Azure DocumentIntelligence (先前稱為 FormRecognizer) 適用於 JavaScript 的 REST 用戶端連結庫 - 1.0.0-beta.2 版

發行項
03/22/2024

從檔擷取內容、版面配置和結構化數據。

注意：表格辨識器已重新命名為 Document Intelligence。請檢查移轉指南從 @azure/ai-form-recognizer 到 @azure-rest/ai-document-intelligence。

重要連結：

此版本的用戶端連結庫預設為 "2024-02-29-preview" 服務版本。

下表顯示 SDK 版本與服務支援的 API 版本之間的關聯性：

SDK 版本	服務支援的 API 版本
1.0.0-beta.2	2024-02-29-preview
1.0.0-beta.1	2023-10-31-preview

請透過舊版服務 API 版本依賴舊 @azure/ai-form-recognizer 版的連結庫，以取得已淘汰的模型，例如 "prebuilt-businessCard" 和 "prebuilt-document"。如需詳細資訊，請參閱 Changelog。

下表說明每個用戶端及其支援的 API 版本關聯性， (s) ：

服務 API 版本	支持的用戶端	套件
2024-02-29-preview	DocumentIntelligenceClient	`@azure-rest/ai-document-intelligence1.0.0-beta.2` 版
2023-10-31-preview	DocumentIntelligenceClient	`@azure-rest/ai-document-intelligence1.0.0-beta.1` 版
2023-07-31	DocumentAnalysisClient 和 DocumentModelAdministrationClient	`@azure/ai-form-recognizer^5.0.0` 版
2022-08-01	DocumentAnalysisClient 和 DocumentModelAdministrationClient	`@azure/ai-form-recognizer^4.0.0` 版

開始使用

目前支援的環境

Node.js 的 LTS 版本

必要條件

您必須擁有 Azure 訂用帳戶才能使用此套件。

安裝 `@azure-rest/ai-document-intelligence` 套件

安裝 Azure DocumentIntelligence (先前稱為FormRecognizer) 適用於 JavaScript npm的 REST 用戶端 REST 用戶端連結庫：

npm install @azure-rest/ai-document-intelligence

建立和驗證 `DocumentIntelligenceClient`

若要使用 Azure Active Directory (AAD) 令牌認證，請提供從 @azure/身分識別連結庫取得所需認證類型的實例。

若要使用 AAD 進行驗證，您必須先 npm 安裝 @azure/identity

安裝之後，您可以選擇要使用的認證@azure/identity類型。例如， DefaultAzureCredential 可用來驗證用戶端。

將 AAD 應用程式的用戶端識別碼、租使用者識別碼和客戶端密碼的值設定為環境變數：AZURE_CLIENT_ID、AZURE_TENANT_ID、AZURE_CLIENT_SECRET

使用令牌認證

import DocumentIntelligence from "@azure-rest/ai-document-intelligence";

const client = DocumentIntelligence(
  process.env["DOCUMENT_INTELLIGENCE_ENDPOINT"],
  new DefaultAzureCredential()
);

使用 API 金鑰

import DocumentIntelligence from "@azure-rest/ai-document-intelligence";

const client = DocumentIntelligence(process.env["DOCUMENT_INTELLIGENCE_ENDPOINT"], {
  key: process.env["DOCUMENT_INTELLIGENCE_API_KEY"],
});

檔模型

分析預先建置的配置 (urlSource)

const initialResponse = await client
  .path("/documentModels/{modelId}:analyze", "prebuilt-layout")
  .post({
    contentType: "application/json",
    body: {
      urlSource:
        "https://raw.githubusercontent.com/Azure/azure-sdk-for-js/6704eff082aaaf2d97c1371a28461f512f8d748a/sdk/formrecognizer/ai-form-recognizer/assets/forms/Invoice_1.pdf",
    },
    queryParameters: { locale: "en-IN" },
  });

分析預先建置的配置 (base64Source)

import fs from "fs";
import path from "path";

const filePath = path.join(ASSET_PATH, "forms", "Invoice_1.pdf");
const base64Source = fs.readFileSync(filePath, { encoding: "base64" });
const initialResponse = await client
  .path("/documentModels/{modelId}:analyze", "prebuilt-layout")
  .post({
    contentType: "application/json",
    body: {
      base64Source,
    },
    queryParameters: { locale: "en-IN" },
  });

繼續從初始回應建立輪詢器

import {
  getLongRunningPoller,
  AnalyzeResultOperationOutput,
  isUnexpected,
} from "@azure-rest/ai-document-intelligence";

if (isUnexpected(initialResponse)) {
  throw initialResponse.body.error;
}
const poller = await getLongRunningPoller(client, initialResponse);
const result = (await poller.pollUntilDone()).body as AnalyzeResultOperationOutput;
console.log(result);
// {
//   status: 'succeeded',
//   createdDateTime: '2023-11-10T13:31:31Z',
//   lastUpdatedDateTime: '2023-11-10T13:31:34Z',
//   analyzeResult: {
//     apiVersion: '2023-10-31-preview',
//     .
//     .
//     .
//     contentFormat: 'text'
//   }
// }

Markdown 內容格式

支援使用 Markdown 內容格式以及預設純文字的輸出。目前，僅支援「預先建置的配置」。 Markdown 內容格式在聊天或自動化使用案例中被視為更易記的 LLM 取用格式。

服務遵循適用於 Markdown 格式的 GFM 規格 (GitHub Flavored Markdown) 。也引進新的 contentFormat 屬性，其值為 “text” 或 “markdown”，以指出結果內容格式。

import DocumentIntelligence from "@azure-rest/ai-document-intelligence";
const client = DocumentIntelligence(process.env["DOCUMENT_INTELLIGENCE_ENDPOINT"], {
  key: process.env["DOCUMENT_INTELLIGENCE_API_KEY"],
});

const initialResponse = await client
  .path("/documentModels/{modelId}:analyze", "prebuilt-layout")
  .post({
    contentType: "application/json",
    body: {
      urlSource:
        "https://raw.githubusercontent.com/Azure/azure-sdk-for-js/6704eff082aaaf2d97c1371a28461f512f8d748a/sdk/formrecognizer/ai-form-recognizer/assets/forms/Invoice_1.pdf",
    },
    queryParameters: { outputContentFormat: "markdown" }, // <-- new query parameter
  });

查詢欄位

指定此功能旗標時，服務會進一步擷取透過 queryFields 查詢參數指定的欄位值，以補充模型所定義的任何現有欄位做為後援。

await client.path("/documentModels/{modelId}:analyze", "prebuilt-layout").post({
  contentType: "application/json",
  body: { urlSource: "..." },
  queryParameters: {
    features: ["queryFields"],
    queryFields: ["NumberOfGuests", "StoreNumber"],
  }, // <-- new query parameter
});

分割選項

在舊 @azure/ai-form-recognizer 版文檔庫支援的舊版 API 中，檔分割和分類作業 () "/documentClassifiers/{classifierId}:analyze" 一律嘗試將輸入檔分割成多個檔。

為了啟用更廣泛的案例，服務引進了新的「2023-10-31-preview」服務版本「分割」查詢參數。支援下列值：

split: "auto"

讓服務判斷要分割的位置。
split: "none"

整個檔案會被視為單一檔。不會執行分割。
split: "perPage"

每個頁面都會被視為個別的檔。每個空白頁面都會保留為自己的檔。

檔分類器 #Build

import {
  DocumentClassifierBuildOperationDetailsOutput,
  getLongRunningPoller,
  isUnexpected,
} from "@azure-rest/ai-document-intelligence";

const containerSasUrl = (): string =>
  process.env["DOCUMENT_INTELLIGENCE_TRAINING_CONTAINER_SAS_URL"];
const initialResponse = await client.path("/documentClassifiers:build").post({
  body: {
    classifierId: `customClassifier${getRandomNumber()}`,
    description: "Custom classifier description",
    docTypes: {
      foo: {
        azureBlobSource: {
          containerUrl: containerSasUrl(),
        },
      },
      bar: {
        azureBlobSource: {
          containerUrl: containerSasUrl(),
        },
      },
    },
  },
});

if (isUnexpected(initialResponse)) {
  throw initialResponse.body.error;
}
const poller = await getLongRunningPoller(client, initialResponse);
const response = (await poller.pollUntilDone())
  .body as DocumentClassifierBuildOperationDetailsOutput;
console.log(response);
//  {
//    operationId: '31466834048_f3ee629e-73fb-48ab-993b-1d55d73ca460',
//    kind: 'documentClassifierBuild',
//    status: 'succeeded',
//    .
//    .
//    result: {
//      classifierId: 'customClassifier10978',
//      createdDateTime: '2023-11-09T12:45:56Z',
//      .
//      .
//      description: 'Custom classifier description'
//    },
//    apiVersion: '2023-10-31-preview'
//  }

取得資訊

const response = await client.path("/info").get();
if (isUnexpected(response)) {
  throw response.body.error;
}
console.log(response.body.customDocumentModels.limit);
// 20000

列出檔模型

import { paginate } from "@azure-rest/ai-document-intelligence";
const response = await client.path("/documentModels").get();
if (isUnexpected(response)) {
  throw response.body.error;
}

const modelsInAccount: string[] = [];
for await (const model of paginate(client, response)) {
  console.log(model.modelId);
}

疑難排解

記錄

啟用記錄有助於找出失敗的相關實用資訊。若要查看 HTTP 的要求和回應記錄，請將 AZURE_LOG_LEVEL 環境變數設定為 info。或者，您可以在 @azure/logger 中呼叫 setLogLevel，以在執行階段啟用記錄：

const { setLogLevel } = require("@azure/logger");

setLogLevel("info");

如需如何啟用記錄的詳細指示，可參閱 @azure/logger 套件文件。

共用方式為

Azure DocumentIntelligence (先前稱為 FormRecognizer) 適用於 JavaScript 的 REST 用戶端連結庫 - 1.0.0-beta.2 版

開始使用

目前支援的環境

必要條件

安裝 `@azure-rest/ai-document-intelligence` 套件

建立和驗證 `DocumentIntelligenceClient`

使用令牌認證

使用 API 金鑰

檔模型

分析預先建置的配置 (urlSource)

分析預先建置的配置 (base64Source)

Markdown 內容格式

查詢欄位

分割選項

檔分類器 #Build

取得資訊

列出檔模型

疑難排解

記錄

其他資源

共用方式為

Azure DocumentIntelligence (先前稱為 FormRecognizer) 適用於 JavaScript 的 REST 用戶端連結庫 - 1.0.0-beta.2 版

開始使用

目前支援的環境

必要條件

安裝 @azure-rest/ai-document-intelligence 套件

建立和驗證 DocumentIntelligenceClient

使用令牌認證

使用 API 金鑰

檔模型

分析預先建置的配置 (urlSource)

分析預先建置的配置 (base64Source)

Markdown 內容格式

查詢欄位

分割選項

檔分類器 #Build

取得資訊

列出檔模型

疑難排解

記錄

其他資源

安裝 `@azure-rest/ai-document-intelligence` 套件

建立和驗證 `DocumentIntelligenceClient`