Document Models - Analyze Batch Documents

Analyzes batch documents with document model.

POST {endpoint}/documentintelligence/documentModels/{modelId}:analyzeBatch?api-version=2024-07-31-preview
POST {endpoint}/documentintelligence/documentModels/{modelId}:analyzeBatch?api-version=2024-07-31-preview&pages={pages}&locale={locale}&stringIndexType={stringIndexType}&features={features}&queryFields={queryFields}&outputContentFormat={outputContentFormat}&output={output}

URI Parameters

Name In Required Type Description
endpoint
path True

string

uri

The Document Intelligence service endpoint.

modelId
path True

string

Unique document model name.

Regex pattern: ^[a-zA-Z0-9][a-zA-Z0-9._~-]{1,63}$

api-version
query True

string

The API version to use for this operation.

features
query

DocumentAnalysisFeature[]

List of optional analysis features.

locale
query

string

Locale hint for text recognition and document analysis. Value may contain only the language code (ex. "en", "fr") or BCP 47 language tag (ex. "en-US").

output
query

AnalyzeOutputOption[]

Additional outputs to generate during analysis.

outputContentFormat
query

ContentFormat

Format of the analyze result top-level content.

pages
query

string

List of 1-based page numbers to analyze. Ex. "1-3,5,7-9"

Regex pattern: ^(\d+(-\d+)?)(,\s*(\d+(-\d+)?))*$

queryFields
query

string[]

List of additional fields to extract. Ex. "NumberOfGuests,StoreNumber"

stringIndexType
query

StringIndexType

Method used to compute string offset and length.

Request Body

Name Required Type Description
resultContainerUrl True

string

Azure Blob Storage container URL where analyze result files will be stored.

azureBlobFileListSource

AzureBlobFileListContentSource

Azure Blob Storage file list specifying the batch documents. Either azureBlobSource or azureBlobFileListSource must be specified.

azureBlobSource

AzureBlobContentSource

Azure Blob Storage location containing the batch documents. Either azureBlobSource or azureBlobFileListSource must be specified.

overwriteExisting

boolean

Overwrite existing analyze result files?

resultPrefix

string

Blob name prefix of result files.

Responses

Name Type Description
202 Accepted

The request has been accepted for processing, but processing has not yet completed.

Headers

  • Operation-Location: string
  • Retry-After: integer
Other Status Codes

ErrorResponse

An unexpected error response.

Security

Ocp-Apim-Subscription-Key

Type: apiKey
In: header

OAuth2Auth

Type: oauth2
Flow: accessCode
Authorization URL: https://login.microsoftonline.com/common/oauth2/authorize
Token URL: https://login.microsoftonline.com/common/oauth2/token

Scopes

Name Description
https://cognitiveservices.azure.com/.default

Examples

Analyze Batch Documents

Sample request

POST https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentModels/customModel:analyzeBatch?api-version=2024-07-31-preview&pages=1-5&locale=en-US&stringIndexType=textElements

{
  "azureBlobSource": {
    "containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer?mySasToken",
    "prefix": "trainingDocs/"
  },
  "resultContainerUrl": "https://myStorageAccount.blob.core.windows.net/myOutputContainer?mySasToken",
  "resultPrefix": "trainingDocsResult/",
  "overwriteExisting": true
}

Sample response

Operation-Location: https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentModels/customModel/analyzeBatchResults/3b31320d-8bab-4f88-b19c-2322a7f11034?api-version=2024-02-29-preview

Definitions

Name Description
AnalyzeBatchDocumentsRequest

Batch document analysis parameters.

AnalyzeOutputOption

Additional outputs to generate during analysis.

AzureBlobContentSource

Azure Blob Storage content.

AzureBlobFileListContentSource

File list in Azure Blob Storage.

ContentFormat

Format of the content in analyzed result.

DocumentAnalysisFeature

Document analysis features to enable.

Error

The error object.

ErrorResponse

Error response object.

InnerError

An object containing more specific information about the error.

StringIndexType

Method used to compute string offset and length.

AnalyzeBatchDocumentsRequest

Batch document analysis parameters.

Name Type Default value Description
azureBlobFileListSource

AzureBlobFileListContentSource

Azure Blob Storage file list specifying the batch documents. Either azureBlobSource or azureBlobFileListSource must be specified.

azureBlobSource

AzureBlobContentSource

Azure Blob Storage location containing the batch documents. Either azureBlobSource or azureBlobFileListSource must be specified.

overwriteExisting

boolean

False

Overwrite existing analyze result files?

resultContainerUrl

string

Azure Blob Storage container URL where analyze result files will be stored.

resultPrefix

string

Blob name prefix of result files.

AnalyzeOutputOption

Additional outputs to generate during analysis.

Name Type Description
figures

string

Generate cropped images of detected figures.

pdf

string

Generate searchable PDF output.

AzureBlobContentSource

Azure Blob Storage content.

Name Type Description
containerUrl

string

Azure Blob Storage container URL.

prefix

string

Blob name prefix.

AzureBlobFileListContentSource

File list in Azure Blob Storage.

Name Type Description
containerUrl

string

Azure Blob Storage container URL.

fileList

string

Path to a JSONL file within the container specifying a subset of documents.

ContentFormat

Format of the content in analyzed result.

Name Type Description
markdown

string

Markdown representation of the document content with section headings, tables, etc.

text

string

Plain text representation of the document content without any formatting.

DocumentAnalysisFeature

Document analysis features to enable.

Name Type Description
barcodes

string

Enable the detection of barcodes in the document.

formulas

string

Enable the detection of mathematical expressions in the document.

keyValuePairs

string

Enable the detection of general key value pairs (form fields) in the document.

languages

string

Enable the detection of the text content language.

ocrHighResolution

string

Perform OCR at a higher resolution to handle documents with fine print.

queryFields

string

Enable the extraction of additional fields via the queryFields query parameter.

styleFont

string

Enable the recognition of various font styles.

Error

The error object.

Name Type Description
code

string

One of a server-defined set of error codes.

details

Error[]

An array of details about specific errors that led to this reported error.

innererror

InnerError

An object containing more specific information than the current object about the error.

message

string

A human-readable representation of the error.

target

string

The target of the error.

ErrorResponse

Error response object.

Name Type Description
error

Error

Error info.

InnerError

An object containing more specific information about the error.

Name Type Description
code

string

One of a server-defined set of error codes.

innererror

InnerError

Inner error.

message

string

A human-readable representation of the error.

StringIndexType

Method used to compute string offset and length.

Name Type Description
textElements

string

User-perceived display character, or grapheme cluster, as defined by Unicode 8.0.0.

unicodeCodePoint

string

Character unit represented by a single unicode code point. Used by Python 3.

utf16CodeUnit

string

Character unit represented by a 16-bit Unicode code unit. Used by JavaScript, Java, and .NET.