Freigeben über


DocumentAnalysisClient Class

DocumentAnalysisClient analyzes information from documents and images, and classifies documents. It is the interface to use for analyzing with prebuilt models (receipts, business cards, invoices, identity documents, among others), analyzing layout from documents, analyzing general document types, and analyzing custom documents with built models (to see a full list of models supported by the service, see: https://aka.ms/azsdk/formrecognizer/models). It provides different methods based on inputs from a URL and inputs from a stream.

Note

DocumentAnalysisClient should be used with API versions

2022-08-31 and up. To use API versions <=v2.1, instantiate a FormRecognizerClient.

New in version 2022-08-31: The DocumentAnalysisClient and its client methods.

Inheritance
azure.ai.formrecognizer._form_base_client.FormRecognizerClientBase
DocumentAnalysisClient

Constructor

DocumentAnalysisClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, **kwargs: Any)

Parameters

Name Description
endpoint
Required
str

Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus2.api.cognitive.microsoft.com).

credential
Required

Credentials needed for the client to connect to Azure. This is an instance of AzureKeyCredential if using an API key or a token credential from identity.

Keyword-Only Parameters

Name Description
api_version

The API version of the service to use for requests. It defaults to the latest service version. Setting to an older version may result in reduced feature compatibility. To use API versions <=v2.1, instantiate a FormRecognizerClient.

Examples

Creating the DocumentAnalysisClient with an endpoint and API key.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(endpoint, AzureKeyCredential(key))

Creating the DocumentAnalysisClient with a token credential.


   """DefaultAzureCredential will use the values from these environment
   variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
   """
   from azure.ai.formrecognizer import DocumentAnalysisClient
   from azure.identity import DefaultAzureCredential

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   credential = DefaultAzureCredential()

   document_analysis_client = DocumentAnalysisClient(endpoint, credential)

Methods

begin_analyze_document

Analyze field text and semantic values from a given document.

New in version 2023-07-31: The features keyword argument.

begin_analyze_document_from_url

Analyze field text and semantic values from a given document. The input must be the location (URL) of the document to be analyzed.

New in version 2023-07-31: The features keyword argument.

begin_classify_document

Classify a document using a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel.

New in version 2023-07-31: The begin_classify_document client method.

begin_classify_document_from_url

Classify a given document with a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel. The input must be the location (URL) of the document to be classified.

New in version 2023-07-31: The begin_classify_document_from_url client method.

close

Close the DocumentAnalysisClient session.

send_request

Runs a network request using the client's existing pipeline.

The request URL can be relative to the base URL. The service API version used for the request is the same as the client's unless otherwise specified. Overriding the client's configured API version in relative URL is supported on client with API version 2022-08-31 and later. Overriding in absolute URL supported on client with any API version. This method does not raise if the response is an error; to raise an exception, call raise_for_status() on the returned response object. For more information about how to send custom requests with this method, see https://aka.ms/azsdk/dpcodegen/python/send_request.

begin_analyze_document

Analyze field text and semantic values from a given document.

New in version 2023-07-31: The features keyword argument.

begin_analyze_document(model_id: str, document: bytes | IO[bytes], **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

Name Description
model_id
Required
str

A unique model identifier can be passed in as a string. Use this to specify the custom model ID or prebuilt model ID. Prebuilt model IDs supported can be found here: https://aka.ms/azsdk/formrecognizer/models

document
Required
bytes or IO[bytes]

File stream or bytes. For service supported file types, see: https://aka.ms/azsdk/formrecognizer/supportedfiles.

Keyword-Only Parameters

Name Description
pages
str

Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages="1-3, 5-6". Separate each page number or range with a comma.

locale
str

Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.

features

Document analysis features to enable.

Returns

Type Description

An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Exceptions

Type Description

Examples

Analyze an invoice. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_analyze_document(
           "prebuilt-invoice", document=f, locale="en-US"
       )
   invoices = poller.result()

   for idx, invoice in enumerate(invoices.documents):
       print(f"--------Analyzing invoice #{idx + 1}--------")
       vendor_name = invoice.fields.get("VendorName")
       if vendor_name:
           print(
               f"Vendor Name: {vendor_name.value} has confidence: {vendor_name.confidence}"
           )
       vendor_address = invoice.fields.get("VendorAddress")
       if vendor_address:
           print(
               f"Vendor Address: {vendor_address.value} has confidence: {vendor_address.confidence}"
           )
       vendor_address_recipient = invoice.fields.get("VendorAddressRecipient")
       if vendor_address_recipient:
           print(
               f"Vendor Address Recipient: {vendor_address_recipient.value} has confidence: {vendor_address_recipient.confidence}"
           )
       customer_name = invoice.fields.get("CustomerName")
       if customer_name:
           print(
               f"Customer Name: {customer_name.value} has confidence: {customer_name.confidence}"
           )
       customer_id = invoice.fields.get("CustomerId")
       if customer_id:
           print(
               f"Customer Id: {customer_id.value} has confidence: {customer_id.confidence}"
           )
       customer_address = invoice.fields.get("CustomerAddress")
       if customer_address:
           print(
               f"Customer Address: {customer_address.value} has confidence: {customer_address.confidence}"
           )
       customer_address_recipient = invoice.fields.get("CustomerAddressRecipient")
       if customer_address_recipient:
           print(
               f"Customer Address Recipient: {customer_address_recipient.value} has confidence: {customer_address_recipient.confidence}"
           )
       invoice_id = invoice.fields.get("InvoiceId")
       if invoice_id:
           print(
               f"Invoice Id: {invoice_id.value} has confidence: {invoice_id.confidence}"
           )
       invoice_date = invoice.fields.get("InvoiceDate")
       if invoice_date:
           print(
               f"Invoice Date: {invoice_date.value} has confidence: {invoice_date.confidence}"
           )
       invoice_total = invoice.fields.get("InvoiceTotal")
       if invoice_total:
           print(
               f"Invoice Total: {invoice_total.value} has confidence: {invoice_total.confidence}"
           )
       due_date = invoice.fields.get("DueDate")
       if due_date:
           print(f"Due Date: {due_date.value} has confidence: {due_date.confidence}")
       purchase_order = invoice.fields.get("PurchaseOrder")
       if purchase_order:
           print(
               f"Purchase Order: {purchase_order.value} has confidence: {purchase_order.confidence}"
           )
       billing_address = invoice.fields.get("BillingAddress")
       if billing_address:
           print(
               f"Billing Address: {billing_address.value} has confidence: {billing_address.confidence}"
           )
       billing_address_recipient = invoice.fields.get("BillingAddressRecipient")
       if billing_address_recipient:
           print(
               f"Billing Address Recipient: {billing_address_recipient.value} has confidence: {billing_address_recipient.confidence}"
           )
       shipping_address = invoice.fields.get("ShippingAddress")
       if shipping_address:
           print(
               f"Shipping Address: {shipping_address.value} has confidence: {shipping_address.confidence}"
           )
       shipping_address_recipient = invoice.fields.get("ShippingAddressRecipient")
       if shipping_address_recipient:
           print(
               f"Shipping Address Recipient: {shipping_address_recipient.value} has confidence: {shipping_address_recipient.confidence}"
           )
       print("Invoice items:")
       for idx, item in enumerate(invoice.fields.get("Items").value):
           print(f"...Item #{idx + 1}")
           item_description = item.value.get("Description")
           if item_description:
               print(
                   f"......Description: {item_description.value} has confidence: {item_description.confidence}"
               )
           item_quantity = item.value.get("Quantity")
           if item_quantity:
               print(
                   f"......Quantity: {item_quantity.value} has confidence: {item_quantity.confidence}"
               )
           unit = item.value.get("Unit")
           if unit:
               print(f"......Unit: {unit.value} has confidence: {unit.confidence}")
           unit_price = item.value.get("UnitPrice")
           if unit_price:
               unit_price_code = unit_price.value.code if unit_price.value.code else ""
               print(
                   f"......Unit Price: {unit_price.value}{unit_price_code} has confidence: {unit_price.confidence}"
               )
           product_code = item.value.get("ProductCode")
           if product_code:
               print(
                   f"......Product Code: {product_code.value} has confidence: {product_code.confidence}"
               )
           item_date = item.value.get("Date")
           if item_date:
               print(
                   f"......Date: {item_date.value} has confidence: {item_date.confidence}"
               )
           tax = item.value.get("Tax")
           if tax:
               print(f"......Tax: {tax.value} has confidence: {tax.confidence}")
           amount = item.value.get("Amount")
           if amount:
               print(
                   f"......Amount: {amount.value} has confidence: {amount.confidence}"
               )
       subtotal = invoice.fields.get("SubTotal")
       if subtotal:
           print(f"Subtotal: {subtotal.value} has confidence: {subtotal.confidence}")
       total_tax = invoice.fields.get("TotalTax")
       if total_tax:
           print(
               f"Total Tax: {total_tax.value} has confidence: {total_tax.confidence}"
           )
       previous_unpaid_balance = invoice.fields.get("PreviousUnpaidBalance")
       if previous_unpaid_balance:
           print(
               f"Previous Unpaid Balance: {previous_unpaid_balance.value} has confidence: {previous_unpaid_balance.confidence}"
           )
       amount_due = invoice.fields.get("AmountDue")
       if amount_due:
           print(
               f"Amount Due: {amount_due.value} has confidence: {amount_due.confidence}"
           )
       service_start_date = invoice.fields.get("ServiceStartDate")
       if service_start_date:
           print(
               f"Service Start Date: {service_start_date.value} has confidence: {service_start_date.confidence}"
           )
       service_end_date = invoice.fields.get("ServiceEndDate")
       if service_end_date:
           print(
               f"Service End Date: {service_end_date.value} has confidence: {service_end_date.confidence}"
           )
       service_address = invoice.fields.get("ServiceAddress")
       if service_address:
           print(
               f"Service Address: {service_address.value} has confidence: {service_address.confidence}"
           )
       service_address_recipient = invoice.fields.get("ServiceAddressRecipient")
       if service_address_recipient:
           print(
               f"Service Address Recipient: {service_address_recipient.value} has confidence: {service_address_recipient.confidence}"
           )
       remittance_address = invoice.fields.get("RemittanceAddress")
       if remittance_address:
           print(
               f"Remittance Address: {remittance_address.value} has confidence: {remittance_address.confidence}"
           )
       remittance_address_recipient = invoice.fields.get("RemittanceAddressRecipient")
       if remittance_address_recipient:
           print(
               f"Remittance Address Recipient: {remittance_address_recipient.value} has confidence: {remittance_address_recipient.confidence}"
           )

Analyze a custom document. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   model_id = os.getenv("CUSTOM_BUILT_MODEL_ID", custom_model_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )

   # Make sure your document's type is included in the list of document types the custom model can analyze
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_analyze_document(
           model_id=model_id, document=f
       )
   result = poller.result()

   for idx, document in enumerate(result.documents):
       print(f"--------Analyzing document #{idx + 1}--------")
       print(f"Document has type {document.doc_type}")
       print(f"Document has document type confidence {document.confidence}")
       print(f"Document was analyzed with model with ID {result.model_id}")
       for name, field in document.fields.items():
           field_value = field.value if field.value else field.content
           print(
               f"......found field of type '{field.value_type}' with value '{field_value}' and with confidence {field.confidence}"
           )

   # iterate over tables, lines, and selection marks on each page
   for page in result.pages:
       print(f"\nLines found on page {page.page_number}")
       for line in page.lines:
           print(f"...Line '{line.content}'")
       for word in page.words:
           print(f"...Word '{word.content}' has a confidence of {word.confidence}")
       if page.selection_marks:
           print(f"\nSelection marks found on page {page.page_number}")
           for selection_mark in page.selection_marks:
               print(
                   f"...Selection mark is '{selection_mark.state}' and has a confidence of {selection_mark.confidence}"
               )

   for i, table in enumerate(result.tables):
       print(f"\nTable {i + 1} can be found on page:")
       for region in table.bounding_regions:
           print(f"...{region.page_number}")
       for cell in table.cells:
           print(
               f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'"
           )
   print("-----------------------------------")

begin_analyze_document_from_url

Analyze field text and semantic values from a given document. The input must be the location (URL) of the document to be analyzed.

New in version 2023-07-31: The features keyword argument.

begin_analyze_document_from_url(model_id: str, document_url: str, **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

Name Description
model_id
Required
str

A unique model identifier can be passed in as a string. Use this to specify the custom model ID or prebuilt model ID. Prebuilt model IDs supported can be found here: https://aka.ms/azsdk/formrecognizer/models

document_url
Required
str

The URL of the document to analyze. The input must be a valid, properly encoded (i.e. encode special characters, such as empty spaces), and publicly accessible URL. For service supported file types, see: https://aka.ms/azsdk/formrecognizer/supportedfiles.

Keyword-Only Parameters

Name Description
pages
str

Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages="1-3, 5-6". Separate each page number or range with a comma.

locale
str

Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.

features

Document analysis features to enable.

Returns

Type Description

An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Exceptions

Type Description

Examples

Analyze a receipt. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png"
   poller = document_analysis_client.begin_analyze_document_from_url(
       "prebuilt-receipt", document_url=url
   )
   receipts = poller.result()

   for idx, receipt in enumerate(receipts.documents):
       print(f"--------Analysis of receipt #{idx + 1}--------")
       print(f"Receipt type: {receipt.doc_type if receipt.doc_type else 'N/A'}")
       merchant_name = receipt.fields.get("MerchantName")
       if merchant_name:
           print(
               f"Merchant Name: {merchant_name.value} has confidence: "
               f"{merchant_name.confidence}"
           )
       transaction_date = receipt.fields.get("TransactionDate")
       if transaction_date:
           print(
               f"Transaction Date: {transaction_date.value} has confidence: "
               f"{transaction_date.confidence}"
           )
       if receipt.fields.get("Items"):
           print("Receipt items:")
           for idx, item in enumerate(receipt.fields.get("Items").value):
               print(f"...Item #{idx + 1}")
               item_description = item.value.get("Description")
               if item_description:
                   print(
                       f"......Item Description: {item_description.value} has confidence: "
                       f"{item_description.confidence}"
                   )
               item_quantity = item.value.get("Quantity")
               if item_quantity:
                   print(
                       f"......Item Quantity: {item_quantity.value} has confidence: "
                       f"{item_quantity.confidence}"
                   )
               item_price = item.value.get("Price")
               if item_price:
                   print(
                       f"......Individual Item Price: {item_price.value} has confidence: "
                       f"{item_price.confidence}"
                   )
               item_total_price = item.value.get("TotalPrice")
               if item_total_price:
                   print(
                       f"......Total Item Price: {item_total_price.value} has confidence: "
                       f"{item_total_price.confidence}"
                   )
       subtotal = receipt.fields.get("Subtotal")
       if subtotal:
           print(f"Subtotal: {subtotal.value} has confidence: {subtotal.confidence}")
       tax = receipt.fields.get("TotalTax")
       if tax:
           print(f"Total tax: {tax.value} has confidence: {tax.confidence}")
       tip = receipt.fields.get("Tip")
       if tip:
           print(f"Tip: {tip.value} has confidence: {tip.confidence}")
       total = receipt.fields.get("Total")
       if total:
           print(f"Total: {total.value} has confidence: {total.confidence}")
       print("--------------------------------------")

begin_classify_document

Classify a document using a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel.

New in version 2023-07-31: The begin_classify_document client method.

begin_classify_document(classifier_id: str, document: bytes | IO[bytes], **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

Name Description
classifier_id
Required
str

A unique document classifier identifier can be passed in as a string.

document
Required
bytes or IO[bytes]

File stream or bytes. For service supported file types, see: https://aka.ms/azsdk/formrecognizer/supportedfiles.

Keyword-Only Parameters

Name Description
pages
str

Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages="1-3, 5-6". Separate each page number or range with a comma.

locale
str

Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.

features

Document analysis features to enable.

Returns

Type Description

An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Exceptions

Type Description

Examples

Classify a document. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   classifier_id = os.getenv("CLASSIFIER_ID", classifier_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )
   with open(path_to_sample_documents, "rb") as f:
       poller = document_analysis_client.begin_classify_document(
           classifier_id, document=f
       )
   result = poller.result()

   print("----Classified documents----")
   for doc in result.documents:
       print(
           f"Found document of type '{doc.doc_type or 'N/A'}' with a confidence of {doc.confidence} contained on "
           f"the following pages: {[region.page_number for region in doc.bounding_regions]}"
       )

begin_classify_document_from_url

Classify a given document with a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel. The input must be the location (URL) of the document to be classified.

New in version 2023-07-31: The begin_classify_document_from_url client method.

begin_classify_document_from_url(classifier_id: str, document_url: str, **kwargs: Any) -> LROPoller[AnalyzeResult]

Parameters

Name Description
classifier_id
Required
str

A unique document classifier identifier can be passed in as a string.

document_url
Required
str

The URL of the document to classify. The input must be a valid, properly encoded (i.e. encode special characters, such as empty spaces), and publicly accessible URL of one of the supported formats: https://aka.ms/azsdk/formrecognizer/supportedfiles.

Keyword-Only Parameters

Name Description
pages
str

Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages="1-3, 5-6". Separate each page number or range with a comma.

locale
str

Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.

features

Document analysis features to enable.

Returns

Type Description

An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult.

Exceptions

Type Description

Examples

Classify a document. For more samples see the samples folder.


   from azure.core.credentials import AzureKeyCredential
   from azure.ai.formrecognizer import DocumentAnalysisClient

   endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"]
   key = os.environ["AZURE_FORM_RECOGNIZER_KEY"]
   classifier_id = os.getenv("CLASSIFIER_ID", classifier_id)

   document_analysis_client = DocumentAnalysisClient(
       endpoint=endpoint, credential=AzureKeyCredential(key)
   )

   url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/forms/IRS-1040.pdf"

   poller = document_analysis_client.begin_classify_document_from_url(
       classifier_id, document_url=url
   )
   result = poller.result()

   print("----Classified documents----")
   for doc in result.documents:
       print(
           f"Found document of type '{doc.doc_type or 'N/A'}' with a confidence of {doc.confidence} contained on "
           f"the following pages: {[region.page_number for region in doc.bounding_regions]}"
       )

close

Close the DocumentAnalysisClient session.

close() -> None

Keyword-Only Parameters

Name Description
pages
str

Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages="1-3, 5-6". Separate each page number or range with a comma.

locale
str

Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.

features

Document analysis features to enable.

Exceptions

Type Description

send_request

Runs a network request using the client's existing pipeline.

The request URL can be relative to the base URL. The service API version used for the request is the same as the client's unless otherwise specified. Overriding the client's configured API version in relative URL is supported on client with API version 2022-08-31 and later. Overriding in absolute URL supported on client with any API version. This method does not raise if the response is an error; to raise an exception, call raise_for_status() on the returned response object. For more information about how to send custom requests with this method, see https://aka.ms/azsdk/dpcodegen/python/send_request.

send_request(request: HttpRequest, *, stream: bool = False, **kwargs) -> HttpResponse

Parameters

Name Description
request
Required

The network request you want to make.

Keyword-Only Parameters

Name Description
stream

Whether the response payload will be streamed. Defaults to False.

Returns

Type Description

The response of your network call. Does not do error handling on your response.

Exceptions

Type Description