Image Analysis - Analyze Image

Analyze the input image. The request either contains image stream with any content type ['image/*', 'application/octet-stream'], or a JSON payload which includes an url property to be used to retrieve the image stream.

POST /imageanalysis:analyze?api-version=2023-04-01-preview
POST /imageanalysis:analyze?features={features}&model-name={model-name}&language={language}&smartcrops-aspect-ratios={smartcrops-aspect-ratios}&gender-neutral-caption={gender-neutral-caption}&api-version=2023-04-01-preview

URI Parameters

Name In Required Type Description
api-version
query True

string

Requested API version.

features
query

VisualFeature[]

The visual features requested: tags, objects, caption, denseCaptions, read, smartCrops, people. This parameter needs to be specified if the parameter "model-name" is not specified.

gender-neutral-caption
query

boolean

Boolean flag for enabling gender-neutral captioning for caption and denseCaptions features. If this parameter is not specified, the default value is "false".

language
query

string

The desired language for output generation. If this parameter is not specified, the default value is "en". See https://aka.ms/cv-languages for a list of supported languages.

model-name
query

string

The name of the custom trained model. This parameter needs to be specified if the parameter "features" is not specified.

smartcrops-aspect-ratios
query

string

A list of aspect ratios to use for smartCrops feature. Aspect ratios are calculated by dividing the target crop width by the height. Supported values are between 0.75 and 1.8 (inclusive). Multiple values should be comma-separated. If this parameter is not specified, the service will return one crop suggestion with an aspect ratio it sees fit between 0.5 and 2.0 (inclusive).

Request Body

Name Required Type Description
url True

string

Publicly reachable URL of an image.

Responses

Name Type Description
200 OK

ImageAnalysisResult

Success

Other Status Codes

ErrorResponse

Error

Headers

x-ms-error-code: string

Examples

AnalyzeImage_CustomModel

Sample request

POST /imageanalysis:analyze?model-name=my_model_name&api-version=2023-04-01-preview

{
  "url": "https://example.com/image.jpg"
}

Sample response

{
  "customModelResult": {
    "objectsResult": {
      "values": [
        {
          "id": "1",
          "boundingBox": {
            "x": 197,
            "y": 68,
            "w": 356,
            "h": 394
          },
          "tags": [
            {
              "name": "class1",
              "confidence": 0.92431640625
            }
          ]
        },
        {
          "id": "2",
          "boundingBox": {
            "x": 0,
            "y": 77,
            "w": 241,
            "h": 359
          },
          "tags": [
            {
              "name": "class1",
              "confidence": 0.87890625
            }
          ]
        }
      ]
    }
  },
  "modelVersion": "2023-04-01-preview",
  "metadata": {
    "width": 660,
    "height": 495
  }
}

Definitions

Name Description
AdultMatch

An object describing adult content match.

AdultResult

An object describing whether the image contains adult-oriented content and/or is racy.

BoundingBox

A bounding box for an area inside an image.

CaptionResult

A brief description of what the image depicts.

CropRegion

A region identified for smart cropping. There will be one region returned for each requested aspect ratio.

DenseCaption

A brief description of what the image depicts.

DenseCaptionsResult

A list of captions.

DetectedObject

Describes a detected object in an image.

DetectedPerson

A person detected in an image.

DocumentLine

A content line object consisting of an adjacent sequence of content elements, such as words and selection marks.

DocumentPage

The content and layout elements extracted from a page from the input.

DocumentSpan

Contiguous region of the concatenated content property, specified as an offset and length.

DocumentStyle

An object representing observed text styles.

DocumentWord

A word object consisting of a contiguous sequence of characters. For non-space delimited languages, such as Chinese, Japanese, and Korean, each character is represented as its own word.

ErrorResponse

Response returned when an error occurs.

ErrorResponseDetails

Error info.

ErrorResponseInnerError

Detailed error.

ImageAnalysisResult

Describe the combined results of different types of image analysis.

ImageMetadataApiModel

The image metadata information such as height and width.

ImagePredictionResult

Describes the prediction result of an image.

ImageUrl

A JSON document with a URL pointing to the image that is to be analyzed.

ObjectsResult

Describes detected objects in an image.

PeopleResult

An object describing whether the image contains people.

ReadResult

The results of an Read operation.

SmartCropsResult

Smart cropping result.

Tag

An entity observation in the image, along with the confidence score.

TagsResult

A list of tags with confidence level.

VisualFeature

The visual features requested: tags, objects, caption, denseCaptions, read, smartCrops, people. This parameter needs to be specified if the parameter "model-name" is not specified.

AdultMatch

An object describing adult content match.

Name Type Description
confidence

number

A value indicating the confidence level of matched adult content.

isMatch

boolean

A value indicating if the image is matched adult content.

AdultResult

An object describing whether the image contains adult-oriented content and/or is racy.

Name Type Description
adult

AdultMatch

An object describing adult content match.

gore

AdultMatch

An object describing adult content match.

racy

AdultMatch

An object describing adult content match.

BoundingBox

A bounding box for an area inside an image.

Name Type Description
h

integer

Height measured from the top-left point of the area, in pixels.

w

integer

Width measured from the top-left point of the area, in pixels.

x

integer

Left-coordinate of the top left point of the area, in pixels.

y

integer

Top-coordinate of the top left point of the area, in pixels.

CaptionResult

A brief description of what the image depicts.

Name Type Description
confidence

number

The level of confidence the service has in the caption.

text

string

The text of the caption.

CropRegion

A region identified for smart cropping. There will be one region returned for each requested aspect ratio.

Name Type Description
aspectRatio

number

The aspect ratio of the crop region.

boundingBox

BoundingBox

A bounding box for an area inside an image.

DenseCaption

A brief description of what the image depicts.

Name Type Description
boundingBox

BoundingBox

A bounding box for an area inside an image.

confidence

number

The level of confidence the service has in the caption.

text

string

The text of the caption.

DenseCaptionsResult

A list of captions.

Name Type Description
values

DenseCaption[]

A list of captions.

DetectedObject

Describes a detected object in an image.

Name Type Description
boundingBox

BoundingBox

A bounding box for an area inside an image.

id

string

Id of the detected object.

tags

Tag[]

Classification confidences of the detected object.

DetectedPerson

A person detected in an image.

Name Type Description
boundingBox

BoundingBox

A bounding box for an area inside an image.

confidence

number

Confidence score of having observed the person in the image, as a value ranging from 0 to 1.

DocumentLine

A content line object consisting of an adjacent sequence of content elements, such as words and selection marks.

Name Type Description
boundingBox

number[]

Bounding box of the line.

content

string

Concatenated content of the contained elements in reading order.

spans

DocumentSpan[]

Location of the line in the reading order concatenated content.

DocumentPage

The content and layout elements extracted from a page from the input.

Name Type Description
angle

number

The general orientation of the content in clockwise direction, measured in degrees between (-180, 180].

height

number

The height of the image/PDF in pixels/inches, respectively.

lines

DocumentLine[]

Extracted lines from the page, potentially containing both textual and visual elements.

pageNumber

integer

1-based page number in the input document.

spans

DocumentSpan[]

Location of the page in the reading order concatenated content.

width

number

The width of the image/PDF in pixels/inches, respectively.

words

DocumentWord[]

Extracted words from the page.

DocumentSpan

Contiguous region of the concatenated content property, specified as an offset and length.

Name Type Description
length

integer

Number of characters in the content represented by the span.

offset

integer

Zero-based index of the content represented by the span.

DocumentStyle

An object representing observed text styles.

Name Type Description
confidence

number

Confidence of correctly identifying the style.

isHandwritten

boolean

Is content handwritten or not.

spans

DocumentSpan[]

Location of the text elements in the concatenated content the style applies to.

DocumentWord

A word object consisting of a contiguous sequence of characters. For non-space delimited languages, such as Chinese, Japanese, and Korean, each character is represented as its own word.

Name Type Description
boundingBox

number[]

Bounding box of the word.

confidence

number

Confidence of correctly extracting the word.

content

string

Text content of the word.

span

DocumentSpan

Contiguous region of the concatenated content property, specified as an offset and length.

ErrorResponse

Response returned when an error occurs.

Name Type Description
error

ErrorResponseDetails

Error info.

ErrorResponseDetails

Error info.

Name Type Description
code

string

Error code.

details

ErrorResponseDetails[]

List of detailed errors.

innererror

ErrorResponseInnerError

Detailed error.

message

string

Error message.

target

string

Target of the error.

ErrorResponseInnerError

Detailed error.

Name Type Description
code

string

Error code.

innererror

ErrorResponseInnerError

Detailed error.

message

string

Error message.

ImageAnalysisResult

Describe the combined results of different types of image analysis.

Name Type Description
adultResult

AdultResult

An object describing whether the image contains adult-oriented content and/or is racy.

captionResult

CaptionResult

A brief description of what the image depicts.

customModelResult

ImagePredictionResult

Describes the prediction result of an image.

denseCaptionsResult

DenseCaptionsResult

A list of captions.

metadata

ImageMetadataApiModel

The image metadata information such as height and width.

modelVersion

string

Model Version.

objectsResult

ObjectsResult

Describes detected objects in an image.

peopleResult

PeopleResult

An object describing whether the image contains people.

readResult

ReadResult

The results of an Read operation.

smartCropsResult

SmartCropsResult

Smart cropping result.

tagsResult

TagsResult

A list of tags with confidence level.

ImageMetadataApiModel

The image metadata information such as height and width.

Name Type Description
height

integer

The height of the image in pixels.

width

integer

The width of the image in pixels.

ImagePredictionResult

Describes the prediction result of an image.

Name Type Description
objectsResult

ObjectsResult

Describes detected objects in an image.

tagsResult

TagsResult

A list of tags with confidence level.

ImageUrl

A JSON document with a URL pointing to the image that is to be analyzed.

Name Type Description
url

string

Publicly reachable URL of an image.

ObjectsResult

Describes detected objects in an image.

Name Type Description
values

DetectedObject[]

An array of detected objects.

PeopleResult

An object describing whether the image contains people.

Name Type Description
values

DetectedPerson[]

An array of detected people.

ReadResult

The results of an Read operation.

Name Type Description
content

string

Concatenate string representation of all textual and visual elements in reading order.

pages

DocumentPage[]

A list of analyzed pages.

stringIndexType

string

The method used to compute string offset and length, possible values include: 'textElements', 'unicodeCodePoint', 'utf16CodeUnit' etc.

styles

DocumentStyle[]

Extracted font styles.

SmartCropsResult

Smart cropping result.

Name Type Description
values

CropRegion[]

Recommended regions for cropping the image.

Tag

An entity observation in the image, along with the confidence score.

Name Type Description
confidence

number

The level of confidence that the entity was observed.

name

string

Name of the entity.

TagsResult

A list of tags with confidence level.

Name Type Description
values

Tag[]

A list of tags with confidence level.

VisualFeature

The visual features requested: tags, objects, caption, denseCaptions, read, smartCrops, people. This parameter needs to be specified if the parameter "model-name" is not specified.

Name Type Description
caption

string

denseCaptions

string

objects

string

people

string

read

string

smartCrops

string

tags

string