Analyze Image - Analyze Image

This operation extracts a rich set of visual features based on the image content. Two input methods are supported -- (1) Uploading an image or (2) specifying an image URL. Within your request, there is an optional parameter to allow you to choose which features to return. By default, image categories are returned in the response. A successful response will be returned in JSON. If the request failed, the response will contain an error code and a message to help understand what went wrong.

POST {Endpoint}/vision/v3.2/analyze
POST {Endpoint}/vision/v3.2/analyze?visualFeatures={visualFeatures}&details={details}&language={language}&descriptionExclude={descriptionExclude}&model-version={model-version}

URI Parameters

Name In Required Type Description
Endpoint
path True

string

Supported Cognitive Services endpoints.

descriptionExclude
query

DescriptionExclude[]

Turn off specified domain models when generating the description.

details
query

Details[]

A string indicating which domain-specific details to return. Multiple values should be comma-separated. Valid visual feature types include: Celebrities - identifies celebrities if detected in the image, Landmarks - identifies notable landmarks in the image.

language
query

string

The desired language for output generation. If this parameter is not specified, the default value is "en". See https://aka.ms/cv-languages for list of supported languages.

model-version
query

string

Optional parameter to specify the version of the AI model. Accepted values are: "latest", "2021-04-01", "2021-05-01". Defaults to "latest".

Regex pattern: ^(latest|\d{4}-\d{2}-\d{2})(-preview)?$

visualFeatures
query

VisualFeatureTypes[]

A string indicating what visual feature types to return. Multiple values should be comma-separated. Valid visual feature types include: Categories - categorizes image content according to a taxonomy defined in documentation. Tags - tags the image with a detailed list of words related to the image content. Description - describes the image content with a complete English sentence. Faces - detects if faces are present. If present, generate coordinates, gender and age. ImageType - detects if image is clipart or a line drawing. Color - determines the accent color, dominant color, and whether an image is black&white. Adult - detects if the image is pornographic in nature (depicts nudity or a sex act), or is gory (depicts extreme violence or blood). Sexually suggestive content (aka racy content) is also detected. Objects - detects various objects within an image, including the approximate location. The Objects argument is only available in English. Brands - detects various brands within an image, including the approximate location. The Brands argument is only available in English.

Request Header

Name Required Type Description
Ocp-Apim-Subscription-Key True

string

Request Body

Name Required Type Description
url True

string

Publicly reachable URL of an image.

Responses

Name Type Description
200 OK

ImageAnalysis

The response include the extracted features in JSON format. Here is the definitions for enumeration types:

ClipartType

Non - clipart = 0, ambiguous = 1, normal - clipart = 2, good - clipart = 3. LineDrawingTypeNon - LineDrawing = 0, LineDrawing = 1.

Other Status Codes

ComputerVisionErrorResponse

Error response.

Security

Ocp-Apim-Subscription-Key

Type: apiKey
In: header

Examples

Successful AnalyzeImage request

Sample request

POST https://westus.api.cognitive.microsoft.com/vision/v3.2/analyze?visualFeatures=Categories,Adult,Tags,Description,Faces,Color,ImageType,Objects,Brands&details=Celebrities,Landmarks&language=en


{
  "url": "{url}"
}

Sample response

{
  "categories": [
    {
      "name": "abstract_",
      "score": 0.00390625
    },
    {
      "name": "people_",
      "score": 0.83984375,
      "detail": {
        "celebrities": [
          {
            "name": "Satya Nadella",
            "faceRectangle": {
              "left": 597,
              "top": 162,
              "width": 248,
              "height": 248
            },
            "confidence": 0.999028444
          }
        ]
      }
    },
    {
      "name": "building_",
      "score": 0.984375,
      "detail": {
        "landmarks": [
          {
            "name": "Forbidden City",
            "confidence": 0.9829016923904419
          }
        ]
      }
    }
  ],
  "adult": {
    "isAdultContent": false,
    "isRacyContent": false,
    "isGoryContent": false,
    "adultScore": 0.0934349000453949,
    "racyScore": 0.06861349195241928,
    "goreScore": 0.012872257380997575
  },
  "tags": [
    {
      "name": "person",
      "confidence": 0.9897908568382263
    },
    {
      "name": "man",
      "confidence": 0.9449388980865479
    },
    {
      "name": "outdoor",
      "confidence": 0.938492476940155
    },
    {
      "name": "window",
      "confidence": 0.8951393961906433
    },
    {
      "name": "pangolin",
      "confidence": 0.7250059783791661,
      "hint": "mammal"
    }
  ],
  "description": {
    "tags": [
      "person",
      "man",
      "outdoor",
      "window",
      "glasses"
    ],
    "captions": [
      {
        "text": "Satya Nadella sitting on a bench",
        "confidence": 0.48293603002174407
      }
    ]
  },
  "requestId": "0dbec5ad-a3d3-4f7e-96b4-dfd57efe967d",
  "metadata": {
    "width": 1500,
    "height": 1000,
    "format": "Jpeg"
  },
  "modelVersion": "2021-04-01",
  "faces": [
    {
      "age": 44,
      "gender": "Male",
      "faceRectangle": {
        "left": 593,
        "top": 160,
        "width": 250,
        "height": 250
      }
    }
  ],
  "color": {
    "dominantColorForeground": "Brown",
    "dominantColorBackground": "Brown",
    "dominantColors": [
      "Brown",
      "Black"
    ],
    "accentColor": "873B59",
    "isBWImg": false
  },
  "imageType": {
    "clipArtType": 0,
    "lineDrawingType": 0
  },
  "objects": [
    {
      "rectangle": {
        "x": 0,
        "y": 0,
        "w": 50,
        "h": 50
      },
      "object": "tree",
      "confidence": 0.9,
      "parent": {
        "object": "plant",
        "confidence": 0.95
      }
    }
  ],
  "brands": [
    {
      "name": "Pepsi",
      "confidence": 0.857,
      "rectangle": {
        "x": 489,
        "y": 79,
        "w": 161,
        "h": 177
      }
    },
    {
      "name": "Coca-Cola",
      "confidence": 0.893,
      "rectangle": {
        "x": 216,
        "y": 55,
        "w": 171,
        "h": 372
      }
    }
  ]
}

Definitions

Name Description
AdultInfo

An object describing whether the image contains adult-oriented content and/or is racy.

BoundingRect

A bounding box for an area inside an image.

Category

An object describing identified category.

CategoryDetail

An object describing additional category details.

CelebritiesModel

An object describing possible celebrity identification.

ColorInfo

An object providing additional metadata describing color attributes.

ComputerVisionError

The API request error.

ComputerVisionErrorCodes

The error code.

ComputerVisionErrorResponse

The API error response.

ComputerVisionInnerError

Details about the API request error.

ComputerVisionInnerErrorCodeValue

The error code.

DescriptionExclude

Turn off specified domain models when generating the description.

Details

A string indicating which domain-specific details to return. Multiple values should be comma-separated. Valid visual feature types include: Celebrities - identifies celebrities if detected in the image, Landmarks - identifies notable landmarks in the image.

DetectedBrand

A brand detected in an image.

DetectedObject

An object detected in an image.

FaceDescription

An object describing a face identified in the image.

FaceRectangle

An object describing face rectangle.

Gender

Possible gender of the face.

ImageAnalysis

Result of AnalyzeImage operation.

ImageCaption

An image caption, i.e. a brief description of what the image depicts.

ImageDescriptionDetails

A collection of content tags, along with a list of captions sorted by confidence level, and image metadata.

ImageMetadata

Image metadata.

ImageTag

An entity observation in the image, along with the confidence score.

ImageType

An object providing possible image types and matching confidence levels.

ImageUrl
LandmarksModel

A landmark recognized in the image.

ObjectHierarchy

An object detected inside an image.

VisualFeatureTypes

A string indicating what visual feature types to return. Multiple values should be comma-separated. Valid visual feature types include: Categories - categorizes image content according to a taxonomy defined in documentation. Tags - tags the image with a detailed list of words related to the image content. Description - describes the image content with a complete English sentence. Faces - detects if faces are present. If present, generate coordinates, gender and age. ImageType - detects if image is clipart or a line drawing. Color - determines the accent color, dominant color, and whether an image is black&white. Adult - detects if the image is pornographic in nature (depicts nudity or a sex act), or is gory (depicts extreme violence or blood). Sexually suggestive content (aka racy content) is also detected. Objects - detects various objects within an image, including the approximate location. The Objects argument is only available in English. Brands - detects various brands within an image, including the approximate location. The Brands argument is only available in English.

AdultInfo

An object describing whether the image contains adult-oriented content and/or is racy.

Name Type Description
adultScore

number

Score from 0 to 1 that indicates how much the content is considered adult-oriented within the image.

goreScore

number

Score from 0 to 1 that indicates how gory is the image.

isAdultContent

boolean

A value indicating if the image contains adult-oriented content.

isGoryContent

boolean

A value indicating if the image is gory.

isRacyContent

boolean

A value indicating if the image is racy.

racyScore

number

Score from 0 to 1 that indicates how suggestive is the image.

BoundingRect

A bounding box for an area inside an image.

Name Type Description
h

integer

Height measured from the top-left point of the area, in pixels.

w

integer

Width measured from the top-left point of the area, in pixels.

x

integer

X-coordinate of the top left point of the area, in pixels.

y

integer

Y-coordinate of the top left point of the area, in pixels.

Category

An object describing identified category.

Name Type Description
detail

CategoryDetail

Details of the identified category.

name

string

Name of the category.

score

number

Scoring of the category.

CategoryDetail

An object describing additional category details.

Name Type Description
celebrities

CelebritiesModel[]

An array of celebrities if any identified.

landmarks

LandmarksModel[]

An array of landmarks if any identified.

CelebritiesModel

An object describing possible celebrity identification.

Name Type Description
confidence

number

Confidence level for the celebrity recognition as a value ranging from 0 to 1.

faceRectangle

FaceRectangle

Location of the identified face in the image.

name

string

Name of the celebrity.

ColorInfo

An object providing additional metadata describing color attributes.

Name Type Description
accentColor

string

Possible accent color.

dominantColorBackground

string

Possible dominant background color.

dominantColorForeground

string

Possible dominant foreground color.

dominantColors

string[]

An array of possible dominant colors.

isBWImg

boolean

A value indicating if the image is black and white.

ComputerVisionError

The API request error.

Name Type Description
code

ComputerVisionErrorCodes

The error code.

innererror

ComputerVisionInnerError

Inner error contains more specific information.

message

string

A message explaining the error reported by the service.

ComputerVisionErrorCodes

The error code.

Name Type Description
InternalServerError

string

InvalidArgument

string

InvalidRequest

string

ServiceUnavailable

string

ComputerVisionErrorResponse

The API error response.

Name Type Description
error

ComputerVisionError

Error contents.

ComputerVisionInnerError

Details about the API request error.

Name Type Description
code

ComputerVisionInnerErrorCodeValue

The error code.

message

string

Error message.

ComputerVisionInnerErrorCodeValue

The error code.

Name Type Description
BadArgument

string

CancelledRequest

string

DetectFaceError

string

FailedToProcess

string

InternalServerError

string

InvalidDetails

string

InvalidImageFormat

string

InvalidImageSize

string

InvalidImageUrl

string

InvalidModel

string

InvalidThumbnailSize

string

NotSupportedFeature

string

NotSupportedImage

string

NotSupportedLanguage

string

NotSupportedVisualFeature

string

StorageException

string

Timeout

string

Unspecified

string

UnsupportedMediaType

string

DescriptionExclude

Turn off specified domain models when generating the description.

Name Type Description
Celebrities

string

Landmarks

string

Details

A string indicating which domain-specific details to return. Multiple values should be comma-separated. Valid visual feature types include: Celebrities - identifies celebrities if detected in the image, Landmarks - identifies notable landmarks in the image.

Name Type Description
Celebrities

string

Landmarks

string

DetectedBrand

A brand detected in an image.

Name Type Description
confidence

number

Confidence score of having observed the brand in the image, as a value ranging from 0 to 1.

name

string

Label for the brand.

rectangle

BoundingRect

Approximate location of the detected brand.

DetectedObject

An object detected in an image.

Name Type Description
confidence

number

Confidence score of having observed the object in the image, as a value ranging from 0 to 1.

object

string

Label for the object.

parent

ObjectHierarchy

The parent object, from a taxonomy perspective. The parent object is a more generic form of this object. For example, a 'bulldog' would have a parent of 'dog'.

rectangle

BoundingRect

Approximate location of the detected object.

FaceDescription

An object describing a face identified in the image.

Name Type Description
age

integer

Possible age of the face.

faceRectangle

FaceRectangle

Rectangle in the image containing the identified face.

gender

Gender

Possible gender of the face.

FaceRectangle

An object describing face rectangle.

Name Type Description
height

integer

Height measured from the top-left point of the face, in pixels.

left

integer

X-coordinate of the top left point of the face, in pixels.

top

integer

Y-coordinate of the top left point of the face, in pixels.

width

integer

Width measured from the top-left point of the face, in pixels.

Gender

Possible gender of the face.

Name Type Description
Female

string

Male

string

ImageAnalysis

Result of AnalyzeImage operation.

Name Type Description
adult

AdultInfo

An object describing whether the image contains adult-oriented content and/or is racy.

brands

DetectedBrand[]

Array of brands detected in the image.

categories

Category[]

An array indicating identified categories.

color

ColorInfo

An object providing additional metadata describing color attributes.

description

ImageDescriptionDetails

A collection of content tags, along with a list of captions sorted by confidence level, and image metadata.

faces

FaceDescription[]

An array of possible faces within the image.

imageType

ImageType

An object providing possible image types and matching confidence levels.

metadata

ImageMetadata

Image metadata.

modelVersion

string

Version of the AI model.

objects

DetectedObject[]

Array of objects describing what was detected in the image.

requestId

string

Id of the REST API request.

tags

ImageTag[]

A list of tags with confidence level.

ImageCaption

An image caption, i.e. a brief description of what the image depicts.

Name Type Description
confidence

number

The level of confidence the service has in the caption.

text

string

The text of the caption.

ImageDescriptionDetails

A collection of content tags, along with a list of captions sorted by confidence level, and image metadata.

Name Type Description
captions

ImageCaption[]

A list of captions, sorted by confidence level.

tags

string[]

A collection of image tags.

ImageMetadata

Image metadata.

Name Type Description
format

string

Image format.

height

integer

Image height, in pixels.

width

integer

Image width, in pixels.

ImageTag

An entity observation in the image, along with the confidence score.

Name Type Description
confidence

number

The level of confidence that the entity was observed.

hint

string

Optional hint/details for this tag.

name

string

Name of the entity.

ImageType

An object providing possible image types and matching confidence levels.

Name Type Description
clipArtType

integer

Confidence level that the image is a clip art.

lineDrawingType

integer

Confidence level that the image is a line drawing.

ImageUrl

Name Type Description
url

string

Publicly reachable URL of an image.

LandmarksModel

A landmark recognized in the image.

Name Type Description
confidence

number

Confidence level for the landmark recognition as a value ranging from 0 to 1.

name

string

Name of the landmark.

ObjectHierarchy

An object detected inside an image.

Name Type Description
confidence

number

Confidence score of having observed the object in the image, as a value ranging from 0 to 1.

object

string

Label for the object.

parent

ObjectHierarchy

The parent object, from a taxonomy perspective. The parent object is a more generic form of this object. For example, a 'bulldog' would have a parent of 'dog'.

VisualFeatureTypes

A string indicating what visual feature types to return. Multiple values should be comma-separated. Valid visual feature types include: Categories - categorizes image content according to a taxonomy defined in documentation. Tags - tags the image with a detailed list of words related to the image content. Description - describes the image content with a complete English sentence. Faces - detects if faces are present. If present, generate coordinates, gender and age. ImageType - detects if image is clipart or a line drawing. Color - determines the accent color, dominant color, and whether an image is black&white. Adult - detects if the image is pornographic in nature (depicts nudity or a sex act), or is gory (depicts extreme violence or blood). Sexually suggestive content (aka racy content) is also detected. Objects - detects various objects within an image, including the approximate location. The Objects argument is only available in English. Brands - detects various brands within an image, including the approximate location. The Brands argument is only available in English.

Name Type Description
Adult

string

Brands

string

Categories

string

Color

string

Description

string

Faces

string

ImageType

string

Objects

string

Tags

string