Transparency note: Azure AI Search

Article
06/24/2024

What is a Transparency Note?

An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft's Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system, or share them with the people who will use or be affected by your system.

Microsoft's Transparency Notes are part of a broader effort at Microsoft to put our AI Principles into practice. To find out more, see the Microsoft AI principles.

The basics of Azure AI Search

Introduction

Azure AI Search gives developers tools, APIs, and SDKs for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications. Search is foundational to any application that surfaces data to users. Common scenarios include catalog or document search, online retail stores, or data exploration over proprietary content.

Searchable data can be in the form of text or vectors and ingested as-is from a data source or enriched by using AI to improve the overall search experience. Developers can convert data to vectors by using external machine learning models they call, and indexers can optionally include skill sets that support a powerful suite of data enrichment via several Azure AI Language capabilities, such as Named Entity Recognition (NER) and personally identifiable information (PII) detection, and Azure AI Vision capabilities, including optical character recognition (OCR) and image analysis.

See the following tabs for more information about how Azure AI Search improves the search experience by using Azure AI services or other AI systems to better understand the intent, semantics, and implied structure of a customer's content.

AI enrichment is the application of machine learning models from Azure AI services over content that is not easily searchable in its raw form. Through enrichment, analysis and inference are used to create searchable content and structure where none previously existed.

AI enrichment is an optional extension of the Azure AI Search indexer pipeline that connects to Azure AI services in the same region as a customer's search service. An enrichment pipeline has the same core components as a typical indexer (indexer, data source, index), plus a skill set that specifies the atomic enrichment steps. A skill set can be assembled by using built-in skills based on the Azure AI services APIs, such as Azure AI Vision and Azure AI Language, or custom skills that run external code that you provide.

Vector search is a method of information retrieval where documents and queries are represented in an index as vectors instead of plain text. In vector search, machine learning models, hosted externally from Azure AI Search, generate the vector representations of source inputs, which can be text, images, audio, or video content. This mathematical and normalized representation of content, called vector embeddings, provides a common basis for search scenarios.

When everything is a vector, a query can find a match in vector space, even if the associated original content is in a different media type, such as images versus text, or language than the query. The search engine scans the index looking for vector content that is most similar, that is, the closest, to the vector in the query. Matching on a mathematical vector representation instead of keywords makes it far more likely to find matches that share semantic meaning but are textually distinct, such as "car" and "auto," for example. This gives a more detailed introduction to vector embeddings and how the similarity algorithm works.

Key terms

Term	Definition
Vector embeddings	A highly optimized way to represent data that reflects meaning and understanding extracted by a machine learning model from images, audio, video, or text. Content is converted to vector embeddings both at indexing and query time. Vector search amounts to taking embeddings provided in a query and looking for the most similar embeddings in the index. Results are then typically sorted by the degree of similarity.
Embedding space	All vectors in the corpus for a single field occupy the same embedding space where similar items are located close to each other, and dissimilar items are farther apart. Higher dimensionality of the embedding space can include more information in a single vector and greatly improve the search experience, but at significant cost of index storage size and higher query latency.

Term

Definition

Vector embeddings

A highly optimized way to represent data that reflects meaning and understanding extracted by a machine learning model from images, audio, video, or text. Content is converted to vector embeddings both at indexing and query time. Vector search amounts to taking embeddings provided in a query and looking for the most similar embeddings in the index. Results are then typically sorted by the degree of similarity.

Embedding space

All vectors in the corpus for a single field occupy the same embedding space where similar items are located close to each other, and dissimilar items are farther apart. Higher dimensionality of the embedding space can include more information in a single vector and greatly improve the search experience, but at significant cost of index storage size and higher query latency.

Semantic ranker uses the context or semantic meaning of a query to compute a new relevance score that promotes results that are semantically closest to the intent of the original query to the top. The initial result set can come from a keyword search with BM25 ranking, vector search, or a hybrid search that includes both. It also creates and returns "captions" by extracting verbatim content found in the result and "highlights" to call attention to important content within the result. It can also return an "answer" if the query has the characteristics of a question ("what is the freezing point of water") and the result contains text having the characteristics of an answer ("water freezes at 0°C or 32°F").

Key terms

Term	Definition
Semantic ranker	Uses the context and semantic meaning of a query to improve search relevance by using language understanding to rerank search results.
Semantic captions and highlights	Extracts sentences and phrases from a document that best summarize the content, with highlights over key passages for easy scanning. Captions that summarize a result are useful when individual content fields are too dense for the results page. Highlighted text elevates the most relevant terms and phrases so that users can quickly determine why a match was considered relevant.
Semantic answers	Provides an optional and additional substructure returned from a semantic query. It provides a direct answer to a query that looks like a question. It requires that a document has text with the characteristics of an answer.

Capabilities

System behavior

Several built-in skills for AI enrichment in Azure AI Search take advantage of Azure AI services. See the Transparency Notes for each built-in skill linked below for considerations when choosing to use a skill:

Key Phrase Extraction Skill: Azure AI Language - Key Phrase Extraction
Language Detection Skill: Azure AI Language - Language Detection
Entity Linking Skill: Azure AI Language - Entity Linking
Entity Recognition Skill: Azure AI Language - Named Entity Recognition (NER)
PII Detection Skill: Azure AI Language - PII Detection
Sentiment Skill: Azure AI Language - Sentiment Analysis
Image Analysis Skill: Azure AI Vision - Image Analysis
OCR Skill: Azure AI Vision - OCR

Please see the documentation for each skill to learn more about their respective capabilities, limitations, performance, evaluations, and methods for integration and responsible use. Note that using these skills in combination may lead to compounding effects (for example, errors introduced when using OCR will carry through when using key phrase extraction).

Use cases

Example use cases

Because Azure AI Search is a full text search solution, the purpose of AI enrichment is to improve the search utility of unstructured content. Here are some examples of content enrichment scenarios supported by the built-in skills:

Translation and language detection enable multilingual search.
Entity recognition extracts people , places , and other entities from large chunks of text.
Key phrase extraction identifies and then outputs important terms.
OCR recognizes printed and handwritten text in binary files.
Image analysis describes image content and outputs the descriptions as searchable text fields.
Integrated vectorization is a preview feature that calls the Azure OpenAI embeddings model to vectorize data and store embeddings in Azure AI Search for similarity search.

Limitations

AI enrichment in Azure AI Search uses the indexer and data source features of the service to call Azure AI services to perform the content enrichment. Limitations of the indexers and data sources used in this process will apply. Review the indexer and data source documentation for more information about these related limitations. The limitations of each Azure AI Service used by the AI enrichment pipeline in Azure AI Search will also apply. See the transparency notes for each service for more information about these limitations.

Partager via

Transparency note: Azure AI Search

What is a Transparency Note?

The basics of Azure AI Search

Introduction

Key terms

Key terms

Capabilities

System behavior

Use cases

Example use cases

System behavior

Use cases

Example use cases

Considerations when choosing a use case

System behavior

Use cases

Example use cases

Considerations when choosing a use case

Limitations

Technical limitations, operational factors, and ranges

Best practices for improving system performance

Evaluating and integrating vector search for your use

Technical limitations, operational factors, and ranges

System performance

Best practices for improving system performance

Evaluation of Semantic ranker

Evaluation methods

Evaluation results

Evaluating and integrating semantic ranker for your use

Learn more about responsible AI

Learn more about Azure AI Search

Ressources supplémentaires