Create or Update Index (Preview REST API)
Important
This API reference is for a legacy version. See Data plane REST operations for updated reference documentation. Use the filter on the top left to select a version.
Applies to: 2023-07-01-Preview. This version is no longer supported. Upgrade immediately to a newer version.
Important
2023-07-01-Preview adds vector search.
- "vectorSearch" object, a configuration of vector search settings. Applies to vector search algorithms only.
- "Collection(Edm.Single)" data type, required for a vector field. Represents a single-precision floating-point number as the primitive type.
- "dimensions" property, required for a vector field. Represents the dimensionality of your vector embeddings.
- "vectorSearchConfiguration" property, required for a vector field. Selects the algorithm configuration for this field.
2021-04-30-Preview adds:
- "semanticConfiguration" used for scoping semantic ranking to specific fields.
- "identity", under "encryptionKey", used to retrieve a customer-managed encryption key from Azure Key Vault using a user-assigned managed identity.
2020-06-30-Preview adds:
- "normalizers", used for case-insensitivity on sorts and filters.
An index specifies the index schema, including the fields collection (field names, data types, and attributes), but also other constructs (suggesters, scoring profiles, and CORS configuration) that define other search behaviors.
You can use POST or PUT on a create request. For either one, the request body provides the object definition.
POST https://[servicename].search.windows.net/indexes?api-version=[api-version]
Content-Type: application/json
api-key: [admin key]
For update requests, use PUT and specify the index name on the URI.
PUT https://[servicename].search.windows.net/indexes/[index name]?api-version=[api-version]
Content-Type: application/json
api-key: [admin key]
HTTPS is required for all service requests. If the index doesn't exist, it's created. If it already exists, it's updated to the new definition.
Creating an index establishes the schema and metadata. Populating the index is a separate operation. For this step, you can use an indexer (see Indexer operations, available for supported data sources) or Add, Update or Delete Documents. The maximum number of indexes that you can create varies by pricing tier. Within each index, there are limits on individual elements. For more information, see Service limits for Azure AI Search.
Updating an existing index must include the full schema definition, including any original definitions that you want to preserve. In general, the best pattern for updates is to retrieve the index definition with a GET, modify it, and then update it with PUT.
Because an existing index contains content, many index modifications require an index drop and rebuild. The following schema changes are an exception to this rule:
Adding new fields
Adding or changing scoring profiles
Adding or changing semantic configurations
Changing CORS options
Changing existing fields with any of the following three modifications:
- Show or hide fields (
retrievable
: true | false) - Change the analyzer used at query time (
searchAnalyzer
) - Add or edit the synonymMap used at query time (
synonymMaps
)
- Show or hide fields (
To make any of the above schema changes to an existing index, specify the name of the index on the request URI, and then include a fully specified index definition with the new or changed elements.
When a new field is added, all existing documents in the index automatically have a null value for that field. No extra storage space is consumed until one of two things occurs: a value is provided for the new field (using merge), or new documents are added.
Updates to a suggester
have similar constraints: new fields can be added to a suggester
at the same time fields are added, but existing fields can't be removed from nor added to suggesters
without an index rebuild.
Updates to an analyzer, a tokenizer, a token filter or a char filter aren't allowed. New ones can be created with the changes you want, but you must take the index offline when adding the new analyzer definitions. Setting the allowIndexDowntime
flag to true in the index update request takes the index offline:
PUT https://[search service name].search.windows.net/indexes/[index name]?api-version=[api-version]&allowIndexDowntime=true
This operation takes your index offline for at least a few seconds, which means indexing and query requests fail until the index is back online and ready to handle requests.
Parameter | Description |
---|---|
service name | Required. Set this value to the unique, user-defined name of your search service. |
index name | Required on the URI if using PUT. The name must be lower case, start with a letter or number, have no slashes or dots, and be fewer than 128 characters. Dashes can't be consecutive. |
api-version | Required. See API versions for more versions. |
allowIndexDowntime | Optional. False by default. Set to true for certain updates, such as adding or modifying an analyzer, tokenizer, token filter, char filter, or similarity property. The index is taken offline during the update, usually not more than several seconds. |
The following table describes the required and optional request headers.
Fields | Description |
---|---|
Content-Type | Required. Set this value to application/json |
api-key | Optional if you're using Azure roles and a bearer token is provided on the request, otherwise a key is required. An api-key is a unique, system-generated string that authenticates the request to your search service. Create requests must include an api-key header set to your admin key (as opposed to a query key). See Connect to Azure AI Search using key authentication for details. |
The body of the request contains a schema definition, which includes the list of data fields within documents that are fed into this index.
The following JSON is a high-level representation of a schema that supports vector search. A schema requires a key field, and that key field can be searchable, filterable, sortable, and facetable.
A vector search field is of type Collection(Edm.Single)
. Because vector fields aren't textual, a vector field can't be used as a key, and it doesn't accept analyzers, normalizers, suggesters, or synonyms. It must have a "dimensions" property and an "vectorSearchConfiguration" property.
A schema that supports vector search can also support keyword search. Other nonvector fields in the index can use any analyzers, synonyms, and scoring profiles that you include in your index.
{
"name": (optional on PUT; required on POST) "Name of the index",
"description": (optional) "Description of the index",
"fields": [
{
"name": "name_of_field",
"type": "Edm.String | Edm.Int32 | Edm.Int64 | Edm.Double | Edm.Boolean | Edm.DateTimeOffset | Edm.GeographyPoint | Edm.ComplexType | Collection(Edm.String) | Collection(Edm.Int32) | Collection(Edm.Int64) | Collection(Edm.Single) | Collection(Edm.Double) | Collection(Edm.Boolean) | Collection(Edm.DateTimeOffset) | Collection(Edm.GeographyPoint) | Collection(Edm.ComplexType)",
"key": true | false (default, only Edm.String fields can be keys, enable on one field only),
"searchable": true (default where applicable) | false (only Edm.String and Collection(Edm.String) fields can be searchable),
"filterable": true (default) | false,
"sortable": true (default where applicable) | false (Collection(Edm.String) fields cannot be sortable),
"facetable": true (default where applicable) | false (Edm.GeographyPoint fields cannot be facetable),
"retrievable": true (default) | false,
"analyzer": "name_of_analyzer_for_search_and_indexing", (only if 'searchAnalyzer' and 'indexAnalyzer' are not set)
"searchAnalyzer": "name_of_search_analyzer", (only if 'indexAnalyzer' is set and 'analyzer' is not set)
"indexAnalyzer": "name_of_indexing_analyzer", (only if 'searchAnalyzer' is set and 'analyzer' is not set)
"normalizer": "name_of_normalizer", (optional, applies only to filterable, facetable, or sortable Edm.String and Collection(Edm.String) fields.)
"synonymMaps": [ "name_of_synonym_map" ], (optional, only one synonym map per field is currently supported),
"fields" : [ ... ], (optional, a list of sub-fields if this is a field of type Edm.ComplexType or Collection(Edm.ComplexType). Must be null or empty for simple fields.)
"dimensions": 1234, (required for vector field definitions. Prohibited for non-vector fields. Integer specifying the dimensionality of the embeddings generated by a machine learning model)
"vectorSearchConfiguration": "name_of_algorithm_config" (required for vector field definitions. Prohibited for non-vector fields. This should reference an algorithm configuration.)
}
],
"similarity": (optional) { },
"suggesters": (optional) [ ... ],
"scoringProfiles": (optional) [ ... ],
"semantic": (optional) { },
"vectorSearch": (optional) {
"algorithmConfigurations": [
{
"name": "name_of_algorithm_config",
"kind": "hnsw" (algorithm type. Only "hnsw" is supported currently.),
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
}
]},
"normalizers":(optional) [ ... ],
"analyzers":(optional) [ ... ],
"charFilters":(optional) [ ... ],
"tokenizers":(optional) [ ... ],
"tokenFilters":(optional) [ ... ],
"defaultScoringProfile": (optional) "Name of a custom scoring profile to use as the default",
"corsOptions": (optional) { },
"encryptionKey":(optional) { }
}
Request contains the following properties:
Property | Description |
---|---|
name | Required. The name of the index. An index name must only contain lowercase letters, digits or dashes, can't start or end with dashes and is limited to 128 characters. |
description | An optional description. |
fields | A collection of fields for this index, where each field has a name, a supported data type that conforms to the Entity Data Model (EDM), and attributes that define allowable actions on that field. The fields collection must have one field of type Edm.String with "key" set to "true". This field represents the unique identifier, sometimes called the document ID, for each document stored with the index. The fields collection now accepts vector fields. |
similarity | Optional. For services created before July 15, 2020, set this property to opt in the BM25 ranking algorithm. |
suggesters | Specifies a construct that stores prefixes for matching on partial queries like autocomplete and suggestions. |
scoringProfiles | Optional. Used for relevance tuning for full text queries. |
semantic | Optional. Defines the parameters of a search index that influence semantic search capabilities. A semantic configuration is required for semantic queries. For more information, see Create a semantic query. |
vectorSearch | Optional. Configures various vector search settings. Only vector search algorithms can be configured. |
normalizers | Optional. Normalizes the lexicographical ordering of strings, producing case-insensitive sorting and filtering output. |
analyzers, charFilters, tokenizers, tokenFilters | Optional. Specify these sections of the index if you're defining custom analyzers. By default, these sections are null. |
defaultScoringProfile | Name of a custom scoring profile that overwrites the default scoring behaviors. |
corsOptions | Optional. Used for cross-origin queries to your index. |
encryptionKey | Optional. Used for extra encryption of the index, through customer-managed encryption keys (CMK) in Azure Key Vault. Available for billable search services created on or after 2019-01-01. |
For a successful create request, you should see status code "201 Created". By default, the response body contains the JSON for the index definition that was created. However, if the Prefer request header is set to return=minimal, the response body is empty, and the success status code is "204 No Content" instead of "201 Created". This is true regardless of whether PUT or POST is used to create the index.
For a successful update request, you should see "204 No Content". By default the response body is empty. However, if the Prefer
request header is set to return=representation
, the response body contains the JSON for the index definition that was updated. In this case, the success status code is "200 OK".
Example: Vector
Vector search is implemented at the field level. This definition puts the focus on vector fields. Vector fields must be of type Collection(Edm.Single)
used to store single-precision floating point values. Vector fields have a "dimensions" property that holds the number of output dimensions supported by the machine learning model used to generate embeddings. For example, if you're using text-embedding-ada-002, the maximum number of output dimensions is 1536 per this document. The "algorithmConfiguration" is set to the name of the "vectorSearch" configuration in your index. You can define multiple in the index, and then specify one per field.
Many attributes apply only to nonvector fields. Attributes like "filterable", "sortable", "facetable", "analyzer", "normalizer", and "synonymMaps" are ignored for vector fields. Likewise, if you set vector-only properties like "dimensions" or "vectorSearchConfiguration" on field containing alpha-numeric content, those attributes are ignored.
{
"name": "{{index-name}}",
"fields": [
{
"name": "id",
"type": "Edm.String",
"key": true,
"searchable": true,
"retrievable": true,
"filterable": true
},
{
"name": "titleVector",
"type": "Collection(Edm.Single)",
"key": false,
"searchable": true,
"retrievable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"analyzer": "",
"searchAnalyzer": "",
"indexAnalyzer": "",
"normalizer": "",
"synonymMaps": "",
"dimensions": 1536,
"vectorSearchConfiguration": "my-vector-config"
},
{
"name": "contentVector",
"type": "Collection(Edm.Single)",
"key": false,
"searchable": true,
"retrievable": true,
"filterable": false,
"sortable": false,
"facetable": false,
"analyzer": "",
"searchAnalyzer": "",
"indexAnalyzer": "",
"normalizer": "",
"synonymMaps": "",
"dimensions": 1536,
"vectorSearchConfiguration": "my-vector-config"
}
],
"vectorSearch": {
"algorithmConfigurations": [
{
"name": "my-vector-config",
"kind": "hnsw",
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
}
]
}
}
Example: Field collections with vector and non-vector fields
Vector search is implemented at the field level. To support hybrid query scenarios, create pairs of fields for vector and nonvector queries. The "title", "titleVector", "content", "contentVector" fields follow this convention. If you also want to use semantic search, you must have nonvector text fields for those behaviors.
{
"name": "{{index-name}}",
"fields": [
{
"name": "id",
"type": "Edm.String",
"key": true,
"filterable": true
},
{
"name": "title",
"type": "Edm.String",
"searchable": true,
"retrievable": true
},
{
"name": "content",
"type": "Edm.String",
"searchable": true,
"retrievable": true
},
{
"name": "category",
"type": "Edm.String",
"filterable": true,
"searchable": true,
"retrievable": true
},
{
"name": "titleVector",
"type": "Collection(Edm.Single)",
"searchable": true,
"retrievable": true,
"dimensions": 1536,
"vectorSearchConfiguration": "my-vector-config"
},
{
"name": "contentVector",
"type": "Collection(Edm.Single)",
"searchable": true,
"retrievable": true,
"dimensions": 1536,
"vectorSearchConfiguration": "my-vector-config"
}
],
"corsOptions": {
"allowedOrigins": [
"*"
],
"maxAgeInSeconds": 60
},
"vectorSearch": {
"algorithmConfigurations": [
{
"name": "my-vector-config",
"kind": "hnsw",
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
}
]
},
"semantic": {
"configurations": [
{
"name": "my-semantic-config",
"prioritizedFields": {
"titleField": {
"fieldName": "title"
},
"prioritizedContentFields": [
{
"fieldName": "content"
}
],
"prioritizedKeywordsFields": [
{
"fieldName": "category"
}
]
}
}
]
}
}
Example: An index schema with simple and complex fields
The first example shows a complete index schema with simple and complex fields. At least one string field must have "key" set to true.
{
"name": "hotels",
"fields": [
{ "name": "HotelId", "type": "Edm.String", "key": true, "filterable": true },
{ "name": "HotelName", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": true, "facetable": false },
{ "name": "Description", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "en.microsoft" },
{ "name": "Description_fr", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "fr.microsoft" },
{ "name": "Category", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true },
{ "name": "Tags", "type": "Collection(Edm.String)", "searchable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "tagsAnalyzer", "normalizer": "tagsNormalizer" },
{ "name": "ParkingIncluded", "type": "Edm.Boolean", "filterable": true, "sortable": true, "facetable": true },
{ "name": "LastRenovationDate", "type": "Edm.DateTimeOffset", "filterable": true, "sortable": true, "facetable": true },
{ "name": "Rating", "type": "Edm.Double", "filterable": true, "sortable": true, "facetable": true },
{ "name": "Address", "type": "Edm.ComplexType",
"fields": [
{ "name": "StreetAddress", "type": "Edm.String", "filterable": false, "sortable": false, "facetable": false, "searchable": true },
{ "name": "City", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true, "normalizer": "lowercase" },
{ "name": "StateProvince", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true },
{ "name": "PostalCode", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true },
{ "name": "Country", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true }
]
},
{ "name": "Location", "type": "Edm.GeographyPoint", "filterable": true, "sortable": true },
{ "name": "Rooms", "type": "Collection(Edm.ComplexType)",
"fields": [
{ "name": "Description", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "en.lucene" },
{ "name": "Description_fr", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "fr.lucene" },
{ "name": "Type", "type": "Edm.String", "searchable": true },
{ "name": "BaseRate", "type": "Edm.Double", "filterable": true, "facetable": true },
{ "name": "BedOptions", "type": "Edm.String", "searchable": true },
{ "name": "SleepsCount", "type": "Edm.Int32", "filterable": true, "facetable": true },
{ "name": "SmokingAllowed", "type": "Edm.Boolean", "filterable": true, "facetable": true },
{ "name": "Tags", "type": "Collection(Edm.String)", "searchable": true, "filterable": true, "facetable": true, "analyzer": "tagsAnalyzer", "normalizer": "tagsNormalizer" }
]
}
],
"suggesters": [ ],
"analyzers": [ ],
"normalizers": [ ],
"encryptionKey": [ ]
}
Example: Suggesters
A suggester definition should specify "searchable" and "retrievable" string fields (in the REST APIs, all simple fields are "retrievable": true
by default). After a suggester is defined, you can reference it by name on query requests that use either the Suggestions API or Autocomplete API, depending on whether you want to return a match or the remainder of a query term.
{
"name": "hotels",
"fields": [
{ "name": "HotelId", "type": "Edm.String", "key": true, "filterable": true },
{ "name": "HotelName", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": true, "facetable": false },
{ "name": "Description", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "en.microsoft" },
{ "name": "Description_fr", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "fr.microsoft" },
{ "name": "Category", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true },
{ "name": "Tags", "type": "Collection(Edm.String)", "searchable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "tagsAnalyzer", "normalizer": "tagsNormalizer" },
{ "name": "Rating", "type": "Edm.Double", "filterable": true, "sortable": true, "facetable": true },
],
"suggesters": [
{
"name": "sg",
"searchMode": "analyzingInfixMatching",
"sourceFields": ["HotelName", "Category", "Tags"]
}
]
}
Example: Analyzers and normalizers
Analyzers and normalizers are referenced on field definitions and can be either predefined or custom. If you're using custom analyzers or normalizers, specify them in the index in the "analyzers" and "normalizers" sections.
The following example illustrates custom analyzers and normalizers for "Tags". It also demonstrates a predefined normalizer (standard) and analyzer (en.microsoft) for "HotelName" and "Description", respectively.
{
"name": "hotels",
"fields": [
{ "name": "HotelId", "type": "Edm.String", "key": true, "filterable": true },
{ "name": "HotelName", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": true, "facetable": false, "normalizer": standard },
{ "name": "Description", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "en.microsoft"},
{ "name": "Description_fr", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "fr.microsoft" },
{ "name": "Category", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": true },
{ "name": "Tags", "type": "Collection(Edm.String)", "searchable": true, "filterable": true, "sortable": false, "facetable": true, "analyzer": "tagsAnalyzer", "normalizer": "tagsNormalizer" },
{ "name": "Rating", "type": "Edm.Double", "filterable": true, "sortable": true, "facetable": true },
],
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "tagsAnalyzer",
"charFilters": [ "html_strip" ],
"tokenizer": "standard_v2"
}
],
"normalizers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomNormalizer",
"name": "tagsNormalizer",
"tokenFilters": [ "asciifolding", "lowercase" ]
}
]
}
Example: Similarity for search relevance
This property sets the ranking algorithm used to create a relevance score in search results of a full text search query. In services created after July 15, 2020, this property is ignored because the similarity algorithm is always BM25. For existing services created before July 15, 2020, you can opt in to BM25 by setting this construct as follows:
"similarity": {
"@odata.type": "#Microsoft.Azure.Search.BM25Similarity"
}
Example: CORS Options
Client-side JavaScript can't call any APIs by default since the browser prevents all cross-origin requests. To allow cross-origin queries to your index, enable CORS (Cross-origin resource sharing (Wikipedia)) by setting the corsOptions
attribute. For security reasons, only query APIs support CORS.
{
"name": "hotels",
"fields": [ omitted for brevity ],
"suggesters": [ omitted for brevity ],
"analyzers": [ omitted for brevity ],
"corsOptions": (optional) {
"allowedOrigins": ["*"] | ["https://docs.microsoft.com:80", "https://azure.microsoft.com:80", ...],
"maxAgeInSeconds": (optional) max_age_in_seconds (non-negative integer)
}
}
Example: Encryption keys with access credentials
Encryption keys are customer-managed keys used for extra encryption. For more information, see Encryption using customer-managed keys in Azure Key Vault.
{
"name": "hotels",
"fields": [ omitted for brevity ],
"suggesters": [ omitted for brevity ],
"analyzers": [ omitted for brevity ],
"encryptionKey": (optional) {
"keyVaultKeyName": "Name of the Azure Key Vault key used for encryption",
"keyVaultKeyVersion": "Version of the Azure Key Vault key",
"keyVaultUri": "URI of Azure Key Vault, also referred to as DNS name, that provides the key. An example URI might be https://my-keyvault-name.vault.azure.net",
"accessCredentials": (optional, only if not using managed system identity) {
"applicationId": "AAD Application ID that was granted access permissions to your specified Azure Key Vault",
"applicationSecret": "Authentication key of the specified AAD application)"
}
}
}
Example: Encryption keys with managed identity
You can authenticate to Azure Key Vault using a system-assigned or user-assigned (preview) managed identity. In this case, omit access credentials, or set to null. The following example shows a user-assigned managed identity. To use a system-assigned managed identity, omit access credentials and identity. As long as the system identity of your search service has permissions in Azure Key Vault, the connection request should succeed.
{
"name": "hotels",
"fields": [ omitted for brevity ],
"suggesters": [ omitted for brevity ],
"analyzers": [ omitted for brevity ],
"encryptionKey": (optional) {
"keyVaultKeyName": "Name of the Azure Key Vault key used for encryption",
"keyVaultKeyVersion": "Version of the Azure Key Vault key",
"keyVaultUri": "URI of Azure Key Vault, also referred to as DNS name, that provides the key. An example URI might be https://my-keyvault-name.vault.azure.net",
"accessCredentials": null,
"identity" : {
"@odata.type": "#Microsoft.Azure.Search.DataUserAssignedIdentity",
"userAssignedIdentity" : "/subscriptions/[subscription ID]/resourceGroups/[resource group name]/providers/Microsoft.ManagedIdentity/userAssignedIdentities/[managed identity name]"
}
}
}
Example: Scoring Profiles
A scoring profile is a section of the schema that defines custom scoring behaviors that let you influence which documents appear higher in the search results. Scoring profiles are made up of field weights and functions. To use them, you specify a profile by name on the query string. For more information, see Add scoring profiles to a search index (Azure AI Search REST API) for details.
{
"name": "hotels",
"fields": [ omitted for brevity ],
"suggesters": [ omitted for brevity ],
"analyzers": [ omitted for brevity ],
"scoringProfiles": [
{
"name": "name of scoring profile",
"text": (optional, only applies to searchable fields) {
"weights": {
"searchable_field_name": relative_weight_value (positive #'s),
...
}
},
"functions": (optional) [
{
"type": "magnitude | freshness | distance | tag",
"boost": # (positive number used as multiplier for raw score != 1),
"fieldName": "...",
"interpolation": "constant | linear (default) | quadratic | logarithmic",
"magnitude": {
"boostingRangeStart": #,
"boostingRangeEnd": #,
"constantBoostBeyondRange": true | false (default)
},
"freshness": {
"boostingDuration": "..." (value representing timespan leading to now over which boosting occurs)
},
"distance": {
"referencePointParameter": "...", (parameter to be passed in queries to use as reference location)
"boostingDistance": # (the distance in kilometers from the reference location where the boosting range ends)
},
"tag": {
"tagsParameter": "..." (parameter to be passed in queries to specify a list of tags to compare against target fields)
}
}
],
"functionAggregation": (optional, applies only when functions are specified)
"sum (default) | average | minimum | maximum | firstMatching"
}
]
}
Example: Semantic Configurations
A semantic configuration is a part of an index definition that's used to configure which fields are utilized by semantic search for ranking, captions, highlights, and answers. To use semantic search, you must specify the name of a semantic configuration at query time. For more information, see Create a semantic query.
{
"name": "hotels",
"fields": [ omitted for brevity ],
"suggesters": [ omitted for brevity ],
"analyzers": [ omitted for brevity ],
"semantic": {
"configurations": [
{
"name": "my-semantic-config",
"prioritizedFields": {
"titleField": {
"fieldName": "hotelName"
},
"prioritizedContentFields": [
{
"fieldName": "description"
},
{
"fieldName": "description_fr"
}
],
"prioritizedKeywordsFields": [
{
"fieldName": "tags"
},
{
"fieldName": "category"
}
]
}
}
]
}
}
Link | Description |
---|---|
corsOptions | Lists the domains or origins that are granted to your index. |
defaultScoringProfile | Name of a custom scoring profile that overwrites the default scoring behaviors. |
encryptionKey | Configures a connection to Azure Key Vault for customer-managed encryption. |
fields | Sets definitions and attributes of a field in a search index. |
normalizers | Configures a custom normalizer. Normalizes the lexicographical ordering of strings, producing case-insensitive sorting, faceting, and filtering output. |
semantic | Configures fields used by semantic search for ranking, captions, highlights, and answers. |
scoringProfiles | Used for relevance tuning for full text queries. |
similarity | |
suggesters | Configures internal prefix storage for matching on partial queries like autocomplete and suggestions. |
vectorSearch | Configures the algorithm used for vector fields. |
Client-side JavaScript can't call any APIs by default since the browser prevents all cross-origin requests. To allow cross-origin queries to your index, enable CORS (Cross-Origin Resource Sharing) by setting the "corsOptions" attribute. For security reasons, only query APIs support CORS.
Attribute | Description |
---|---|
allowedOrigins | Required. A comma-delimited list of origins that are granted access to your index, where each origin is typically of the form protocol://<fully-qualified-domain-name>:<port> (although the <port> is often omitted). This means that any JavaScript code served from those origins is allowed to query your index (assuming it provides a valid API key). If you want to allow access to all origins, specify * as a single item in the "allowedOrigins" array. This isn't recommended for production, but might be useful for development or debugging. |
maxAgeInSeconds | Optional. Browsers use this value to determine the duration (in seconds) to cache CORS preflight responses. This must be a non-negative integer. Performance improves if this value is larger, but those gains are offset by the amount of time required for CORS policy changes to take effect. If it isn't set, a default duration of 5 minutes is used. |
Optional. A string that is the name of a custom scoring profile defined in the index. A default profile is invoked whenever a custom profile isn't explicitly specified on the query string. For more information, see Add scoring profiles to a search index.
Configures a connection to Azure Key Vault for supplemental customer-managed encryption keys (CMK). Available for billable search services created on or after January 1, 2019.
A connection to the key vault must be authenticated. You can use either "accessCredentials" or a managed identity for this purpose.
Managed identities can be system or user-assigned (preview). If the search service has both a system-assigned managed identity and a role assignment that grants read access to the key vault, you can omit both "identity" and "accessCredentials", and the request will authenticate using the managed identity. If the search service has user-assigned identity and role assignment, set the "identity" property to the resource ID of that identity.
Attribute | Description |
---|---|
keyVaultKeyName | Required. Name of the Azure Key Vault key used for encryption. |
keyVaultKeyVersion | Required. Version of the Azure Key Vault key. |
keyVaultUri | Required. URI of Azure Key Vault (also referred to as DNS name) that provides the key. An example URI might be https://my-keyvault-name.vault.azure.net |
accessCredentials | Optional. Omit this property if you're using a managed identity. Otherwise, the properties of "accessCredentials" include: "applicationId" (an Azure Active Directory Application ID that has access permissions to your specified Azure Key Vault). "applicationSecret" (the authentication key of the specified Azure AD application). |
identity | Optional unless you're using a user-assigned managed identity for the search service connection to Azure Key Vault. The format is "/subscriptions/[subscription ID]/resourceGroups/[resource group name]/providers/Microsoft.ManagedIdentity/userAssignedIdentities/[managed identity name]" . |
Contains information about attributes on a field definition.
Attribute | Description |
---|---|
name | Required. Sets the name of the field, which must be unique within the fields collection of the index or parent field. |
type | Required. Sets the data type for the field. Fields can be simple or complex. Simple fields are of primitive types, like Edm.String for text or Edm.Int32 for integers. Complex fields can have subfields that are themselves either simple or complex. This allows you to model objects and arrays of objects, which in turn enables you to upload most JSON object structures to your index. Collection(Edm.Single) accommodates single-precision floating point values. It's used only for vector fields, and it's required. See Supported data types for the complete list of supported types. |
key | Required. Set this attribute to true to designate that a field's values uniquely identify documents in the index. The maximum length of values in a key field is 1024 characters. Exactly one top-level field in each index must be chosen as the key field and it must be of type Edm.String . Default is false for simple fields and null for complex fields. Key fields can be used to look up documents directly and update or delete specific documents. The values of key fields are handled in a case-sensitive manner when looking up or indexing documents. See Lookup Document and Add, Update or Delete Documents for details. |
retrievable | Indicates whether the field can be returned in a search result. Set this attribute to false if you want to use a field (for example, margin) as a filter, sorting, or scoring mechanism but don't want the field to be visible to the end user. This attribute must be true for key fields, and it must be null for complex fields. This attribute can be changed on existing fields. Setting retrievable to true doesn't cause any increase in index storage requirements. Default is true for simple fields and null for complex fields. |
searchable | Indicates whether the field is full-text searchable and can be referenced in search queries. This means it undergoes lexical analysis such as word-breaking during indexing. If you set a searchable field to a value like "Sunny day", internally it's normalized into the individual tokens "sunny" and "day". This enables full-text searches for these terms. Fields of type Edm.String or Collection(Edm.String) are searchable by default. This attribute must be false for simple fields of other nonstring data types, and it must be null for complex fields. A searchable field consumes extra space in your index since Azure AI Search processes the contents of those fields and organize them in auxiliary data structures for performant searching. If you want to save space in your index and you don't need a field to be included in searches, set searchable to false . See How full-text search works in Azure AI Search for details. |
filterable | Indicates whether to enable the field to be referenced in $filter queries. Filterable differs from searchable in how strings are handled. Fields of type Edm.String or Collection(Edm.String) that are filterable don't undergo lexical analysis, so comparisons are for exact matches only. For example, if you set such a field f to "Sunny day", $filter=f eq 'sunny' finds no matches, but $filter=f eq 'Sunny day' will. This attribute must be null for complex fields. Default is true for simple fields and null for complex fields. To reduce index size, set this attribute to false on fields that you won't be filtering on. |
sortable | Indicates whether to enable the field to be referenced in $orderby expressions. By default Azure AI Search sorts results by score, but in many experiences users want to sort by fields in the documents. A simple field can be sortable only if it's single-valued (it has a single value in the scope of the parent document). Simple collection fields can't be sortable, since they're multi-valued. Simple subfields of complex collections are also multi-valued, and therefore can't be sortable. This is true whether it's an immediate parent field, or an ancestor field, that's the complex collection. Complex fields can't be sortable and the sortable attribute must be null for such fields. The default for sortable is true for single-valued simple fields, false for multi-valued simple fields, and null for complex fields. |
facetable | Indicates whether to enable the field to be referenced in facet queries. Typically used in a presentation of search results that includes hit count by category (for example, search for digital cameras and see hits by brand, by megapixels, by price, and so on). This attribute must be null for complex fields. Fields of type Edm.GeographyPoint or Collection(Edm.GeographyPoint) can't be facetable. Default is true for all other simple fields. To reduce index size, set this attribute to false on fields that you won't be faceting on. |
analyzer | Sets the lexical analyzer for tokenizing strings during indexing and query operations. Valid values for this property include language analyzers, built-in analyzers, and custom analyzers. The default is standard.lucene . This attribute can only be used with searchable fields, and it can't be set together with either searchAnalyzer or indexAnalyzer. Once the analyzer is chosen and the field is created in the index, it can't be changed for the field. Must be null for complex fields. |
searchAnalyzer | Set this property together with indexAnalyzer to specify different lexical analyzers for indexing and queries. If you use this property, set analyzer to null and make sure indexAnalyzer is set to an allowed value. Valid values for this property include built-in analyzers and custom analyzers. This attribute can be used only with searchable fields. The search analyzer can be updated on an existing field since it's only used at query-time. Must be null for complex fields. |
indexAnalyzer | Set this property together with searchAnalyzer to specify different lexical analyzers for indexing and queries. If you use this property, set analyzer to null and make sure searchAnalyzer is set to an allowed value. Valid values for this property include built-in analyzers and custom analyzers. This attribute can be used only with searchable fields. Once the index analyzer is chosen, it can't be changed for the field. Must be null for complex fields. |
normalizer | Sets the normalizer for filtering, sorting, and faceting operations. It can be the name of a predefined normalizer or a custom normalizer defined within index. The default is null , which results in an exact match on verbatim, un-analyzed text. This attribute can be used only with Edm.String and Collection(Edm.String) fields that have at least one of filterable, sortable, or facetable set to true. A normalizer can only be set on the field when added to the index and can't be changed later. Must be null for complex fields. Valid values for a predefined normalizer include: standard - Lowercases the text followed by asciifolding. lowercase - Transforms characters to lowercase. uppercase - Transforms characters to uppercase. asciifolding - Transforms characters that aren't in the Basic Latin Unicode block to their ASCII equivalent, if one exists. For example, changing "à" to "a". elision - Removes elision from beginning of the tokens. |
synonymMaps | A list of the names of synonym maps to associate with this field. This attribute can be used only with searchable fields. Currently only one synonym map per field is supported. Assigning a synonym map to a field ensures that query terms targeting that field are expanded at query-time using the rules in the synonym map. This attribute can be changed on existing fields. Must be null or an empty collection for complex fields. |
fields | A list of subfields if this is a field of type Edm.ComplexType or Collection(Edm.ComplexType) . Must be null or empty for simple fields. See How to model complex data types in Azure AI Search for more information on how and when to use subfields. |
dimensions | Integer. Required for vector fields. **This must match the output embedding size of your embedding model. For example, for a popular Azure OpenAI model text-embedding-ada-002 , its output dimensions is 1536, so this would be the dimensions to set for that vector field. The dimensions attribute has a minimum of 2 and a maximum of 2048 floating point values each. |
vectorSearchConfiguration | Required for vector field definitions. Specifies the name of the "vectorSearch" algorithm configuration used by the vector field. Once the field is created, you can't change the name of the vectorSearchConfiguration, but you can change the properties of the algorithm configuration in the index. This allows for adjustments to the algorithm type and parameters. |
Note
Fields of type Edm.String
that are filterable, sortable, or facetable can be at most 32 kilobytes in length. This is because values of such fields are treated as a single search term, and the maximum length of a term in Azure AI Search is 32 kilobytes. If you need to store more text than this in a single string field, you will need to explicitly set filterable, sortable, and facetable to false
in your index definition.
Setting a field as searchable, filterable, sortable, or facetable has an impact on index size and query performance. Don't set those attributes on fields that are not meant to be referenced in query expressions.
If a field is not set to be searchable, filterable, sortable, or facetable, the field can't be referenced in any query expression. This is useful for fields that are not used in queries, but are needed in search results.
Defines a custom normalizer that has a user-defined combination of character filters and token filters. After defining a custom normalizer in the index, you can specify it by name on a field definition.
Attribute | Description |
---|---|
name | Required. String field that specifies either a user-defined custom normalizer. |
charFilters | Used in a custom normalizer. It can be one or more the available character filters supported for use in a custom normalizer: mapping pattern_replace |
tokenFilters | Used in a custom normalizer. It can be one or more of the available token tilters supported for use in a custom normalizer: arabic_normalization asciifolding cjk_width elision german_normalization hindi_normalization indic_normalization persian_normalization scandinavian_normalization scandinavian_folding sorani_normalization lowercase uppercase |
Scoring profiles apply to full text search. A profile is defined in an index and specifies custom logic that can award higher search scores to matching documents that meet the criteria defined in the profile. You can create multiple scoring profiles, and then assign the one you want to a query.
If you create a custom profile, you can make it the default by setting defaultScoringProfile
. For more information, see Add scoring profiles to a search index.
A semantic configuration is a part of an index definition that's used to configure which fields are utilized by semantic search for ranking, captions, highlights, and answers. Semantic configurations are made up of a title field, prioritized content fields, and prioritized keyword fields. At least one field needs to be specified for each of the three subproperties (titleField, prioritizedKeywordsFields and prioritizedContentFields). Any field of type Edm.String
or Collection(Edm.String)
can be used as part of a semantic configuration.
To use semantic search, you must specify the name of a semantic configuration at query time. For more information, see Create a semantic query.
{
"name": "hotels",
"fields": [ omitted for brevity ],
"suggesters": [ omitted for brevity ],
"analyzers": [ omitted for brevity ],
"semantic": {
"configurations": [
{
"name": "name of the semantic configuration",
"prioritizedFields": {
"titleField": {
"fieldName": "..."
},
"prioritizedContentFields": [
{
"fieldName": "..."
},
{
"fieldName": "..."
}
],
"prioritizedKeywordsFields": [
{
"fieldName": "..."
},
{
"fieldName": "..."
}
]
}
}
]
}
}
Attribute | Description |
---|---|
name | Required. The name of the semantic configuration. |
prioritizedFields | Required. Describes the title, content, and keyword fields to be used for semantic ranking, captions, highlights, and answers. At least one of the three subproperties (titleField, prioritizedKeywordsFields and prioritizedContentFields) need to be set. |
prioritizedFields.titleField | Defines the title field to be used for semantic ranking, captions, highlights, and answers. If you don't have a title field in your index, leave this blank. |
prioritizedFields.prioritizedContentFields | Defines the content fields to be used for semantic ranking, captions, highlights, and answers. For the best result, the selected fields should contain text in natural language form. The order of the fields in the array represents their priority. Fields with lower priority may get truncated if the content is long. |
prioritizedFields.prioritizedKeywordsFields | Defines the keyword fields to be used for semantic ranking, captions, highlights, and answers. For the best result, the selected fields should contain a list of keywords. The order of the fields in the array represents their priority. Fields with lower priority may get truncated if the content is long. |
Optional property that applies to services created before July 15, 2020. For those services, you can set this property to use the BM25 ranking algorithm that was introduced in July 2020. Valid values include "#Microsoft.Azure.Search.ClassicSimilarity"
(the previous default) or "#Microsoft.Azure.Search.BM25Similarity"
.
For all services created after July 2020, setting this property has no effect. All newer services use BM25 as the sole ranking algorithm for full text search. For more information, see Ranking algorithms in Azure AI Search.
Specifies a construct that stores prefixes for matching on partial queries like autocomplete and suggestions.
Attribute | Description |
---|---|
name | Required. The name of the suggester. |
sourceFields | Required. One or more string fields for which you're enabling autocomplete or suggested results. |
searchMode | Required, and always set to analyzingInfixMatching . It specifies that matching occurs on any term in the query string. |
The vectorSearch object allows configuration of vector search properties. Only algorithm configurations can be configured currently. This allows configuration of the algorithm type and algorithm parameters used for vector fields. You can have multiple configurations. Any configurations referenced by a vector field can't be modified nor deleted. Any configurations that aren't referenced may be modified or deleted. A vector field definition (in the fields collection) must specify which vector search algorithm configuration (through the vectorSearchConfiguration
property) that the field is using.
"vectorSearch": {
"algorithmConfigurations": [
{
"name": "my-vector-config",
"kind": "hnsw",
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
}
]
}
Attribute | Description |
---|---|
name | Required. The name of the algorithm configuration. |
kind | The algorithm type to use. Only '"hnsw"` is supported, which is the Hierarchical Navigable Small World (HNSW) algorithm. |
hnswParameters | Optional. Parameters for "hnsw" algorithm. If this object is omitted, default values are used. |
This object contains the customizations to hnsw
algorithm parameters. All properties are optional and default values are used if any are omitted.
Attribute | Description |
---|---|
metric | String. The similarity metric to use for vector comparisons. For hnsw , the allowed values are "cosine", "euclidean", and "dotProduct". The default value is "cosine". |
m | Integer. The number of bi-directional links created for every new element during construction. The default is 4. The allowable range is 4 to 10. Larger values lead to denser graphs, improving query performance, but require more memory and computation. |
efConstruction | Integer. The size of the dynamic list for the nearest neighbors used during indexing. The default is 400. The allowable range is 100 to 1000.Larger values lead to a better index quality, but require more memory and computation. |
efSearch | Integer. The size of the dynamic list containing the nearest neighbors, which is used during search time. The default is 500. The allowable range is 100 to 1000. Increasing this parameter may improve search results, but it slows down query performance. |
Since efSearch
is a query-time parameter, this value can be updated even if an existing field is using an algorithm configuration.