教學課程：最大化相關性（Azure AI 搜尋中的RAG）

發行項
11/19/2024

在本教學課程中，瞭解如何改善RAG解決方案中使用的搜尋結果相關性。相關性調整是提供符合使用者期望的RAG解決方案的重要因素。在 Azure AI 搜尋中，相關性調整包括 L2 語意排名和評分配置檔。

若要實作這些功能，您可以重新流覽索引架構，以新增語意排名和評分配置檔的組態。接著，您可以使用新的建構重新執行查詢。

在本教學課程中，您會修改現有的搜尋索引和查詢，以使用：

L2 語意排名
檔提升的評分配置檔

本教學課程會更新索引管線所建立的搜尋索引。更新不會影響現有的內容，因此不需要重建，而且您不需要重新執行索引器。

注意

預覽版中有更多的相關性功能，包括向量查詢加權和設定最小閾值，但我們從本教學課程中省略它們，因為它們處於預覽狀態。

必要條件

具有 Python 延伸模組和 Jupyter 套件的 Visual Studio Code。
Azure AI 搜尋、受控識別和語意排名的基本層或更高層級，與 Azure OpenAI 和 Azure AI 服務位於相同的區域中。
Azure OpenAI，在與 Azure AI 搜尋相同的區域中部署文字內嵌-002 和 gpt-35-turbo。

下載範例

範例筆記本包含更新的索引和查詢要求。

執行比較基準查詢

讓我們從新的查詢開始，「是否有海洋和大型水體特有的雲層？」。

若要在新增相關性功能之後比較結果，請在新增語意排名或評分配置檔之前，先對現有的索引架構執行查詢。

針對 Azure Government 雲端，將令牌提供者上的 API 端點修改為 "https://cognitiveservices.azure.us/.default"。

from azure.search.documents import SearchClient
from openai import AzureOpenAI

token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
openai_client = AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,
     azure_ad_token_provider=token_provider
 )

deployment_name = "gpt-4o"

search_client = SearchClient(
     endpoint=AZURE_SEARCH_SERVICE,
     index_name=index_name,
     credential=credential
 )

GROUNDED_PROMPT="""
You are an AI assistant that helps users learn from the information found in the source material.
Answer the query using only the sources provided below.
Use bullets if the answer has multiple points.
If the answer is longer than 3 sentences, provide a summary.
Answer ONLY with the facts listed in the list of sources below. Cite your source when you answer the question
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Query: {query}
Sources:\n{sources}
"""

# Focused query on cloud formations and bodies of water
query="Are there any cloud formations specific to oceans and large bodies of water?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")

search_results = search_client.search(
    search_text=query,
    vector_queries= [vector_query],
    select=["title", "chunk", "locations"],
    top=5,
)

sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])

response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)

print(response.choices[0].message.content)

此要求的輸出看起來可能如下列範例所示。

Yes, there are cloud formations specific to oceans and large bodies of water. 
A notable example is "cloud streets," which are parallel rows of clouds that form over 
the Bering Strait in the Arctic Ocean. These cloud streets occur when wind blows from 
a cold surface like sea ice over warmer, moister air near the open ocean, leading to 
the formation of spinning air cylinders. Clouds form along the upward cycle of these cylinders, 
while skies remain clear along the downward cycle (Source: page-21.pdf).

更新語意排名和評分配置檔的索引

在上一個教學課程中，您已為RAG工作負載設計索引架構。我們特意省略該架構的相關性增強功能，讓您可以專注於基本概念。延遲與個別練習的相關性，可讓您在進行更新之後對搜尋結果品質的前後比較。

更新 import 語句，以包含語意排名和評分配置檔的類別。

 from azure.identity import DefaultAzureCredential
 from azure.identity import get_bearer_token_provider
 from azure.search.documents.indexes import SearchIndexClient
 from azure.search.documents.indexes.models import (
     SearchField,
     SearchFieldDataType,
     VectorSearch,
     HnswAlgorithmConfiguration,
     VectorSearchProfile,
     AzureOpenAIVectorizer,
     AzureOpenAIVectorizerParameters,
     SearchIndex,
     SemanticConfiguration,
     SemanticPrioritizedFields,
     SemanticField,
     SemanticSearch,
     ScoringProfile,
     TagScoringFunction,
     TagScoringParameters
 )

將下列語意組態新增至搜尋索引。您可以在筆記本的更新架構步驟中找到此範例。

# New semantic configuration
semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="title"),
        keywords_fields=[SemanticField(field_name="locations")],
        content_fields=[SemanticField(field_name="chunk")]
    )
)

# Create the semantic settings with the configuration
semantic_search = SemanticSearch(configurations=[semantic_config])

語意設定具有名稱和已排定優先順序的欄位清單，以協助將輸入優化為語意排名器。如需詳細資訊，請參閱設定語意排名。

接下來，新增評分配置檔定義。如同語意組態，評分配置檔可以隨時新增至索引架構。此範例也會在筆記本的更新架構步驟中，遵循語意設定。
```
# New scoring profile
scoring_profiles = [  
    ScoringProfile(  
        name="my-scoring-profile",
        functions=[
            TagScoringFunction(  
                field_name="locations",  
                boost=5.0,  
                parameters=TagScoringParameters(  
                    tags_parameter="tags",  
                ),  
            ) 
        ]
    )
]
```
此設定檔會使用標記函式，可提升在位置字段中找到相符專案的檔分數。回想一下，搜尋索引具有向量欄位，以及標題、區塊和位置的多個非向量字段。 locations 字段是字串集合，而且可以使用評分配置檔中的 tags 函式來提升字串集合。如需詳細資訊，請參閱使用檔提升來新增評分配置檔和增強搜尋相關性（部落格文章）。

更新搜尋服務上的索引定義。

# Update the search index with the semantic configuration
 index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search, scoring_profiles=scoring_profiles)  
 result = index_client.create_or_update_index(index)  
 print(f"{result.name} updated")

更新語意排名和評分配置檔的查詢

在上一個教學課程中，您已在搜尋引擎上執行查詢，將回應和其他信息傳遞至 LLM 以進行聊天完成。

此範例會修改查詢要求，以包含語意組態和評分配置檔。

針對 Azure Government 雲端，將令牌提供者上的 API 端點修改為 "https://cognitiveservices.azure.us/.default"。

# Import libraries
from azure.search.documents import SearchClient
from openai import AzureOpenAI

token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
openai_client = AzureOpenAI(
     api_version="2024-06-01",
     azure_endpoint=AZURE_OPENAI_ACCOUNT,
     azure_ad_token_provider=token_provider
 )

deployment_name = "gpt-4o"

search_client = SearchClient(
     endpoint=AZURE_SEARCH_SERVICE,
     index_name=index_name,
     credential=credential
 )

# Prompt is unchanged in this update
GROUNDED_PROMPT="""
You are an AI assistant that helps users learn from the information found in the source material.
Answer the query using only the sources provided below.
Use bullets if the answer has multiple points.
If the answer is longer than 3 sentences, provide a summary.
Answer ONLY with the facts listed in the list of sources below.
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.
Query: {query}
Sources:\n{sources}
"""

# Queries are unchanged in this update
query="Are there any cloud formations specific to oceans and large bodies of water?"
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="text_vector")

# Add query_type semantic and semantic_configuration_name
# Add scoring_profile and scoring_parameters
search_results = search_client.search(
    query_type="semantic",
    semantic_configuration_name="my-semantic-config",
    scoring_profile="my-scoring-profile",
    scoring_parameters=["tags-ocean, 'sea surface', seas, surface"],
    search_text=query,
    vector_queries= [vector_query],
    select="title, chunk, locations",
    top=5,
)
sources_formatted = "=================\n".join([f'TITLE: {document["title"]}, CONTENT: {document["chunk"]}, LOCATIONS: {document["locations"]}' for document in search_results])

response = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
        }
    ],
    model=deployment_name
)

print(response.choices[0].message.content)

語意排名和提升查詢的輸出看起來可能類似下列範例。

Yes, there are specific cloud formations influenced by oceans and large bodies of water:

- **Stratus Clouds Over Icebergs**: Low stratus clouds can frame holes over icebergs, 
such as Iceberg A-56 in the South Atlantic Ocean, likely due to thermal instability caused 
by the iceberg (source: page-39.pdf).

- **Undular Bores**: These are wave structures in the atmosphere created by the collision 
of cool, dry air from a continent with warm, moist air over the ocean, as seen off the 
coast of Mauritania (source: page-23.pdf).

- **Ship Tracks**: These are narrow clouds formed by water vapor condensing around tiny 
particles from ship exhaust. They are observed over the oceans, such as in the Pacific Ocean 
off the coast of California (source: page-31.pdf).

These specific formations are influenced by unique interactions between atmospheric conditions 
and the presence of large water bodies or objects within them.

藉由提升符合評分準則且語意相關的結果，新增語意排名和評分配置檔會對 LLM 的回應產生正面影響。

現在您已進一步瞭解索引和查詢設計，接下來讓我們繼續優化速度與連續。我們重新瀏覽架構定義以實作量化和記憶體縮減，但管線和模型的其餘部分仍保持不變。

後續步驟

將向量儲存和成本降至最低

共用方式為