持續監視您的產生的 AI 應用程式

發行項
11/23/2024

重要

本文中標示為 (預覽) 的項目目前處於公開預覽狀態。此預覽版本沒有服務等級協定，不建議將其用於生產工作負載。可能不支援特定功能，或可能已經限制功能。如需詳細資訊，請參閱 Microsoft Azure 預覽版增補使用條款。

Generative AI 的持續進步導致組織建置更複雜的應用程式，以解決各種問題（聊天機器人、RAG 系統、代理系統等）。這些應用程式正用來推動創新、改善客戶體驗，以及增強決策。雖然這些模型（例如 GPT-4）為這些產生式 AI 應用程式提供技術支援非常強大，但持續監視對於確保高品質、安全且可靠的結果從未更重要。當觀察應用程式時會考慮多個檢視方塊時，持續監視是有效的。這些觀點包括令牌使用量和成本、作業計量 – 延遲、要求計數等，以及重要的是持續評估。若要深入瞭解評估，請參閱評估產生的 AI 應用程式。

Azure AI 和 Azure 監視器提供工具，可讓您從多個觀點持續監視 Generative AI 應用程式的效能。透過 Azure AI Online 評估，您可以持續評估應用程式所部署的位置，或其所使用的協調流程架構（例如 LangChain）。您可以使用各種內建評估工具，以維持與 Azure AI 評估 SDK 的同位，或定義您自己的自定義評估工具。藉由持續在收集的追蹤數據上執行正確的評估工具，您的小組可以更有效率地識別並降低生產前或生產後的安全性、品質和安全性考慮。 Azure AI Online Evaluation 提供與 Azure 監視器 Application Insights 中提供的完整可觀察性工具套件整合，讓您能夠建置自定義儀錶板、將評估結果可視化，以及設定進階應用程式監視的警示。

總而言之，由於 AI 產業的複雜性和快速發展，監視您的產生 AI 應用程式從未變得更加重要。 Azure AI Online Evaluation 與 Azure 監視器 Application Insights 整合，可讓您持續評估已部署的應用程式，以確保其高效能、安全且可在生產環境中產生高質量的結果。

在線評估的運作方式

在本節中，您將瞭解 Azure AI Online 評估的運作方式、它如何與 Azure 監視器 Application Insights 整合，以及如何使用它對來自您產生的 AI 應用程式的追蹤數據執行持續評估。

追蹤您的產生的 AI 應用程式

持續監視應用程式的第一個步驟是確保其遙測數據已擷取並儲存以供分析。若要達成此目的，您必須檢測產生的 AI 應用程式程式代碼，才能使用 Azure AI 追蹤套件，將追蹤數據記錄至您選擇的 Azure 監視器 Application Insights 資源。此套件完全符合 OpenTelemetry 標準，以取得可觀察性。檢測應用程式的程式代碼之後，追蹤數據會記錄到 Application Insights 資源。

在應用程式程式代碼中包含追蹤之後，您可以在 Azure AI Foundry 或 Azure 監視器 Application Insights 資源中檢視追蹤數據。若要深入瞭解如何執行這項操作，請參閱監視您的生成 AI 應用程式。

在線評估

在檢測您的應用程式以將追蹤數據傳送至 Application Insights 之後，是時候設定在線評估排程來持續評估此數據。 Azure AI Online 評估是一項服務，使用 Azure AI 計算持續執行一組評估工具。使用 Azure AI Project SDK 設定在線評估排程之後，它會以可自定義的排程執行。每次服務執行時，都會執行下列步驟：

使用提供的 Kusto 查詢，從連線的 Application Insights 資源查詢應用程式追蹤數據。
在追蹤數據上執行每個評估工具，並計算每個計量（例如 ，基礎性：3）。
使用標準化語意慣例，將評估分數寫回每個追蹤。

注意

Azure AI Online 評估支援與 Azure AI 評估相同的計量。如需評估運作方式和支援哪些評估計量的詳細資訊，請參閱使用 Azure AI 評估 SDK 評估您的 Generative AI 應用程式

例如，假設您有一個已部署的聊天應用程式，每天會收到許多客戶問題。您想要持續評估應用程式回應的品質。您設定了每日週期的在線評估排程。您可以設定評估工具：基礎性、連貫性和流暢性。服務每天都會計算這些計量的評估分數，並將數據寫回 Application Insights，以取得在週期時間範圍期間收集的每個追蹤（在此範例中為過去 24 小時）。然後，您可以從每個追蹤查詢數據，並在 Azure AI Foundry 和 Azure 監視器 Application Insights 中加以存取。

寫回 Application Insights 內每個追蹤的評估結果遵循下列慣例。每個評估計量都會將唯一範圍新增至每個追蹤。

屬性	Application Insights 數據表	指定operation_ID的欄位	範例值
評估計量	traces、AppTraces	`customDimensions[“event.name”]`	`gen_ai.evaluation.relevance`
評估計量分數	traces、AppTraces	`customDimensions[“gen_ai.evaluation.score”]`	`3`
評估計量批註（如果適用）	traces、AppTraces	`message`	`{“comment”: “I like the response”}`

既然您已瞭解 Azure AI Online 評估的運作方式，以及其如何連線到 Azure 監視器 Application Insights，現在可以瞭解如何設定服務。

設定在線評估

在本節中，您將瞭解如何設定在線評估排程，以持續監視已部署的產生式 AI 應用程式。 Azure AI Project SDK 透過 Python API 提供這類功能，並支援本機評估中可用的所有功能。使用下列步驟，使用內建或自定義評估工具，在數據上提交在線評估排程。

注意

評估僅支援與 AI 輔助風險和安全性計量相同的區域。

必要條件

完成下列必要步驟，以設定您的環境和對必要資源的驗證：

Azure 訂用帳戶。
評估支援區域中的資源群組。
相同資源群組和區域中的新使用者指派受控識別。請記下 clientId;您稍後會用到它。
相同資源群組和區域中的 Azure AI 中樞。
此中樞的 Azure AI 專案，請參閱在 Azure AI Foundry 入口網站中建立專案。
Azure 監視器 Application Insights 資源。
流覽至 Azure 入口網站中的中樞頁面並新增Application Insights資源，請參閱更新 Azure 應用程式Insights和 Azure Container Registry。
支援 GPT 模型的 chat completionAzure OpenAI 部署，例如 gpt-4。
Connection String 可讓 Azure AI 專案輕鬆建立 AIProjectClient 物件。您可以從專案的 [概觀] 頁面，取得 [專案詳細數據] 底下的 [專案連接字串。
流覽至 Azure 入口網站中的 Application Insights 資源，並使用 [存取控制（IAM）] 索引標籤，將角色新增Log Analytics Contributor至您先前建立的使用者指派受控識別。
將使用者指派的受控識別附加至您的專案。
流覽至 Azure 入口網站中的 Azure AI 服務，並使用 [存取控制（IAM）] 索引標籤，將角色新增Cognitive Services OpenAI Contributor至您先前建立的使用者指派受控識別。
執行，確定您第一次登入您的 Azure 訂用 az login帳戶。

安裝說明

建立您選擇的虛擬環境。若要使用 conda 建立一個，請執行下列命令：

conda create -n online-evaluation
conda activate online-evaluation

執行下列命令以安裝必要的套件：

pip install azure-identity azure-ai-projects azure-ai-ml

提示

您可以選擇性地 pip install azure-ai-evaluation 讓程式代碼優先體驗擷取程式代碼中內建評估工具的評估工具識別碼。若要瞭解如何執行這項操作，請參閱從評估工具連結庫指定評估工具。

設定您產生的 AI 應用程式的追蹤

監視應用程式的第一個步驟是設定追蹤。若要瞭解如何讓數據記錄到 Application Insights，請參閱設定您產生的 AI 應用程式的追蹤。

在追蹤數據中使用服務名稱

若要透過 Application Insights 中的唯一標識碼來識別您的服務，您可以在追蹤資料中使用服務名稱 OpenTelemetry 屬性。如果您要將數據從多個應用程式記錄到相同的 Application Insights 資源，而且您想要區分它們，這特別有用。例如，假設您有兩個應用程式： App-1 和 App-2，且追蹤已設定為將數據記錄至相同的 Application Insights 資源。也許您想要將 App-1 設定為依 相關性 持續評估，而 App-2 會持續受到 基礎性評估。您可以使用服務名稱來區分在線評估組態中的應用程式。

若要設定服務名稱屬性，您可以依照步驟直接在應用程式程式代碼中執行此動作，請參閱使用不同的資源使用多個追蹤提供者。或者，您可以在部署應用程式之前設定環境變數 OTEL_SERVICE_NAME 。若要深入瞭解如何使用服務名稱，請參閱 KEY環境變數和服務資源語意慣例。

若要查詢指定服務名稱的追蹤數據，請查詢 cloud_roleName 屬性。將下列這一行新增至您在線上評估設定中使用的 KQL 查詢：

| where cloud_RoleName == "service_name"

查詢 Application Insights 中儲存的追蹤數據

使用 Kusto 查詢語言（KQL），您可以從 Application Insights 查詢您的產生 AI 應用程式的追蹤數據，以用於持續在線評估。如果您使用 Azure AI 追蹤套件來追蹤您的產生 AI 應用程式，您可以使用下列 Kusto 查詢來檢視 Application Insights 中的數據：

重要

線上評估服務所使用的 KQL 查詢必須輸出下列資料列： operation_Id、 operation_ParentId和 gen_ai_response_id。此外，每個評估工具都有自己的輸入數據需求。 KQL 查詢必須輸出這些數據行，才能做為評估工具本身的輸入。如需評估工具的數據需求清單，請參閱內建評估工具的數據需求。

let gen_ai_spans = (
    dependencies
    | where isnotnull(customDimensions["gen_ai.system"])
    | extend response_id = tostring(customDimensions["gen_ai.response.id"])
    | project id, operation_Id, operation_ParentId, timestamp, response_id
);
let gen_ai_events = (
    traces
    | where message in ("gen_ai.choice", "gen_ai.user.message", "gen_ai.system.message")
        or tostring(customDimensions["event.name"]) in ("gen_ai.choice", "gen_ai.user.message", "gen_ai.system.message")
    | project 
        id = operation_ParentId, 
        operation_Id, 
        operation_ParentId, 
        user_input = iff(
            message == "gen_ai.user.message" or tostring(customDimensions["event.name"]) == "gen_ai.user.message", 
            parse_json(iff(message == "gen_ai.user.message", tostring(customDimensions["gen_ai.event.content"]), message)).content, 
            ""
        ), 
        system = iff(
            message == "gen_ai.system.message" or tostring(customDimensions["event.name"]) == "gen_ai.system.message", 
            parse_json(iff(message == "gen_ai.system.message", tostring(customDimensions["gen_ai.event.content"]), message)).content, 
            ""
        ), 
        llm_response = iff(
            message == "gen_ai.choice", 
            parse_json(tostring(parse_json(tostring(customDimensions["gen_ai.event.content"])).message)).content, 
            iff(tostring(customDimensions["event.name"]) == "gen_ai.choice", parse_json(parse_json(message).message).content, "")
        )
    | summarize 
        operation_ParentId = any(operation_ParentId), 
        Input = maxif(user_input, user_input != ""), 
        System = maxif(system, system != ""), 
        Output = maxif(llm_response, llm_response != "") 
    by operation_Id, id
);
gen_ai_spans
| join kind=inner (gen_ai_events) on id, operation_Id
| project Input, System, Output, operation_Id, operation_ParentId, gen_ai_response_id = response_id

您可以選擇性地在 Kusto 查詢中使用 sample 運算符或 take 運算符，使其只會傳回追蹤的子集。由於 AI 輔助評估的規模可能非常昂貴，因此這種方法可協助您只評估數據的隨機樣本（或 n 追蹤）來控制成本。

使用 Azure AI Project SDK 設定在線評估

您可以透過 Python API 向 Azure AI Project SDK 提交在線評估排程工作。請參閱下列腳本，瞭解如何使用效能和品質（AI 輔助）評估工具來設定在線評估。若要檢視支援評估工具的完整清單，請參閱使用 Azure AI 評估 SDK 進行評估。若要瞭解如何使用自定義評估工具，請參閱自定義評估工具。

從匯入必要的套件和設定必要的變數開始：

from azure.ai.projects import AIProjectClient 
from azure.identity import DefaultAzureCredential 
from azure.ai.projects.models import ( 
    ApplicationInsightsConfiguration,
    EvaluatorConfiguration,
    EvaluationSchedule,
    RecurrenceTrigger,
)
from azure.ai.evaluation import CoherenceEvaluator 

# This sample includes the setup for an online evaluation schedule using the Azure AI Project SDK and Azure AI Evaluation SDK
# The schedule is configured to run daily over the collected trace data while running two evaluators: CoherenceEvaluator and RelevanceEvaluator
# This sample can be modified to fit your application's requirements

# Name of your online evaluation schedule
SAMPLE_NAME = "online_eval_name"

# Name of your generative AI application (will be available in trace data in Application Insights)
SERVICE_NAME = "service_name"

# Connection string to your Azure AI Foundry project
# Currently, it should be in the format "<HostName>;<AzureSubscriptionId>;<ResourceGroup>;<HubName>"
PROJECT_CONNECTION_STRING = "<HostName>;<AzureSubscriptionId>;<ResourceGroup>;<HubName>"

# Your Application Insights resource ID
APPLICATION_INSIGHTS_RESOURCE_ID = "appinsights_resource_id"

# Kusto Query Language (KQL) query to query data from Application Insights resource
# This query is compatible with data logged by the Azure AI Inferencing Tracing SDK (linked in documentation)
# You can modify it depending on your data schema
# The KQL query must output these required columns: operation_ID, operation_ParentID, and gen_ai_response_id
# You can choose which other columns to output as required by the evaluators you are using
KUSTO_QUERY = "let gen_ai_spans=(dependencies | where isnotnull(customDimensions[\"gen_ai.system\"]) | extend response_id = tostring(customDimensions[\"gen_ai.response.id\"]) | project id, operation_Id, operation_ParentId, timestamp, response_id); let gen_ai_events=(traces | where message in (\"gen_ai.choice\", \"gen_ai.user.message\", \"gen_ai.system.message\") or tostring(customDimensions[\"event.name\"]) in (\"gen_ai.choice\", \"gen_ai.user.message\", \"gen_ai.system.message\") | project id= operation_ParentId, operation_Id, operation_ParentId, user_input = iff(message == \"gen_ai.user.message\" or tostring(customDimensions[\"event.name\"]) == \"gen_ai.user.message\", parse_json(iff(message == \"gen_ai.user.message\", tostring(customDimensions[\"gen_ai.event.content\"]), message)).content, \"\"), system = iff(message == \"gen_ai.system.message\" or tostring(customDimensions[\"event.name\"]) == \"gen_ai.system.message\", parse_json(iff(message == \"gen_ai.system.message\", tostring(customDimensions[\"gen_ai.event.content\"]), message)).content, \"\"), llm_response = iff(message == \"gen_ai.choice\", parse_json(tostring(parse_json(tostring(customDimensions[\"gen_ai.event.content\"])).message)).content, iff(tostring(customDimensions[\"event.name\"]) == \"gen_ai.choice\", parse_json(parse_json(message).message).content, \"\")) | summarize operation_ParentId = any(operation_ParentId), Input = maxif(user_input, user_input != \"\"), System = maxif(system, system != \"\"), Output = maxif(llm_response, llm_response != \"\") by operation_Id, id); gen_ai_spans | join kind=inner (gen_ai_events) on id, operation_Id | project Input, System, Output, operation_Id, operation_ParentId, gen_ai_response_id = response_id"

接下來，定義用戶端和 Azure OpenAI GPT 部署（例如）， GPT-4以用來執行在線評估排程。此外，請連線到 Application Insights 資源：

# Connect to your Azure AI Foundry Project
project_client = AIProjectClient.from_connection_string(
    credential=DefaultAzureCredential(),
    conn_str=PROJECT_CONNECTION_STRING
)

# Connect to your Application Insights resource 
app_insights_config = ApplicationInsightsConfiguration(
    resource_id=APPLICATION_INSIGHTS_RESOURCE_ID,
    query=KUSTO_QUERY,
    service_name=SERVICE_NAME
)

# Connect to your AOAI resource, you must use an AOAI GPT model
deployment_name = "gpt-4"
api_version = "2024-08-01-preview"

# This is your AOAI connection name, which can be found in your AI Foundry project under the 'Models + Endpoints' tab
default_connection = project_client.connections._get_connection(
    "aoai_connection_name"
)

model_config = {
    "azure_deployment": deployment_name,
    "api_version": api_version,
    "type": "azure_openai",
    "azure_endpoint": default_connection.properties["target"]
}

接下來，設定您想要使用的評估工具：

# RelevanceEvaluator
# id for each evaluator can be found in your AI Foundry registry - please see documentation for more information
# init_params is the configuration for the model to use to perform the evaluation
# data_mapping is used to map the output columns of your query to the names required by the evaluator
relevance_evaluator_config = EvaluatorConfiguration(
    id="azureml://registries/azureml-staging/models/Relevance-Evaluator/versions/4",
    init_params={"model_config": model_config},
    data_mapping={"query": "${data.Input}", "response": "${data.Output}"}
)

# CoherenceEvaluator
coherence_evaluator_config = EvaluatorConfiguration(
    id=CoherenceEvaluator.id,
    init_params={"model_config": model_config},
    data_mapping={"query": "${data.Input}", "response": "${data.Output}"}
)

最後，定義週期並建立排程：

注意：在必要步驟中，您已建立使用者指派的受控識別，以向 Application Insights 資源驗證在線評估排程。 AzureMSIClientId類別properties參數EvaluationSchedule中的是clientId這個身分識別的。

# Frequency to run the schedule
recurrence_trigger = RecurrenceTrigger(frequency="day", interval=1)

# Dictionary of evaluators
evaluators = {
    "relevance": relevance_evaluator_config,
    "coherence" : coherence_evaluator_config
}

name = SAMPLE_NAME
description = f"{SAMPLE_NAME} description"
# AzureMSIClientId is the clientID of the User-assigned managed identity created during set-up - see documentation for how to find it
properties = {"AzureMSIClientId": "your_client_id"}

# Configure the online evaluation schedule
evaluation_schedule = EvaluationSchedule(
    data=app_insights_config,
    evaluators=evaluators,
    trigger=recurrence_trigger,
    description=description,
    properties=properties)

# Create the online evaluation schedule 
created_evaluation_schedule = project_client.evaluations.create_or_replace_schedule(name, evaluation_schedule)
print(f"Successfully submitted the online evaluation schedule creation request - {created_evaluation_schedule.name}, currently in {created_evaluation_schedule.provisioning_state} state.")

在在線評估排程上執行作業

您可以將下列程式代碼新增至線上評估組態腳本，以取得、列出和停用在線評估排程：

警告：請等候建立在線評估排程和執行 get_schedule() API 之間的少量時間（~30 秒）。

取得線上評估排程：

name = "<my-online-evaluation-name>"
get_evaluation_schedule = project_client.evaluations.get_schedule(name)

列出所有線上評估排程：

count = 0
for evaluation_schedule in project_client.evaluations.list_schedule():
    count += 1
        print(f"{count}. {evaluation_schedule.name} "
        f"[IsEnabled: {evaluation_schedule.is_enabled}]")
        print(f"Total evaluation schedules: {count}")

停用 [虛刪除] 在線評估排程：

name = "<my-online-evaluation-name>"
project_client.evaluations.disable_schedule(name)

監視您的行用 AI 應用程式

在本節中，您將瞭解 Azure AI 如何與 Azure 監視器 Application Insights 整合，為您提供現成的儀錶板檢視，該檢視是針對您的產生 AI 應用程式所量身打造的深入解析，讓您可以隨時掌握應用程式的最新狀態。

適用於您產生的 AI 應用程式的深入解析

如果您尚未設定此設定，以下是一些快速步驟：

在 Azure AI Foundry 中瀏覽至您的專案。
選取左側的 [追蹤] 頁面。
將 Application Insights 資源連線到您的專案。

如果您已在 Azure AI Foundry 入口網站中設定追蹤，您只需要選取連結來 查看適用於 Generative AI 應用程式的深入解析儀錶板。

將數據串流處理到 Application Insights 資源之後，您會自動看到它已填入此自定義儀錶板。

此檢視是開始使用監視需求的絕佳位置。

您可以檢視一段時間的令牌耗用量，以瞭解是否需要增加使用量限制或進行額外的成本分析。
您可以將評估計量視為趨勢線，以每天瞭解應用程式的品質。
您可以在發生例外狀況時進行偵錯，並使用 Azure 監視器端對端交易詳細數據檢視來鑽研追蹤，以找出發生錯誤的原因。

這是 Azure 活頁簿，會查詢儲存在 Application Insights 資源中的數據。您可以自定義此活頁簿，並量身打造此活頁簿，以符合您的業務需求。若要深入瞭解，請參閱編輯 Azure 活頁簿。

這可讓您新增您可能已記錄的其他自定義評估工具或其他 Markdown 文字，以共用摘要，並用於報告用途。

您也可以與小組共用此活頁簿，讓他們隨時掌握最新資訊！

共用方式為

持續監視您的產生的 AI 應用程式

在線評估的運作方式

追蹤您的產生的 AI 應用程式

在線評估

設定在線評估

必要條件

安裝說明

設定您產生的 AI 應用程式的追蹤

在追蹤數據中使用服務名稱

查詢 Application Insights 中儲存的追蹤數據

使用 Azure AI Project SDK 設定在線評估

在在線評估排程上執行作業

監視您的行用 AI 應用程式

適用於您產生的 AI 應用程式的深入解析

意見反應

其他資源

共用方式為

持續監視您的產生的 AI 應用程式

在線評估的運作方式

追蹤您的產生的 AI 應用程式

在線評估

設定在線評估

必要條件

安裝說明

設定您產生的 AI 應用程式的追蹤

在追蹤數據中使用服務名稱

查詢 Application Insights 中儲存的追蹤數據

使用 Azure AI Project SDK 設定在線評估

在在線評估排程上執行作業

監視您的行用 AI 應用程式

適用於您產生的 AI 應用程式的深入解析

相關內容

意見反應

其他資源