Azure AI 模型推斷 API |Azure AI Foundry

發行項
11/24/2024

重要

本文中標示為 (預覽) 的項目目前處於公開預覽狀態。此預覽版本沒有服務等級協定，不建議將其用於生產工作負載。可能不支援特定功能，或可能已經限制功能。如需詳細資訊，請參閱 Microsoft Azure 預覽版增補使用條款。

Azure AI 模型推斷是一種 API，其會公開基礎模型的一組常見功能，且開發人員可以使用此 API，以統一且一致的方式取用各種模型集合的預測。開發人員可以與 Azure AI Foundry 入口網站中部署的不同模型交談，而不需要變更其所使用的基礎程序代碼。

福利

語言模型等基礎模型近年來確實顯著進步。這些進步已經徹底改變了各種領域，包括自然語言處理和電腦視覺，並已啟用聊天機器人、虛擬助理和語言翻譯服務等應用程式。

雖然基礎模型在特定領域中表現出色，但其缺乏一組統一的功能。有些模型在特定的工作中、甚至跨相同工作表現較好，有些模型可能會以一種方式處理問題，而另一個模型則以另一種方式處理問題。開發人員可以針對適當作業使用適當的模型來獲得此多樣性，讓開發人員能夠：

改善特定下游工作的效能。
針對較簡單的工作使用更有效率的模型。
使用較小的模型，在特定工作上執行速度較快。
撰寫多個模型以開發智慧體驗。

擁有統一方式來取用基礎模型，可讓開發人員了解所有這些優點，而不需要犧牲可攜帶性或是變更其中程式碼。

可用性

Azure AI 模型推斷 API 可在下列模型中取得：

部署到無伺服器 API 端點的模型：

Cohere 內嵌 V3 系列模型
Cohere Command R 系列模型
Meta Llama 2 聊天系列模型
Meta Llama 3 指導系列模型
Mistral-Small
Mistral-Large
Jais 系列模型
Jamba 系列模型
Phi-3 系列模型

部署至受控推斷的模型：

Meta Llama 3 指導系列模型
Phi-3 系列模型
Mistral 和 Mixtral 系列模型。

API 與 Azure OpenAI 模型部署相容。

注意

Azure AI 模型推斷 API 適用於在 2024 年 6 月 24 日之後部署的模型的受控推斷 (受控線上端點)。若要利用 API，如果模型已在這類日期之前部署，請重新部署您的端點。

功能

下一節說明 API 所公開的一些功能。如需 API 的完整規格，請檢視參考區段。

形式

API 會指出開發人員如何使用下列形式的預測：

取得資訊：傳回在端點下所部署模型的相關資訊。
文字內嵌：建立代表輸入文字的內嵌向量。
文字完成：為提供的提示和參數建立完成。
聊天完成：為指定的聊天交談建立模型回應。
影像內嵌：建立代表輸入文字和映像的內嵌向量。

推斷 SDK 支援

您可以使用您選擇的語言簡化推斷用戶端，從執行 Azure AI 模型推斷 API 的模型取用預測。

使用套件管理員安裝套件 azure-ai-inference，例如 pip：

pip install azure-ai-inference

然後，您可以使用套件來取用模型。下列範例會示範如何建立用戶端以取用聊天完成：

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

model = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
    credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]),
)

如果您要使用支援 Entra ID 的端點，則可以建立用戶端，如下所示：

import os
from azure.ai.inference import ChatCompletionsClient
from azure.identity import AzureDefaultCredential

model = ChatCompletionsClient(
    endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
    credential=AzureDefaultCredential(),
)

探索我們的範例，並閱讀 API 參考文件以開始使用。

從 npm 安裝套件 @azure-rest/ai-inference：

npm install @azure-rest/ai-inference

然後，您可以使用套件來取用模型。下列範例會示範如何建立用戶端以取用聊天完成：

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";

const client = new ModelClient(
    process.env.AZUREAI_ENDPOINT_URL, 
    new AzureKeyCredential(process.env.AZUREAI_ENDPOINT_KEY)
);

針對支援 Microsoft Entra ID 的端點，您可以建立用戶端，如下所示：

import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureDefaultCredential } from "@azure/identity";

const client = new ModelClient(
    process.env.AZUREAI_ENDPOINT_URL, 
    new AzureDefaultCredential()
);

探索我們的範例，並閱讀 API 參考文件以開始使用。

使用下列命令來安裝 Azure AI 推斷程式庫：

dotnet add package Azure.AI.Inference --prerelease

針對支援 Microsoft Entra ID 的端點 (先前稱為 Azure Active Directory)，請安裝 Azure.Identity 套件：

dotnet add package Azure.Identity

匯入下列命名空間：

using Azure;
using Azure.Identity;
using Azure.AI.Inference;

然後，您可以使用套件來取用模型。下列範例會示範如何建立用戶端以取用聊天完成：

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

針對支援 Microsoft Entra ID 的端點 (先前稱為 Azure Active Directory)：

ChatCompletionsClient client = new ChatCompletionsClient(
    new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")),
    new DefaultAzureCredential(includeInteractiveCredentials: true)
);

探索我們的範例，並閱讀 API 參考文件以開始使用。

使用參考區段來探索 API 設計和可用的參數。例如，聊天完成的參考區段會詳細說明如何使用路由 /chat/completions 根據聊天格式的指示產生預測：

要求

POST /chat/completions?api-version=2024-04-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

擴充性

Azure AI 模型推斷 API 會指定模型可訂閱的一組形式和參數。不過，某些模型可能有 API 所指出的進一步功能。在這些情況下，API 可讓開發人員將其當作承載中的額外參數來傳遞。

API 會設定標頭 extra-parameters: pass-through，嘗試將任何未知的參數直接傳遞至基礎模型。如果模型可以處理該參數，要求就會完成。

下列範例顯示傳遞 Mistral-Large 所支援參數 safe_prompt 的要求，該參數未在 Azure AI 模型推斷 API 中指定。

from azure.ai.inference.models import SystemMessage, UserMessage

response = model.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="How many languages are in the world?"),
    ],
    model_extras={
        "safe_mode": True
    }
)

print(response.choices[0].message.content)

提示

使用 Azure AI 推斷 SDK 時，使用 model_extras 會自動為您設定具有 extra-parameters: pass-through 的要求。

var messages = [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "How many languages are in the world?" },
];

var response = await client.path("/chat/completions").post({
    "extra-parameters": "pass-through",
    body: {
        messages: messages,
        safe_mode: true
    }
});

console.log(response.choices[0].message.content)

requestOptions = new ChatCompletionsOptions()
{
    Messages = {
        new ChatRequestSystemMessage("You are a helpful assistant."),
        new ChatRequestUserMessage("How many languages are in the world?")
    },
    AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } },
};

response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough);
Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}");

要求

POST /chat/completions?api-version=2024-04-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json
extra-parameters: pass-through

{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Explain Riemann's conjecture in 1 paragraph"
    }
    ],
    "temperature": 0,
    "top_p": 1,
    "response_format": { "type": "text" },
    "safe_prompt": true
}

注意

extra-parameters 的預設值為 error，如果承載中指出額外的參數，則會傳回錯誤。或者，您可以將 extra-parameters: drop 設定為卸除要求中的任何未知參數。如果您碰巧傳送了包含您知道模型所不支援額外參數的要求，但您想要要求完成，請使用此功能。一個典型的範例是表示 seed 參數。

具有不同功能集的模型

Azure AI 模型推斷 API 指出一組一般功能，但每個模型都可以決定是否實作這些功能。模型不支援特定參數的案例中會傳回特定的錯誤。

下列範例顯示聊天完成要求的回應，指出參數 reponse_format，並以 JSON 格式要求回覆。在此範例中，由於模型不支援這類功能，因此會將錯誤 422 傳回給使用者。

import json
from azure.ai.inference.models import SystemMessage, UserMessage, ChatCompletionsResponseFormatJSON
from azure.core.exceptions import HttpResponseError

try:
    response = model.complete(
        messages=[
            SystemMessage(content="You are a helpful assistant."),
            UserMessage(content="How many languages are in the world?"),
        ],
        response_format=ChatCompletionsResponseFormatJSON()
    )
except HttpResponseError as ex:
    if ex.status_code == 422:
        response = json.loads(ex.response._content.decode('utf-8'))
        if isinstance(response, dict) and "detail" in response:
            for offending in response["detail"]:
                param = ".".join(offending["loc"])
                value = offending["input"]
                print(
                    f"Looks like the model doesn't support the parameter '{param}' with value '{value}'"
                )
    else:
        raise ex

try {
    var messages = [
        { role: "system", content: "You are a helpful assistant" },
        { role: "user", content: "How many languages are in the world?" },
    ];
    
    var response = await client.path("/chat/completions").post({
        body: {
            messages: messages,
            response_format: { type: "json_object" }
        }
    });
}
catch (error) {
    if (error.status_code == 422) {
        var response = JSON.parse(error.response._content)
        if (response.detail) {
            for (const offending of response.detail) {
                var param = offending.loc.join(".")
                var value = offending.input
                console.log(`Looks like the model doesn't support the parameter '${param}' with value '${value}'`)
            }
        }
    }
    else 
    {
        throw error
    }
}

try
{
    requestOptions = new ChatCompletionsOptions()
    {
        Messages = {
            new ChatRequestSystemMessage("You are a helpful assistant"),
            new ChatRequestUserMessage("How many languages are in the world?"),
        },
        ResponseFormat = new ChatCompletionsResponseFormatJSON()
    };

    response = client.Complete(requestOptions);
    Console.WriteLine(response.Value.Choices[0].Message.Content);
}
catch (RequestFailedException ex)
{
    if (ex.Status == 422)
    {
        Console.WriteLine($"Looks like the model doesn't support a parameter: {ex.Message}");
    }
    else
    {
        throw;
    }
}

要求

POST /chat/completions?api-version=2024-04-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Explain Riemann's conjecture in 1 paragraph"
    }
    ],
    "temperature": 0,
    "top_p": 1,
    "response_format": { "type": "json_object" },
}

回應

{
    "status": 422,
    "code": "parameter_not_supported",
    "detail": {
        "loc": [ "body", "response_format" ],
        "input": "json_object"
    },
    "message": "One of the parameters contain invalid values."
}

提示

您可以檢查屬性 details.loc 以了解違規參數的位置，並檢查屬性 details.input 以查看傳入要求的值。

內容安全性

Azure AI 模型推斷 API 支援 Azure AI 內容安全性。使用 Azure AI 內容安全進行部署時，輸入和輸出都會通過旨在偵測及防止有害內容輸出的一組分類模型。內容篩選（預覽）系統會偵測並針對輸入提示和輸出完成中潛在有害內容的特定類別採取動作。

下列範例顯示已觸發內容安全性之聊天完成要求的回應。

from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
from azure.core.exceptions import HttpResponseError

try:
    response = model.complete(
        messages=[
            SystemMessage(content="You are an AI assistant that helps people find information."),
            UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
        ]
    )

    print(response.choices[0].message.content)

except HttpResponseError as ex:
    if ex.status_code == 400:
        response = json.loads(ex.response._content.decode('utf-8'))
        if isinstance(response, dict) and "error" in response:
            print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}")
        else:
            raise ex
    else:
        raise ex

try {
    var messages = [
        { role: "system", content: "You are an AI assistant that helps people find information." },
        { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." },
    ]

    var response = await client.path("/chat/completions").post({
        body: {
            messages: messages,
        }
    });
    
    console.log(response.body.choices[0].message.content)
}
catch (error) {
    if (error.status_code == 400) {
        var response = JSON.parse(error.response._content)
        if (response.error) {
            console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`)
        }
        else
        {
            throw error
        }
    }
}

try
{
    requestOptions = new ChatCompletionsOptions()
    {
        Messages = {
            new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
            new ChatRequestUserMessage(
                "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
            ),
        },
    };

    response = client.Complete(requestOptions);
    Console.WriteLine(response.Value.Choices[0].Message.Content);
}
catch (RequestFailedException ex)
{
    if (ex.ErrorCode == "content_filter")
    {
        Console.WriteLine($"Your query has trigger Azure Content Safety: {ex.Message}");
    }
    else
    {
        throw;
    }
}

要求

POST /chat/completions?api-version=2024-04-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

{
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
    }
    ],
    "temperature": 0,
    "top_p": 1,
}

回應

{
    "status": 400,
    "code": "content_filter",
    "message": "The response was filtered",
    "param": "messages",
    "type": null
}

開始使用

部署為無伺服器 API 端點和受控線上端點的模型，目前支援 Azure AI 模型推斷 API。部署任何支援的模型，並使用完全相同的程式碼來取用其預測。

用戶端連結庫azure-ai-inference會針對 Azure AI Foundry 和 Azure Machine Learning 工作室所部署的 AI 模型進行推斷，包括聊天完成。它支援無伺服器 API 端點和受控計算端點 (先前稱為受控線上端點)。

探索我們的範例，並閱讀 API 參考文件讓自己開始。

共用方式為

Azure AI 模型推斷 API |Azure AI Foundry

福利

可用性

功能

形式

推斷 SDK 支援

擴充性

具有不同功能集的模型

內容安全性

開始使用

意見反應

其他資源