ChatCompletionsClient Class

ChatCompletionsClient.

Inheritance
azure.ai.inference._client.ChatCompletionsClient
ChatCompletionsClient

Constructor

ChatCompletionsClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, *, frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: ChatCompletionsResponseFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any)

Parameters

Name Description
endpoint
Required
str

Service host. Required.

credential
Required

Credential used to authenticate requests to the service. Is either a AzureKeyCredential type or a TokenCredential type. Required.

Keyword-Only Parameters

Name Description
frequency_penalty

A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. Default value is None.

presence_penalty

A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2]. Default value is None.

temperature

The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.

top_p

An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.

max_tokens
int

The maximum number of tokens to generate. Default value is None.

response_format

The format that the model must output. Use this to enable JSON mode instead of the default text mode. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message. Default value is None.

stop

A collection of textual sequences that will end completions generation. Default value is None.

tools

The available tool definitions that the chat completions request can use, including caller-defined functions. Default value is None.

tool_choice

If specified, the model will configure which of the provided tools it can use for the chat completions response. Is either a Union[str, "_models.ChatCompletionsToolChoicePreset"] type or a ChatCompletionsNamedToolChoice type. Default value is None.

seed
int

If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Default value is None.

model
str

ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.

model_extras

Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the extra-parameters request header. Default value is None.

api_version
str

The API version to use for this operation. Default value is "2024-05-01-preview". Note that overriding this default value may result in unsupported behavior.

Methods

close
complete

Gets chat completions for the provided chat messages. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. When using this method with stream=True, the response is streamed back to the client. Iterate over the resulting StreamingChatCompletions object to get content updates as they arrive.

get_model_info

Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

send_request

Runs the network request through the client's chained policies.


>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = client.send_request(request)
<HttpResponse: 200 OK>

For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request

close

close() -> None

complete

Gets chat completions for the provided chat messages. Completions support a wide variety of tasks and generate text that continues from or "completes" provided prompt data. When using this method with stream=True, the response is streamed back to the client. Iterate over the resulting StreamingChatCompletions object to get content updates as they arrive.

complete(*, messages: List[ChatRequestMessage] | List[Dict[str, Any]], stream: Literal[False] = False, frequency_penalty: float | None = None, presence_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, max_tokens: int | None = None, response_format: ChatCompletionsResponseFormat | None = None, stop: List[str] | None = None, tools: List[ChatCompletionsToolDefinition] | None = None, tool_choice: str | ChatCompletionsToolChoicePreset | ChatCompletionsNamedToolChoice | None = None, seed: int | None = None, model: str | None = None, model_extras: Dict[str, Any] | None = None, **kwargs: Any) -> ChatCompletions

Parameters

Name Description
body
<xref:JSON> or IO[bytes]

Is either a MutableMapping[str, Any] type (like a dictionary) or a IO[bytes] type that specifies the full request payload. Required.

Keyword-Only Parameters

Name Description
messages

The collection of context messages associated with this chat completions request. Typical usage begins with a chat message for the System role that provides instructions for the behavior of the assistant, followed by alternating messages between the User and Assistant roles. Required.

Default value: <object object at 0x000001635165BF20>
stream

A value indicating whether chat completions should be streamed for this request. Default value is False. If streaming is enabled, the response will be a StreamingChatCompletions. Otherwise the response will be a ChatCompletions.

frequency_penalty

A value that influences the probability of generated tokens appearing based on their cumulative frequency in generated text. Positive values will make tokens less likely to appear as their frequency increases and decrease the likelihood of the model repeating the same statements verbatim. Supported range is [-2, 2]. Default value is None.

presence_penalty

A value that influences the probability of generated tokens appearing based on their existing presence in generated text. Positive values will make tokens less likely to appear when they already exist and increase the model's likelihood to output new topics. Supported range is [-2, 2]. Default value is None.

temperature

The sampling temperature to use that controls the apparent creativity of generated completions. Higher values will make output more random while lower values will make results more focused and deterministic. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.

top_p

An alternative to sampling with temperature called nucleus sampling. This value causes the model to consider the results of tokens with the provided probability mass. As an example, a value of 0.15 will cause only the tokens comprising the top 15% of probability mass to be considered. It is not recommended to modify temperature and top_p for the same completions request as the interaction of these two settings is difficult to predict. Supported range is [0, 1]. Default value is None.

max_tokens
int

The maximum number of tokens to generate. Default value is None.

response_format

The format that the model must output. Use this to enable JSON mode instead of the default text mode. Note that to enable JSON mode, some AI models may also require you to instruct the model to produce JSON via a system or user message. Default value is None.

stop

A collection of textual sequences that will end completions generation. Default value is None.

tools

The available tool definitions that the chat completions request can use, including caller-defined functions. Default value is None.

tool_choice

If specified, the model will configure which of the provided tools it can use for the chat completions response. Is either a Union[str, "_models.ChatCompletionsToolChoicePreset"] type or a ChatCompletionsNamedToolChoice type. Default value is None.

seed
int

If specified, the system will make a best effort to sample deterministically such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Default value is None.

model
str

ID of the specific AI model to use, if more than one model is available on the endpoint. Default value is None.

model_extras

Additional, model-specific parameters that are not in the standard request payload. They will be added as-is to the root of the JSON in the request body. How the service handles these extra parameters depends on the value of the extra-parameters request header. Default value is None.

Returns

Type Description

ChatCompletions for non-streaming, or Iterable[StreamingChatCompletionsUpdate] for streaming.

Exceptions

Type Description

get_model_info

Returns information about the AI model. The method makes a REST API call to the /info route on the given endpoint. This method will only work when using Serverless API or Managed Compute endpoint. It will not work for GitHub Models endpoint or Azure OpenAI endpoint.

get_model_info(**kwargs: Any) -> ModelInfo

Returns

Type Description

ModelInfo. The ModelInfo is compatible with MutableMapping

Exceptions

Type Description

send_request

Runs the network request through the client's chained policies.


>>> from azure.core.rest import HttpRequest
>>> request = HttpRequest("GET", "https://www.example.org/")
<HttpRequest [GET], url: 'https://www.example.org/'>
>>> response = client.send_request(request)
<HttpResponse: 200 OK>

For more information on this code flow, see https://aka.ms/azsdk/dpcodegen/python/send_request

send_request(request: HttpRequest, *, stream: bool = False, **kwargs: Any) -> HttpResponse

Parameters

Name Description
request
Required

The network request you want to make. Required.

Keyword-Only Parameters

Name Description
stream

Whether the response payload will be streamed. Defaults to False.

Returns

Type Description

The response of your network call. Does not do error handling on your response.