With chat completion, you can simulate a back-and-forth conversation with an AI agent. This is of course useful for creating chat bots, but it can also be used for creating autonomous agents that can complete business processes, generate code, and more. As the primary model type provided by OpenAI, Google, Mistral, Facebook, and others, chat completion is the most common AI service that you will add to your Semantic Kernel project.
When picking out a chat completion model, you will need to consider the following:
What modalities does the model support (e.g., text, image, audio, etc.)?
Does it support function calling?
How fast does it receive and generate tokens?
How much does each token cost?
Important
Of all the above questions, the most important is whether the model supports function calling. If it does not, you will not be able to use the model to call your existing code. Most of the latest models from OpenAI, Google, Mistral, and Amazon all support function calling. Support from small language models, however, is still limited.
Setting up your local environment
Some of the AI Services can be hosted locally and may require some setup. Below are instructions for those that support this.
After the container has started, launch a Terminal window for the docker container, e.g. if using
docker desktop, choose Open in Terminal from actions.
From this terminal download the required models, e.g. here we are downloading the phi3 model.
ollama pull phi3
No local setup.
No local setup.
Clone the repository containing the ONNX model you would like to use.
Before adding chat completion to your kernel, you will need to install the necessary packages. Below are the packages you will need to install for each AI service provider.
Now that you've installed the necessary packages, you can create chat completion services. Below are the several ways you can create chat completion services using Semantic Kernel.
Adding directly to the kernel
To add a chat completion service, you can use the following code to add it to the kernel's inner service provider.
using Microsoft.SemanticKernel;
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddAzureOpenAIChatCompletion(
deploymentName: "NAME_OF_YOUR_DEPLOYMENT",
apiKey: "YOUR_API_KEY",
endpoint: "YOUR_AZURE_ENDPOINT",
modelId: "gpt-4", // Optional name of the underlying model if the deployment name doesn't match the model name
serviceId: "YOUR_SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
httpClient: new HttpClient() // Optional; if not provided, the HttpClient from the kernel will be used
);
Kernel kernel = kernelBuilder.Build();
using Microsoft.SemanticKernel;
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOpenAIChatCompletion(
modelId: "gpt-4",
apiKey: "YOUR_API_KEY",
orgId: "YOUR_ORG_ID", // Optional
serviceId: "YOUR_SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
httpClient: new HttpClient() // Optional; if not provided, the HttpClient from the kernel will be used
);
Kernel kernel = kernelBuilder.Build();
Important
The Mistral chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
#pragma warning disable SKEXP0070
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddMistralChatCompletion(
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Optional
serviceId: "SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
httpClient: new HttpClient() // Optional; for customizing HTTP client
);
Kernel kernel = kernelBuilder.Build();
Important
The Google chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.Google;
#pragma warning disable SKEXP0070
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddGoogleAIGeminiChatCompletion(
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
apiVersion: GoogleAIVersion.V1, // Optional
serviceId: "SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
httpClient: new HttpClient() // Optional; for customizing HTTP client
);
Kernel kernel = kernelBuilder.Build();
Important
The Hugging Face chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
#pragma warning disable SKEXP0070
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddHuggingFaceChatCompletion(
model: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Optional
serviceId: "SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
httpClient: new HttpClient() // Optional; for customizing HTTP client
);
Kernel kernel = kernelBuilder.Build();
Important
The Azure AI Inference chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
#pragma warning disable SKEXP0070
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddAzureAIInferenceChatCompletion(
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Optional
serviceId: "SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
httpClient: new HttpClient() // Optional; for customizing HTTP client
);
Kernel kernel = kernelBuilder.Build();
Important
The Ollama chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
#pragma warning disable SKEXP0070
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOllamaChatCompletion(
modelId: "NAME_OF_MODEL", // E.g. "phi3" if phi3 was downloaded as described above.
endpoint: new Uri("YOUR_ENDPOINT"), // E.g. "http://localhost:11434" if Ollama has been started in docker as described above.
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
Kernel kernel = kernelBuilder.Build();
Important
The Bedrock chat completion connector which is required for Anthropic is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
#pragma warning disable SKEXP0070
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddBedrockChatCompletionService(
modelId: "NAME_OF_MODEL",
bedrockRuntime: amazonBedrockRuntime, // Optional; An instance of IAmazonBedrockRuntime, used to communicate with Azure Bedrock.
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
Kernel kernel = kernelBuilder.Build();
Important
The Bedrock chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
#pragma warning disable SKEXP0070
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddBedrockChatCompletionService(
modelId: "NAME_OF_MODEL",
bedrockRuntime: amazonBedrockRuntime, // Optional; An instance of IAmazonBedrockRuntime, used to communicate with Azure Bedrock.
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
Kernel kernel = kernelBuilder.Build();
Important
The ONNX chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
#pragma warning disable SKEXP0070
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOnnxRuntimeGenAIChatCompletion(
modelId: "NAME_OF_MODEL", // E.g. phi-3
modelPath: "PATH_ON_DISK", // Path to the model on disk e.g. C:\Repos\huggingface\microsoft\Phi-3-mini-4k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32
serviceId: "SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
jsonSerializerOptions: customJsonSerializerOptions // Optional; for providing custom serialization settings for e.g. function argument / result serialization and parsing.
);
Kernel kernel = kernelBuilder.Build();
For other AI service providers that support the OpenAI chat completion API (e.g., LLM Studio), you can use the following code to reuse the existing OpenAI chat completion connector.
Important
Using custom endpoints with the OpenAI connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0010.
using Microsoft.SemanticKernel;
#pragma warning disable SKEXP0010
IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOpenAIChatCompletion(
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Used to point to your service
serviceId: "SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
httpClient: new HttpClient() // Optional; for customizing HTTP client
);
Kernel kernel = kernelBuilder.Build();
Using dependency injection
If you're using dependency injection, you'll likely want to add your AI services directly to the service provider. This is helpful if you want to create singletons of your AI services and reuse them in transient kernels.
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddAzureOpenAIChatCompletion(
deploymentName: "NAME_OF_YOUR_DEPLOYMENT",
apiKey: "YOUR_API_KEY",
endpoint: "YOUR_AZURE_ENDPOINT",
modelId: "gpt-4", // Optional name of the underlying model if the deployment name doesn't match the model name
serviceId: "YOUR_SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddOpenAIChatCompletion(
modelId: "gpt-4",
apiKey: "YOUR_API_KEY",
orgId: "YOUR_ORG_ID", // Optional; for OpenAI deployment
serviceId: "YOUR_SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
Important
The Mistral chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
#pragma warning disable SKEXP0070
builder.Services.AddMistralChatCompletion(
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Optional
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
Important
The Google chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.Google;
var builder = Host.CreateApplicationBuilder(args);
#pragma warning disable SKEXP0070
builder.Services.AddGoogleAIGeminiChatCompletion(
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
apiVersion: GoogleAIVersion.V1, // Optional
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
Important
The Hugging Face chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
#pragma warning disable SKEXP0070
builder.Services.AddHuggingFaceChatCompletion(
model: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Optional
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
Important
The Azure AI Inference chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
#pragma warning disable SKEXP0070
builder.Services.AddAzureAIInferenceChatCompletion(
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Optional
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
Important
The Ollama chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
#pragma warning disable SKEXP0070
builder.Services.AddOllamaChatCompletion(
modelId: "NAME_OF_MODEL", // E.g. "phi3" if phi3 was downloaded as described above.
endpoint: new Uri("YOUR_ENDPOINT"), // E.g. "http://localhost:11434" if Ollama has been started in docker as described above.
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
Important
The Bedrock chat completion connector which is required for Anthropic is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
#pragma warning disable SKEXP0070
builder.Services.AddBedrockChatCompletionService(
modelId: "NAME_OF_MODEL",
bedrockRuntime: amazonBedrockRuntime, // Optional; An instance of IAmazonBedrockRuntime, used to communicate with Azure Bedrock.
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
Important
The Bedrock chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
#pragma warning disable SKEXP0070
builder.Services.AddBedrockChatCompletionService(
modelId: "NAME_OF_MODEL",
bedrockRuntime: amazonBedrockRuntime, // Optional; An instance of IAmazonBedrockRuntime, used to communicate with Azure Bedrock.
serviceId: "SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
Important
The ONNX chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
#pragma warning disable SKEXP0070
builder.Services.AddOnnxRuntimeGenAIChatCompletion(
modelId: "NAME_OF_MODEL", // E.g. phi-3
modelPath: "PATH_ON_DISK", // Path to the model on disk e.g. C:\Repos\huggingface\microsoft\Phi-3-mini-4k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32
serviceId: "SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
jsonSerializerOptions: customJsonSerializerOptions // Optional; for providing custom serialization settings for e.g. function argument / result serialization and parsing.
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
For other AI service providers that support the OpenAI chat completion API (e.g., LLM Studio), you can use the following code to reuse the existing OpenAI chat completion connector.
Important
Using custom endpoints with the OpenAI connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0010.
using Microsoft.SemanticKernel;
var builder = Host.CreateApplicationBuilder(args);
#pragma warning disable SKEXP0010
builder.Services.AddOpenAIChatCompletion(
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Used to point to your service
serviceId: "SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel
httpClient: new HttpClient() // Optional; for customizing HTTP client
);
builder.Services.AddTransient((serviceProvider)=> {
return new Kernel(serviceProvider);
});
Creating standalone instances
Lastly, you can create instances of the service directly so that you can either add them to a kernel later or use them directly in your code without ever injecting them into the kernel or in a service provider.
using Microsoft.SemanticKernel.Connectors.AzureOpenAI;
AzureOpenAIChatCompletionService chatCompletionService = new (
deploymentName: "NAME_OF_YOUR_DEPLOYMENT",
apiKey: "YOUR_API_KEY",
endpoint: "YOUR_AZURE_ENDPOINT",
modelId: "gpt-4", // Optional name of the underlying model if the deployment name doesn't match the model name
httpClient: new HttpClient() // Optional; if not provided, the HttpClient from the kernel will be used
);
using Microsoft.SemanticKernel.Connectors.OpenAI;
OpenAIChatCompletionService chatCompletionService = new (
modelId: "gpt-4",
apiKey: "YOUR_API_KEY",
organization: "YOUR_ORG_ID", // Optional
httpClient: new HttpClient() // Optional; if not provided, the HttpClient from the kernel will be used
);
Important
The Mistral chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel.Connectors.MistralAI;
#pragma warning disable SKEXP0070
MistralAIChatCompletionService chatCompletionService = new (
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Optional
httpClient: new HttpClient() // Optional; for customizing HTTP client
);
Important
The Google chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel.Connectors.Google;
#pragma warning disable SKEXP0070
GoogleAIGeminiChatCompletionService chatCompletionService = new (
modelId: "NAME_OF_MODEL",
apiKey: "API_KEY",
apiVersion: GoogleAIVersion.V1, // Optional
httpClient: new HttpClient() // Optional; for customizing HTTP client
);
Important
The Hugging Face chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel.Connectors.HuggingFace;
#pragma warning disable SKEXP0070
HuggingFaceChatCompletionService chatCompletionService = new (
model: "NAME_OF_MODEL",
apiKey: "API_KEY",
endpoint: new Uri("YOUR_ENDPOINT") // Optional
);
Important
The Azure AI Inference chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel.Connectors.AzureAIInference;
#pragma warning disable SKEXP0070
AzureAIInferenceChatCompletionService chatCompletionService = new (
modelId: "YOUR_MODEL_ID",
apiKey: "YOUR_API_KEY",
endpoint: new Uri("YOUR_ENDPOINT"), // Used to point to your service
httpClient: new HttpClient() // Optional; if not provided, the HttpClient from the kernel will be used
);
Important
The Ollama chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel.ChatCompletion;
using OllamaSharp;
#pragma warning disable SKEXP0070
using var ollamaClient = new OllamaApiClient(
uriString: "YOUR_ENDPOINT" // E.g. "http://localhost:11434" if Ollama has been started in docker as described above.
defaultModel: "NAME_OF_MODEL" // E.g. "phi3" if phi3 was downloaded as described above.
);
IChatCompletionService chatCompletionService = ollamaClient.AsChatCompletionService();
Important
The Bedrock chat completion connector which is required for Anthropic is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel.Connectors.Amazon;
#pragma warning disable SKEXP0070
BedrockChatCompletionService chatCompletionService = new BedrockChatCompletionService(
modelId: "NAME_OF_MODEL",
bedrockRuntime: amazonBedrockRuntime // Optional; An instance of IAmazonBedrockRuntime, used to communicate with Azure Bedrock.
);
Important
The Bedrock chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel.Connectors.Amazon;
#pragma warning disable SKEXP0070
BedrockChatCompletionService chatCompletionService = new BedrockChatCompletionService(
modelId: "NAME_OF_MODEL",
bedrockRuntime: amazonBedrockRuntime // Optional; An instance of IAmazonBedrockRuntime, used to communicate with Azure Bedrock.
);
Important
The ONNX chat completion connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0070.
using Microsoft.SemanticKernel.Connectors.Onnx;
#pragma warning disable SKEXP0070
OnnxRuntimeGenAIChatCompletionService chatCompletionService = new OnnxRuntimeGenAIChatCompletionService(
modelId: "NAME_OF_MODEL", // E.g. phi-3
modelPath: "PATH_ON_DISK", // Path to the model on disk e.g. C:\Repos\huggingface\microsoft\Phi-3-mini-4k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32
jsonSerializerOptions: customJsonSerializerOptions // Optional; for providing custom serialization settings for e.g. function argument / result serialization and parsing.
);
For other AI service providers that support the OpenAI chat completion API (e.g., LLM Studio), you can use the following code to reuse the existing OpenAI chat completion connector.
Important
Using custom endpoints with the OpenAI connector is currently experimental. To use it, you will need to add #pragma warning disable SKEXP0010.
using Microsoft.SemanticKernel.Connectors.OpenAI;
#pragma warning disable SKEXP0010
OpenAIChatCompletionService chatCompletionService = new (
modelId: "gpt-4",
apiKey: "YOUR_API_KEY",
organization: "YOUR_ORG_ID", // Optional
endpoint: new Uri("YOUR_ENDPOINT"), // Used to point to your service
httpClient: new HttpClient() // Optional; if not provided, the HttpClient from the kernel will be used
);
To create a chat completion service, you need to install and import the necessary modules and create an instance of the service. Below are the steps to install and create a chat completion service for each AI service provider.
The Semantic Kernel package comes with all the necessary packages to use Azure OpenAI. There are no additional packages required to use Azure OpenAI.
The Semantic Kernel package comes with all the necessary packages to use OpenAI. There are no additional packages required to use OpenAI.
pip install semantic-kernel[azure]
pip install semantic-kernel[anthropic]
pip install semantic-kernel[aws]
pip install semantic-kernel[google]
pip install semantic-kernel[google]
pip install semantic-kernel[mistralai]
pip install semantic-kernel[ollama]
pip install semantic-kernel[onnx]
Creating a chat completion service
Tip
There are three methods to supply the required information to AI services. You may either provide the information directly through the constructor, set the necessary environment variables, or create a .env file within your project directory containing the environment variables. You can visit this page to find all the required environment variables for each AI service provider: https://github.com/microsoft/semantic-kernel/blob/main/python/samples/concepts/setup/ALL_SETTINGS.md
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
chat_completion_service = AzureChatCompletion(
deployment_name="my-deployment",
api_key="my-api-key",
endpoint="my-api-endpoint", # Used to point to your service
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
# You can do the following if you have set the necessary environment variables or created a .env file
chat_completion_service = AzureChatCompletion(service_id="my-service-id")
Note
The AzureChatCompletion service also supports Microsoft Entra authentication. If you don't provide an API key, the service will attempt to authenticate using the Entra token.
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
chat_completion_service = OpenAIChatCompletion(
ai_model_id="my-deployment",
api_key="my-api-key",
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
# You can do the following if you have set the necessary environment variables or created a .env file
chat_completion_service = OpenAIChatCompletion(service_id="my-service-id")
from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatCompletion
chat_completion_service = AzureAIInferenceChatCompletion(
ai_model_id="my-deployment",
api_key="my-api-key",
endpoint="my-api-endpoint", # Used to point to your service
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
# You can do the following if you have set the necessary environment variables or created a .env file
chat_completion_service = AzureAIInferenceChatCompletion(ai_model_id="my-deployment", service_id="my-service-id")
# You can also use an Azure OpenAI deployment with the Azure AI Inference service
from azure.ai.inference.aio import ChatCompletionsClient
from azure.identity.aio import DefaultAzureCredential
chat_completion_service = AzureAIInferenceChatCompletion(
ai_model_id="my-deployment",
client=ChatCompletionsClient(
endpoint=f"{str(endpoint).strip('/')}/openai/deployments/{deployment_name}",
credential=DefaultAzureCredential(),
credential_scopes=["https://cognitiveservices.azure.com/.default"],
),
)
Note
The AzureAIInferenceChatCompletion service also supports Microsoft Entra authentication. If you don't provide an API key, the service will attempt to authenticate using the Entra token.
from semantic_kernel.connectors.ai.anthropic import AnthropicChatCompletion
chat_completion_service = AnthropicChatCompletion(
chat_model_id="model-id",
api_key="my-api-key",
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
from semantic_kernel.connectors.ai.bedrock import BedrockChatCompletion
chat_completion_service = BedrockChatCompletion(
model_id="model-id",
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
Note
Amazon Bedrock does not accept an API key. Follow this guide to configure your environment.
from semantic_kernel.connectors.ai.google.google_ai import GoogleAIChatCompletion
chat_completion_service = GoogleAIChatCompletion(
gemini_model_id="model-id",
api_key="my-api-key",
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
Tip
Users can access Google's Gemini models via Google AI Studio or Google Vertex platform. Follow this guide to configure your environment.
from semantic_kernel.connectors.ai.google.vertex_ai import VertexAIChatCompletion
chat_completion_service = VertexAIChatCompletion(
project_id="my-project-id",
gemini_model_id="model-id",
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
Tip
Users can access Google's Gemini models via Google AI Studio or Google Vertex platform. Follow this guide to configure your environment.
from semantic_kernel.connectors.ai.mistral_ai import MistralAIChatCompletion
chat_completion_service = MistralAIChatCompletion(
ai_model_id="model-id",
api_key="my-api-key",
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
from semantic_kernel.connectors.ai.ollama import OllamaChatCompletion
chat_completion_service = OllamaChatCompletion(
ai_model_id="model-id",
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
Tip
Learn more about Ollama and download the necessary software from here.
from semantic_kernel.connectors.ai.onnx import OnnxGenAIChatCompletion
chat_completion_service = OnnxGenAIChatCompletion(
template="phi3v",
ai_model_path="model-path",
service_id="my-service-id", # Optional; for targeting specific services within Semantic Kernel
)
You can start using the completion service right away or add the chat completion service to a kernel. You can use the following code to add a service to the kernel.
from semantic_kernel import Kernel
# Initialize the kernel
kernel = Kernel()
# Add the chat completion service created above to the kernel
kernel.add_service(chat_completion_service)
You can create instances of the chat completion service directly and either add them to
a kernel or use them directly in your code without injecting them into the kernel. The
following code shows how to create a a chat completion service and add it to the kernel.
import com.azure.ai.openai.OpenAIAsyncClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.microsoft.semantickernel.Kernel;
import com.microsoft.semantickernel.services.chatcompletion.ChatCompletionService;
// Create the client
OpenAIAsyncClient client = new OpenAIClientBuilder()
.credential(azureOpenAIClientCredentials)
.endpoint(azureOpenAIClientEndpoint)
.buildAsyncClient();
// Create the chat completion service
ChatCompletionService openAIChatCompletion = OpenAIChatCompletion.builder()
.withOpenAIAsyncClient(client)
.withModelId(modelId)
.build();
// Initialize the kernel
Kernel kernel = Kernel.builder()
.withAIService(ChatCompletionService.class, openAIChatCompletion)
.build();
import com.azure.ai.openai.OpenAIAsyncClient;
import com.azure.ai.openai.OpenAIClientBuilder;
import com.microsoft.semantickernel.Kernel;
import com.microsoft.semantickernel.services.chatcompletion.ChatCompletionService;
// Create the client
OpenAIAsyncClient client = new OpenAIClientBuilder()
.credential(openAIClientCredentials)
.buildAsyncClient();
// Create the chat completion service
ChatCompletionService openAIChatCompletion = OpenAIChatCompletion.builder()
.withOpenAIAsyncClient(client)
.withModelId(modelId)
.build();
// Initialize the kernel
Kernel kernel = Kernel.builder()
.withAIService(ChatCompletionService.class, openAIChatCompletion)
.build();
Retrieving chat completion services
Once you've added chat completion services to your kernel, you can retrieve them using the get service method. Below is an example of how you can retrieve a chat completion service from the kernel.
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
from semantic_kernel.connectors.ai.chat_completion_client_base import ChatCompletionClientBase
# Retrieve the chat completion service by type
chat_completion_service = kernel.get_service(type=ChatCompletionClientBase)
# Retrieve the chat completion service by id
chat_completion_service = kernel.get_service(service_id="my-service-id")
# Retrieve the default inference settings
execution_settings = kernel.get_prompt_execution_settings_from_service_id("my-service-id")
Adding the chat completion service to the kernel is not required if you don't need to use other services in the kernel. You can use the chat completion service directly in your code.
Using chat completion services
Now that you have a chat completion service, you can use it to generate responses from an AI agent. There are two main ways to use a chat completion service:
Non-streaming: You wait for the service to generate an entire response before returning it to the user.
Streaming: Individual chunks of the response are generated and returned to the user as they are created.
Before getting started, you will need to manually create an execution settings instance to use the chat completion service if you did not register the service with the kernel.
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
execution_settings = OpenAIChatPromptExecutionSettings()
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
execution_settings = OpenAIChatPromptExecutionSettings()
from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatPromptExecutionSettings
execution_settings = AzureAIInferenceChatPromptExecutionSettings()
from semantic_kernel.connectors.ai.anthropic import AnthropicChatPromptExecutionSettings
execution_settings = AnthropicChatPromptExecutionSettings()
from semantic_kernel.connectors.ai.bedrock import BedrockChatPromptExecutionSettings
execution_settings = BedrockChatPromptExecutionSettings()
from semantic_kernel.connectors.ai.google.google_ai import GoogleAIChatPromptExecutionSettings
execution_settings = GoogleAIChatPromptExecutionSettings()
from semantic_kernel.connectors.ai.google.vertex_ai import VertexAIChatPromptExecutionSettings
execution_settings = VertexAIChatPromptExecutionSettings()
from semantic_kernel.connectors.ai.mistral_ai import MistralAIChatPromptExecutionSettings
execution_settings = MistralAIChatPromptExecutionSettings()
from semantic_kernel.connectors.ai.ollama import OllamaChatPromptExecutionSettings
execution_settings = OllamaChatPromptExecutionSettings()
from semantic_kernel.connectors.ai.onnx import OnnxGenAIPromptExecutionSettings
execution_settings = OnnxGenAIPromptExecutionSettings()
Tip
To see what you can configure in the execution settings, you can check the class definition in the source code or check out the API documentation.
Below are the two ways you can use a chat completion service to generate responses.
Non-streaming chat completion
To use non-streaming chat completion, you can use the following code to generate a response from the AI agent.
ChatHistory history = [];
history.AddUserMessage("Hello, how are you?");
var response = await chatCompletionService.GetChatMessageContentAsync(
history,
kernel: kernel
);
chat_history = ChatHistory()
chat_history.add_user_message("Hello, how are you?")
response = await chat_completion.get_chat_message_content(
chat_history=history,
settings=execution_settings,
)
ChatHistory history = new ChatHistory();
history.addUserMessage("Hello, how are you?");
InvocationContext optionalInvocationContext = null;
List<ChatMessageContent<?>> response = chatCompletionService.getChatMessageContentsAsync(
history,
kernel,
optionalInvocationContext
);
Streaming chat completion
To use streaming chat completion, you can use the following code to generate a response from the AI agent.
ChatHistory history = [];
history.AddUserMessage("Hello, how are you?");
var response = chatCompletionService.GetStreamingChatMessageContentsAsync(
chatHistory: history,
kernel: kernel
);
await foreach (var chunk in response)
{
Console.Write(chunk);
}
chat_history = ChatHistory()
chat_history.add_user_message("Hello, how are you?")
response = chat_completion.get_streaming_chat_message_content(
chat_history=history,
settings=execution_settings,
)
async for chunk in response:
print(chunk, end="")
Note
Semantic Kernel for Java does not support the streaming response model.
Next steps
Now that you've added chat completion services to your Semantic Kernel project, you can start creating conversations with your AI agent. To learn more about using a chat completion service, check out the following articles: