Usare modelli di intelligenza artificiale personalizzati e locali con l’SDK Semantic Kernel

Articolo
05/24/2024

Questo articolo illustra come integrare modelli personalizzati e locali nell’SDK Semantic Kernel e usarli per la generazione di testo e i completamenti della chat.

È possibile adattare i passaggi per usarli con qualsiasi modello a cui è possibile accedere, indipendentemente dalla posizione o dalla modalità di accesso. Ad esempio, è possibile integrare il modello codellama con l’SDK Semantic Kernel per abilitare la generazione e la discussione del codice.

I modelli personalizzati e locali spesso forniscono accesso tramite API REST, ad esempio vedere Compatibilità OpenAI di Ollama. Prima di integrare il modello, sarà necessario ospitare e accedere all’applicazione .NET tramite HTTPS.

Prerequisiti

Un account Azure con una sottoscrizione attiva. Creare un account gratuitamente.
.NET SDK
Pacchetto NuGet Microsoft.SemanticKernel
Modello personalizzato o locale, distribuito e accessibile all’applicazione .NET

Implementare la generazione di testo usando un modello locale

La sezione seguente illustra come integrare il modello con l’SDK Semantic Kernel e quindi usarlo per generare completamenti del testo.

Creare una classe di servizio che implementa l’interfaccia ITextGenerationService. Ad esempio:

class MyTextGenerationService : ITextGenerationService
{
    private IReadOnlyDictionary<string, object?>? _attributes;
    public IReadOnlyDictionary<string, object?> Attributes =>
        _attributes ??= new Dictionary<string, object?>();

    public string ModelUrl { get; init; } = "<default url to your model's Chat API>";
    public required string ModelApiKey { get; init; }

    public async IAsyncEnumerable<StreamingTextContent> GetStreamingTextContentsAsync(
        string prompt,
        PromptExecutionSettings? executionSettings = null,
        Kernel? kernel = null,
        [EnumeratorCancellation] CancellationToken cancellationToken = default
    )
    {
        // Build your model's request object, specify that streaming is requested
        MyModelRequest request = MyModelRequest.FromPrompt(prompt, executionSettings);
        request.Stream = true;

        // Send the completion request via HTTP
        using var httpClient = new HttpClient();

        // Send a POST to your model with the serialized request in the body
        using HttpResponseMessage httpResponse = await httpClient.PostAsJsonAsync(
            ModelUrl,
            request,
            cancellationToken
        );

        // Verify the request was completed successfully
        httpResponse.EnsureSuccessStatusCode();

        // Read your models response as a stream
        using StreamReader reader =
            new(await httpResponse.Content.ReadAsStreamAsync(cancellationToken));

        // Iteratively read a chunk of the response until the end of the stream
        // It is more efficient to use a buffer that is the same size as the internal buffer of the stream
        // If the size of the internal buffer was unspecified when the stream was constructed, its default size is 4 kilobytes (2048 UTF-16 characters)
        char[] buffer = new char[2048];
        while (!reader.EndOfStream)
        {
            // Check the cancellation token with each iteration
            cancellationToken.ThrowIfCancellationRequested();

            // Fill the buffer with the next set of characters, track how many characters were read
            int readCount = reader.Read(buffer, 0, buffer.Length);

            // Convert the character buffer to a string, only include as many characters as were just read
            string chunk = new(buffer, 0, readCount);

            yield return new StreamingTextContent(chunk);
        }
    }

    public async Task<IReadOnlyList<TextContent>> GetTextContentsAsync(
        string prompt,
        PromptExecutionSettings? executionSettings = null,
        Kernel? kernel = null,
        CancellationToken cancellationToken = default
    )
    {
        // Build your model's request object
        MyModelRequest request = MyModelRequest.FromPrompt(prompt, executionSettings);

        // Send the completion request via HTTP
        using var httpClient = new HttpClient();

        // Send a POST to your model with the serialized request in the body
        using HttpResponseMessage httpResponse = await httpClient.PostAsJsonAsync(
            ModelUrl,
            request,
            cancellationToken
        );

        // Verify the request was completed successfully
        httpResponse.EnsureSuccessStatusCode();

        // Deserialize the response body to your model's response object
        // Handle when the deserialization fails and returns null
        MyModelResponse response =
            await httpResponse.Content.ReadFromJsonAsync<MyModelResponse>(cancellationToken)
            ?? throw new Exception("Failed to deserialize response from model");

        // Convert your model's response into a list of ChatMessageContent
        return response
            .Completions.Select<string, TextContent>(completion => new(completion))
            .ToImmutableList();
    }
}

Includere la nuova classe di servizio durante la compilazione del Kernel. Ad esempio:

IKernelBuilder builder = Kernel.CreateBuilder();

// Add your text generation service as a singleton instance
builder.Services.AddKeyedSingleton<ITextGenerationService>(
    "myTextService1",
    new MyTextGenerationService
    {
        // Specify any properties specific to your service, such as the url or API key
        ModelUrl = "https://localhost:38748",
        ModelApiKey = "myApiKey"
    }
);

// Alternatively, add your text generation service as a factory method
builder.Services.AddKeyedSingleton<ITextGenerationService>(
    "myTextService2",
    (_, _) =>
        new MyTextGenerationService
        {
            // Specify any properties specific to your service, such as the url or API key
            ModelUrl = "https://localhost:38748",
            ModelApiKey = "myApiKey"
        }
);

// Add any other Kernel services or configurations
// ...
Kernel kernel = builder.Build();

Inviare una richiesta di generazione di testo al modello direttamente tramite il Kernel o usando la classe del servizio. Ad esempio:

var executionSettings = new PromptExecutionSettings
{
    // Add execution settings, such as the ModelID and ExtensionData
    ModelId = "MyModelId",
    ExtensionData = new Dictionary<string, object> { { "MaxTokens", 500 } }
};

// Send a prompt to your model directly through the Kernel
// The Kernel response will be null if the model can't be reached
string prompt = "Please list three services offered by Azure";
string? response = await kernel.InvokePromptAsync<string>(prompt);
Console.WriteLine($"Output: {response}");

// Alteratively, send a prompt to your model through the text generation service
ITextGenerationService textService = kernel.GetRequiredService<ITextGenerationService>();
TextContent responseContents = await textService.GetTextContentAsync(
    prompt,
    executionSettings
);
Console.WriteLine($"Output: {responseContents.Text}");

Implementare il completamento della chat usando un modello locale

La sezione seguente illustra come integrare il modello con l’SDK Semantic Kernel e quindi usarlo per i completamenti della chat.

Creare una classe del servizio che implementa l’interfaccia IChatCompletionService. Ad esempio:

class MyChatCompletionService : IChatCompletionService
{
    private IReadOnlyDictionary<string, object?>? _attributes;
    public IReadOnlyDictionary<string, object?> Attributes =>
        _attributes ??= new Dictionary<string, object?>();

    public string ModelUrl { get; init; } = "<default url to your model's Chat API>";
    public required string ModelApiKey { get; init; }

    public async Task<IReadOnlyList<ChatMessageContent>> GetChatMessageContentsAsync(
        ChatHistory chatHistory,
        PromptExecutionSettings? executionSettings = null,
        Kernel? kernel = null,
        CancellationToken cancellationToken = default
    )
    {
        // Build your model's request object
        MyModelRequest request = MyModelRequest.FromChatHistory(chatHistory, executionSettings);

        // Send the completion request via HTTP
        using var httpClient = new HttpClient();

        // Send a POST to your model with the serialized request in the body
        using HttpResponseMessage httpResponse = await httpClient.PostAsJsonAsync(
            ModelUrl,
            request,
            cancellationToken
        );

        // Verify the request was completed successfully
        httpResponse.EnsureSuccessStatusCode();

        // Deserialize the response body to your model's response object
        // Handle when the deserialization fails and returns null
        MyModelResponse response =
            await httpResponse.Content.ReadFromJsonAsync<MyModelResponse>(cancellationToken)
            ?? throw new Exception("Failed to deserialize response from model");

        // Convert your model's response into a list of ChatMessageContent
        return response
            .Completions.Select<string, ChatMessageContent>(completion =>
                new(AuthorRole.Assistant, completion)
            )
            .ToImmutableList();
    }

    public async IAsyncEnumerable<StreamingChatMessageContent> GetStreamingChatMessageContentsAsync(
        ChatHistory chatHistory,
        PromptExecutionSettings? executionSettings = null,
        Kernel? kernel = null,
        [EnumeratorCancellation] CancellationToken cancellationToken = default
    )
    {
        // Build your model's request object, specify that streaming is requested
        MyModelRequest request = MyModelRequest.FromChatHistory(chatHistory, executionSettings);
        request.Stream = true;

        // Send the completion request via HTTP
        using var httpClient = new HttpClient();

        // Send a POST to your model with the serialized request in the body
        using HttpResponseMessage httpResponse = await httpClient.PostAsJsonAsync(
            ModelUrl,
            request,
            cancellationToken
        );

        // Verify the request was completed successfully
        httpResponse.EnsureSuccessStatusCode();

        // Read your models response as a stream
        using StreamReader reader =
            new(await httpResponse.Content.ReadAsStreamAsync(cancellationToken));

        // Iteratively read a chunk of the response until the end of the stream
        // It is more efficient to use a buffer that is the same size as the internal buffer of the stream
        // If the size of the internal buffer was unspecified when the stream was constructed, its default size is 4 kilobytes (2048 UTF-16 characters)
        char[] buffer = new char[2048];
        while (!reader.EndOfStream)
        {
            // Check the cancellation token with each iteration
            cancellationToken.ThrowIfCancellationRequested();

            // Fill the buffer with the next set of characters, track how many characters were read
            int readCount = reader.Read(buffer, 0, buffer.Length);

            // Convert the character buffer to a string, only include as many characters as were just read
            string chunk = new(buffer, 0, readCount);

            yield return new StreamingChatMessageContent(AuthorRole.Assistant, chunk);
        }
    }
}

Includere la nuova classe di servizio durante la compilazione del Kernel. Ad esempio:

IKernelBuilder builder = Kernel.CreateBuilder();

// Add your chat completion service as a singleton instance
builder.Services.AddKeyedSingleton<IChatCompletionService>(
    "myChatService1",
    new MyChatCompletionService
    {
        // Specify any properties specific to your service, such as the url or API key
        ModelUrl = "https://localhost:38748",
        ModelApiKey = "myApiKey"
    }
);

// Alternatively, add your chat completion service as a factory method
builder.Services.AddKeyedSingleton<IChatCompletionService>(
    "myChatService2",
    (_, _) =>
        new MyChatCompletionService
        {
            // Specify any properties specific to your service, such as the url or API key
            ModelUrl = "https://localhost:38748",
            ModelApiKey = "myApiKey"
        }
);

// Add any other Kernel services or configurations
// ...
Kernel kernel = builder.Build();

Inviare una richiesta di completamento della chat al modello direttamente tramite il Kernel o usando la classe del servizio. Ad esempio:

var executionSettings = new PromptExecutionSettings
{
    // Add execution settings, such as the ModelID and ExtensionData
    ModelId = "MyModelId",
    ExtensionData = new Dictionary<string, object> { { "MaxTokens", 500 } }
};

// Send a string representation of the chat history to your model directly through the Kernel
// This uses a special syntax to denote the role for each message
// For more information on this syntax see:
// https://zcusa.951200.xyz/en-us/semantic-kernel/prompts/your-first-prompt?tabs=Csharp
string prompt = """
    <message role="system">the initial system message for your chat history</message>
    <message role="user">the user's initial message</message>
    """;

string? response = await kernel.InvokePromptAsync<string>(prompt);
Console.WriteLine($"Output: {response}");

// Alteratively, send a prompt to your model through the chat completion service
// First, initialize a chat history with your initial system message
string systemMessage = "<the initial system message for your chat history>";
Console.WriteLine($"System Prompt: {systemMessage}");
var chatHistory = new ChatHistory(systemMessage);

// Add the user's input to your chat history
string userRequest = "<the user's initial message>";
Console.WriteLine($"User: {userRequest}");
chatHistory.AddUserMessage(userRequest);

// Get the models response and add it to the chat history
IChatCompletionService service = kernel.GetRequiredService<IChatCompletionService>();
ChatMessageContent responseMessage = await service.GetChatMessageContentAsync(
    chatHistory,
    executionSettings
);
Console.WriteLine($"Assistant: {responseMessage.Content}");
chatHistory.Add(responseMessage);

// Continue sending and receiving messages between the user and model
// ...

Condividi tramite

Usare modelli di intelligenza artificiale personalizzati e locali con l’SDK Semantic Kernel

Prerequisiti

Implementare la generazione di testo usando un modello locale

Implementare il completamento della chat usando un modello locale

Risorse aggiuntive

Condividi tramite

Usare modelli di intelligenza artificiale personalizzati e locali con l’SDK Semantic Kernel

Prerequisiti

Implementare la generazione di testo usando un modello locale

Implementare il completamento della chat usando un modello locale

Contenuto correlato

Risorse aggiuntive