使用 ONNX Runtime 開始使用 WinUI 應用程式中的 ONNX 模型

發行項
12/14/2024

本文將逐步引導您建立 WinUI 3 應用程式，該應用程式會使用 ONNX 模型來分類影像中的物件，並顯示每個分類的信心。如需在 Windows 應用程式中使用 AI 和機器學習模型的詳細資訊，請參閱在 Windows 應用程式中開始使用 AI 和 Machine Learning 模型。

使用 AI 功能時，建議您檢閱：在 Windows上開發負責任的產生式 AI 應用程式和功能。

什麼是 ONNX 執行環境

ONNX Runtime 是跨平臺機器學習模型加速器，具有彈性介面來整合硬體特定連結庫。 ONNX Runtime 可以搭配來自 PyTorch、Tensorflow/Keras、TFLite、scikit-learn和其他架構的模型使用。如需詳細資訊，請參閱 ONNX Runtime 網站，網址為 https://onnxruntime.ai/docs/。

此範例會使用 DirectML Execution Provider，它會在 Windows 裝置上的不同硬體選項中抽象並執行，並支援跨本機加速器執行，例如 GPU 和 NPU。

先決條件

您的裝置必須啟用開發人員模式。如需詳細資訊，請參閱啟用您的裝置以進行開發。
Visual Studio 2022 或更新版本搭配 .NET 桌面應用程式開發工作負載。

建立新的 C# WinUI 應用程式

在 Visual Studio 中，建立新的專案。在 [[建立新專案] 對話框中，將語言篩選設定為 “C#”，並將專案類型篩選設定為 “winui”，然後選取 [空白應用程式]、[封裝] （Desktop 中的 WinUI3） 範本。將新專案命名為「ONNXWinUIExample」。

新增對 NuGet 套件的參考

在 [方案總管]中，以滑鼠右鍵點擊 [相依性]，然後選取 [管理 NuGet 套件...]。在 NuGet 套件管理員中，選取 [流覽] 索引標籤。搜尋下列套件，並針對每個套件，選取 [版本] 下拉式清單中的最新穩定版本，然後按下 [安裝]。

包裹	描述
Microsoft.ML.OnnxRuntime.DirectML	提供在 GPU 上執行 ONNX 模型的 API。
SixLabors.ImageSharp	提供用於處理模型輸入影像的影像公用程式。
SharpDX.DXGI	提供從 C# 存取 DirectX 裝置的 API。

使用指令將下列新增到的頂端，以存取這些程式庫的 API。

// MainWindow.xaml.cs
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using SharpDX.DXGI;
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Formats;
using SixLabors.ImageSharp.PixelFormats;
using SixLabors.ImageSharp.Processing;

將模型新增至您的專案

在 [方案總管]中，以滑鼠右鍵按一下您的專案，然後選取 [加入-新資料夾]>。將新資料夾命名為「model」。在此範例中，我們將使用來自的 https://github.com/onnx/models 模型。前往模型 https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx的存放庫檢視頁面。按兩下 *[下載源檔] 按鈕。將此檔案複製到您剛才建立的「model」目錄。

在 [方案總管] 中，按兩下模型檔案，並將 [複製到輸出目錄] 設定為 [如果更新時複製]。

建立簡單的UI

在此範例中，我們將建立簡單的UI，其中包含 Button，讓使用者能夠選取影像來評估模型、Image 控件來顯示選取的影像，以及 TextBlock 來列出影像中偵測到的物件，以及每個物件分類的信賴度。

在 MainWindow.xaml 檔案中，以下列 XAML 程式代碼取代預設 StackPanel 元素。

<!--MainWindow.xaml-->
<Grid Padding="25" >
    <Grid.ColumnDefinitions>
        <ColumnDefinition/>
        <ColumnDefinition/>
        <ColumnDefinition/>
    </Grid.ColumnDefinitions>
    <Button x:Name="myButton" Click="myButton_Click" Grid.Column="0" VerticalAlignment="Top">Select photo</Button>
    <Image x:Name="myImage" MaxWidth="300" Grid.Column="1" VerticalAlignment="Top"/>
    <TextBlock x:Name="featuresTextBlock" Grid.Column="2" VerticalAlignment="Top"/>
</Grid>

初始化模型

在檔案中，在mainWindow 類別內，建立名為 InitModel 的協助程式方法，以初始化模型。此方法會使用來自 SharpDX.DXGI 連結庫的 API 來選取第一個可用的適配卡。在此會話中，DirectML 執行提供者的 SessionOptions 物件中設定了選擇的配接器。最後，初始化新的推理會話，並傳入模型檔案的路徑和會話選項。

// MainWindow.xaml.cs

private InferenceSession _inferenceSession;
private string modelDir = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "model");

private void InitModel()
{
    if (_inferenceSession != null)
    {
        return;
    }

    // Select a graphics device
    var factory1 = new Factory1();
    int deviceId = 0;

    Adapter1 selectedAdapter = factory1.GetAdapter1(0);

    // Create the inference session
    var sessionOptions = new SessionOptions
    {
        LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_INFO
    };
    sessionOptions.AppendExecutionProvider_DML(deviceId);
    _inferenceSession = new InferenceSession($@"{modelDir}\resnet50-v2-7.onnx", sessionOptions);

}

載入和分析影像

為了簡單起見，在此範例中，載入和格式化影像的所有步驟、叫用模型，以及顯示結果都會放在按鈕按兩下處理程式內。請注意，我們會將 async 關鍵詞新增至預設範本中包含的按鈕點選處理程式，以便我們可以在處理程式中執行異步操作。

// MainWindow.xaml.cs

private async void myButton_Click(object sender, RoutedEventArgs e)
{
    ...
}

使用 FileOpenPicker 讓使用者從電腦選取影像，以在 UI 中進行分析並顯示影像。

    FileOpenPicker fileOpenPicker = new()
    {
        ViewMode = PickerViewMode.Thumbnail,
        FileTypeFilter = { ".jpg", ".jpeg", ".png", ".gif" },
    };
    InitializeWithWindow.Initialize(fileOpenPicker, WinRT.Interop.WindowNative.GetWindowHandle(this));
    StorageFile file = await fileOpenPicker.PickSingleFileAsync();
    if (file == null)
    {
        return;
    }

    // Display the image in the UI
    var bitmap = new BitmapImage();
    bitmap.SetSource(await file.OpenAsync(Windows.Storage.FileAccessMode.Read));
    myImage.Source = bitmap;

接下來，我們需要處理輸入，使其成為模型所支援的格式。 SixLabors.ImageSharp 連結庫可用來載入 24 位 RGB 格式的影像，並將影像大小調整為 224x224 像素。然後，將圖元值正規化，平均值為 255*[0.485、0.456、0.406] 和標準偏差 255*[0.229、0.224、0.225]。您可以在 GitHub 頁面的 resnet 模型，找到模型預期的格式詳細資料。

    using var fileStream = await file.OpenStreamForReadAsync();

    IImageFormat format = SixLabors.ImageSharp.Image.DetectFormat(fileStream);
    using Image<Rgb24> image = SixLabors.ImageSharp.Image.Load<Rgb24>(fileStream);


    // Resize image
    using Stream imageStream = new MemoryStream();
    image.Mutate(x =>
    {
        x.Resize(new ResizeOptions
        {
            Size = new SixLabors.ImageSharp.Size(224, 224),
            Mode = ResizeMode.Crop
        });
    });

    image.Save(imageStream, format);

    // Preprocess image
    // We use DenseTensor for multi-dimensional access to populate the image data
    var mean = new[] { 0.485f, 0.456f, 0.406f };
    var stddev = new[] { 0.229f, 0.224f, 0.225f };
    DenseTensor<float> processedImage = new(new[] { 1, 3, 224, 224 });
    image.ProcessPixelRows(accessor =>
    {
        for (int y = 0; y < accessor.Height; y++)
        {
            Span<Rgb24> pixelSpan = accessor.GetRowSpan(y);
            for (int x = 0; x < accessor.Width; x++)
            {
                processedImage[0, 0, y, x] = ((pixelSpan[x].R / 255f) - mean[0]) / stddev[0];
                processedImage[0, 1, y, x] = ((pixelSpan[x].G / 255f) - mean[1]) / stddev[1];
                processedImage[0, 2, y, x] = ((pixelSpan[x].B / 255f) - mean[2]) / stddev[2];
            }
        }
    });

接下來，我們會在受管理的影像資料陣列頂端建立類型為 tensor 的 OrtValue 來設定輸入。

    // Setup inputs
    // Pin tensor buffer and create a OrtValue with native tensor that makes use of
    // DenseTensor buffer directly. This avoids extra data copy within OnnxRuntime.
    // It will be unpinned on ortValue disposal
    using var inputOrtValue = OrtValue.CreateTensorValueFromMemory(OrtMemoryInfo.DefaultInstance,
        processedImage.Buffer, new long[] { 1, 3, 224, 224 });

    var inputs = new Dictionary<string, OrtValue>
    {
        { "data", inputOrtValue }
    };

接下來，如果推斷會話尚未初始化，請呼叫 InitModel Helper 方法。然後呼叫 Run 方法來執行模型並擷取結果。

    // Run inference
    if (_inferenceSession == null)
    {
        InitModel();
    }
    using var runOptions = new RunOptions();
    using IDisposableReadOnlyCollection<OrtValue> results = _inferenceSession.Run(runOptions, inputs, _inferenceSession.OutputNames);

模型會將結果輸出為原生張量緩衝區。下列程式代碼會將輸出轉換成 floats 陣列。會套用 softmax 函式，讓值位於範圍 [0,1] 和總和為 1。

    // Postprocess output
    // We copy results to array only to apply algorithms, otherwise data can be accessed directly
    // from the native buffer via ReadOnlySpan<T> or Span<T>
    var output = results[0].GetTensorDataAsSpan<float>().ToArray();
    float sum = output.Sum(x => (float)Math.Exp(x));
    IEnumerable<float> softmax = output.Select(x => (float)Math.Exp(x) / sum);

輸出陣列中每個值的索引會對應至模型定型的標籤，而該索引的值是模型對標籤代表輸入影像中偵測到之物件的信賴度。我們會挑選具有最高信賴值的10個結果。此程式代碼會使用我們將在下一個步驟中定義的一些協助程序物件。

    // Extract top 10
    IEnumerable<Prediction> top10 = softmax.Select((x, i) => new Prediction { Label = LabelMap.Labels[i], Confidence = x })
        .OrderByDescending(x => x.Confidence)
        .Take(10);

    // Print results
    featuresTextBlock.Text = "Top 10 predictions for ResNet50 v2...\n";
    featuresTextBlock.Text += "-------------------------------------\n";
    foreach (var t in top10)
    {
        featuresTextBlock.Text += $"Label: {t.Label}, Confidence: {t.Confidence}\n";
    }
} // End of myButton_Click

宣告輔助物件

Prediction 類別只是提供簡單的方法，讓物件標籤與信賴值產生關聯。在 MainPage.xaml.cs中，在 ONNXWinUIExample 命名空間區塊內新增此類別，但在 MainWindow 類別定義之外。

internal class Prediction
{
    public object Label { get; set; }
    public float Confidence { get; set; }
}

接下來，新增 LabelMap 輔助類別，以特定的順序列出模型所訓練的所有物件標籤，這樣標籤會對應到模型傳回結果的索引。標籤清單太長，無法完整呈現在這裡。您可以從 ONNXRuntime github 存放 庫中的範例程式代碼檔案複製完整的 LabelMap 類別，並將其貼到 ONNXWinUIExample 命名空間區塊中。

public class LabelMap
{
    public static readonly string[] Labels = new[] {
        "tench",
        "goldfish",
        "great white shark",
        ...
        "hen-of-the-woods",
        "bolete",
        "ear",
        "toilet paper"};

執行範例

建置並執行專案。按兩下 [選取相片] 按鈕，然後挑選要分析的影像檔。您可以查看 LabelMap 輔助類別的定義，了解模型可以辨識的項目，並挑選可能產生有趣結果的影像。模型初始化之後，第一次執行模型，並在模型處理完成之後，您應該會看到影像中偵測到的物件清單，以及每個預測的信賴值。

Top 10 predictions for ResNet50 v2...
-------------------------------------
Label: lakeshore, Confidence: 0.91674984
Label: seashore, Confidence: 0.033412453
Label: promontory, Confidence: 0.008877817
Label: shoal, Confidence: 0.0046836217
Label: container ship, Confidence: 0.001940886
Label: Lakeland Terrier, Confidence: 0.0016400366
Label: maze, Confidence: 0.0012478716
Label: breakwater, Confidence: 0.0012336193
Label: ocean liner, Confidence: 0.0011933135
Label: pier, Confidence: 0.0011284945

共用方式為