使用 PyTorch 將影像分類器模型定型

發行項
11/20/2024

注意

為了獲得更大的功能， PyTorch 也可以與 Windows 上的 DirectML 搭配使用。

在本教學課程的上一個階段中，我們取得了我們將用來使用 PyTorch 定型影像分類器的數據集。現在，是時候使用該數據了。

若要使用 PyTorch 定型影像分類器，您需要完成下列步驟：

載入數據。如果您已完成本教學課程的上一個步驟，表示您已經處理過此作業。
定義卷積類神經網路。
定義遺失函式。
在定型數據上定型模型。
在測試數據上測試網路。

定義卷積類神經網路。

若要使用 PyTorch 建置神經網路，您將使用 torch.nn 套件。此套件包含模組、可延伸類別，以及建置神經網路所需的所有元件。

在這裡，您將建置基本的 捲積神經網路 （CNN），以分類來自CIFAR10數據集的影像。

CNN 是類神經網路，定義為多層神經網路，其設計目的是要偵測數據中的複雜特徵。它們最常用於電腦視覺應用程式。

我們的網路將採用下列 14 層的結構：

Conv -> BatchNorm -> ReLU -> Conv -> BatchNorm -> ReLU -> MaxPool -> Conv -> BatchNorm -> ReLU -> Conv -> BatchNorm -> ReLU -> Linear.

捲積層

捲積層是CNN的主要層，可協助我們偵測影像中的特徵。每個層都有數目的通道來偵測影像中的特定功能，以及定義所偵測功能大小的一些核心。因此，具有 64 個通道和核心大小 3 x 3 的捲積層會偵測到 64 個不同的特徵，每個大小為 3 x 3。當您定義捲積層時，您會提供通道內數目、輸出通道數目和核心大小。層次中的輸出通道數目可作為下一層的通道內數目。

例如：具有 channels=3、out-channels=10 且 kernel-size=6 的捲積層會取得 RGB 影像（3 個通道）作為輸入，並將 10 個功能偵測器套用至核心大小為 6x6 的影像。較小的核心大小可減少計算時間和權數共用。

其他圖層

我們的網路涉及下列其他層：

此 ReLU 層是啟用函式，可定義要 0 或更新的所有傳入功能。當您套用此圖層時，小於 0 的任何數位會變更為零，而其他數位則維持不變。
圖層 BatchNorm2d 會在輸入上套用正規化，以具有零平均數和單位變異數，並增加網路精確度。
圖層 MaxPool 將協助我們確保影像中物件的位置不會影響神經網路偵測其特定特徵的能力。
此 Linear 層是我們網路中的最終層，它會計算每個類別的分數。在CIFAR10數據集中，有十個標籤類別。具有最高分數的標籤將會是模型預測的標籤。在線性圖層中，您必須指定輸入特徵的數目，以及應該對應至類別數目的輸出特徵數目。

類神經網路如何運作？

CNN 是一個轉送網路。在定型過程中，網路會透過所有層處理輸入、計算遺失，以瞭解影像的預測標籤從正確的標籤下降到多遠，並將漸層傳播回網路，以更新圖層的權數。透過逐一查看龐大的輸入數據集，網路將會「學習」設定其權數，以達到最佳結果。

正向函式會計算遺失函式的值，而向後函式會計算可學習參數的漸層。當您使用 PyTorch 建立神經網路時，只需要定義正向函式。會自動定義回溯函式。

將下列程式代碼複製到 PyTorchTraining.py Visual Studio 中的檔案，以定義 CCN。

import torch
import torch.nn as nn
import torchvision
import torch.nn.functional as F

# Define a convolution neural network
class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=5, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(12)
        self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=5, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(12)
        self.pool = nn.MaxPool2d(2,2)
        self.conv4 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=5, stride=1, padding=1)
        self.bn4 = nn.BatchNorm2d(24)
        self.conv5 = nn.Conv2d(in_channels=24, out_channels=24, kernel_size=5, stride=1, padding=1)
        self.bn5 = nn.BatchNorm2d(24)
        self.fc1 = nn.Linear(24*10*10, 10)

    def forward(self, input):
        output = F.relu(self.bn1(self.conv1(input)))      
        output = F.relu(self.bn2(self.conv2(output)))     
        output = self.pool(output)                        
        output = F.relu(self.bn4(self.conv4(output)))     
        output = F.relu(self.bn5(self.conv5(output)))     
        output = output.view(-1, 24*10*10)
        output = self.fc1(output)

        return output

# Instantiate a neural network model 
model = Network()

注意

有興趣深入瞭解使用 PyTorch 的類神經網路嗎？查看 PyTorch 檔

定義遺失函式

損失函式會計算估計輸出距離目標距離的值。主要目標是透過神經網路中的反向傳播來變更加權向量值，以減少損失函式的值。

損失值與模型精確度不同。 Loss 函式可讓我們瞭解模型在定型集上每次反覆運算優化之後的行為有多好。模型的精確度會計算在測試數據上，並顯示正確預測的百分比。

在 PyTorch 中，類神經網路套件包含各種損失函式，形成深度神經網路的建置組塊。在本教學課程中，您將根據使用分類交叉 Entropy 損失和 Adam 優化器定義損失函式來使用分類損失函式。學習速率（lr）會設定您在損失漸層方面調整網路權數的控制權。您將將其設定為 0.001。其越低，定型的速度會越慢。

將下列程式代碼 PyTorchTraining.py 複製到 Visual Studio 中的檔案，以定義遺失函式和優化器。

from torch.optim import Adam
 
# Define the loss function with Classification Cross-Entropy loss and an optimizer with Adam optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=0.001, weight_decay=0.0001)

在定型數據上定型模型。

若要定型模型，您必須迴圈處理我們的數據反覆運算器、將輸入饋送至網路，以及優化。 PyTorch 沒有專用的 GPU 使用連結庫，但您可以手動定義執行裝置。如果計算機上存在，則裝置會是 Nvidia GPU，如果不存在，則為 CPU。

將下列程式代碼新增至 PyTorchTraining.py 檔案

from torch.autograd import Variable

# Function to save the model
def saveModel():
    path = "./myFirstModel.pth"
    torch.save(model.state_dict(), path)

# Function to test the model with the test dataset and print the accuracy for the test images
def testAccuracy():
    
    model.eval()
    accuracy = 0.0
    total = 0.0
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    
    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            # run the model on the test set to predict labels
            outputs = model(images.to(device))
            # the label with the highest energy will be our prediction
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            accuracy += (predicted == labels.to(device)).sum().item()
    
    # compute the accuracy over all test images
    accuracy = (100 * accuracy / total)
    return(accuracy)


# Training function. We simply have to loop over our data iterator and feed the inputs to the network and optimize.
def train(num_epochs):
    
    best_accuracy = 0.0

    # Define your execution device
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("The model will be running on", device, "device")
    # Convert model parameters and buffers to CPU or Cuda
    model.to(device)

    for epoch in range(num_epochs):  # loop over the dataset multiple times
        running_loss = 0.0
        running_acc = 0.0

        for i, (images, labels) in enumerate(train_loader, 0):
            
            # get the inputs
            images = Variable(images.to(device))
            labels = Variable(labels.to(device))

            # zero the parameter gradients
            optimizer.zero_grad()
            # predict classes using images from the training set
            outputs = model(images)
            # compute the loss based on model output and real labels
            loss = loss_fn(outputs, labels)
            # backpropagate the loss
            loss.backward()
            # adjust parameters based on the calculated gradients
            optimizer.step()

            # Let's print statistics for every 1,000 images
            running_loss += loss.item()     # extract the loss value
            if i % 1000 == 999:    
                # print every 1000 (twice per epoch) 
                print('[%d, %5d] loss: %.3f' %
                      (epoch + 1, i + 1, running_loss / 1000))
                # zero the loss
                running_loss = 0.0

        # Compute and print the average accuracy fo this epoch when tested over all 10000 test images
        accuracy = testAccuracy()
        print('For epoch', epoch+1,'the test accuracy over the whole test set is %d %%' % (accuracy))
        
        # we want to save the model if the accuracy is the best
        if accuracy > best_accuracy:
            saveModel()
            best_accuracy = accuracy

在測試數據上測試模型。

現在，您可以使用來自測試集的影像批次來測試模型。

將下列程式碼新增至 PyTorchTraining.py 檔案。

import matplotlib.pyplot as plt
import numpy as np

# Function to show the images
def imageshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# Function to test the model with a batch of images and show the labels predictions
def testBatch():
    # get batch of images from the test DataLoader  
    images, labels = next(iter(test_loader))

    # show all images as one image grid
    imageshow(torchvision.utils.make_grid(images))
   
    # Show the real labels on the screen 
    print('Real labels: ', ' '.join('%5s' % classes[labels[j]] 
                               for j in range(batch_size)))
  
    # Let's see what if the model identifiers the  labels of those example
    outputs = model(images)
    
    # We got the probability for every 10 labels. The highest (max) probability should be correct label
    _, predicted = torch.max(outputs, 1)
    
    # Let's show the predicted labels on the screen to compare with the real ones
    print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] 
                              for j in range(batch_size)))

最後，讓我們新增主要程序代碼。這會起始模型定型、儲存模型，並在畫面上顯示結果。我們只會在定型集上執行兩個 [train(2)] 反覆專案，因此定型程式不會花費太長的時間。

將下列程式碼新增至 PyTorchTraining.py 檔案。

if __name__ == "__main__":
    
    # Let's build our model
    train(5)
    print('Finished Training')

    # Test which classes performed well
    testAccuracy()
    
    # Let's load the model we just created and test the accuracy per label
    model = Network()
    path = "myFirstModel.pth"
    model.load_state_dict(torch.load(path))

    # Test with batch of images
    testBatch()

讓我們執行測試！請確定頂端工具列中的下拉功能表設定為 [偵錯]。如果您的裝置是 64 位，請將 [方案平臺] 變更為 x64，以在本機計算機上執行專案，如果專案為 32 位，請將 x86 變更為 x86。

選擇 Epoch 數位（完成通過定型數據集的數目）等於兩個（[train(2)]）會導致逐一查看 10,000 個影像的整個測試數據集兩次。完成第 8 代 Intel CPU 定型大約需要 20 分鐘的時間，模型在十個標籤分類中應達到或少於 65% 的成功率。

若要執行專案，按一下工具列上的 [開始偵錯] 按鈕，或按下 F5。

控制台視窗隨即出現，而且能夠查看定型程式。

如您所定義，遺失值會每 1,000 批影像列印一次，或每一次反覆運算定型集五次。您預期遺失值會隨著每個循環而減少。

您也會在每個反覆項目之後看到模型的正確性。模型精確度與損失值不同。 Loss 函式可讓我們瞭解模型在定型集上每次反覆運算優化之後的行為有多好。模型的精確度會計算在測試數據上，並顯示正確預測的百分比。在我們的案例中，它會告訴我們每個定型反覆項目之後，10,000 個影像測試集中有多少個影像能夠正確分類。

定型完成後，您應該會看到類似下面的輸出。您的數位不會完全相同 - 根據許多因素進行編碼，而且不會一律傳回識別結果，但看起來應該類似。

初始模型定型的輸出

執行 5 個 Epoch 之後，模型成功率為 70%。這是在短時間內定型的基本模型的良好結果！

使用影像批次進行測試，模型從10個批次取得正確的7個影像。完全不壞，而且與模型成功率一致。

已成功分類影像

您可以檢查我們的模型可以預測最佳類別。簡單新增下列程式代碼：

選擇性 - 將下列 testClassess 函式新增至 PyTorchTraining.py 檔案，在 main 函式內新增此函 testClassess() 式的呼叫 - __name__ == "__main__"。

# Function to test what classes performed well
def testClassess():
    class_correct = list(0. for i in range(number_of_labels))
    class_total = list(0. for i in range(number_of_labels))
    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            c = (predicted == labels).squeeze()
            for i in range(batch_size):
                label = labels[i]
                class_correct[label] += c[i].item()
                class_total[label] += 1

    for i in range(number_of_labels):
        print('Accuracy of %5s : %2d %%' % (
            classes[i], 100 * class_correct[i] / class_total[i]))

輸出如下所示：

初始分類精確度

後續步驟

既然我們有分類模型，下一個步驟是將模型轉換成 ONNX 格式

共用方式為