圖形 (C++ AMP)

發行項
03/01/2013

C++ AMP 包含一些在 Concurrency::graphics 名稱空間裏的應用程式介面供您存取 GPU 的 texture 支援。一些常見的案例為：

您可以使用紋理類別為計算的資料容器，並利用紋理快取空間區域性和 GPU 硬體的配置。空間侷限性是資料元素在實體上相近的屬性。
執行階段提供非運算著色器有效率的交互操作性。像素、頂點、區面細分和外殼著色器經常都會使用或產生您在 C++ AMP 計算中使用的 texture 。
C++ AMP 的繪圖應用程式介面提供存取 sub-word packed buffer 的另一種方法。紋理的格式是由 8 位元或 16 位元純量 texel 的紋理 (紋理項目)，允許對這類壓縮資料存放區的存取。

注意事項
C++ AMP API 不會提供材質取樣和篩選功能。您將使用 C++ AMP 的相互支援功能，並將程式碼以 DirectCompute 和 HLSL 撰寫。

norm 及 unorm 類型

norm 和 unorm 是值被限制在 float 值範圍的型別，即所謂的限制。這些型別可以從其他純量型別建構。在轉型時，該值會先轉換成 float ，接著分別限制兩者的區域於 norm [-1.0...1.0] 或 unorm [0.0...1.0]。正負無窮將會轉型為正負 1 。從非數的轉型未定義。 norm 可以隱式地從 unorm 建構並且不會遺失資料。這些型別有定義轉型至 float 的型別轉換運算子。這些型別和諸如 float 和 int 等內建純量類別之間均有定義二元運算子： + 、 - 、 * 、 / 、 == 、 != 、 > 、 <> 、 = 、 <= 。複合指定運算子也有提供支援： += 、 -= 、 *= 、 /= 。 norm 類別有提供一元負號運算子 (-) 。

短向量程式庫

短向量程式庫提供部份定義於 HLSL 的 Vector Type 的功能並且通常被用來定義紋理元素。短向量是一種資料結構，存放相同型別的一至四個值。支援的型態包含 double 、 float 、 int 、 norm 、 uint 、和 unorm 。下表顯示型別名稱。每一個類別都有提供一個不含底線的 typedef 別名。名稱有底線的型別在 Concurrency::graphics 命名空間裏。不含底線的型別則在 Concurrency::graphics::direct3d 命名空間裏以使它們能和諸如 __int8 和 __int16 等相似名稱分開。

	為 2 的長度	為 3 的長度	為 4 的長度
double	double_2 double2	double_3 double3	double_4 double4
float	float_2 float2	float_3 float3	float_4 float4
int	int_2 int2	int_3 int3	int_4 int4
norm	norm_2 norm2	norm_3 norm3	norm_4 norm4
uint	uint_2 uint2	uint_3 uint3	uint_4 uint4
unorm	unorm_2 unorm2	unorm_3 unorm3	unorm_4 unorm4

double

double_2

double2

double_3

double3

double_4

double4

float

float_2

float2

float_3

float3

float_4

float4

int

int_2

int2

int_3

int3

int_4

int4

norm

norm_2

norm2

norm_3

norm3

norm_4

norm4

uint

uint_2

uint2

uint_3

uint3

uint_4

uint4

unorm

unorm_2

unorm2

unorm_3

unorm3

unorm_4

unorm4

運算子

如果一個運算子有為兩個短向量定義，則它也會有一份一個短向量和一個純量之間的定義。此外，其中一個條件必須成立：

純量型別必須與短向量的元素型別相同。
純量型別可以只透過一個使用者定義的型別轉換隱式地轉型為向量的元素型別。

這些短向量和純量之間的運算是逐元件的進位。這個有效的運算子：

運算子類型	有效的型別
二元運算子	在所有型別上有效： + 、 - 、 * 、 / 、對整數型別有效： % 、 ^ 、 \| 、 & 、 << 、 >> 兩向量的大小必須相同，且結果向量也會有同樣的大小。
關係運算子	在所有型別上有效： == 和 !=
複合設定運算子	在所有型別上有效： += 、 -= 、 *= 、 /= 整數型別上有效： %= 、 ^= 、 \|= 、 &= 、 <<= 、 >>=
遞增和遞減運算子	在所有型別上有效： ++ 、 -- 前置和後置都是有效的。
位元 NOT 運算子 (~)	適用於整數型別。
一元 - 運算子	除了 unorm 和 uint 之外，對所有型別均有效。

Swizzling 運算式

短向量程式庫支援 vector_type.identifier 存取建構子來存取短向量的元件。 identifier ，也就是所謂的 swizzling 運算式，指定向量的元件。運算式可以是左值或右值。識別項的各別文字可能是： x 、 y 、 z 和 w 或 r 、 g 、 b 和 a 。 x 和 r 代表第零個元件， y 和 g 代表第一個元件，依此類推。 (須注意 x 和 r 不能用在同一識別項) 因此， rgba 和 xyzw 會回傳相同的結果。諸如 x 和 y 等單一元件存取子都是純量類別。多元件的存取子是短向量型別。例如，如果您建立一個名為 fourInts 且有值 2 、 4 、 6 和 8 的 int_4 向量，那麼 fourInts.y 會回傳整數 4 且 fourInts.rg 會回傳一個有值為 2 和 4 的 int_2 物件。

Texture 類別

許多 GPU 具有最佳化擷取像素和材質元素且給予影像和材質的硬體以及快取。 texture<T,N> 類別是一個放置材質元素物件的容器類別，它代表了的這些 GPU 的功能。材質元素可以是：

一個 int 、 uint 、 float 、 double 、 norm 、或 unorm 純量。
有兩個或四個元件的短向量。唯一的例外是 double_4 ，這是不允許的。

texture 物件可以具有秩 1 、 2 或 3 。只有透過呼叫 parallel_for_each 的 lambda 的參照才能擷取 texture 物件。 texture 會以 Direct3D texture 物件的形式儲存在 GPU 上。如需更多關於 texture 和 Direct3d 材質元素的資訊，請參閱 Introduction to Textures in Direct3D 11 。

您所使用的材質元素類別可能是圖形程式設計所使用的諸多 texture 格式中的一種。例如，一個 RGBA 格式可以使用 32 位元， R 、 G 、 B 和 A 每個純量元素 8 位元。圖形卡的 texture 硬體可以根據格式存取各別的元素。例如，如果您使用 RGBA 格式， texture 硬體可以將每個 8 位元元素以 32 位元的方式提取出來。在 C++ AMP 中，您可以直接設定您的材質元素的每一個純量元素的位元，如此一來您便可以不須透過左右移位元來在您的程式碼中存取各別的純量元素。

具現化 Texture 物件

您可以直接宣告一個未初始化的 texture 物件。下面的範例程式碼宣告數個 texture 物件。

#include <amp.h>
#include <amp_graphics.h>
using namespace concurrency;
using namespace concurrency::graphics;

void declareTextures() {

    // Create a 16-texel texture of int. 
    texture<int, 1> intTexture1(16);  
    texture<int, 1> intTexture2(extent<1>(16)); 

    // Create a 16 x 32 texture of float_2.  
    texture<float_2, 2> floatTexture1(16, 32);  
    texture<float_2, 2> floatTexture2(extent<2>(16, 32));   

    // Create a 2 x 4 x 8 texture of uint_4. 
    texture<uint_4, 3> uintTexture1(2, 4, 8);  
    texture<uint_4, 3> uintTexture2(extent<3>(2, 4, 8));
}

您也可以使用建構子來宣告並初始化一個 texture 物件。下面的範例程式碼透過一個 float_4 物件的向量來具現化 texture 物件。每一個純量元素的位元都設為預設。您無法將此建構子與 norm 、 unorm ，或 norm 和 unorm 的短向量一起使用，因為它們沒有每個純量元素的位元的預設值。

#include <amp.h>
#include <amp_graphics.h>
#include <vector>
using namespace concurrency;
using namespace concurrency::graphics;

void initializeTexture() {

    std::vector<int_4> texels;
    for (int i = 0; i < 768 * 1024; i++) {
        int_4 i4(i, i, i, i);
        texels.push_back(i4);
    }
    
texture<int_4, 2> aTexture(768, 1024, texels.begin(), texels.end());
}

您也可以透過以一指向來源資料的指標，以位元組為單位的來源資料大小，以及每個純量元素的位元為參數的重載建構子來宣告並初始化一個 texture 物件。

void createTextureWithBPC() {
    // Create the source data.
    float source[1024 * 2]; 
    for (int i = 0; i < 1024 * 2; i++) {
        source[i] = (float)i;
    }

    // Initialize the texture by using the size of source in bytes
    // and bits per scalar element.
    texture<float_2, 1> floatTexture(1024, source, (unsigned int)sizeof(source), 32U); 
}

這些範例中的 texture 均是建立在預設的 accelerator 的預設 view 上。如果您想要指定一個 accelerator_view 物件，您也可以使用建構子的其他重載版本您無法將一個 texture 物件建立在 CPU accelerator 上。

如下表所示， texture 物件的每個維度大小都有限制。如果您超過了限制將會導致執行期錯誤。

紋理	大小限制
texture<T,1>	16384
texture<T,2>	16384
texture<T,2>	2048

從 Texture 物件中讀取

您可以透過 texture::operator[] 運算子、 texture::operator() 運算子、或 texture::get 方法來讀取 texture 物件。 texture::operator[] 運算子和 texture::operator() 運算子傳回的是值，而不是參考。因此，您無法透過texture::operator[] 運算子來寫入 texture 物件。

void readTexture() {
    std::vector<int_2> src;    
    for (int i = 0; i < 16 *32; i++) {
        int_2 i2(i, i);
        src.push_back(i2);
    }

    std::vector<int_2> dst(16 * 32);  
    array_view<int_2, 2> arr(16, 32, dst);  
    arr.discard_data(); 

    const texture<int_2, 2> tex9(16, 32, src.begin(), src.end());  
    parallel_for_each(tex9.extent, [=, &tex9] (index<2> idx) restrict(amp) {          
        // Use the subscript operator.      
        arr[idx].x += tex9[idx].x; 
        // Use the function () operator.      
        arr[idx].x += tex9(idx).x; 
        // Use the get method.
        arr[idx].y += tex9.get(idx).y; 
        // Use the function () operator.  
        arr[idx].y += tex9(idx[0], idx[1]).y; 
    });  

    arr.synchronize();
}

下面的範例程式碼展吾如果將 texture channel 存入一個短向量，然後將各別的純量元素做為短向量的屬性來存取。

void UseBitsPerScalarElement() {
    // Create the image data. 
    // Each unsigned int (32-bit) represents four 8-bit scalar elements(r,g,b,a values).
    const int image_height = 16;
    const int image_width = 16;
    std::vector<unsigned int> image(image_height * image_width);

    extent<2> image_extent(image_height, image_width);

    // By using uint_4 and 8 bits per channel, each 8-bit channel in the data source is 
    // stored in one 32-bit component of a uint_4.
    texture<uint_4, 2> image_texture(image_extent, image.data(), image_extent.size() * 4U,  8U);

    // Use can access the RGBA values of the source data by using swizzling expressions of the uint_4.
    parallel_for_each(image_extent,  
         [&image_texture](index<2> idx) restrict(amp) 
    { 
        // 4 bytes are automatically extracted when reading.
        uint_4 color = image_texture[idx]; 
        unsigned int r = color.r; 
        unsigned int g = color.g; 
        unsigned int b = color.b; 
        unsigned int a = color.a; 
    });
}

下表列出每個短向量類別中，每個 channel 的有效位元數。

Texture 資料型別	每一個純量元素的有效位元
int 、 int_2 、 int_4 uint 、 uint_2 、 uint_4	8, 16, 32
float 、 float_2 、 float_4	16, 32
double 、 double_2	64
norm 、 norm_2 、 norm_4 unorm 、 unorm_2 、 unorm_4	8, 16

寫入 Texture 物件

使用 texture::set 方法寫入 texture 物件。 texture 物件可以是唯讀或可讀寫的。對於一個 texture 物件是否可讀和可寫，以下狀況必須為真：

T 只有一個純量的元件。 (不允許短向量。)
T 不是 double 、 norm 或 unorm 。
texture::bits_per_scalar_element 屬性為 32 。

如果這三個不成立，則 texture 物件是唯讀的。前兩個條件會在編譯期間檢查。如果您嘗試寫入 readonly texture 物件的程式碼，會產生編譯錯誤。 texture::bits_per_scalar_element 的狀況會在執行期間偵測，而如果您嘗試寫入唯讀的 texture 物件，則會產生 unsupported_feature 例外狀況。

下列程式碼範例寫入值到 texture 物件。

void writeTexture() {
    texture<int, 1> tex1(16); 
    parallel_for_each(tex1.extent, [&tex1] (index<1> idx) restrict(amp) {    
        tex1.set(idx, 0); 
    });

}

使用 writeonly_texture_view 物件

writeonly_texture_view 提供 texture 物件唯寫的 view 。必須由 lambda 運算式裏的值才能擷取 writeonly_texture_view 物件。下列範例程式碼使用了一個 writeonly_texture_view 物件來寫入一個擁有兩個元件 (int_2) 的 texture 物件。

void write2ComponentTexture() {
    texture<int_2, 1> tex4(16); 
    writeonly_texture_view<int_2, 1> wo_tv4(tex4); 
    parallel_for_each(extent<1>(16), [=] (index<1> idx) restrict(amp) {   
        wo_tv4.set(idx, int_2(1, 1)); 
    });
}

複製 Texture 物件

如以下範例程式碼所示，您可以透過 copy 函式或 copy_async 函式在 texture 物件之間執行複製。

void copyHostArrayToTexture() {
    // Copy from source array to texture object by using the copy function.
    float floatSource[1024 * 2]; 
    for (int i = 0; i < 1024 * 2; i++) {
        floatSource[i] = (float)i;
}
    texture<float_2, 1> floatTexture(1024);
    copy(floatSource, (unsigned int)sizeof(floatSource), floatTexture); 

    // Copy from source array to texture object by using the copy function.
    char charSource[16 * 16]; 
    for (int i = 0; i < 16 * 16; i++) {
        charSource[i] = (char)i;
    }
    texture<int, 2> charTexture(16, 16, 8U);
    copy(charSource, (unsigned int)sizeof(charSource), charTexture); 
    // Copy from texture object to source array by using the copy function.
    copy(charTexture, charSource, (unsigned int)sizeof(charSource)); 
}

您也可以使用 texture::copy_to 方法，從一個 texture 複製到另一個。兩個 texture 可以在不同的 accelerator_views 。當您複製到一個 writeonly_texture_view 物件，資料會被複製到底下的 texture 物件。來源和目的的 texture 物件的每純量元件位元數和範圍必須要一致。如果未能符合要求，則會擲回例外狀況。

互通性

C++ AMP 執行階段可以在 texture<T,1> 和 ID3D11Texture1D interface 之間， texture<T,2> 和 ID3D11Texture2D interface 之間，與 texture<T,3> 和 ID3D11Texture3D interface 之間的相互支援。 get_texture 方法接受texture 物件，並傳回IUnknown 介面。 make_texture 方法接受IUnknown 介面和 accelerator_view 物件，並傳回texture 物件。