DML_QUANTIZED_LINEAR_MATRIX_MULTIPLY_OPERATOR_DESC structure (directml.h)

Article
08/22/2024

Performs a matrix multiplication function on quantized data. This operator is mathematically equivalent to dequantizing the inputs, then performing matrix multiply, and then quantizing the output.

This operator requires the matrix multiply input tensors to be 4D which are formatted as { BatchCount, ChannelCount, Height, Width }. The matrix multiply operator will perform BatchCount * ChannelCount number of independent matrix multiplications.

For example, if ATensor has Sizes of { BatchCount, ChannelCount, M, K }, and BTensor has Sizes of { BatchCount, ChannelCount, K, N }, and OutputTensor has Sizes of { BatchCount, ChannelCount, M, N }, then the matrix multiply operator will perform BatchCount * ChannelCount independent matrix multiplications of dimensions {M,K} x {K,N} = {M,N}.

Dequantize function

f(Input, Scale, ZeroPoint) = (Input - ZeroPoint) * Scale

Quantize function

f(Input, Scale, ZeroPoint) = clamp(round(Input / Scale) + ZeroPoint, Min, Max)

Syntax

struct DML_QUANTIZED_LINEAR_MATRIX_MULTIPLY_OPERATOR_DESC {
  const DML_TENSOR_DESC *ATensor;
  const DML_TENSOR_DESC *AScaleTensor;
  const DML_TENSOR_DESC *AZeroPointTensor;
  const DML_TENSOR_DESC *BTensor;
  const DML_TENSOR_DESC *BScaleTensor;
  const DML_TENSOR_DESC *BZeroPointTensor;
  const DML_TENSOR_DESC *OutputScaleTensor;
  const DML_TENSOR_DESC *OutputZeroPointTensor;
  const DML_TENSOR_DESC *OutputTensor;
};

Members

ATensor

Type: const DML_TENSOR_DESC*

A tensor containing the A data. This tensor's dimensions should be { BatchCount, ChannelCount, M, K }.

AScaleTensor

Type: const DML_TENSOR_DESC*

A tensor containing the ATensor scale data. The expected dimensions of the AScaleTensor are { 1, 1, 1, 1 } if per tensor quantization is required, or { 1, 1, M, 1 } if per row quantization is required. These scale values are used for dequantizing the A values.

Note

A scale value of 0 results in undefined behavior.

AZeroPointTensor

Type: _Maybenull_ const DML_TENSOR_DESC*

An optional tensor containing the ATensor zero point data. The expected dimensions of the AZeroPointTensor are { 1, 1, 1, 1 } if per tensor quantization is required, or { 1, 1, M, 1 } if per row quantization is required. These zero point values are used for dequantizing the ATensor values.

BTensor

Type: const DML_TENSOR_DESC*

A tensor containing the B data. This tensor's dimensions should be { BatchCount, ChannelCount, K, N }.

BScaleTensor

Type: const DML_TENSOR_DESC*

A tensor containing the BTensor scale data. The expected dimensions of the BScaleTensor are { 1, 1, 1, 1 } if per tensor quantization is required, or { 1, 1, 1, N } if per column quantization is required. These scale values are used for dequantizing the BTensor values.

Note

A scale value of 0 results in undefined behavior.

BZeroPointTensor

Type: _Maybenull_ const DML_TENSOR_DESC*

An optional tensor containing the BTensor zero point data. The expected dimensions of the BZeroPointTensor are { 1, 1, 1, 1 } if per tensor quantization is required, or { 1, 1, 1, N } if per column quantization is required. These zero point values are used for dequantizing the BTensor values.

OutputScaleTensor

Type: const DML_TENSOR_DESC*

A tensor containing the OutputTensor scale data. The expected dimensions of the OutputScaleTensor are { 1, 1, 1, 1 } if per-tensor quantization is required, or { 1, 1, M, 1 } if per-row quantization is required. This scale value is used for dequantizing the OutputTensor values.

Note

A scale value of 0 results in undefined behavior.

OutputZeroPointTensor

Type: _Maybenull_ const DML_TENSOR_DESC*

An optional tensor containing the OutputTensor zero point data. The expected dimensions of the OutputZeroPointTensor are { 1, 1, 1, 1 } if per-tensor quantization is required, or { 1, 1, M, 1 } if per-row quantization is required. This zero point value is used for dequantizing the OutputTensor values.

OutputTensor

Type: const DML_TENSOR_DESC*

A tensor to write the results to. This tensor's dimensions are { BatchCount, ChannelCount, M, N }.

Availability

This operator was introduced in DML_FEATURE_LEVEL_2_1.

Tensor constraints

AScaleTensor, AZeroPointTensor, BScaleTensor, BZeroPointTensor, OutputScaleTensor, and OutputZeroPointTensor must have the same DimensionCount.
ATensor, BTensor, and OutputTensor must have the same DimensionCount.
BTensor and BZeroPointTensor must have the same DataType.
OutputTensor and OutputZeroPointTensor must have the same DataType.
AScaleTensor, AZeroPointTensor, BScaleTensor, BZeroPointTensor, OutputScaleTensor, and OutputZeroPointTensor must have the same DimensionCount.
ATensor and AZeroPointTensor must have the same DataType.

Tensor support

DML_FEATURE_LEVEL_4_0 and above

Tensor	Kind	Supported dimension counts	Supported data types
ATensor	Input	2 to 4	INT8, UINT8
AScaleTensor	Input	1 to 4	FLOAT32
AZeroPointTensor	Optional input	1 to 4	INT8, UINT8
BTensor	Input	2 to 4	INT8, UINT8
BScaleTensor	Input	1 to 4	FLOAT32
BZeroPointTensor	Optional input	1 to 4	INT8, UINT8
OutputScaleTensor	Input	1 to 4	FLOAT32
OutputZeroPointTensor	Optional input	1 to 4	INT8, UINT8
OutputTensor	Output	2 to 4	INT8, UINT8

DML_FEATURE_LEVEL_2_1 and above

Tensor	Kind	Supported dimension counts	Supported data types
ATensor	Input	4	INT8, UINT8
AScaleTensor	Input	4	FLOAT32
AZeroPointTensor	Optional input	4	INT8, UINT8
BTensor	Input	4	INT8, UINT8
BScaleTensor	Input	4	FLOAT32
BZeroPointTensor	Optional input	4	INT8, UINT8
OutputScaleTensor	Input	4	FLOAT32
OutputZeroPointTensor	Optional input	4	INT8, UINT8
OutputTensor	Output	4	INT8, UINT8

Requirements

Requirement	Value
Minimum supported client	Windows 10 Build 20348
Minimum supported server	Windows 10 Build 20348
Header	directml.h

Share via

DML_QUANTIZED_LINEAR_MATRIX_MULTIPLY_OPERATOR_DESC structure (directml.h)

Dequantize function

Quantize function

Syntax

Members

Availability

Tensor constraints

Tensor support

DML_FEATURE_LEVEL_4_0 and above

DML_FEATURE_LEVEL_2_1 and above

Requirements

Feedback

Additional resources