Generate unique identifiers in a warehouse table in Microsoft Fabric
Applies to: ✅ Warehouse in Microsoft Fabric
It's a common requirement in data warehouses to assign a unique identifier to each row of a table. In SQL Server-based environments that's typically done by creating an identity column in a table, however this feature isn't supported in a warehouse in Microsoft Fabric. Instead, you'll need to use a workaround technique.
Workaround technique
This article describes a workaround technique that generates unique identifiers in a warehouse table.
Step 1: Create a table with an identifier column
First, you should create a table that includes a column that stores unique identifier values. The column data type should be set to either int or bigint, depending on the volume of data you expect to store. You should also define the column as NOT NULL
to ensure that every row is assigned an identifier.
The following code creates an example table named Orders_with_Identifier
in the dbo
schema.
--Drop a table named 'Orders_with_Identifier' in schema 'dbo', if it exists
IF OBJECT_ID('[dbo].[Orders_with_Identifier]', 'U') IS NOT NULL
DROP TABLE [dbo].[Orders_with_Identifier];
GO
CREATE TABLE [dbo].[Orders_with_Identifier] (
[Row_ID] BIGINT NOT NULL,
[O_OrderKey] BIGINT NULL,
[O_CustomerKey] BIGINT NULL,
[O_OrderStatus] VARCHAR(1) NULL,
[O_TotalPrice] DECIMAL(15, 2) NULL,
[O_OrderDate] DATE NULL,
[O_OrderPriority] VARCHAR(15) NULL,
[O_Clerk] VARCHAR (15) NULL,
[O_ShipPriority] INT NULL,
[O_Comment] VARCHAR (79) NULL
);
GO
Step 2: Determine the last identifier value
Before you insert rows into the table, you need to determine the last identifier value stored in the table. You can do that by retrieving the maximum identifier value. This value should be assigned to a variable so you can refer to it when you insert table rows (in the next step).
The following code assigns the last identifier value to a variable named @MaxID
.
--Assign the last identifier value to a variable
--If the table doesn't contain any rows, assign zero to the variable
DECLARE @MaxID AS BIGINT;
IF EXISTS(SELECT * FROM [dbo].[Orders_with_Identifier])
SET @MaxID = (SELECT MAX([Row_ID]) FROM [dbo].[Orders_with_Identifier]);
ELSE
SET @MaxID = 0;
Step 3: Insert unique identifier values
When you insert rows into the table, unique and sequential numbers are computed by adding the value of the @MaxID
variable to the values returned by the ROW_NUMBER function. This function is a window function that computes a sequential row number starting with 1.
The following code—which is run in the same batch as the script in step 2—inserts rows into the Orders_with_Identifier
table. The values for the Row_ID
column are computed by adding the @MaxID
variable to values returned by the ROW_NUMBER
function. The function must have an ORDER BY
clause, which defines the logical order of the rows within the result set. However when it's set to SELECT NULL
, no logical order is imposed, meaning identifier values are arbitrarily assigned. This ORDER BY
clause results in a faster execution time.
--Insert new rows with unique identifiers
INSERT INTO [dbo].[Orders_with_Identifier]
SELECT
@MaxID + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS [Row_ID],
[src].[O_OrderKey],
[src].[O_CustomerKey],
[src].[O_OrderStatus],
[src].[O_TotalPrice],
[src].[O_OrderDate],
[src].[O_OrderPriority],
[src].[O_Clerk],
[src].[O_ShipPriority],
[src].[O_Comment]
FROM [dbo].[Orders] AS [src];