Share via


APS Blocked Partition Switch

 

In SQL Server, when you perform a partition switch, a schema lock is acquired briefly to do the operation. This can get blocked by read operations that have a schema-s lock on the table. APS works a little differently as a lot of the locking is controlled within the PDW code before the request even gets sent to the SQL Server. The same result happens, though, and you cannot always use the same techniques as you would in SQL Server to mitigate it (blocking detection, managed lock priority, etc). You need to look for the process to be in a queued status within PDW. You can see this easily in the sys.dm_pdw_lock_waits DMV.

To show demonstrate, I created a partitioned table from the FactInternetSales table in AdventureworksPDW2012 as well as an empty partition aligned table to swap partitions:
[sql]
CREATE TABLE FactInternetSale_PartitionSwapTest
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = HASH(ProductKey),
PARTITION
(
OrderDateKey RANGE RIGHT FOR VALUES
(
20000101,20010101,20020101,20030101,20040101,20050101,20060101,20070101,20080101,20090101,
20100101,20110101,20120101,20130101,20140101,20150101,20160101,20170101,20180101,20190101,
20200101,20210101,20220101,20230101,20240101,20250101,20260101,20270101,20280101,20290101
)
)
)
AS SELECT *
FROM FactInternetSales;

--create shadw table for partition swap

CREATE TABLE [dbo].[FactInternetSale_PartitionSwapTest_AUX] (
[ProductKey] int NOT NULL,
[OrderDateKey] int NOT NULL,
[DueDateKey] int NOT NULL,
[ShipDateKey] int NOT NULL,
[CustomerKey] int NOT NULL,
[PromotionKey] int NOT NULL,
[CurrencyKey] int NOT NULL,
[SalesTerritoryKey] int NOT NULL,
[SalesOrderNumber] nvarchar(20) COLLATE Latin1_General_100_CI_AS_KS_WS NOT NULL,
[SalesOrderLineNumber] tinyint NOT NULL,
[RevisionNumber] tinyint NOT NULL,
[OrderQuantity] smallint NOT NULL,
[UnitPrice] money NOT NULL,
[ExtendedAmount] money NOT NULL,
[UnitPriceDiscountPct] float NOT NULL,
[DiscountAmount] float NOT NULL,
[ProductStandardCost] money NOT NULL,
[TotalProductCost] money NOT NULL,
[SalesAmount] money NOT NULL,
[TaxAmt] money NOT NULL,
[Freight] money NOT NULL,
[CarrierTrackingNumber] nvarchar(25) COLLATE Latin1_General_100_CI_AS_KS_WS NULL,
[CustomerPONumber] nvarchar(25) COLLATE Latin1_General_100_CI_AS_KS_WS NULL
)
WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH([ProductKey]), PARTITION ([OrderDateKey] RANGE RIGHT FOR VALUES (20000101, 20010101, 20020101,
20030101, 20040101, 20050101, 20060101, 20070101, 20080101, 20090101, 20100101, 20110101, 20120101, 20130101, 20140101, 20150101, 20160101, 20170101,
20180101, 20190101, 20200101, 20210101, 20220101, 20230101, 20240101, 20250101, 20260101, 20270101, 20280101, 20290101)));
[/sql]

To create the blocking situation, in another session, I ran the following to keep the select active:
[sql]
BEGIN TRANSACTION
SELECT * FROM FactInternetSale_PartitionSwapTest
[/sql]

Then, in another session, run the ALTER TABLE Statement. It will just run indefinitely:
[sql]
ALTER TABLE FactInternetSale_PartitionSwapTest SWITCH PARTITION 26 to FactInternetSale_PartitionSwapTest_AUX PARTITION 26
[/sql]

You can see that it is in a queue'd status by querying the sys.dm_pdw_waits DMV:
[sql]
select * from sys.dm_pdw_waits where state='queued'
[/sql]

 

How can I tell what is causing it to be queued? You can join back to the DMV in order to get the blocker and waiter information. In this query I also filtered on the waiter containing SWITCH within the command to capture only queued sessions that are performing partition switch operations:

[sql]
select
Q.object_name ObjectName,
Q.session_id as QueuedSession,
Q.request_id QueuedQID,
datediff(ms,Q.request_time, getdate()) as WaitTimeMS,
QR.command as QueuedCommand,
B.session_id as BlockerSession,
B.request_id BlockerQID,
B.type as BlockerLockType ,
BR.Total_elapsed_time,
BR.Start_time,
BR.End_time,
BR.command as BlockerCommand
from sys.dm_pdw_lock_waits Q
inner join sys.dm_pdw_lock_waits B
on Q.object_name=B.object_name
inner join sys.dm_pdw_exec_requests QR
on Q.request_id = QR.request_id
inner join sys.dm_pdw_exec_requests BR
on B.request_id = BR.request_id
where Q.State='Queued' and B.State='Granted' and Q.Type='Exclusive' and QR.command like '%SWITCH%'
[/sql]

 

 

Now, if you know that you want to perform some action, you can programmatically do that. In this case, I am going to automatically kill any sessions that have been running over 5 minutes that are blocking my partition swap:

[sql]
CREATE TABLE #BlockedXLocks
WITH
(DISTRIBUTION=ROUND_ROBIN,
LOCATION=USER_DB)
AS
select
Q.object_name ObjectName,
Q.session_id as QueuedSession,
Q.request_id QueuedQID,
datediff(ms,Q.request_time, getdate()) as WaitTimeMS,
QR.command as QueuedCommand,
B.session_id as BlockerSession,
B.request_id BlockerQID,
B.type as BlockerLockType ,
BR.Total_elapsed_time,
BR.Start_time,
BR.End_time,
BR.command as BlockerCommand
from sys.dm_pdw_lock_waits Q
inner join sys.dm_pdw_lock_waits B
on Q.object_name=B.object_name
inner join sys.dm_pdw_exec_requests QR
on Q.request_id = QR.request_id
inner join sys.dm_pdw_exec_requests BR
on B.request_id = BR.request_id
where Q.State='Queued' and B.State='Granted' and Q.Type='Exclusive' and QR.command like '%SWITCH%'
--AND CRITERIA TO KILL - WAIT Time, etc)
--and BR.Total_elapsed_time > 300000 -- the select has been running for more than 5 min BUT this does NOT work in a case of a transaction
and datediff(mi, BR.Start_Time, getdate()) >=5

WHILE ((SELECT count(1) from #BlockedXLocks) > 0)
BEGIN
DECLARE @BlockerSID nvarchar(10) = (SELECT top 1 BlockerSession FROM #BlockedXLocks)
DECLARE @sql_code nvarchar(1000) = ( 'kill ''' + @BlockerSID + '''' )
--print @sql_code
EXEC sp_executeSQL @sql_code

DELETE FROM #BlockedXLocks where BlockerSession = @BlockerSID

END
[/sql]

 

Now you have a way to keep long running reports from blocking your load processes that are trying to switch in new partitions into your fact table. Typically, the threshold would be longer than 5 minutes, or you may want to filter on a certain user account (non service account possibly), etc. There are a ton of options to help you decide what is a session that should be killed and what shouldn't.