Condividi tramite


Real-Time Operational Analytics: Simple example using nonclustered clustered columnstore index (NCCI) in SQL Server 2016

The previous blog https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/02/29/real-time-operational-analytics-using-nonclustered-columnstore-index/ described how you can use updateable nonclustered columnstore index (NCCI) for real-time operational analytics. Let us now take an example of how NCCI can be used on a regular rowstore table in the context of an Order Management application. The key table for this application is ‘orders’ that tracks customer information, purchase price and the order status. It is expected that over time a large number of rows will have the order status as ‘Order Received’ (i.e. the order has been received by the customer. We don’t expect any more changes to the order once it has been received by the customer except when customer is not happy with the purchase and wants to return it. So for all practical purposes, once the order has been received by the customer, it can be considered that this row is now ‘cold’.   create table orders ( AccountKey int not null, customername nvarchar (50), OrderNumber bigint, PurchasePrice decimal (9,2), OrderStatus smallint not NULL, OrderStatusDesc nvarchar (50))

-- OrderStatusDesc
-- 0 => 'Order Started'
-- 1 => 'Order Closed'
-- 2 => 'Order Paid'
-- 3 => 'Order Fullfillment Wait'
-- 4 => 'Order Shipped'
-- 5 => 'Order Received'

Let us also create a clustered index on ‘OrderStatus’
create clustered index orders_ci on orders(OrderStatus)
Now we load 3 million rows with the data pattern that 95% of the orders have already been received by the customer.
-- insert into the main table load 3 million rows declare @outerloop int = 0 declare @i int = 0 declare @purchaseprice decimal (9,2) declare @customername nvarchar (50) declare @accountkey int declare @orderstatus smallint declare @orderstatusdesc nvarchar(50) declare @ordernumber bigint while (@outerloop < 3000000) begin Select @i = 0 begin tran while (@i < 2000) begin set @ordernumber = @outerloop + @i set @purchaseprice = rand() * 1000.0 set @accountkey = convert (int, RAND ()*1000) set @orderstatus = convert (smallint, RAND()*100) if (@orderstatus >= 5) set @orderstatus = 5

set @orderstatusdesc  =
case @orderstatus
WHEN 0 THEN  'Order Started'
WHEN 1 THEN  'Order Closed'
WHEN 2 THEN  'Order Paid'
WHEN 3 THEN 'Order Fullfillment'
WHEN 4 THEN  'Order Shipped'
WHEN 5 THEN 'Order Received'
END
insert orders values (@accountkey,
(convert(varchar(6), @accountkey) + 'firstname'),@ordernumber, @purchaseprice,@orderstatus, @orderstatusdesc)

set @i += 1;
end
commit

set @outerloop = @outerloop + 2000
set @i = 0
end
go

Now let us create a nonclustered columnstore index as follows. Note, that it is just a DDL operation and similar to any other btree index that you would create on a rowstore table. No changes to the application needed. Unlike in earlier releases of SQL Server, the NCCI is updateable in SQL Server 2016 so your transaction workload will continue to run.
--create NCCI CREATE NONCLUSTERED COLUMNSTORE INDEX orders_ncci ON orders  (accountkey, customername, purchaseprice, orderstatus, orderstatusdesc)

Let us load another 200k orders but these orders are not ‘yet’ received by the customers. Here is the script that I used --insert additional 200k rows declare @outerloop int = 3000000 declare @i int = 0 declare @purchaseprice decimal (9,2) declare @customername nvarchar (50) declare @accountkey int declare @orderstatus smallint declare @orderstatusdesc nvarchar(50) declare @ordernumber bigint while (@outerloop < 3200000) begin Select @i = 0 begin tran while (@i < 2000) begin set @ordernumber = @outerloop + @i set @purchaseprice = rand() * 1000.0 set @accountkey = convert (int, RAND ()*1000) set @orderstatus = convert (smallint, RAND()*5) set @orderstatusdesc = case @orderstatus WHEN 0 THEN 'Order Started' WHEN 1 THEN 'Order Closed' WHEN 2 THEN 'Order Paid' WHEN 3 THEN 'Order Fullfillment' WHEN 4 THEN 'Order Shipped' WHEN 5 THEN 'Order Received' END insert orders values (@accountkey,(convert(varchar(6), @accountkey) + 'firstname'), @ordernumber, @purchaseprice, @orderstatus, @orderstatusdesc) set @i += 1; end commit set @outerloop = @outerloop + 2000 set @i = 0 end go
If we look at the details on the columnstore index, you will see that 3 million rows are compressed while the new 200k rows are in the delta rowgroup
-- look at the rowgroups select object_name(object_id), index_id, row_group_id, delta_store_hobt_id, state_desc, total_rows, trim_reason_desc, transition_to_compressed_state_desc from sys.dm_db_column_store_row_group_physical_stats where object_id = object_id('orders') rowgroup output

We will now run couple of analytics queries and compare the performance if NCCI was not there. These queries are simple and are not representative of the common real-time operational scenario involving multi-table joins but the intent here is to show you that query optimizer picks NCCI for analytics queries and provides significant speed up over traditional btree indexes. Also note, that the difference in performance will only widen as you add more data. Example-1: --run simple query select max (PurchasePrice) from orders
The query plan is as follows showing that query optimizer chose NCCI.

simple query plan

When running the same query without using NCCI, the query took much longer to run (on my machine, it was 12x slower)
-- run the query without using NCCI select max (PurchasePrice) from orders option (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

Example-2: Let us now consider a more complex query
-- a more complex query select top 5 customername, sum (PurchasePrice), Avg (PurchasePrice) from orders where purchaseprice > 90.0 and OrderStatus=5 group by customername
Here is the query plan for the query using NCCI. One thing to note the aggregate got pushed down to the scan node, More on this when we cover columnstore index performance improvements in SQL Server 2016

complex-query-plan

-- a more complex query without NCCI
select top 5 customername, sum (PurchasePrice), Avg (PurchasePrice)
from orders
where purchaseprice > 90.0 and OrderStatus = 5
group by customername
option (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

Here is the run time information for both queries. Note that the query with NCCI ran 25x faster  and took 1/60th CPU resources

complex-query-runtime

Key take aways from this blog are (1) you can enable real-time operational analytics without any changes to your transactional workload (2) SQL Query Optimizer will choose NCCI for analytics queries automatically based on the cost model. In the next blog, I will show you filtered NCCI that minimizes the impact on the OLTP workload but still provides real-time operational analytics

Thanks

Sunil Agarwal