Condividi tramite


Aggregation WITH ROLLUP

In this post, I'm going to discuss how aggregation WITH ROLLUP works.  The WITH ROLLUP clause permits us to execute multiple "levels" of aggregation in a single statement.  For example, suppose we have the following fictitious sales data.  (This is the same data that I used for my series of posts on the PIVOT operator.)

CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)

We can write a simple aggregation query to compute the total sales by year:

SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr

As expected, this query returns three rows - one for each year:

 Yr          Sales
----------- ---------------------
2005        27000.00
2006        44000.00
2007        49000.00

The query plan is a simple stream aggregate:

  |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
       |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1010]=COUNT_BIG([Sales].[Sales]), [Expr1011]=SUM([Sales].[Sales])))
            |--Sort(ORDER BY:([Sales].[Yr] ASC))
                 |--Table Scan(OBJECT:([Sales]))

Now suppose that we want to compute not just the sale by year but the total sales as well.  We could write a UNION ALL query:

SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr
UNION ALL
SELECT NULL, SUM(Sales) AS Sales
FROM Sales

This query works and does give the correct result:

 Yr          Sales
----------- ---------------------
2005        27000.00
2006        44000.00
2007        49000.00
NULL        120000.00

However, the query plan performs two scans and two aggregations (one to compute the sales by year and one to compute the total sales):

  |--Concatenation
       |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1023]=(0) THEN NULL ELSE [Expr1024] END))
       |    |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1023]=COUNT_BIG([Sales].[Sales]), [Expr1024]=SUM([Sales].[Sales])))
       |         |--Sort(ORDER BY:([Sales].[Yr] ASC))
       |              |--Table Scan(OBJECT:([Sales]))
       |--Compute Scalar(DEFINE:([Expr1010]=NULL))
            |--Compute Scalar(DEFINE:([Expr1009]=CASE WHEN [Expr1025]=(0) THEN NULL ELSE [Expr1026] END))
                 |--Stream Aggregate(DEFINE:([Expr1025]=COUNT_BIG([Sales].[Sales]), [Expr1026]=SUM([Sales].[Sales])))
                      |--Table Scan(OBJECT:([Sales]))

We can do better by adding a WITH ROLLUP clause to the original query:

SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP

This query is simpler to write and uses a more efficient query plan with only a single scan:

  |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))
       |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
            |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))
                 |--Sort(ORDER BY:([Sales].[Yr] ASC))
                      |--Table Scan(OBJECT:([Sales]))

The bottom stream aggregate in this query plan is the same as the stream aggregate in the original non-ROLLUP query.  This aggregation is a normal aggregation and, as such, it can be implemented using a stream aggregate (as in this example) or a hash aggregate (try adding an OPTION (HASH GROUP) clause to the above query).  It can also be parallelized.

The top stream aggregate is a special aggregate that computes the ROLLUP.  (Unfortunately, in SQL Server 2005 there is no way to discern from the query plan that this aggregate implements a ROLLUP.  This issue will be fixed in SQL Server 2008 graphical and XML plans.)  A ROLLUP aggregate is always implemented using stream aggregate and cannot be parallelized.  In this simple example, the ROLLUP stream aggregate merely returns each pre-aggregated input row while maintaining a running total of the Sales column.  After outputting the final input row, the aggregate also returns one additional row with the final sum.  Since SQL lacks a concept of an ALL value, the Yr column is set to NULL for this final row.  If NULL is valid value for Yr, we can identify the ROLLUP row using the GROUPING(Yr) construct.

SELECT
      CASE WHEN GROUPING(Yr) = 0
            THEN CAST (Yr AS CHAR(5))
            ELSE 'ALL'
      END AS Yr,
      SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP

 Yr    Sales
----- ---------------------
2005  27000.00
2006  44000.00
2007  49000.00
ALL   120000.00

We can also compute multiple ROLLUP levels in a single query.  For example, suppose that we want to compute the sales first by employee and then for each employee by year:

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP

 EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
1           NULL        55000.00
2           2005        15000.00
2           2006        6000.00
2           NULL        21000.00
3           2006        20000.00
3           2007        24000.00
3           NULL        44000.00
NULL        NULL        120000.00

There are a couple of points worth noting about this query.  First, since the combination of the EmpId and Yr columns is unique, in the absence of the WITH ROLLUP clause, this query would just return the original data.  However, with the WITH ROLLUP clause the query produces a useful result.  Second, the order of the columns in the GROUP BY clause is relevant with the WITH ROLLUP clause.  To see why simply try the same query but reverse the EmpId and Yr columns.  Instead of computing the sales first by employee it will compute the sales first by year.

The query plan for this query is identical to the query plan for the prior query except that it groups on both the EmpId and Yr columns instead of on just the EmpId column.  Like the prior query plan, this query plan includes two stream aggregates: the bottom one which is a normal stream aggregate and the top one which computes the ROLLUP.  This ROLLUP stream aggregate actually computes two running totals: one which computes the total sales for an employee for all years and one which compute the total sales for all employees and all years.  This table shows how the ROLLUP computation proceeds:

EmpId

Yr

SUM(Sales) BY EmpId, Yr

SUM(Sales) BY EmpId

SUM(Sales)

1

2005

12000.00

12000.00

12000.00

1

2006

18000.00

30000.00

30000.00

1

2007

25000.00

55000.00

55000.00

1

NULL

 

55000.00

55000.00

2

2005

15000.00

15000.00

70000.00

2

2006

6000.00

21000.00

76000.00

2

NULL

 

21000.00

76000.00

3

2006

20000.00

20000.00

96000.00

3

2007

24000.00

44000.00

120000.00

3

NULL

 

44000.00

120000.00

NULL

NULL

 

 

120000.00

In my next post, I'll take a look at the WITH CUBE clause.  I'll discuss how it differs from WITH ROLLUP both in terms of function and in terms of its implementation.

Comments

  • Anonymous
    September 27, 2007
    In my last post, I wrote about how aggregation WITH ROLLUP works. In this post, I will discuss how aggregation

  • Anonymous
    September 27, 2007
    Great post - I'm sure many people would consder this "the basics" but it was new to me & VERY useful.Thanks!Mike Knee

  • Anonymous
    October 11, 2007
    In my last two posts, I gave examples of aggregation WITH ROLLUP and CUBE . SQL Server 2008 continues

  • Anonymous
    July 09, 2010
    Very helpful, thanks for writing blog. gr8 work!!! Keep it up

  • Anonymous
    April 19, 2011
    Thanx a lot!!!!

  • Anonymous
    November 21, 2012
    Post really helps to understand the basic use of ROLLUP

  • Anonymous
    June 07, 2013
    Cool article on WITH ROLLUP ..Thanks :)

  • Anonymous
    October 22, 2013
    Very well explained.  It cleared all my doubts. Thanks Craig :)

  • Anonymous
    October 30, 2013
    How can i display the final result as in the column "SUM(Sales) BY EmpId" of above table

  • Anonymous
    October 31, 2013
    Starting with SQL Server 2012, you can use a window aggregate function as follows: SELECT EmpId, Yr, SUM(Sales) OVER (PARTITION BY EmpId ORDER BY Yr) AS Sales FROM Sales HTH Craig

  • Anonymous
    April 07, 2014
    Actually the Aggregate Windows Function SUM(Sales) OVER (PARTITION BY EmpId)   exists since SQL 2005 but without the Order By Waldemar