Condividi tramite


Partition Elimination in SQL Server 2005

Lubor Kollar

 

Partition elimination is very important when SQL Server executes queries against partitioned tables or partitioned views. In general, SQL Server is doing an excellent job of not scanning the partitions that are excluded by some predicates. Recently we have discovered one scenario where the partition elimination does not work against partitioned tables in SQL Server 2005 and this blog describes the conditions leading to the problem as well as easy workarounds. Additionally you will learn how to discover if partition elimination works for your query or not. You will also learn what is static and dynamic partition elimination.

 

The most reliable way to find out if partition elimination happens in your query is to use the SET STATISTICS PROFILE ON command, run the query and investigate the output. But let me start with building our example table:

 

create partition function PF1 (int) as range for values (100,200,300,400);

create partition scheme PS1 as partition PF1 all to ([PRIMARY]);

go

create table t1 (a int, b int) on PS1 (a);

go

declare @i int;

set @i=1;

set nocount on;

while (@i<22)

begin;

insert into t1 values (20*@i, @i);

set @i=@i+1;

end;

go

 

The following query shows distribution of all rows in table t1 across the five partitions:

 

select $partition.PF1(a) [Partition Number], a, b from t1

 

Partition Number a b

1 20 1

1 40 2

1 60 3

1 80 4

1 100 5

2 120 6

2 140 7

2 160 8

2 180 9

2 200 10

3 220 11

3 240 12

3 260 13

3 280 14

3 300 15

4 320 16

4 340 17

4 360 18

4 380 19

4 400 20

5 420 21

 

First, I will show 5 examples how partition elimination works correctly in SQL Server 2005 and I will explain the difference between the static and dynamic partition elimination. Here is a small batch and we will investigate the output later below

 

set statistics profile on;

declare @i1 int;

declare @i2 int;

set @i1=50;

set @i2=250 ;

select * from t1 where a<50 or a>450; -- (Q1) only two partitions are scanned

select * from t1 where a in (50,450); -- (Q2) only two partitions are scanned

select * from t1 where a<@i2 and a>100; -- (Q3) only two partitions are scanned

select * from t1 where a=100;-- (Q4) only one partition is scanned - static partition elimination

select * from t1 where a=@i2; -- (Q5) only one partition is scanned - dynamic partition elimination

set statistics profile off;

You will see the result set followed by the showplan with the columns “Rows” and “Executes” in front of it for each of the four queries above. For the query Q1

select * from t1 where a<50 or a>450

the showplan output is

Rows Executes StmtText

2 1 select * from t1 where a<50 or a>450; -- (Q1) only two partitions

2 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PAR

2 1 |--Constant Scan(VALUES:(((1)),((5))))

2 2 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[d

The scan of the partitioned table becomes a Nested Loops looping over the partitions. And we see already in the Constant Scan that we will be scanning only partitions 1 and 5. The “Executes” value 2 below confirms we did 2 scans of an individual partition.

The IN predicate “a in (50,450)” in the Q2 is turned into “ a = 50 OR a = 450”, and SQL Server will access only the two partitions, 1, and 5, containing all qualifying rows

Rows Executes StmtText

0 1 select * from t1 where a in (50,450); -- (Q2) only two partitions are sc

0 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PAR

2 1 |--Constant Scan(VALUES:(((1)),((5))))

0 2 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[dbo].[t

The second query, Q3,

select * from t1 where a<@i2 and a>100

yields

Rows Executes StmtText

7 1 select * from t1 where a<@i2 and a>100; -- (Q3) only two partitions

7 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PAR

2 1 |--Filter(WHERE:([PtnIds1004]<=RangePartitionNew([@i2],(0)

4 1 | |--Constant Scan(VALUES:(((2)),((3)),((4)),((5))))

7 2 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[d

In the above plan we see that we have statically eliminated the partition 1 from the Constant Scan because of the a>100 predicate and we are using the predicate a<@i2 to potentially eliminate more partitions using the Filter above the Constant Scan. The later is dynamic elimination because it depends on the run t

Comments

  • Anonymous
    March 15, 2006
    The comment has been removed

  • Anonymous
    March 23, 2006
    great article, but what puzzeles me is

    i have manged to create the following repro that demonstrates a partition elimination problem with 'OR'
    and there is NO paramtrization in the queries ,this repro is from our production environment
    and the partitioned table actually contains 100 million rows ,but the query plans are the same even with empty table,the repro works both on RTM and SP1 MARCH CTP

    /****** Object:  PartitionFunction [YearRange]    Script Date: 03/23/2006 16:36:25 /
    CREATE PARTITION FUNCTION YearRange
    AS RANGE RIGHT FOR VALUES (1995, 1996, 1997, 1998, 1999, 2000,
    2001, 2002, 2003, 2004, 2005, 2006, 2007)
    GO
    /
    Object:  PartitionScheme [YEARS_PS_SHR]    Script Date: 03/23/2006 16:36:01 /
    CREATE PARTITION SCHEME [YEARS_PS_SHR]
    AS PARTITION [YearRange]
    ALL TO ([PRIMARY])
    GO

             
    /
    Object:  Table [dbo].[bth_fact]    Script Date: 03/23/2006 16:39:36 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    SET ANSI_PADDING ON
    GO
    CREATE TABLE [dbo].[bth_fact](
    [rowid] [bigint] NOT NULL,
    [teudat_zehut_src] varchar COLLATE Hebrew_CI_AS NOT NULL,
    [teudat_zehut] varchar(10) NOT NULL,
    [kod_beit_ch] [int] NOT NULL,
    [kod_kupa] [tinyint] NOT NULL,
    [kod_peula_bdika] varchar COLLATE Hebrew_CI_AS NOT NULL,
    [kod_peula_old] varchar COLLATE Hebrew_CI_AS NULL,
    [kod_sug_sherut] [tinyint] NOT NULL,
    [kod_machl_mirp] [int] NOT NULL,
    [shnat_chiuv]smallint NOT NULL,
    [chodesh_chiuv] tinyint NOT NULL,
    [rivon_chiuv] tinyint NOT NULL,
    [date_knisa] [smalldatetime] NOT NULL,
    [date_shichrur] [smalldatetime] NULL,
    [kmt_yamim_tipulim] [smallint] NOT NULL,
    [alut] [decimal](11, 2) NOT NULL,
    [tofes17] varchar COLLATE Hebrew_CI_AS NULL,
    [mispar_ishp_bikur] [bigint] NOT NULL,
    [kod_machl_mirp_src] varchar COLLATE Hebrew_CI_AS NULL,
    [zihui_ishpuz_bikur] [bigint] NOT NULL,
    [kod_sug_kabala] [tinyint] NULL,
    [kod_sug_knisa] [tinyint] NULL,
    [kod_sug_shichrur] [tinyint] NOT NULL,
    [kod_sug_chole] char COLLATE Hebrew_CI_AS NOT NULL,
    [kod_sug_cheshbon] [tinyint] NOT NULL,
    [kod_kart_taktziv] [tinyint] NOT NULL,
    [kod_mosad_memamen] [smallint] NOT NULL,
    [gil] [smallint] NOT NULL,
    [kod_machoz_beit_ch] tinyint NOT NULL,
    [kod_seif_tkz_new] [int] NOT NULL,
    [kod_seif_taktzivi] [int] NOT NULL,
    [alut_hanacha] [decimal](11, 2) NOT NULL,
    [mispar_peul_diff] [smallint] NOT NULL,
    [hist_gk] [int] NOT NULL,
    [shem_prati] varchar COLLATE Hebrew_CI_AS NULL,
    [shem_mishpacha] varchar COLLATE Hebrew_CI_AS NULL,
    [kod_kart_tkz_new] [tinyint] NOT NULL,
    [kod_machoz_mevu] tinyint NOT NULL,
    [kod_peula_bdk_src] varchar COLLATE Hebrew_CI_AS NULL,
    [kod_sug_tz] [tinyint] NOT NULL,
    [sw_global] char COLLATE Hebrew_CI_AS NOT NULL,
    [minun] [int] NULL,
    [peula_mb_gk] [int] NOT NULL,
    [kod_bth_simul] [int] NOT NULL,
    [kod_source] [tinyint] NOT NULL,
    [shnat_knisa] [smallint] NOT NULL,
    [kod_machoz_memamen] [tinyint] NOT NULL,
    [kod_makor] [tinyint] NOT NULL,
    CONSTRAINT [PK_bth_fact] PRIMARY KEY NONCLUSTERED
    (
    [shnat_chiuv] ASC,
    [chodesh_chiuv] ASC,
    [rowid] ASC
    )WITH (IGNORE_DUP_KEY = OFF) ON YEARS_PS_SHR
    ) ON YEARS_PS_SHR

    GO
    SET ANSI_PADDING OFF
    GO
    ALTER TABLE [dbo].[bth_fact]  WITH CHECK ADD  CONSTRAINT [CK_bth_fact_KODS]
    CHECK  (([kod_source]=(1)))

    -------------------

    ---QUERY:

    ---1.good plan

    SET STATISTICS PROFILE ON

    select teudat_zehut,alut_hanacha  
    from      
    dbo.bth_fact  bth
    where      
    bth.shnat_chiuv  in (2003,2005)  

    ---showplan outpt

    0 1 select teudat_zehut,alut_hanacha     from        dbo.bth_fact  bth    where          bth.shnat_chiuv  in (2003,2005) 1 1 0 NULL NULL NULL NULL 1 NULL NULL NULL 0.006573417 NULL NULL SELECT 0 NULL
    0 1  |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1003]) PARTITION ID:([PtnIds1003])) 1 2 1 Nested Loops Inner Join OUTER REFERENCES:([PtnIds1003]) PARTITION ID:([PtnIds1003]) NULL 1 0 4.18E-06 25 0.006573417 [bth].[teudat_zehut], [bth].[alut_hanacha] NULL PLAN_ROW 0 1
    2 1       |--Constant Scan(VALUES:(((10)),((12)))) 1 3 2 Constant Scan Constant Scan VALUES:(((10)),((12))) NULL 2 0 2.157E-06 11 2.157E-06 [PtnIds1003] NULL PLAN_ROW 0 1
    0 2       |--Table Scan(OBJECT:([AdventureWorks].[dbo].[bth_fact] AS [bth]), WHERE:([AdventureWorks].[dbo].[bth_fact].[shnat_chiuv] as [bth].[shnat_chiuv]=(2003) OR [AdventureWorks].[dbo].[bth_fact].[shnat_chiuv] as [bth].[shnat_chiuv]=(2005)) PARTITION ID:([PtnIds1003])) 1 6 2 Table Scan Table Scan OBJECT:([AdventureWorks].[dbo].[bth_fact] AS [bth]), WHERE:([AdventureWorks].[dbo].[bth_fact].[shnat_chiuv] as [bth].[shnat_chiuv]=(2003) OR [AdventureWorks].[dbo].[bth_fact].[shnat_chiuv] as [bth].[shnat_chiuv]=(2005)) PARTITION ID:([PtnIds1003]) [bth].[teudat_zehut], [bth].[shnat_chiuv], [bth].[alut_hanacha] 1 0.003125 0.0001581 27 0.0034412 [bth].[teudat_zehut], [bth].[shnat_chiuv], [bth].[alut_hanacha] NULL PLAN_ROW 0 2


    ---2. bad plan


    select teudat_zehut,alut_hanacha
    from      
    dbo.bth_fact  bth
    where
    (bth.shnat_chiuv =2003 and bth.chodesh_chiuv in (9,10,11))
    or (bth.shnat_chiuv = 2005 )  


    ---showplan outpt


    0 1 select teudat_zehut,alut_hanacha   from        dbo.bth_fact  bth    where    (bth.shnat_chiuv =2003 and bth.chodesh_chiuv in (9,10,11))    or (bth.shnat_chiuv = 2005 ) 1 1 0 NULL NULL NULL NULL 1 NULL NULL NULL 0.04802274 NULL NULL SELECT 0 NULL
    0 1  |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1003]) PARTITION ID:([PtnIds1003])) 1 2 1 Nested Loops Inner Join OUTER REFERENCES:([PtnIds1003]) PARTITION ID:([PtnIds1003]) NULL 1 0 4.18E-06 28 0.04802274 [bth].[teudat_zehut], [bth].[alut_hanacha] NULL PLAN_ROW 0 1
    14 1       |--Constant Scan(VALUES:(((1)),((2)),((3)),((4)),((5)),((6)),((7)),((8)),((9)),((10)),((11)),((12)),((13)),((14)))) 1 3 2 Constant Scan Constant Scan VALUES:(((1)),((2)),((3)),((4)),((5)),((6)),((7)),((8)),((9)),((10)),((11)),((12)),((13)),((14))) NULL 14 0 1.4157E-05 11 1.4157E-05 [PtnIds1003] NULL PLAN_ROW 0 1
    0 14       |--Table Scan(OBJECT:([AdventureWorks].[dbo].[bth_fact] AS [bth]), WHERE:([AdventureWorks].[dbo].[bth_fact].[shnat_chiuv] as [bth].[shnat_chiuv]=(2003) AND ([AdventureWorks].[dbo].[bth_fact].[chodesh_chiuv] as [bth].[chodesh_chiuv]=(9) OR [AdventureWorks].[dbo].[bth_fact].[chodesh_chiuv] as [bth].[chodesh_chiuv]=(10) OR [AdventureWorks].[dbo].[bth_fact].[chodesh_chiuv] as [bth].[chodesh_chiuv]=(11)) OR [AdventureWorks].[dbo].[bth_fact].[shnat_chiuv] as [bth].[shnat_chiuv]=(2005)) PARTITION ID:([PtnIds1003])) 1 18 2 Table Scan Table Scan OBJECT:([AdventureWorks].[dbo].[bth_fact] AS [bth]), WHERE:([AdventureWorks].[dbo].[bth_fact].[shnat_chiuv] as [bth].[shnat_chiuv]=(2003) AND ([AdventureWorks].[dbo].[bth_fact].[chodesh_chiuv] as [bth].[chodesh_chiuv]=(9) OR [AdventureWorks].[dbo].[bth_fact].[chodesh_chiuv] as [bth].[chodesh_chiuv]=(10) OR [AdventureWorks].[dbo].[bth_fact].[chodesh_chiuv] as [bth].[chodesh_chiuv]=(11)) OR [AdventureWorks].[dbo].[bth_fact].[shnat_chiuv] as [bth].[shnat_chiuv]=(2005)) PARTITION ID:([PtnIds1003]) [bth].[teudat_zehut], [bth].[alut_hanacha] 1 0.003125 0.0003038857 28 0.0073794 [bth].[teudat_zehut], [bth].[alut_hanacha] NULL PLAN_ROW 0 14



    Any Idea why tis happens  ?
    Danny Ravid
    Ness Technologies
    danny.ravid@ness.com

  • Anonymous
    March 26, 2006
    The comment has been removed

  • Anonymous
    March 26, 2006
    Answer to Danny:
    Danny, thanks a lot for all the details. You are correct - this is another case where SQL Server 2005 does not perform the partition elimination. Luckily the same UNION ALL trick described in my blog works again. So please rewrite your quey as follows:
    select teudat_zehut,alut_hanacha
    from      
    dbo.bth_fact  bth
    where
    (bth.shnat_chiuv =2003 and bth.chodesh_chiuv in (9,10,11))
    UNION ALL
    select teudat_zehut,alut_hanacha
    from      
    dbo.bth_fact  bth
    where
    (bth.shnat_chiuv = 2005 )  

  • Anonymous
    November 27, 2006
    In this post, I’m going to take a look at how query plans involving partitioned tables work. Note that

  • Anonymous
    February 07, 2007
    Hi, Here in Brazil I´m working on MS Gold Certified Partner, and I act in two distinct teams - Data...

  • Anonymous
    April 18, 2007
    Hi, Great aritical, though do you happen to know why the optimizer it NOT grabbing the correct partition when a @varible is used verses a direct value. funny thing is the plan doesn't change between estimated verses actual( you would think it would pick up the correct partition after the exection) in addtional if this actually by design out can you verify that the optimizer really is pulling the correct partition.

  • Anonymous
    April 18, 2007
    The comment has been removed

  • Anonymous
    October 03, 2007
    Lubor: declare @i1 int; set statistics profile on; set @i1=120; select * from t1 where a < @i1; Any reason why the the “Executes” value is 2 here, and not 1? 120 is the first value in Partition#2, thus <120 should only be Partition#1... Thanks for the post, it has provided some much needed direction with Partitioned Tables. ....

  • Anonymous
    October 04, 2007
    I'm not sure what partition function you have in mind... In the one I have in the blog the first boundary value is 100 thus precicate <120 must scan the first two partitions. Or post your Partition Function, Schema, and table to see what you exacly mean.

  • Anonymous
    October 04, 2007
    Ok, I see it now. Its the boundary, now the rows which satisfy. Partition Number     a          b 1                           100     5 2                           120     6 I was thinking (errantly) thinking that because there were only rows which satisfied the <120 condition, it would only scan Part1.  But now I see it has to scan Part2 for 101-119 as well (My conditition overlapping with the boundary definition). Thanks for the followup.

  • Anonymous
    October 04, 2007
    I think it would be interested to see the break down of these types of situations: set statistics profile on; declare @holder table ( hValue int ) insert into @holder (hValue) values ( 20 ) insert into @holder (hValue) values ( 40 ) insert into @holder (hValue) values ( 60 ) --Derived Table Example select a, b  from t1 join ( select hValue as HV from @holder h ) as derived1 on t1.a = derived1.HV --EXISTS Example select a, b  from t1 tOne where EXISTS ( select null from @holder h where h.hValue = tOne.a) I'm still using your examples to learn how to fine-tune my eye for partition elimination.

  • Anonymous
    October 04, 2007
    Man, what would be nice is something like: set partitions profile on; --<this does not exist And then it would just show you: Table   PartitionTouches t1      1 Because it gets cumbersome/confusing sometimes. I guess I can dream. But thank you again for the post...I'd be lost/frustrated without it.

  • Anonymous
    January 15, 2008
    Folks, I have VLDB on SQL2000. Our primary table contains slightly less than 1B records, and I am anticipating 450M records per year. We have implemented distributed view on 2000, and it works reasonably well. As we are migrating to SQL2005, I am not certain of the maturity of partitioning to migrate to that architecture, as well as having everything on a single server. My first reaction is to architect a hybrid solution, where I would use partitioning on the local servers, and then have a distributed view on top of all the partitions. Any thoughts would be appreciated.

  • Anonymous
    January 25, 2008
    Hi, I'm Jungsun Kim, SQL Server MVP in Korea. This is good article, I think. (of course, too old. ^^) I would like to introduce this article to my buddies. so I just translated to Korean and posted in my blog, "http://blog.naver.com/visualdb/50027176087" Thank you.

  • Anonymous
    January 25, 2008
    Answer to clientserverarch: Your approach combining the partitioned tables and partitioned view will work. Before you implement the solution make sure you know WHY you want to partition. Table partitioning is usually introduced for manageability reasons, ability to load data w/out affecting the table access and ability to delete whole partition very fast. Sometimes users parition to enable index rebuild on a single partition or break down the backup/restore process. You should start with clear idea what you want to use it for and go from there. I would also first test it w/out the partitioned view to make sure that you are getting what you hoped for.

  • Anonymous
    January 25, 2008
    Jungsun Kim, thanks a lot for translating to Korean. It is not late at all and we have many customers in Korea - here you can see some: http://www.microsoft.com/casestudies/search.aspx?Keywords=korea&ProTaxID=1273

  • Anonymous
    January 28, 2008
    LuborK, Thanks for your reply. I am planning to distribute the data using both methods, separating the data out by server (perhaps 1-3 servers a year), and further separate out by month into different partitions. I am concerned that MS is going to remove the distributed view functionality in future MSSQL releases, and then I will be back to consolidating data. Thoughts?

  • Anonymous
    January 28, 2008
    There is currently no plan to remove the support for distributed partitioned views in the future releases. That said I cannot ensure it will be supported forever, but similarly I cannot say it about SELECT statement :-).

  • Anonymous
    June 20, 2008
    Is there a plan to release a patch for this issue for SQL Server 2005 or it will be fixed in SQL Server 2008 only? I hope Microsoft is planning to address this issue for SQL Server 2005 because partition elimination not working is the only reason we are not using partitioning in our environment. Unfortunately we cannot use the workarounds suggested in this article because we do not have control over the structure of ad hoc queries generated in our environment.

  • Anonymous
    June 23, 2008
    Kangana, it would be great if you could let us know what kind of queries you are looking for to eliminate partitions. Also please let us know what are the partitioning columns, clustered indexes. Then I can answer your question.

  • Anonymous
    June 25, 2008
    Thanks for the response Lubor. We have a data warehouse environment. The query in its simplest form is: SELECT ColA from FactTableA WHERE  ColA in (select ColB from DimensionTableB where ColC = somevalue ) FactTableA is partitioned on ColA but partition elimination does not work. ColA and ColB are data type int and the subquery returns list of integers. ColC is data type date. FactTableA has a identity int column as its primary and clustered index key. There is a nonclustered index on ColA. This query is generated through a third-party reporting tool Microstrategy. User inputs value of 'somevalue' in a report, SQL query is generated by the reporting tool and then sent to the SQL Server thus resulting in no control over structure of query.

  • Anonymous
    June 25, 2008
    Sorry, query would be more like: SELECT ColA, AmountA, AmountB, AmountC from FactTableA WHERE  ColA in (select ColB from DimensionTableB where ColC = somevalue )

  • Anonymous
    June 29, 2008
    I'm not sure I understand how is your table decleared (with partitioning, uniquness, indexes, etc.). If ColA is the partitioned column and some other column is unique key, then there must be some other index on the unique column that is not partitioned on ColA. If I ignore the above uncertainity and I only assume that the table is partitioned on ColA then we can use several different plans for this query - some of them visiting all partitions some of them not. The following things would encourage using index lookup into the fact table thus avoiding searching all partitions:

  1. create index on ColA and name the additional columns AmountA, AmountB, AmountC as INCLUDE columns in the index OR
  2. create clustered index on ColA But in both cases optimizer may still not choose the "seeks" if it assumes that there exist many rows qualified by the subquery (select ColB from DimensionTableB where ColC = somevalue )in the fact table.
  • Anonymous
    June 30, 2008
    The comment has been removed