Condividi tramite


Seek Predicates

Before SQL Server can perform an index seek, it must determine whether the keys of the index are suitable for evaluating a predicate in the query.

Single column indexes

Single column indexes are fairly straightforward.  SQL Server can use single column indexes to answer most simple comparisons including equality and inequality (greater than, less than, etc.) comparisons.  More complex expressions such as functions over a column and “like” predicates with a leading wildcard character will generally prevent SQL Server from using an index seek.

For example, suppose we have a single column index on a column “a.”  We can use this index to seek on these predicates:

  • a = 3.14
  • a > 100
  • a between 0 and 99
  • a like ‘abc%’
  • a in (2, 3, 5, 7)

However, we cannot use it to seek on these predicates:

  • ABS(a) = 1
  • a + 1 = 9
  • a like ‘%abc’

Multi-column indexes

Multi-column indexes are slightly more complex.  With a multi-column index, the order of the keys matters.  It determines the sort order of the index and it affects the set of seek predicates that SQL Server can evaluate using the index.

For an easy way to visualize why order matters, think about the phone book.  The phone book is like an index with the keys (last name, first name).  The contents of the phone book are sorted by last name and it is easy to look someone up if we know their last name.  However, if we have only a first name, it is very difficult to get a list of people with that name.  We would need another phone book sorted on first name.

In the same way, if we have an index on two columns, we can only use the index to satisfy a predicate on the second column if we have an equality predicate on the first column.  Even if we cannot use the index to satisfy the predicate on the second column, we may be able to use it on the first column.  In this case, we introduce a residual predicate for the predicate on the second column.  This is the same residual predicate that we use for scans.

For example, suppose we have a two column index on column “a” and “b.”  We can use this index to seek on any of the predicates that worked on the single column index.  We can also use it to seek on these additional predicates:

  • a = 3.14 and b = ‘pi’
  • a = ‘xyzzy’ and b <= 0

For the next set of examples, we can use the index to satisfy the predicate on column a, but not on column b.  In these cases, we need a residual predicate.

  • a > 100 and b > 100
  • a like ‘abc%’ and b = 2

Finally, we cannot use the index to seek on the next set of predicates as we cannot seek even on column a.  In these cases, we must use a different index (e.g., one where column b is the leading column) or we must use a scan with a residual predicate.

  • b = 0
  • a + 1 = 9 and b between 1 and 9
  • a like ‘%abc’ and b in (1, 3, 5)

A slightly more concrete example

Consider the following schema:

create table person (id int, last_name varchar(30), first_name varchar(30))
create unique clustered index person_id on person (id)
create index person_name on person (last_name, first_name)

Here are three queries and their corresponding text showplan.  The first query seeks on both columns of the person_name index.  The second query seeks on the first column only and uses a residual predicate to evaluate first_name.  The third query cannot seek and uses a scan with a residual predicate.

select id from person where last_name = 'Doe' and first_name = 'John'
|--Index Seek(OBJECT:([person].[person_name]), SEEK:([person].[last_name]='Doe' AND [person].[first_name]='John'))

select id from person where last_name > 'Doe' and first_name = 'John'
|--Index Seek(OBJECT:([person].[person_name]), SEEK:([person].[last_name] > 'Doe'), WHERE:([person].[first_name]='John'))

select id from person where last_name like '%oe' and first_name = 'John'
|--Index Scan(OBJECT:([person].[person_name]), WHERE:([person].[first_name]='John' AND [person].[last_name] like '%oe'))

Note:  In case you are trying to reproduce these plans, in this and some of my prior examples, I’ve used index hints (which I haven’t shown) to ensure that I get the plan I want to illustrate without having to insert data into the table.

A bit more about index keys

In most cases, the index keys are the set of columns that you specify in the create index statement.  However, when you create a non-unique non-clustered index on a table with a clustered index, we append the clustered index keys to the non-clustered index keys if they are not explicitly part of the non-clustered index keys.  You can seek on these implicit keys just as if you specified them explicitly.

For example, given this schema:

create table T_heap (a int, b int, c int, d int, e int, f int)
create index T_heap_a on T_heap (a)
create index T_heap_bc on T_heap (b, c)
create index T_heap_d on T_heap (d) include (e)
create unique index T_heap_f on T_heap (f)

 

create table T_clu (a int, b int, c int, d int, e int, f int)
create unique clustered index T_clu_a on T_clu (a)
create index T_clu_b on T_clu (b)
create index T_clu_ac on T_clu (a, c)
create index T_clu_d on T_clu (d) include (e)
create unique index T_clu_f on T_clu (f)

 

The key columns and covered columns for each index are:

Index

Key Columns

Covered Columns

T_heap_a

a

a

T_heap_bc

b, c

b, c

T_heap_d

d

d, e

T_heap_f

f

f

T_clu_a

a

a, b, c, d, e

T_clu_b

b, a

a, b

T_clu_ac

a, c

a, c

T_clu_d

d, a

a, d, e

T_clu_f

f

a, f

Note that each of the non-clustered indexes on T_clu include the clustered index key column a with the exception of T_clu_f which is a unique index.

To be continued one last time …

I’m planning one more post about scans and seeks.  I’ll write a bit more about some of the tradeoffs that SQL Server considers when deciding which index to use and whether to perform a scan, a seek, or a seek with a bookmark lookup.

Comments

  • Anonymous
    July 13, 2006
    The optimizer must choose an appropriate “access path” to read data from each table referenced in a query.&amp;nbsp;...

  • Anonymous
    August 11, 2006
    Craig, the behaviour you describe at the very end of this post differs from my previous understanding of non-clustered index architecture.  Namely, unique non-clustered indexes do not have the clustered index key columns incorporated into the index.  Is this new in SQL Server 2005?  The latest version of BOL does not mention this differentiation between unique and non-unique non-clustered indexes.  The 'uniqueifier' that is generated when the clustered key is non-unique is mentioned though.

  • Anonymous
    August 11, 2006
    There is no change between SQL 2000 and 2005.  Unique non-clustered indexes do NOT include the clustering key as a key column but DO include it as a covered column.  That is, you cannot seek on the clustering key, but you can fetch it from the index.  This is because the clustering key is stored only in the leaf pages of the B-tree; it is not stored in the non-leaf pages.  We need the clustering key to do bookmark lookups.  There is no need to seek on the clustering key since a seek on the unique index key can produce at most one row.  (In SQL 2000, we actually can generate a seek on the clustering key, but this is implemented internally as a residual predicate not as a seek.)

  • Anonymous
    August 24, 2006
    Many thanks for the clarification.  I did some testing after you posted your reply, examining DBCC PAGE dumps of node-level and leaf-level non clustered index pages, and found the row structures to be as you describe.  It makes absolute sense when you think about the difference between a unique and non-unique key.  Revisiting Chapter 8 of Kalen Delaney's Inside SQL Server 2000 book is also a great help on this particular subject.  Keep up the great work on this blog, it's much appreciated.

  • Anonymous
    August 05, 2008
    In my last post , I looked at how SQL Server 2008 handles scans on partitioned tables. I explained that

  • Anonymous
    January 22, 2010
    I was under the impression that using the:    a in (2, 3, 5, 7) form would result in poor use of indexes.  Is this mistaken?   Great article!

  • Anonymous
    January 22, 2010
    "a in (2, 3, 5, 7)" is equivalent to writing "a = 2 or a = 3 or a = 5 or a = 7".  Both syntaxes give the same plan and (if an index seek plan is possible and chosen by the optimizer) will result in four separate index seeks (though all four of these seeks these will generally be implemented using a single index seek operator). Craig

  • Anonymous
    August 11, 2010
    in: select id from person where cola > @a and colb > @b, you get index seek only on cola, even if colb is also part of index. colb > @b is put in residue predicate. To have colb also used in seek predicate, you need to write it as cola = @a or cola > @a and colb > @b, the optimizer can recognise it as a range, this is good. however, if you add more columns to the condition, like colc and cold, the optimizer will fail to recognise it as range, and start to use table scan. That's a bad thing. But I found a way to get around of this drawback, I just split the condition into multiple branches combined by several 'or' operators, then compiler will be able to recognize as a seek with multiple ranges, albeit at the cost of less efficient expression.

  • Anonymous
    August 12, 2010
    basically, the question boils down to how to pass a composite index bookmark to engine, such that the compiler recognize it as  "x > bookmark_of_x" where x represnet a composite index, i.e.having more than two columns.

  • Anonymous
    August 16, 2010
    Unfortunately, SQL Server does not support a syntax for "composite" comparisons.

  • Anonymous
    April 08, 2011
    Nice article.

  • Anonymous
    June 09, 2014
    Great Artcile,It's really simple to understand a very complex operation.