Condividi tramite


OPTIMIZE FOR UNKNOWN – a little known SQL Server 2008 feature

Using parameterized queries is a well known SQL Server Best Practice. This technique ensures caching and reuse of existing query execution plans (instead of constantly compiling new plans), as well as avoiding SQL injection by mandating that input data be data type safe.

See more about SQL Server parameterization Best Practices here: https://blogs.msdn.com/sqlprogrammability/archive/2007/01/13/6-0-best-programming-practices.aspx

An application that I work with presented me with an interesting dilemma; It wanted to utilize the benefits of plan reuse but the parameter values that the application initially sends to SQL Server are not representative of the values passed in the subsequent re-execution of the statement. SQL Server compiled and cached an optimal plan for the first parameter values. Unfortunately, this had the unintended side-effect of caching a poor execution plan for all subsequent parameter values. To make this clearer let’s look at the following example;

CREATE PROCEDURE AllFromT

@p1 int,

@p2 int

AS

SELECT * FROM t WHERE col1 > @p1 or col2 > @p2 ORDER BY col1

GO

Let’s assume, for simplicities sake, that the table consists of three columns (col1, col2, col3), col1 is unique, col2 has 1000 distinct values, col3 contains a single default value, there are 10,000,000 rows in the table, the clustered index consists of col1, and a nonclustered index exists on col2.

Imagine the query execution plan created for the following initially passed parameters: @P1= 1 @P2=99

These values would result in an optimal queryplan for the statement contained in the stored procedure using the substituted parameters:

SELECT * FROM t WHERE col1 > 1 or col2 > 99 ORDER BY col1

Now, imagine the query execution plan if the initial parameter values were: @P1 = 11890000 and @P2 = 990.

Likewise, an optimal queryplan would be created after substituting the passed parameters:

SELECT * FROM t WHERE col1 > 11890000 or col2 > 990 ORDER BY col1

These two identical parameterized SQL Statements would potentially create and cache very different execution plans due to the difference of the initially passed parameter values. Both queryplans are valid for the parameter values that were passed, but may not be optimal for subsequent invocations of the query. This can be a particular problem if the initially passed parameters are such that the query returns no results.

There are a number of ways to work-around this issue;

· Recompile every time the query is executed using the RECOMPILE hint - This can be very CPU intensive and effectively eliminates the benefits of cached queryplans.

· Unparameterize the query – Not a viable option in most cases due to SQL injection risk.

· Hint with specific parameters using the OPTIMIZE FOR hint (However, what value(s) should the app developer use?) This is a great option if the values in the rows are static, that is; not growing in number, etc. – However in my case the rows were not static.

· Forcing the use of a specific index

· Use a plan guide – Using any of the recommendations above.

SQL Server 2008 provides another alternative: OPTIMIZE FOR UNKNOWN

SQL Server 2008 provides a different alternative; the OPTIMIZE FOR UNKNOWN optimizer hint. This hint directs the query optimizer to use the standard algorithms it has always used if no parameters values had been passed to the query at all. In this case the optimizer will look at all available statistical data to reach a determination of what the values of the local variables used to generate the queryplan should be, instead of looking at the specific parameter values that were passed to the query by the application.

Full documentation of optimizer hints can be found here:

https://msdn.microsoft.com/en-us/library/ms181714(SQL.100).aspx

Example:

@p1=1, @p2=9998,

SELECT * FROM t WHERE col > @p1 or col2 > @p2 ORDER BY col1 OPTION (OPTIMIZE FOR (@p1 UNKNOWN, @p2 UNKNOWN))

In this example, even though the parameters were passed they will not be used to create the queryplan. However, they will of course be used as part of the query execution that will return the appropriate rows.

Using this new optimizer hint option has allowed the ISV to generate queries that can utilize the benefits of parameterization; such as plan reuse, while eliminating the problems caused by the caching of queryplans that were created using non-typical initially passed parameter values.

NOTE: This new optimizer hint option, like all optimizer hints, should be used only by experienced developers and database administrators in cases where SQL Server cannot create an optimal plan.

REPRO:

It’s best to run each step below individually (that is; between the comments). This repro has 1,000,000 rows in the table not 10,000,000 as described above, however the same end-results are demonstrated.

--create the table and the data

--!!!!Please be aware that the data creation

--!!!!may take more than a few minutes

--On my laptop it took approximately 20 minutes to create the data

CREATE TABLE t (col1 int, col2 int, col3 int)

GO

DECLARE @i INT

DECLARE @i2 INT

DECLARE @id INT

DECLARE @id2 INT

SET @i = 1

SET @i2 = 0

SET @id = 0

SET @id2 = 0

WHILE @i < 1001

BEGIN

SET @id2 = @id2 + 1

WHILE @i2 < 1000

BEGIN

SET @id = @id + 1

SET @i2 = @i2 + 1

INSERT INTO t VALUES (@id,@id2,73)

END

SET @i = @i + 1

SET @i2 = 0

END

GO

--create the indexes

CREATE CLUSTERED INDEX clind ON t(col1)

GO

CREATE INDEX ind1 ON t(col2)

GO

--create the stored procedure

CREATE PROCEDURE AllFromT

@p1 int,

@p2 int

AS

SELECT * FROM t WHERE col1 > @p1 or col2 > @p2 ORDER BY col1

GO

--!!!! Make sure you turn on ‘Include actual execution plan’

--!!!! before running the stored procedure

--run the stored procedure with non-typical values

EXEC AllFromT 1670000,1000

--Now look at the actual execution plan

--Now run the stored procedure with more-typical values

EXEC AllFromT 500000,589

--Now look at the actual execution plan,

--it will be the same as above because it is using the cached plan

--Now lets reverse the order, but first lets clear the cache

DBCC FREEPROCCACHE

GO

EXEC AllFromT 500000,589

--Now look at the actual execution plan, it should be very different

--Now run the stored procedure with non-typical values

EXEC AllFromT 1670000,1000

--Now look at the actual execution plan, it should be the same

--create the stored procedure with the OPTIMZE FOR UNKNOWN hint

DROP PROCEDURE AllFromT

GO

CREATE PROCEDURE AllFromT

@p1 int,

@p2 int

AS

SELECT * FROM t WHERE col1 > @p1 or col2 > @p2 ORDER BY col1

OPTION (OPTIMIZE FOR (@p1 UNKNOWN, @p2 UNKNOWN))

GO

--Now, let’s run the stored procedure with non-typical values again

EXEC AllFromT 1670000,1000

--Now look at the actual execution plan it should be a more optimal plan

--for most parameter values based on the data and statistics.

--NOTE: in this example a table scan is the optimal way to get the data

Comments

  • Anonymous
    December 02, 2015
    I just stumbled across this and thought I'd modify it a little.  It should not take 20 minutes to create 1 million rows.  try this, it takes 2 seconds CREATE TABLE t (col1 int, col2 int, col3 int) GO DECLARE @Rows INT = 1000;  -- Ten million rows ;WITH Tally (n) AS (    SELECT TOP (@Rows) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))    FROM sys.all_columns a CROSS JOIN sys.all_columns b ) insert into t(Col1, Col2, Col3) SELECT a.n, b.n , 73 FROM Tally a    cross apply tally b;