Condividi tramite


Boosting Update Statistics performance with SQL 2014 & SQL 2016

With SQL 2005 end of life, many customers are planning their upgrades to SQL 2014. One of the recommended post upgrade tasks is updating the statistics of all the tables and databases in latest compatibility level. The updated statistics ensures the query optimizer in the latest release has most up to date data to produce the best plans. For VLDB scenarios, running update statistics on the entire database can take considerable time to complete depending on the volume of data. As applications are required to run 24/7 with desired performance, DBAs are increasingly under pressure to keep the maintenance window as low as possible. One such improvement introduced in SQL 2014 SP1 CU6 and SQL 2016 CU1 is targeted towards optimizing and reducing execution times for UPDATE STATISTICS runs is detailed in this blog post.

In scenarios where the execution time of update statistics job is critical, the execution time can be improved by running update statistics for tables in the database in parallel.

Let me illustrate this with an example. Consider a hypothetical scenario where a database has 4 tables viz T1, T2, T3 and T4. Table T1 has 8 statistics in total combining the index and column statistics. Table T2 has 4 statistics, table T3 has 2 statistics and table T4 has 2 statistics. Let's say updating each statistics takes 1 sec each.

In this case, if the statistics is updated serially using a single job, the total time taken to update the statistics for all the tables in the database is (8+4+2+2) = 16 seconds as shown in the figure below

If the statistics is updated in parallel using 4 parallel jobs (One job per table), the overall statistics gets updated for the entire database in 8
seconds as the shown in the figure below

Like any parallel algorithm, you can gain maximum throughput when there is even distribution of workloads across all the execution units. When executing update statistics in parallel like previous example, there is a possibility one table has many statistics and large number of records as table T1 in this case which is refreshed by the same thread or job. The worker thread or job running the update statistics on this table is likely to delay the overall execution while the other threads are idle having completed their workload. This uneven distribution of workload can reduce the overall efficiency and throughput of the UPDATE STATISTICS run. The parallel execution of UPDATE STATISTICS job can be improved by running the update statistics on individual statistics in parallel rather than table to distribute the workload evenly across threads.

In the above scenario, if the update statistics is run in parallel on individual statistics, the statistics can be completed in 4 seconds as shown in the figure below

Until SQL 2014 SP1 CU6, if you try to run the update statistics job in parallel on individual statistics as shown above, the parallel threads updating the statistics on the same table will be blocked due to the (exclusive) X LOCK taken by the first update statistics job on the UPDSTATS metadata resource of the table as shown below. Due to the blocking, the overall execution time of the UPDATE STATISTICS can be higher as shown below.

With SQL 2014 SP1 CU6, a new Trace Flag 7471 has been introduced to address this blocking scenario. Trace Flag 7471 is documented in the KB 3156157. Trace Flag 7471 changes the locking behavior of update statistics command such that SQL Server engine no longer acquires X LOCK on UPDSTATS resource on the table. Rather the engine acquires an (update) U LOCK on each individual statistics object being updated which doesn't block other parallel threads on the same table running on different statistics. This behavior improves the overall concurrency of the update statistics job run executing in parallel and improves its performance.

Our empirical testing shows TF 7471 can increase the possibility of deadlock especially when creation of new statistics and updating of existing statistics are executed simultaneously. So the recommended best practice is to use TF 7471 only as a part of maintenance when update statistics jobs are run in parallel and should be turned off otherwise.

To conclude, starting SQL 2014 SP1 CU6, the performance and execution times of update statistics can be improved by running the job in parallel on individual statistics and enabling Trace flag 7471.

Parikshit Savjani
Senior Program Manager (@talktosavjani)

Comments

  • Anonymous
    May 02, 2016
    Upgrade to 2014? I'm eagerly awaiting 2016 RTM! :)
    • Anonymous
      June 01, 2016
      2016 has arrived, don't cut yourself :)
  • Anonymous
    July 05, 2016
    Leveraging this trace flag may of course weigh heavily on your server - updating eight stats on one large table in parallel will still require eight scans of the table, which seems a little unnecessary. For large tables the default sample size is relatively tiny, so why not scan the table once and extract a single sample (all relevant columns) to then generate all eight statistics from? This approach will scale better to larger tables, and tables with more statistics. Because it's so efficient you will probably allow yourself the luxury of a larger sample size. Customers have been able to do this since 2008R2, and I'm surprised it's not more widely implemented.
  • Anonymous
    December 02, 2016
    Is this feature available in Azure SQL Database
    • Anonymous
      February 27, 2017
      Nope, not available in Azure SQL Database