Düzenle

Aracılığıyla paylaş


Accelerated database recovery

Applies to: SQL Server 2019 (15.x) and later versions Azure SQL Database Azure SQL Managed Instance SQL database in Microsoft Fabric

Accelerated database recovery (ADR) improves database availability, especially in the presence of long-running transactions, by redesigning the database engine recovery process.

ADR was introduced in SQL Server 2019 (15.x) and improved in SQL Server 2022 (16.x). ADR is also available in Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics (dedicated SQL pool only), and SQL database in Microsoft Fabric.

Note

ADR is always enabled in Azure SQL Database, Azure SQL Managed Instance, and SQL database in Fabric.

This article provides an overview of ADR. To work with ADR, review Manage accelerated database recovery.

For more information about transaction log and database recovery, see SQL Server transaction log architecture and management guide and Restore and recovery overview (SQL Server).

Overview

The primary benefits of ADR are:

  • Fast and consistent database recovery

    Long-running transactions don't affect the overall recovery time, enabling fast and consistent database recovery irrespective of the number of active transactions in the system or their size.

  • Instantaneous transaction rollback

    Transaction rollback is instantaneous, irrespective of the time that the transaction has been active or the number of updates that have been made.

  • Aggressive log truncation

    The transaction log is aggressively truncated, even in the presence of active long-running transactions, which prevents it from growing out of control.

The traditional database recovery process

Without ADR, database recovery follows the ARIES recovery model and consists of three phases, which are illustrated and explained in more detail in the following diagram:

Diagram of current recovery process.

  • Analysis phase

    The database engine performs a forward scan of the transaction log from the beginning of the last successful checkpoint (or the oldest dirty page log sequence number (LSN)) until the end, to determine the state of each transaction at the time the engine stopped.

  • Redo phase

    The database engine performs a forward scan of the transaction log from the oldest uncommitted transaction to the end. This process redoes all committed operations to restore the database to its state at the time of the crash.

  • Undo phase

    For each transaction that was active at the time of the crash, the database engine traverses the log backward, undoing the operations that this transaction performed.

  • With the traditional database recovery process, recovery after a crash can take a long time if a long-running transaction was active.

    The time for the database engine to recover from an unexpected restart is (roughly) proportional to the size of the longest active transaction in the system at the time of the crash. Recovery requires a rollback of all incomplete transactions. The length of time required is proportional to the work that the transaction has performed and the time it has been active.

  • Canceling, or rolling back, a large transaction can take a long time, because it uses the same undo recovery phase as described previously.

  • The database engine can't truncate the transaction log when there are long-running transactions because their corresponding log records are needed for the recovery and rollback processes. As a result, the transaction log can grow very large and consume a lot of storage space.

The accelerated database recovery process

ADR addresses previous issues with the traditional recovery model by completely redesigning the database engine recovery process to:

  • Make the recovery time constant since there is no longer a need to scan the transaction log from the beginning of the oldest active transaction. With ADR, the transaction log is only processed from the last successful checkpoint (or oldest dirty page LSN). As a result, recovery time isn't affected by long-running transactions and is typically instantaneous.

  • Minimize the required transaction log space since there's no longer a need to retain the log for the whole transaction. As a result, the transaction log can be truncated aggressively as checkpoints and backups occur.

At a high level, ADR achieves fast database recovery by versioning all physical database modifications and only undoing nonversioned operations, which are limited and can be undone almost instantly. Any transactions that were active at the time of a crash are marked as aborted and, therefore, any versions generated by these transactions can be ignored by concurrent user queries.

The ADR recovery process has the same three phases as the traditional recovery process. How these phases operate with ADR is illustrated in the following diagram:

Diagram of ADR recovery process.

  • Analysis phase

    The process remains the same as the traditional recovery model with the addition of reconstructing the secondary log stream (SLOG) and copying log records for nonversioned operations.

  • Redo phase

    Broken into two subphases

    • Subphase 1

      Redo from SLOG (oldest uncommitted transaction up to the last checkpoint). Redo is a fast operation as it might only need to process a few records from the SLOG.

    • Subphase 2

      Redo from transaction log starts from last successful checkpoint (instead of oldest uncommitted transaction).

  • Undo phase

    The undo phase with ADR completes almost instantaneously by using SLOG to undo nonversioned operations and persistent version store (PVS) using logical revert to perform row level version-based undo.

For an explanation of accelerated database recovery, watch this eight-minute video:

ADR recovery components

The four key components of ADR are:

  • Persistent version store (PVS)

    The persistent version store (PVS) is a database engine mechanism for persisting row versions in the database itself instead of in the traditional version store in the tempdb database. PVS enables resource isolation and improves availability of readable secondaries.

  • Logical revert

    Logical revert is the asynchronous process responsible for performing row-level version-based undo - providing instant transaction rollback and undo for all versioned operations.

    • Keeps track of all aborted transactions
    • Performs rollback using PVS for all user transactions
    • Releases all locks immediately after transaction abort
  • SLOG

    The SLOG is a secondary in-memory log stream that stores log records for nonversioned operations (such as metadata cache invalidation, lock acquisitions, and so on). The SLOG is:

    • Low volume and in-memory
    • Persisted on disk during the checkpoint process
    • Periodically truncated as transactions commit
    • Accelerates redo and undo by processing only nonversioned operations
    • Enables aggressive transaction log truncation by preserving only the required log records
  • Cleaner

    The cleaner is the asynchronous process that wakes up periodically and cleans obsolete row versions from PVS.

Workloads that benefit from ADR

ADR is particularly beneficial for workloads that have:

  • Long-running transactions.
  • Active transactions that cause the transaction log to grow significantly.
  • Long periods of database unavailability due to long running recovery (such as from unexpected service restart or manual transaction rollback).

Best practices for ADR

  • Avoid unnecessary long-running transactions. Though ADR speeds up database recovery even with long-running transactions, such transactions can delay version cleanup and increase the size of the PVS.

  • Avoid large transactions that include DDL operations. ADR uses the secondary log stream (SLOG) mechanism to track DDL operations used in recovery. SLOG is only used while the transaction is active. SLOG is checkpointed, so avoiding large transactions that use SLOG can help the overall performance. These scenarios can cause the SLOG to take up more space:

    • Many DDLs are executed in one transaction. For example, in one transaction, rapidly creating and dropping temp tables.
    • A table has very large number of partitions/indexes that are modified. For example, a DROP TABLE operation on such table would require a large reservation of SLOG memory, which would delay truncation of the transaction log and delay undo/redo operations. As a workaround, drop the indexes individually and gradually, then drop the table.

    For more information about SLOG, see ADR recovery components.

  • Prevent or reduce unnecessary aborted transactions. A high transaction abort rate puts pressure on the PVS cleaner and lower ADR performance. The aborts might come from a high rate of deadlocks, duplicate keys, constraint violations, or query timeouts. The sys.dm_tran_aborted_transactions DMV shows all aborted transactions on the database engine instance. The nested_abort column indicates that the transaction committed but there are portions that aborted (savepoints or nested transactions) which can also delay the PVS cleanup process.

  • Ensure there's sufficient space in the database to account for PVS usage. If the database doesn't have enough room for PVS to grow, ADR might fail to generate versions, causing DML statements to fail.

  • When ADR is enabled with write-intensive workloads, transaction log generation rate might increase substantially because row versions written to PVS are logged. This might increase the size of transaction log backups.

  • When you use transactional replication, snapshot replication, or change data capture (CDC), the aggressive log truncation behavior of ADR is disabled to allow the log reader to collect changes from the transaction log. Make sure that the transaction log is sufficiently large.

    In Azure SQL Database, you might need to increase your service tier or compute size to ensure that sufficient transaction log space is available for the needs of all your workloads. Similarly, in Azure SQL Managed Instance you might need to increase your instance maximum storage size.

  • For SQL Server, isolate the PVS version store on a filegroup on higher tier storage, such as high-end SSD or advanced SSD or Persistent Memory (PMEM), sometimes referred to as Storage Class Memory (SCM). For more information, see Change the location of the PVS to a different filegroup.

  • For SQL Server, monitor the error log for PreallocatePVS entries. If PreallocatePVS entries are present, you might need to increase the ADR ability to preallocate pages using a background task. Preallocating PVS pages in background improves ADR performance by reducing more expensive foreground PVS allocations. You can use the ADR Preallocation Factor server configuration to increase this amount. For more information, see Server configuration: ADR Preallocation Factor.

  • For SQL Server 2022 (16.x) and later, consider enabling multi-threaded PVS cleanup if the single-threaded cleaner performance is insufficient. For more information, see Server configuration: ADR Cleaner Thread Count.

  • If you observe issues such as high database space usage by PVS or slow PVS cleanup, see Troubleshoot accelerated database recovery.

ADR improvements in SQL Server 2022

There are several improvements to address persistent version store (PVS) storage and improve overall scalability. For more information about new features of SQL Server 2022 (16.x), see What's new in SQL Server 2022.

The same improvements are also available in Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics (dedicated SQL pool only), and SQL database in Microsoft Fabric.

  • User transaction cleanup

    Clean pages that can't be cleaned by the regular process due to lock acquisition failure.

    This feature allows user transactions to run cleanup on pages that couldn't be addressed by the regular cleanup process due to table level lock conflicts. This improvement helps ensure that the ADR cleanup process doesn't fail indefinitely because user workloads can't acquire table level locks.

  • Reduce memory footprint for PVS page tracker

    This improvement tracks PVS pages at the extent level, in order to reduce the memory footprint needed to maintain versioned pages.

  • PVS cleaner improvements

    PVS cleaner has improved version cleanup efficiency to improve how the database engine tracks and records row versions for aborted transactions. This leads to improvements in memory and storage usage.

  • Transaction-level persistent version store (PVS)

    This improvement allows ADR to clean up versions that belong to committed transactions independent of whether there are aborted transactions in the system. With this improvement, PVS pages can be deallocated, even if the cleanup can't complete a successful sweep to trim the aborted transaction map.

    The result of this improvement is reduced PVS growth even if PVS cleanup is slow or fails.

  • Multi-threaded version cleanup

    In SQL Server 2019 (15.x), the cleanup process is single threaded within a database engine instance.

    Beginning with SQL Server 2022 (16.x), multi-threaded version cleanup is supported. This allows multiple databases on the same database engine instance to be cleaned in parallel, or a single database to be cleaned faster. For more information, see Server configuration: ADR Cleaner Thread Count.

    A new extended event, tx_mtvc2_sweep_stats, has been added for monitoring of the ADR PVS multi-threaded version cleaner.