Cleanup Job
This section provides information about how the change data capture cleanup job works.
Structure of the Cleanup Job
Change data capture uses a retention based cleanup strategy to manage change table size. The cleanup mechanism consists of a SQL Server Agent Transact-SQL job that is created when the first database table is enabled. A single cleanup job handles cleanup for all database change tables and applies the same retention value to all defined capture instances.
The cleanup job is initiated by running the parameterless stored procedure sp_MScdc_cleanup_job. This stored procedure starts by extracting the configured retention and threshold values for the cleanup job from msdb.dbo.cdc_jobs. The retention value is used to compute a new low watermark for the change tables. The specified number of minutes is substracted from the maximum tran_end_time value from the cdc.lsn_time_mapping table to obtain the new low water mark expressed as a datetime value. The CDC.lsn_time_mapping table is then used to convert this datetime value to a corresponding lsn value. If the same commit time is shared by multiple entries in the table, the lsn that corresponds to the entry that has the smallest lsn is chosen as the new low watermark. This lsn value is passed to sp_cdc_cleanup_change_tables to remove change table entries from the database change tables.
Note
The advantage of using the commit time of the recent transaction as the base for computing the new low watermark is that it lets the changes remain in change tables for the specified time. This happens even when the capture process is running behind. All entries that have the same commit time as the current low watermark continue to be represented within the change tables by choosing the smallest lsn that has the shared commit time for the actual low watermark.
When a cleanup is performed, the low watermark for all capture instances is initially updated in a single transaction. It then tries to remove obsolete entries from the change tables and the cdc.lsn_time_mapping table. The configurable threshold value limits how many entries are deleted in any single statement. Failure to perform the delete on any individual table will not prevent the operation from being attempted on the remaining tables.
Cleanup Job Customization
For the cleanup job, the possibility for customization is in the strategy used to determine which change table entries are to be discarded. The only supported strategy in the delivered cleanup job is a time-based one. In that situation, the new low watermark is computed by subtracting the allowed retention period from the commit time of the last transaction processed. Beacuse the underlying cleanup procedures are based on lsn instead of time, any number of strategies can be used to determine the smallest lsn to keep in the change tables. Only some of these are strictly time-based. Knowledge about the clients, for example, could be used to provide a failsafe if downstream processes that require access to the change tables cannot run. Also, although the default strategy applies the same lsn to clean up all the databases’ change tables, the underlying cleanup procedure, can also be called to clean up at the capture instance level.