Compartir a través de


4 - Perseverance, Secret of All Triumphs: Using the Transient Fault Handling Application Block

patterns & practices Developer Center

On this page: Download:
What Are Transient Faults? | What Is the Transient Fault Handling Application Block? | Historical Note | Using the Transient Fault Handling Application Block | Adding the Transient Fault Handling Application Block to Your Visual Studio Project | Instantiating the Transient Fault Handling Application Block Objects | Defining a Retry Strategy | Defining a Retry Policy | Executing an Operation with a Retry Policy | When Should You Use the Transient Fault Handling Application Block? | You are Using an Azure Service | You are Using Service Bus for Window Server | You Are Using a Custom Service | More Information

Download code

Download PDF

Download Paperback

Note

Important: Recent versions of SDKs for both Azure Storage and Azure Service Bus natively support retries. It is recommended to use these instead of the Transient Fault Handling Application Block.

What Are Transient Faults?

When an application uses a service, errors can occur because of temporary conditions such as intermittent service, infrastructure-level faults, network issues, or explicit throttling by the service; these types of error occur more frequently with cloud-based services, but can also occur in on-premises solutions. If you retry the operation a short time later (maybe only a few milliseconds later) the operation may succeed. These types of error conditions are referred to as transient faults. Transient faults typically occur very infrequently, and in most cases, only a few retries are necessary for the operation to succeed.

Unfortunately, there is no easy way to distinguish transient from non-transient faults; both would most likely result in exceptions being raised in your application. If you retry the operation that causes a non-transient fault (for example a "file not found" error), you most likely get the same exception raised again.


There is no intrinsic way to distinguish between transient and non-transient faults unless the developer of the service explicitly isolated transient faults into a specified subset of exception types or error codes.

For example, with SQL Database, one of the important considerations is how you should handle client connections. This is because SQL Database can use throttling when a client attempts to establish connections to a database or run queries against it. SQL Database throttles the number of database connections for a variety of reasons, such as excessive resource usage, long-running transactions, and possible failover and load balancing actions. This can lead to the termination of existing client sessions or the temporary inability to establish new connections while the transient conditions persist. SQL Database can also drop database connections for a variety of reasons related to network connectivity between the client and the remote data center: quality of network, intermittent network faults in the client's LAN or WAN infrastructure and other transient technical reasons.

Dn440719.note(en-us,PandP.60).gifJana says:
Jana Throttling can occur with Microsoft Azure storage if your client exceeds the scalability targets. For more information, see "Microsoft Azure Storage Abstractions and their Scalability Targets."

What Is the Transient Fault Handling Application Block?

The Transient Fault Handling Application Block makes your application more robust by providing the logic for handling transient faults. It does this in two ways.

First, the block includes logic to identify transient faults for a number of common cloud-based services in the form of detection strategies. These detection strategies contain built-in knowledge that is capable of identifying whether a particular exception is likely to be caused by a transient fault condition.

Dn440719.note(en-us,PandP.60).gifCarlos says:
Carlos Although transient error codes are documented, determining which exceptions are the result of transient faults for a service requires detailed knowledge of and experience using the service. The block encapsulates this kind of knowledge and experience for you.

The block includes detection strategies for the following services:

  • SQL Database
  • Azure Service Bus
  • Azure Storage Service
  • Azure Caching Service

Note

The Azure Storage detection strategy works with version 2 of the Azure Storage Client Library and with earlier versions. It automatically detects which version of the Storage Client Library you are using and adjusts its detection strategy accordingly.
However, if you are using version 2 of the Azure Storage Client Library, it is recommended that you use the built-in retry policies in the Azure Storage Client Library in preference to the retry policies for the Azure Storage Service in the Transient Fault Handling Application Block.
The Azure Caching detection strategy works with both Azure Caching and Azure Shared Caching.
The Azure Service Bus detection strategy works with both Azure Service Bus and Service Bus for Windows Server.

Second, the application block enables you to define your retry strategies so that you can follow a consistent approach to handling transient faults in your applications. The specific retry strategy you use will depend on several factors; for example, how aggressively you want your application to perform retries, and how the service typically behaves when you perform retries. Some services can further throttle or even block client applications that retry too aggressively. A retry strategy defines how many retries you want to make before you decide that the fault is not transient or that you cannot wait for it to be resolved, and what the intervals should be between the retries.

Dn440719.note(en-us,PandP.60).gifJana says:
Jana This kind of retry logic is also known as "conditional retry" logic.

The built-in retry strategies allow you to specify that retries should happen at fixed intervals, at intervals that increase by the same amount each time, and at intervals that increase exponentially but with some random variation. The following table shows examples of all three strategies.

Retry strategy

Example (intervals between retries in seconds)

Fixed interval

2,2,2,2,2,2

Incremental intervals

2,4,6,8,10,12

Random exponential back-off intervals

2, 3.755, 9.176, 14.306, 31.895

Note

All retry strategies specify a maximum number of retries after which the exception from the last attempt is allowed to bubble up to your application.

In many cases, you should use the random exponential back-off strategy to gracefully back off the load on the service. This is especially true if the service is throttling client requests.

Dn440719.note(en-us,PandP.60).gifCarlos says:
Carlos High throughput applications should typically use an exponential back-off strategy. However, for user-facing applications such as websites you may want to consider a linear back-off strategy to maintain the responsiveness of the UI.

You can define your own custom detection strategies if the built-in detection strategies included with the application block do not meet your requirements. The application block also allows you to define your own custom retry strategies that define additional patterns for retry intervals.

Dn440719.note(en-us,PandP.60).gifMarkus says:
Markus In many cases, retrying immediately may succeed without the need to wait. By default, the block performs the first retry immediately before using the retry intervals defined by the strategy.

Figure 1 illustrates how the key elements of the Transient Fault Handling Application Block work together to enable you to add the retry logic to your application.

Figure 1 - The Transient Fault Handling Application Block

Follow link to expand image

A retry policy combines a detection strategy with a retry strategy. You can then use one of the overloaded versions of the ExecuteAction method to wrap the call that your application makes to one of the services.

Dn440719.note(en-us,PandP.60).gifJana says:
Jana You must select the appropriate detection strategy for the service whose method you are calling from your Azure application.

Historical Note

The Transient Fault Handling Application Block is a product of the collaboration between the Microsoft patterns & practices team and the Azure Customer Advisory Team. It is based on the initial detection and retry strategies, and the data access support from the "Transient Fault Handling Framework for SQL Database, Microsoft Azure Storage, Service Bus & Cache." The new application block now includes enhanced configuration support, enhanced support for wrapping asynchronous calls, and provides integration of the application block's retry strategies with the Azure storage retry mechanism. The Transient Fault Handling Application Block supersedes the Transient Fault Handling Framework and is now the recommended approach to handling transient faults in Azure applications.

Using the Transient Fault Handling Application Block

This section describes, at a high-level, how to use the Transient Fault Handling Application Block. It is divided into the following main subsections. The order of these sections reflects the order in which you would typically perform the associated tasks.

  • Adding the Transient Fault Handling Application Block to your Visual Studio Project. This section describes how you can prepare your Visual Studio solution to use the block.
  • Defining a retry strategy. This section describes the ways that you can define a retry strategy in your application.
  • Defining a retry policy. This section describes how you can define a retry policy in your application.
  • Executing an operation with a retry policy. This section describes how to execute actions with a retry policy to handle any transient faults.

Note

A retry policy is the combination of a retry strategy and a detection strategy. You use a retry policy when you execute an operation that may be affected by transient faults.

For detailed information about configuring the Transient Fault Handling Application Block and writing code that uses the Transient Fault Handling Application Block, see the topic "The Transient Fault Handling Application Block" in the Enterprise Library Reference Documentation.

Adding the Transient Fault Handling Application Block to Your Visual Studio Project

As a developer, before you can write any code that uses the Transient Fault Handling Application Block, you must configure your Visual Studio project with all of the necessary assemblies, references, and other resources that you'll need.

Dn440719.note(en-us,PandP.60).gifMarkus says:
Markus NuGet makes it very easy for you to configure your project with all of the prerequisites for using the Transient Fault Handling Application Block.

Instantiating the Transient Fault Handling Application Block Objects

There are two basic approaches to instantiating the objects from the application block that your application requires. In the first approach, you can explicitly instantiate all the objects in code, as shown in the following code snippet:

var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1), 
  TimeSpan.FromSeconds(2));

var retryPolicy =
  new RetryPolicy<SqlDatabaseTransientErrorDetectionStrategy>(retryStrategy);

Note

If you instantiate the RetryPolicy object using new, you cannot use the default strategies defined in the configuration.

In the second approach, you can instantiate and configure the objects from configuration data as shown in the following code snippet:

// Load policies from the configuration file.
// SystemConfigurationSource is defined in 
// Microsoft.Practices.EnterpriseLibrary.Common.
using (var config = new SystemConfigurationSource())
{
  var settings = RetryPolicyConfigurationSettings.GetRetryPolicySettings(config);

  // Initialize the RetryPolicyFactory with a RetryManager built from the 
  // settings in the configuration file.
  RetryPolicyFactory.SetRetryManager(settings.BuildRetryManager());

  var retryPolicy = RetryPolicyFactory.GetRetryPolicy
  <SqlDatabaseTransientErrorDetectionStrategy>("Incremental Retry Strategy");   
   ... 
   // Use the policy to handle the retries of an operation.

}

Defining a Retry Strategy

There are three considerations in defining retry strategies for your application: which retry strategy to use, where to define the retry strategy, and whether to use default retry strategies.

In most cases, you should use one of the built-in retry strategies: fixed interval, incremental, or random exponential back off. You configure each of these strategies using custom sets of parameters to meet your application's requirements; the parameters specify when the strategy should stop retrying an operation, and what the intervals between the retries should be. The choice of retry strategy will be largely determined by the specific requirements of your application and the characteristics of the service invoked. For more details about the parameters for each retry strategy, see the topic "Source Schema for the Transient Fault Handling Application Block" in the Enterprise Library Reference Documentation.

You can define your own custom retry strategy. For more information, see the topic "Implementing a Custom Retry Strategy."

You can define your retry policies either in code or in the application configuration file. Defining your retry policies in code is most appropriate for small applications with a limited number of calls that require retry logic. Defining the retry policies in configuration is more useful if you have a large number of operations that require retry logic, because it makes it easier to maintain and modify the policies.

For more information about how to define your retry strategy in code, see the topic "Specifying Retry Strategies in Code."

For more information about how to define your retry strategies in a configuration file, see the topic "Specifying Retry Strategies in the Configuration."

If you define your retry strategies in the configuration file for the application, you can also define default retry strategies. The block allows you to specify default retry strategies at two levels. You can specify a default retry strategy for each of the following operation categories: SQL connection operations, SQL command operations, Azure Service Bus operations, Azure Caching, and Azure Storage operations. You can also specify a global default retry strategy. For more information, see the topic "Entering Configuration Information."

Defining a Retry Policy

A retry policy is the combination of a retry strategy and a detection strategy that you use when you execute an operation that may be affected by transient faults. The RetryManager class includes methods that enable you to create retry policies by explicitly identifying the retry strategy and detection strategy, or by using default retry strategies defined in the configuration file.

Dn440719.note(en-us,PandP.60).gifMarkus says:
Markus If you are using Azure Storage and you are already using the retry policies mechanism in the Microsoft.WindowsAzure.StorageClient namespace, then you can use retry strategies from the application block and configure the Azure Storage client API to take advantage of the extensible retry functionality provided by the application block.

For more information about using the retry policies, see the topic "Key Scenarios" on MSDN.

For more information about the RetryPolicy delegate in the Microsoft.WindowsAzure.StorageClient namespace, see the blog post "Overview of Retry Policies in the Microsoft Azure Storage Client Library."

Executing an Operation with a Retry Policy

The RetryPolicy class includes several overloaded versions of the ExecuteAction method. You use the ExecuteAction method to wrap the synchronous calls in your application that may be affected by transient faults. The different overloaded versions enable you to wrap the following types of calls to a service.

  • Synchronous calls that return a void.
  • Synchronous calls that return a value.

The RetryPolicy class includes several overloaded versions of the ExecuteAsync method. You use the ExecuteAsync method to wrap the asynchronous calls in your application that may be affected by transient faults. The different overloaded versions enable you to wrap the following types of calls to a service.

  • Asynchronous calls that return a Task.
  • Asynchronous calls that return a Task<T>.

There are also overloaded versions of the ExecuteAsync method that include a cancellation token parameter that enables you to cancel the retry operations on the asynchronous methods after you have invoked them.

The ExecuteAction and ExecuteAsync methods automatically apply the configured retry strategy and detection strategy when they invoke the specified action. If no transient fault manifests itself during the invocation, your application continues as normal, as if there was nothing between your code and the action being invoked. If a transient fault does manifest itself, the block will initiate the recovery by attempting to invoke the specified action multiple times as defined in the retry strategy. As soon as a retry attempt succeeds, your application continues as normal. If the block does not succeed in executing the operation within the number of retries specified by the retry strategy, then the block rethrows the exception to your application. Your application must still handle this exception properly. Sometimes, the operation to retry might be more than just a single method in your object. For example, if you are trying to send a message using Service Bus, you cannot try to resend a failed brokered message in a retry; you must create a new message for each retry and ensure that messages are properly disposed. Typically, you would wrap this behavior in your own method to use with the block, but this may affect the design of your API, especially if you have a component that sends messages with a retry policy on behalf of other code in your application.

Note

The Transient Fault Handling Application Block is not a substitute for proper exception handling. Your application must still handle any exceptions that are thrown by the service you are using. You should consider using the Exception Handling Application Block described in "Chapter 3 - Error Management Made Exceptionally Easy."

Dn440719.note(en-us,PandP.60).gifMarkus says:
Markus You can use the Retrying event to receive notifications in your application about the retry operations that the block performs.

In addition, the application block includes classes that wrap many common SQL Database operations with a retry policy for you. Using these classes minimizes the amount of code you need to write.

Dn440719.note(en-us,PandP.60).gifMarkus says:
Markus If you are working with SQL Database, the application block includes classes that provide direct support for SQL Database, such as the ReliableSqlConnection class. These classes will help you reduce the amount of code you need to write. However, you are then using a specific class from the block instead of the standard ADO.NET class. Also, these classes don’t work with Entity Framework.

For more information about executing an operation with a retry policy, see the topic "Key Scenarios" in the Enterprise Library Reference Documentation.

When Should You Use the Transient Fault Handling Application Block?

This section describes two scenarios in which you should consider using the Transient Fault Handling Application Block in your Azure solution.

You are Using an Azure Service

If your application uses any of the Microsoft Azure services supported by the Transient Fault Handling Application Block (SQL Database, Azure Storage, Azure Caching, Azure Service Bus), then you can make your application more robust by using the application block. Additionally, by using the block you will be following the documented, recommended practices for these services: published information about the error codes from these services is prescriptive and indicates how you should attempt to retry operations. Any Azure application that uses these services may occasionally encounter transient faults with these services. Although you could add your own detection logic to your application, the application block's built-in detection strategies will handle a wider range of transient faults. It is also quicker and easier to use the application block instead of developing your own solution; this is especially true of asynchronous operations, which can appear to be complex.

The Azure Storage Client Library version 2 includes retry policies for Azure Storage Services and it is recommended that you use these built-in policies in preference to the Transient Fault Handling Application Block.

For more information about retries in Azure storage, see "Overview of Retry Policies in the Microsoft Azure Storage Client Library."

You are Using Service Bus for Window Server

If you are using Service Bus for Windows Server in your solution, you can use the Azure Server Bus detection to detect any transient faults when you use the service bus. You can use the Azure Service Bus detection strategy with Service Bus for Windows Server in exactly the same way that you use it with Azure Service Bus.

You Are Using a Custom Service

If your application uses a custom service, it can still benefit from using the Transient Fault Handling Application Block. You can author a custom detection strategy for your service that encapsulates your knowledge of which transient exceptions may result from a service invocation. The Transient Fault Handling Application Block then provides you with the framework for defining retry policies and for wrapping your method calls so that the application block applies the retry logic.

More Information

All links in this book are accessible from the book's online bibliography on MSDN at https://aka.ms/el6biblio.

For detailed information about configuring the Transient Fault Handling Application Block and writing code that uses the Transient Fault Handling Application Block, see the topic "The Transient Fault Handling Application Block" in the Enterprise Library Reference Documentation.

For information about how you can use NuGet to prepare your Visual Studio project to work with the Transient Fault Handling Application Block, see the topic "Adding the Transient Fault Handling Application Block to Your Solution."

For more information about throttling in Azure, see "Microsoft Azure Storage Abstractions and their Scalability Targets" on MSDN.

There is an additional approach that is provided for backward compatibility with the "Transient Fault Handling Application Framework" that uses the RetryPolicyFactory class.

For more details about the parameters for each retry strategy, see the topic "Source Schema for the Transient Fault Handling Application Block" in the Enterprise Library Reference Documentation.

You can define your own custom retry strategy. For more information, see the topic "Implementing a Custom Retry Strategy."

For more information about how to define your retry strategy in code, see the topic "Specifying Retry Strategies in Code."

For more information about how to define your retry strategies in a configuration file, see the topic "Specifying Retry Strategies in the Configuration."

If you define your retry strategies in the configuration file for the application, you can also define default retry strategies and a global default retry strategy. For more information, see the topic "Entering Configuration Information."

For more information about using the retry policies, see the topic "Key Scenarios" on MSDN.

For more information about the RetryPolicy delegate in the Microsoft.WindowsAzure.StorageClient namespace, see the blog post "Overview of Retry Policies in the Microsoft Azure Storage Client Library."

For more information about retries in Azure storage, see "Overview of Retry Policies in the Microsoft Azure Storage Client Library" and "Microsoft Azure Storage Client Library 2.0 Breaking Changes & Migration Guide."

The Transient Fault Handling Application Block is a product of the collaboration between the Microsoft patterns & practices team and the Azure Customer Advisory Team. It is based on the initial detection and retry strategies, and the data access support from the "Transient Fault Handling Framework for SQL Database, Microsoft Azure Storage, Service Bus & Cache" on MSDN.

General Links:

Next Topic | Previous Topic | Home | Community