Retry patterns in applications and services

Tomasz Dobrowolski
3 min readApr 22, 2022

When your services or cloud applications are unavailable for some reason (99.999% SLA anyone?) you might consider an approach that the message will be retried if the service comes back online.

For the more common transient failures, the period between retries should be chosen to spread requests from multiple instances of the application as evenly as possible. This reduces the chance of a busy service continuing to be overloaded. If many instances of an application are continually overwhelming a service with retry requests, it’ll take the service longer to recover.

There’s few patterns that help tackle retry problem and you can define them some in metadata without the need of implementation

Retry a number of times — application will repeat a call fixed number of times before failing and throwing an exception and error log

private int retryCount = 3
private readonly TimeSpan delay = TimeSpan.FromSeconds(5);


public async Task OperationWithBasicRetryAsync()
{
int currentRetry = 0;


for (;;)
{
try
{
// Call external service.
await TransientOperationAsync();


// Return or break.
break;
}
catch (Exception ex)
{
Trace.TraceError("Operation Exception");


currentRetry++;
if (currentRetry > this.retryCount || !IsTransient(ex))
{
// If this isn't a transient error or we shouldn't retry,
// rethrow the exception.
throw;
}
}


// Wait to retry the operation.
// Consider calculating an exponential delay here and
// using a strategy best suited for the operation and fault.
await Task.Delay(delay);
}
}


// Async method that wraps a call to a remote service (details not shown).
private async Task TransientOperationAsync()
{
...
};

Retry on schedule — application will repeat the call in timed intervals — for example it will try to invoke method or service every 10 min for another hour. You can use metadata for function apps and use following snippet rescheduling it every 10 sec. Be careful though, if problem is more permanent you might get many and costly calls

[FunctionName("EventHubTrigger")
[FixedDelayRetry(5, "00:00:10")]
public static async Task Run([EventHubTrigger("myHub", Connection = "EventHubConnection")] EventData[] events, ILogger log)
{
// ...
}]

combine that with maximum number of retries and you have solid retry system

Circuit breaker pattern — this is the most complicated and one of the most useful patterns. In order to implement it you need to create a proxy to the original service that acts like a gateway. You then can manage the health of the service and if it’s down there’s no point of repeated calls and you want to fail and throw exception immediately.

n the above pattern theres 3 minimal states that you need to implement the pattern

Open — this denotes that the service is down and the application is shown as an exception immediately, instead of allowing it to retry.

Closed — the state denotes that the service is heal;thy and the application can go ahead and connect to it

Half-Open — when the service is up and running, the state allows a limited number of requests to pass through it. It works like a litmus test that checks wether the requests that pass through are successful.

References

https://docs.microsoft.com/en-us/azure/architecture/patterns/retry

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-error-pages?tabs=csharp

--

--