Skip to content
reisenberger edited this page Mar 10, 2018 · 12 revisions

Retry with jitter

A well-known retry strategy is exponential backoff, allowing retries to be made initially quickly, but then at progressively longer intervals: for example, after 2, 4, 8, 15, then 30 seconds.

In high-throughput scenarios, it can also be beneficial to add jitter to wait-and-retry strategies, to prevent retries bunching into further spikes of load.

The problem

Consider a given call path experiencing hundreds of calls per second where the underlying system suddenly fails. A fixed-progression exponential backoff can still generate peaks of load.

If the underlying system fails at time t, for example, and you have a fixed-progression backoff strategy generating retries after 2, 4, 8, 15 and 30 second intervals, you can expect to generate further specific jumps in load at t + 2, t + 6, t + 14 etc (or similar; if calls are timing out, real-world timings will be affected by the timeout).

Jitter is a decorrelation strategy which adds randomness to retry intervals to spread out load and thus avoid spikes.

Jitter is not provided as an in-built configuration overload within Polly, as a number of different jitter strategies are possible, and jitter is relatively easy to implement with existing overloads which allow you to dynamically calculate the duration to wait before retrying.

Simple jitter

Jitter can be achieved using any of the .WaitAndRetry(...) overloads which allow you to specify a Func<..., TimeSpan> for calculating the duration of wait before retrying. (That link shows a simple overload, but more complex variants exist, and similar overloads exist for async retry policies.)

A simple jitter can be achieved by adding a randomly-varying extra delay to the wait before retrying:

Random jitterer = new Random(); 
Policy
  .Handle<HttpRequestException>() // etc
  .WaitAndRetry(5,  
      retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt))  // exponential back-off: 2, 4, 8 etc
                    + TimeSpan.FromMilliseconds(jitterer.Next(0, 1000)) // plus some jitter: up to 1 second
  );

Note however that this jitter doesn't scale with increasing retries.

More complex jitter

A comparison of Jitter approaches by the AWS architecture team highlights a number of more complex strategies. These typically combine an element of backoff with also scaling the amount of jitter in relation to the try number or previous delay.

The following code implements the 'Decorrelated Jitter' strategy discussed in that article, for Polly.

The core work is to implement a function which will return a jittered delay based on a maximum number of retries, a seed delay and a max delay:

public static IEnumerable<TimeSpan> DecorrelatedJitter(int maxRetries, TimeSpan seedDelay, TimeSpan maxDelay)
{
    Random jitterer = new Random();
    int retries = 0;

    double seed = seedDelay.TotalMilliseconds;
    double max = maxDelay.TotalMilliseconds;
    double current = seed;

    while (++retries <= maxRetries)
    {
        current = Math.Min(max, Math.Max(seed, current * 3 * jitterer.NextDouble())); // adopting the 'Decorrelated Jitter' formula from https://www.awsarchitectureblog.com/2015/03/backoff.html.  Can be between seed and previous * 3.  Mustn't exceed max.
        yield return TimeSpan.FromMilliseconds(current);
    }
}

A policy using this strategy can be defined as follows:

Policy retryWithDecorrelatedJitter = Policy
    .Handle<WhateverException>()
    .WaitAndRetry(DecorrelatedJitter(maxRetries, seedDelay, maxDelay));

Parameters

The parameters are:

maxRetries: the maximum number of retries that will be made.

seed: should represent the minimum wait-before-any-retry that you want.

  • This is used as a seed in calculations, but note that it does not fix the initial retry delay to this value (a fixed first retry interval would cause a spike).

max: the maximum wait-before-any-retry that you want.

  • max >= 4 * seed works well (and bigger ratios are fine). If seed and max are any closer, values will tend to bunch at the min and max.

How it works

  • The algorithm will produce delay values all of which fall between seed and max.
  • The formula will tend, over several iterations, to produce values with an increasing delay (that is, there is an in-built element of backoff). However, the formula can also (intentionally) produce a shorter delay for one retry than the previous. (The formula current * 3 * jitterer.NextDouble() has a 2-in-3 chance of producing a greater value than previously, and a 1-in-3 chance of lower.)

Working example

A full working example as a Console app, illustrating the typical delay values the strategy generates, can be found here.

Clone this wiki locally