Circuit Breaker Pattern — [Notes]

Tarun Jain
5 min readMay 12, 2022

--

Circuit breaker is a design pattern used in software development. It is used to detect failures and encapsulates the logic of preventing a failure from constantly recurring, during maintenance, temporary external system failure or unexpected system difficulties.

Thread pool unusable

It’s common for software systems to make remote calls whether we use microservice or other architecture. In contrast to in-memory calls, remote calls can fail or hang without a response until some timeout limit is reached and render the thread pool unusable. In the image below, can be found an example scenario.

To make sure there is no impact on the anonymous behaviour of external service, a circuit breaker is used.

Cascade failure

Circuit breaker can take the following approaches when OPEN

  1. Return an error
  2. Return cached/default response (sometimes it’s fine to return a stale response)
  3. call fallback service(internal/external)
  • count-based sliding window: aggregates the outcome of the last N calls. For instance, if the count window size is 10 and the failure threshold is 50% when the circuit breaker detects 5 failures out of the last 10 calls, it changes from CLOSED to OPEN.
  • time-based sliding window: aggregates the outcome of the calls of the last N seconds. For instance, if the time window size is 10 seconds and the failure threshold is 50% when the circuit breaker detects 5 failures out of the last 10 seconds calls, it changes from CLOSED to OPEN.
  • failure rate threshold: The state of the Circuit Breaker changes from CLOSED to OPEN when the failure rate is equal to or greater than a configurable threshold. For instance when more than 50% of the recorded calls have failed.
  • slow call rate threshold: The circuit breaker changes from CLOSED to OPEN when slow calls are equal to or greater than a configurable threshold. For instance when more than 50% of the recorded calls took longer than 5 seconds. This helps reduce the load on an external system before it is actually unresponsive.
  • minimum number of calls: The failure rate and slow call rate can only be calculated if a minimum number of calls are recorded. For instance, if the minimum number of required calls is 10, then at least 10 calls must be registered before the failure rate can be calculated. If only 9 calls have been evaluated the circuit breaker will not trip open even if all 9 calls have failed.
resilience4j:
circuitbreaker:
instances:
order:
failureRateThreshold: 50
slowCallRateThreshold: 50
slowCallDurationThreshold: 500ms
permittedNumberOfCallsInHalfOpenState: 4
slidingWindowType: COUNT_BASED
slidingWindowSize: 10
minimumNumberOfCalls: 4
waitDurationInOpenState: 30s
  • We use the COUNT_BASED sliding window whose size is 10 and set failure and slow call rate thresholds to 50.
  • This means that if 50% per cent of out of the last 10 calls are failure or timeout which is 500 milliseconds, the circuit breaker transition from CLOSED to OPEN.
  • After 30 seconds duration has elapsed in an OPEN state, the circuit breaker state transitions HALF_OPEN and permits 4 calls.
  • If these calls are equal to or greater than the threshold, the state changes back to OPEN. Otherwise, the state changes back to CLOSED.
  • When the circuit breaker trips OPEN, we can make it return a default response instead of throwing an exception. We need to use the fallback mechanism to handle exceptions. In order to implement fallback, we add fallbackMethod to @CircuitBreaker annotation and create a new method with the same name and signature.

The above implementation wraps a protected function call in a circuit breaker object (refer to step 5), which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips and the library calls the fallback function. The above logic assumes an exponential backoff logic is applied to try the protected function again.

Let’s look at step 5 in more detail. Given we are using lambda functions, maintaining a state in memory won’t fit our use case, as it may for the majority of the monolith applications. The circuit state needs to be persisted externally to lambda functions. This state should be accessible in a distributed network architecture, and provide strong consistency. ElasticCache(Redis) or DynamoDB with DAX can be used to maintain a circuit breaker state with minimal impact on performance, given both these stores allow ultra-high speeds.

References/Followup reads →

--

--

No responses yet