Circuit Breaker Pattern — [Notes]

5 min readMay 12, 2022

Circuit breaker is a design pattern used in software development. It is used to detect failures and encapsulates the logic of preventing a failure from constantly recurring, during maintenance, temporary external system failure or unexpected system difficulties.

Thread pool unusable

It’s common for software systems to make remote calls whether we use microservice or other architecture. In contrast to in-memory calls, remote calls can fail or hang without a response until some timeout limit is reached and render the thread pool unusable. In the image below, can be found an example scenario.

To make sure there is no impact on the anonymous behaviour of external service, a circuit breaker is used.

Cascade failure

Circuit breaker can take the following approaches when OPEN

Return an error
Return cached/default response (sometimes it’s fine to return a stale response)
call fallback service(internal/external)

count-based sliding window: aggregates the outcome of the last N calls. For instance, if the count window size is 10 and the failure threshold is 50% when the circuit breaker detects 5 failures out of the last 10 calls, it changes from CLOSED to OPEN.
time-based sliding window: aggregates the outcome of the calls of the last N seconds. For instance, if the time window size is 10 seconds and the failure threshold is 50% when the circuit breaker detects 5 failures out of the last 10 seconds calls, it changes from CLOSED to OPEN.
failure rate threshold: The state of the Circuit Breaker changes from CLOSED to OPEN when the failure rate is equal to or greater than a configurable threshold. For instance when more than 50% of the recorded calls have failed.
slow call rate threshold: The circuit breaker changes from CLOSED to OPEN when slow calls are equal to or greater than a configurable threshold. For instance when more than 50% of the recorded calls took longer than 5 seconds. This helps reduce the load on an external system before it is actually unresponsive.
minimum number of calls: The failure rate and slow call rate can only be calculated if a minimum number of calls are recorded. For instance, if the minimum number of required calls is 10, then at least 10 calls must be registered before the failure rate can be calculated. If only 9 calls have been evaluated the circuit breaker will not trip open even if all 9 calls have failed.

resilience4j:
  circuitbreaker:
    instances:
      order:
        failureRateThreshold: 50
        slowCallRateThreshold: 50
        slowCallDurationThreshold: 500ms
        permittedNumberOfCallsInHalfOpenState: 4
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 10
        minimumNumberOfCalls: 4
        waitDurationInOpenState: 30s

We use the COUNT_BASED sliding window whose size is 10 and set failure and slow call rate thresholds to 50.
This means that if 50% per cent of out of the last 10 calls are failure or timeout which is 500 milliseconds, the circuit breaker transition from CLOSED to OPEN.
After 30 seconds duration has elapsed in an OPEN state, the circuit breaker state transitions HALF_OPEN and permits 4 calls.
If these calls are equal to or greater than the threshold, the state changes back to OPEN. Otherwise, the state changes back to CLOSED.
When the circuit breaker trips OPEN, we can make it return a default response instead of throwing an exception. We need to use the fallback mechanism to handle exceptions. In order to implement fallback, we add fallbackMethod to @CircuitBreaker annotation and create a new method with the same name and signature.

The above implementation wraps a protected function call in a circuit breaker object (refer to step 5), which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips and the library calls the fallback function. The above logic assumes an exponential backoff logic is applied to try the protected function again.

Let’s look at step 5 in more detail. Given we are using lambda functions, maintaining a state in memory won’t fit our use case, as it may for the majority of the monolith applications. The circuit state needs to be persisted externally to lambda functions. This state should be accessible in a distributed network architecture, and provide strong consistency. ElasticCache(Redis) or DynamoDB with DAX can be used to maintain a circuit breaker state with minimal impact on performance, given both these stores allow ultra-high speeds.

References/Followup reads →

Circuit Breaker Pattern

In this article we will talk about why we need circuit breaker pattern and how it works. And then, we will implement a…

medium.com

Circuit Breaker Pattern (Design Patterns for Microservices)

In a distributed system we have no idea how other components would fail. Network issues could occur, components could…

medium.com

Circuit Breaker Pattern — [Notes]

References/Followup reads →

Circuit Breaker Pattern

In this article we will talk about why we need circuit breaker pattern and how it works. And then, we will implement a…

Circuit Breaker Pattern (Design Patterns for Microservices)

In a distributed system we have no idea how other components would fail. Network issues could occur, components could…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Tarun Jain

No responses yet