Circuit Breaking In Spring Cloud Gateway With Resilience4J

Circuit Breaking In Spring Cloud Gateway With Resilience4J

In the newest version of Spring Cloud Gateway (2.2.1) we may take an advantage of a new implementation of circuit breaker built on top of project Resilience4J (https://github.com/resilience4j/resilience4j). Resilience4J has been selected as a replacement for Netflix’s Hystrix, which has been moved to maintenance mode. Of course, you can still use Hystrix as circuit breaker implementation, however it is deprecated and probably won’t be available in the future versions of Spring Cloud. A new implementation is called no different than just Spring Cloud Circuit Breaker.
You can find another interesting example of using Spring Cloud Gateway components in one of my previous articles. I have already described how to implement rate limiting based on Redis here: Rate Limiting In Spring Cloud Gateway With Redis. In the current article I’m using the same GitHub repository as earlier: sample-spring-cloud-gateway. I’m going to show some sample scenarios of using Spring Cloud Circuit Breaker with Spring Cloud Gateway including a fallback pattern.

1. Dependencies

To succesfully test some scenarios of using a circuit breaker pattern with Spring Cloud Gateway we need to include a reactive version of Spring Cloud Circuit Breaker since gateway is started on reactive Netty server. We will simulate downstream service using MockServer provided within the Testcontainers framework. It is provisioned inside the test by a mock client written in Java.

<dependency>
   <groupId>org.springframework.cloud</groupId>
   <artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>
<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-cloud-starter-circuitbreaker-reactor-resilience4j</artifactId>
</dependency>
<dependency>
   <groupId>org.projectlombok</groupId>
   <artifactId>lombok</artifactId>
</dependency>
<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-test</artifactId>
   <scope>test</scope>
</dependency>
<dependency>
   <groupId>org.testcontainers</groupId>
   <artifactId>mockserver</artifactId>
   <version>1.12.3</version>
   <scope>test</scope>
</dependency>
<dependency>
   <groupId>org.mock-server</groupId>
   <artifactId>mockserver-client-java</artifactId>
   <version>3.10.8</version>
   <scope>test</scope>
</dependency>
<dependency>
   <groupId>com.carrotsearch</groupId>
   <artifactId>junit-benchmarks</artifactId>
   <version>0.7.2</version>
   <scope>test</scope>
</dependency>

2. Enabling Spring Cloud Gateway Circuit Breaker with Resilience4J

To enable circuit breaker built on top of Resilience4J we need to declare a Customizer bean that is passed a ReactiveResilience4JCircuitBreakerFactory. The very simple configuration contains default circuit breaker settings and and defines timeout duration using TimeLimiterConfig. For the first test I decided to set 200 milliseconds timeout.

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
    return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
        .circuitBreakerConfig(CircuitBreakerConfig.ofDefaults())
        .timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build())
        .build());
}

3. Building Test Class

In the next step we are creating a test class. Before running the test it is starting and provisioning an instance of mock server. We are defining two endpoints. The second of them /2 adds a delay of 200 milliseconds, which exceeds the timeout defined in the circuit breaker configuration.
We are also setting configuration of Spring Cloud Gateway route which is addressed to the currently started instance of mock server. To enable the circuit breaker for our route we have to define a CircuitBreaker filter with a given name. The test is repeated 200 times. It calls the delayed and not delayed endpoint in 50/50 proportion. Here’s the Spring Cloud Gateway test class.

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT)
@RunWith(SpringRunner.class)
public class GatewayCircuitBreakerTest {

    private static final Logger LOGGER = LoggerFactory.getLogger(GatewayRateLimiterTest.class);

    @Rule
    public TestRule benchmarkRun = new BenchmarkRule();

    @ClassRule
    public static MockServerContainer mockServer = new MockServerContainer();

    @Autowired
    TestRestTemplate template;
    int i = 0;

    @BeforeClass
    public static void init() {
        System.setProperty("spring.cloud.gateway.routes[0].id", "account-service");
        System.setProperty("spring.cloud.gateway.routes[0].uri", "http://192.168.99.100:" + mockServer.getServerPort());
        System.setProperty("spring.cloud.gateway.routes[0].predicates[0]", "Path=/account/**");
        System.setProperty("spring.cloud.gateway.routes[0].filters[0]", "RewritePath=/account/(?<path>.*), /$\\{path}");
        System.setProperty("spring.cloud.gateway.routes[0].filters[1].name", "CircuitBreaker");
        System.setProperty("spring.cloud.gateway.routes[0].filters[1].args.name", "exampleSlowCircuitBreaker");
        MockServerClient client = new MockServerClient(mockServer.getContainerIpAddress(), mockServer.getServerPort());
        client.when(HttpRequest.request()
            .withPath("/1"))
            .respond(response()
                .withBody("{\"id\":1,\"number\":\"1234567890\"}")
                .withHeader("Content-Type", "application/json"));
        client.when(HttpRequest.request()
            .withPath("/2"))
            .respond(response()
                .withBody("{\"id\":2,\"number\":\"1234567891\"}")
                .withDelay(TimeUnit.MILLISECONDS, 200)
                .withHeader("Content-Type", "application/json"));
    }

    @Test
    @BenchmarkOptions(warmupRounds = 0, concurrency = 1, benchmarkRounds = 200)
    public void testAccountService() {
        int gen = 1 + (i++ % 2);
        ResponseEntity<Account> r = template.exchange("/account/{id}", HttpMethod.GET, null, Account.class, gen);
        LOGGER.info("{}. Received: status->{}, payload->{}, call->{}", i, r.getStatusCodeValue(), r.getBody(), gen);
    }

}

Here’s the result of the currently discussed test. With default settings it opens the circuit after processing 100 requests with 50% error rate. The logs visible below include a sequence number of requests, HTTP response status code, response body and URL of the called endpoint.

circuit-breaking-spring-cloud-gateway-resilience4j-3

We may change the default settings a little. To do that we should define a custom CircuitBreakerConfig. One of the properties we can customize is slidingWindowSize. The property slidingWindowSize defines how many outcome calls has to be recorded when a circuit breaker is closed. Assuming we have the same test endpoints what will happen if we change this value to 10 as shown below?

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
    return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
        .circuitBreakerConfig(CircuitBreakerConfig.custom()
            .slidingWindowSize(10)
            .build())
        .timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build()).build());
}

Here’s the result. The circuit is open just after processing 10 requests when at least 50% of them are timeouted.

resilience4j-2.PNG

Moreover, we may change failureRateThreshold. This property is responsible for configuring the failure rate threshold in percentage. If the failure rate is equal or greater than the threshold the circuit breaker is switched to open and starts short-circuiting calls. It is not difficult to predict what will happen if we change it for our current scenario to 66.6F. The circuit will never be opened.

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
    return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
        .circuitBreakerConfig(CircuitBreakerConfig.custom()
            .slidingWindowSize(10)
            .failureRateThreshold(66.6F)
            .build())
        .timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build()).build());
}

4. Spring Cloud Gateway circuit breaker Customization

We were starting with really basic samples. Let’s do something more interesting! First, we will set a really small value of sliding window size. It is set to 5. Thanks to that we will be able to observe the full result of the current test scenario after processing only a few requests. The next step is to modify the rules defined on the mock server. Now, we will delay only 5 first requests sent to the /2 endpoint. After receiving 5 requests it starts to work fine without adding any delay. Thanks to that fact our circuit breaker would be able to back from OPEN state to CLOSE after some time. But first things first, here are the code defining mock endpoints for the current test.

MockServerClient client = new MockServerClient(mockServer.getContainerIpAddress(), mockServer.getServerPort());
client.when(HttpRequest.request()
   .withPath("/1"))
   .respond(response()
      .withBody("{\"id\":1,\"number\":\"1234567890\"}")
      .withHeader("Content-Type", "application/json"));
client.when(HttpRequest.request()
   .withPath("/2"), Times.exactly(5))
   .respond(response()
      .withBody("{\"id\":2,\"number\":\"1234567891\"}")
      .withDelay(TimeUnit.MILLISECONDS, 200)
      .withHeader("Content-Type", "application/json"));
client.when(HttpRequest.request()
   .withPath("/2"))
   .respond(response()
      .withBody("{\"id\":2,\"number\":\"1234567891\"}")
      .withHeader("Content-Type", "application/json"));

As I mentioned before the slidingWindowSize is now equal to 5. If there are 3 timeouts during the last 5 calls the circuit is switched to OPEN state. We can configure how long the circuit should stay in the OPEN state without trying to process any request. The parameter waitDurationInOpenState, which is responsible for that, has been set to 30 milliseconds. Therefore, after 30 milliseconds the circuit is switched to HALF_OPEN state, which means that the incoming requests are processed again. We can also configure a number of permitted calls in the HALF_OPEN state. The property permittedNumberOfCallsInHalfOpenState is set to 5 instead of default value 10. In these five attempts, we get only 2 timeouts, since we set 5 repeats for delayed service on the mock server and the first 3 timeouts have been in the beginning before opening a circuit. Here’s our current configuration of Spring Cloud Circuit Breaker.

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
   return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
      .circuitBreakerConfig(CircuitBreakerConfig.custom()
         .slidingWindowSize(5)
         .permittedNumberOfCallsInHalfOpenState(5)
         .failureRateThreshold(50.0F)
         .waitDurationInOpenState(Duration.ofMillis(30))
         .build())
      .timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build()).build());
}

The following diagram illustrates our scenario.

circuit-breaker-spring-cloud-gateway-resilience4j-scenario.png

And here’s the result of our current test. The circuit has been opened after processing 6 requests. There were 3 incoming requests that had not been processed during 30 milliseconds of being in open state. After that time it has been switched to half open state and finally it moved back to close state.

circuit-breaking-spring-cloud-gateway-resilience4j-4

What will happen if we increase the number of delayed requests in this scenario to 20?

client.when(HttpRequest.request()
   .withPath("/2"), Times.exactly(20))
   .respond(response()
      .withBody("{\"id\":2,\"number\":\"1234567891\"}")
      .withDelay(TimeUnit.MILLISECONDS, 200)
      .withHeader("Content-Type", "application/json"));

The circuit will be switched between OPEN and HALF_OPEN state until the downstream service is delaying the responses.

resilience4j-5

5. Adding Fallback

As you probably noticed, if the request to the downstream service has been finished with timeout the gateway returns HTTP status HTTP 504 - Gateway Timeout. Moreover, if a circuit is open the gateway is returning HTTP Status HTTP 503 - Service Unavailable. To prevent from returning error status code on the gateway we may enable fallback endpoint for our route. To do that we have to set property fallbackUri using forward: scheme. Here’s the current configuration of the test route. I included the endpoint /fallback/account as fallback URI.


System.setProperty("spring.cloud.gateway.routes[0].id", "account-service");
System.setProperty("spring.cloud.gateway.routes[0].uri", "http://192.168.99.100:" + mockServer.getServerPort());
System.setProperty("spring.cloud.gateway.routes[0].predicates[0]", "Path=/account/**");
System.setProperty("spring.cloud.gateway.routes[0].filters[0]", "RewritePath=/account/(?<path>.*), /$\\{path}");
System.setProperty("spring.cloud.gateway.routes[0].filters[1].name", "CircuitBreaker");
System.setProperty("spring.cloud.gateway.routes[0].filters[1].args.name", "exampleSlowCircuitBreaker");
System.setProperty("spring.cloud.gateway.routes[0].filters[1].args.fallbackUri", "forward:/fallback/account");

The fallback endpoint is exposed on the gateway. I defined a simple controller class that implements a single fallback method.

@RestController
@RequestMapping("/fallback")
public class GatewayFallback {

    @GetMapping("/account")
    public Account getAccount() {
        Account a = new Account();
        a.setId(2);
        a.setNumber("123456");
        return a;
    }

}

Assuming we have exactly the same scenario as in the previous section the current test is returning only HTTP 200 instead of responses with HTTP 5xx as shown below.

circuit-breaking-spring-cloud-gateway-resilience4j-6

6. Handling Slow Responses

In all previous examples we have set a short timeout on response, what results in HTTP 504 - Gateway Timeout or fallback. However, we don’t have to timeout the requests, but we can just set a threshold and failure rate for indicating slow responses. There are two parameters responsible for that: slowCallDurationThreshold and slowCallRateThreshold.

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
    return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
        .circuitBreakerConfig(CircuitBreakerConfig.custom()
           .slidingWindowSize(5)
           .permittedNumberOfCallsInHalfOpenState(5)
           .failureRateThreshold(50.0F)
           .waitDurationInOpenState(Duration.ofMillis(50))
           .slowCallDurationThreshold(Duration.ofMillis(200))
           .slowCallRateThreshold(50.0F)
           .build())
        .build());
}

Now, the delayed responses are not finished with timeout, however the circuit breaker is still recording these records. When the threshold is exceeded the circuit breaker is open as shown below.

circuit-breaking-spring-cloud-gateway-resilience4j-7

6 COMMENTS

comments user
deluxeuo

Piotr how are you? Thanks for the article, there is very little spring gateway content with resilence4j. I downloaded your project and ran it without any changes. But all executions return 200, (I did not notice the fallback call). So I increased the time from 200ms to 1000ms, so I got an error message.

21:14:19.961 — [ parallel-2] : CircuitBreaker ‘exampleSlowCircuitBreaker’ recorded an exception as failure:
java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 200ms in ‘circuitBreaker’ (and no fallback has been configured)

can you help me?

comments user
Szympi

Moja praca inżynierska uratowana 😀

    comments user
    piotr.minkowski

    🙂

comments user
Leroy

ich danke dir Piotr. Wirklich interessant!

    comments user
    piotr.minkowski

    Thanks 🙂

Leave a Reply