Circuit Breaking In Spring Cloud Gateway With Resilience4J


In the newest version of Spring Cloud Gateway (2.2.1) we may take an advantage of a new implementation of circuit breaker built on top of project Resilience4J (https://github.com/resilience4j/resilience4j). Resilience4J has been selected as a replacement for Netflix’s Hystrix, that had been moved to the maintenance mode. Of course, you can still use Hystrix as circuit breaker implementation, however it is deprecated and probably won’t be available in the future versions of Spring Cloud. A new implementation is called no different than just Spring Cloud Circuit Breaker.
You can find another interesting example of using Spring Cloud Gateway components in one of previous articles. I have already described how to implement rate limiting based on Redis here: Rate Limiting In Spring Cloud Gateway With Redis. In the current article I’m using the same GitHub repository as earlier: sample-spring-cloud-gateway. I’m going to show some sample scenarios of using Spring Cloud Circuit Breaker with Spring Cloud Gateway including fallback pattern.

1. Dependencies

To succesfully test some scenarios of using a circuit breaker pattern with Spring Cloud Gateway we need to include reactive version of Spring Cloud Circuit Breaker since gateway is started on reactive Netty server. We will simulate downstream service using MockServer provided within Testcontainers framework. It is provisioned inside the test by mock client written in Java.

<dependency>
	<groupId>org.springframework.cloud</groupId>
	<artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-cloud-starter-circuitbreaker-reactor-resilience4j</artifactId>
</dependency>
<dependency>
	<groupId>org.projectlombok</groupId>
	<artifactId>lombok</artifactId>
</dependency>
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-test</artifactId>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>org.testcontainers</groupId>
	<artifactId>mockserver</artifactId>
	<version>1.12.3</version>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>org.mock-server</groupId>
	<artifactId>mockserver-client-java</artifactId>
	<version>3.10.8</version>
	<scope>test</scope>
</dependency>
<dependency>
	<groupId>com.carrotsearch</groupId>
	<artifactId>junit-benchmarks</artifactId>
	<version>0.7.2</version>
	<scope>test</scope>
</dependency>

2. Enabling Resilience4J Circuit Breaker

To enable circuit breaker built on top of Resilience4J we need to declare Customizer bean that is passed a ReactiveResilience4JCircuitBreakerFactory. The very simple configuration contains default circuit breaker settings and and defines timeout duration using TimeLimiterConfig. For the first test I decided to set 200 milliseconds timeout.

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
    return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
        .circuitBreakerConfig(CircuitBreakerConfig.ofDefaults())
        .timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build())
        .build());
}

3. Building Test Class

In the next step we are creating a test class. Before running the test it is starting and provisioning an instance of mock server. We are defining two endpoints. The second of them /2 adds a delay of 200 milliseconds, which exceeding timeout defined in the circuit breaker configuration.
We are also setting configuration of Spring Cloud Gateway route which is addressed to the currently started instance of mock server. To enable circuit breaker for our route we have to define CircuitBreaker filter with a given name. The test is repeated 200 times. It calls delayed and not delayed endpoint in 50/50 proportion. Here’s the Spring Cloud Gateway test class.

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT)
@RunWith(SpringRunner.class)
public class GatewayCircuitBreakerTest {

    private static final Logger LOGGER = LoggerFactory.getLogger(GatewayRateLimiterTest.class);

    @Rule
    public TestRule benchmarkRun = new BenchmarkRule();

    @ClassRule
    public static MockServerContainer mockServer = new MockServerContainer();

    @Autowired
    TestRestTemplate template;
    int i = 0;

    @BeforeClass
    public static void init() {
        System.setProperty("spring.cloud.gateway.routes[0].id", "account-service");
        System.setProperty("spring.cloud.gateway.routes[0].uri", "http://192.168.99.100:" + mockServer.getServerPort());
        System.setProperty("spring.cloud.gateway.routes[0].predicates[0]", "Path=/account/**");
        System.setProperty("spring.cloud.gateway.routes[0].filters[0]", "RewritePath=/account/(?.*), /$\\{path}");
        System.setProperty("spring.cloud.gateway.routes[0].filters[1].name", "CircuitBreaker");
        System.setProperty("spring.cloud.gateway.routes[0].filters[1].args.name", "exampleSlowCircuitBreaker");
        MockServerClient client = new MockServerClient(mockServer.getContainerIpAddress(), mockServer.getServerPort());
        client.when(HttpRequest.request()
            .withPath("/1"))
            .respond(response()
                .withBody("{\"id\":1,\"number\":\"1234567890\"}")
                .withHeader("Content-Type", "application/json"));
        client.when(HttpRequest.request()
            .withPath("/2"))
            .respond(response()
                .withBody("{\"id\":2,\"number\":\"1234567891\"}")
                .withDelay(TimeUnit.MILLISECONDS, 200)
                .withHeader("Content-Type", "application/json"));
    }

    @Test
    @BenchmarkOptions(warmupRounds = 0, concurrency = 1, benchmarkRounds = 200)
    public void testAccountService() {
        int gen = 1 + (i++ % 2);
        ResponseEntity r = template.exchange("/account/{id}", HttpMethod.GET, null, Account.class, gen);
        LOGGER.info("{}. Received: status->{}, payload->{}, call->{}", i, r.getStatusCodeValue(), r.getBody(), gen);
    }

}

Here’s the result of currently discussed test. With default settings it opens circuit after processing 100 requests with 50% error rate. The logs visible below includes a sequence number of request, HTTP response status code, response body and URL of called endpoint.

circuit-breaking-spring-cloud-gateway-resilience4j-3

We may change the default settings a little. To do that we should define a custom CircuitBreakerConfig. One of the properties we can customize is slidingWindowSize. The property slidingWindowSize defines how many outcome calls has to be recorded when a circuit breaker is closed. Assuming we have the same test endpoints what will happen if we change this value to 10 as shown below?

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
    return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
        .circuitBreakerConfig(CircuitBreakerConfig.custom()
            .slidingWindowSize(10)
            .build())
        .timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build()).build());
}

Here’s the result. The circuit is open just after processing 10 requests when at least 50% of them is timeouted.

circuit-breaking-spring-cloud-gateway-resilience4j-2.PNG

Moreover, we may change failureRateThreshold. This property is responsible for configuring the failure rate threshold in percentage. If the failure rate is equal or greater than the threshold the circuit breaker is switched to open and starts short-circuiting calls. It is not difficult to predict what happen if we change it for our current scenario to 66.6F. The circuit will never be opened.

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
    return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
        .circuitBreakerConfig(CircuitBreakerConfig.custom()
            .slidingWindowSize(10)
            .failureRateThreshold(66.6F)
            .build())
        .timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build()).build());
}

4. Circuit Breaker Customization

We were starting with really basic samples. Let’s do something more interesting! First, we will set really small value of sliding window size. It set set to 5. Thanks to that we will be able to observe the full result of the current test scenario after processing only a few requests. The next step is to modify a rules defined on the mock server. Now, we will delay only 5 first requests sent to the /2 endpoint. After receiving 5 requests it starts to works fine without adding any delay. Thanks to that fact our circuit breaker would be able to back from OPEN state to CLOSE after some time. But first things first, here are the code defining mock endpoints for the current test.

MockServerClient client = new MockServerClient(mockServer.getContainerIpAddress(), mockServer.getServerPort());
client.when(HttpRequest.request()
	.withPath("/1"))
	.respond(response()
		.withBody("{\"id\":1,\"number\":\"1234567890\"}")
		.withHeader("Content-Type", "application/json"));
client.when(HttpRequest.request()
	.withPath("/2"), Times.exactly(5))
	.respond(response()
		.withBody("{\"id\":2,\"number\":\"1234567891\"}")
		.withDelay(TimeUnit.MILLISECONDS, 200)
		.withHeader("Content-Type", "application/json"));
client.when(HttpRequest.request()
	.withPath("/2"))
	.respond(response()
		.withBody("{\"id\":2,\"number\":\"1234567891\"}")
		.withHeader("Content-Type", "application/json"));

As I mentioned before the slidingWindowSize is now equal to 5. If there are 3 timeouts during last 5 calls the circuit is switched to OPEN state. We can configure how long the circuit should stay in the OPEN state without trying to process any request. The parameter waitDurationInOpenState, which is responsible for that, has been set to 30 milliseconds. Therefore, after 30 milliseconds the circuit is switched to HALF_OPEN state, what means that the incoming requests are processing again. We can also configure a number of permitted calls in HALF_OPEN state. The property permittedNumberOfCallsInHalfOpenState is set to 5 instead of default value 10. In this five attempts we get only 2 timeouts, since we set 5 repeats for delayed service on the mock server and first 3 timeouts has been in the beginning before opening a circuit. Here’s our current configuration of Spring Cloud Circuit Breaker.

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
	return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
		.circuitBreakerConfig(CircuitBreakerConfig.custom()
			.slidingWindowSize(5)
			.permittedNumberOfCallsInHalfOpenState(5)
			.failureRateThreshold(50.0F)
			.waitDurationInOpenState(Duration.ofMillis(30))
			.build())
		.timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build()).build());
}

The following diagram illustrates our scenario.

circuit-breaker-spring-cloud-gateway-resilience4j-scenario.png

And here’s the result of our current test. The circuit has been opened after processing 6 requests. There was 3 incoming requests that has not been processed during 30 milliseconds of being in open state. After that time it has been switched to half open state and finally it moved back to close state.

circuit-breaking-spring-cloud-gateway-resilience4j-4

What will happen if we increase the number of delayed requests in this scenario to 20?

client.when(HttpRequest.request()
	.withPath("/2"), Times.exactly(20))
	.respond(response()
		.withBody("{\"id\":2,\"number\":\"1234567891\"}")
		.withDelay(TimeUnit.MILLISECONDS, 200)
		.withHeader("Content-Type", "application/json"));

The circuit will be switched between OPEN and HALF_OPEN state until the downstream service is delaying the responses.

circuit-breaking-spring-cloud-gateway-resilience4j-5

5. Adding Fallback

As you probably noticed, if the request to the downstream service has been finished with timeout the gateway returned HTTP status HTTP 504 - Gateway Timeout. Moreover, if a circuit is open the gateway is returning HTTP Status HTTP 503 - Service Unavailable. To prevent from returning error status code on the gateway we may enable fallback endpoint for our route. To do that we have to set property fallbackUri using forward: scheme. Here’s the current configuration of the test route. I included the endpoint /fallback/account as fallback URI.

System.setProperty("spring.cloud.gateway.routes[0].id", "account-service");
System.setProperty("spring.cloud.gateway.routes[0].uri", "http://192.168.99.100:" + mockServer.getServerPort());
System.setProperty("spring.cloud.gateway.routes[0].predicates[0]", "Path=/account/**");
System.setProperty("spring.cloud.gateway.routes[0].filters[0]", "RewritePath=/account/(?.*), /$\\{path}");
System.setProperty("spring.cloud.gateway.routes[0].filters[1].name", "CircuitBreaker");
System.setProperty("spring.cloud.gateway.routes[0].filters[1].args.name", "exampleSlowCircuitBreaker");
System.setProperty("spring.cloud.gateway.routes[0].filters[1].args.fallbackUri", "forward:/fallback/account");

The fallback endpoint is exposed on the gateway. I defined simple controller class that implements single fallback method.

@RestController
@RequestMapping("/fallback")
public class GatewayFallback {

    @GetMapping("/account")
    public Account getAccount() {
        Account a = new Account();
        a.setId(2);
        a.setNumber("123456");
        return a;
    }

}

Assuming we have exactly the same scenario as in the previous section the current test is returning only HTTP 200 instead of responses with HTTP 5xx as shown below.

circuit-breaking-spring-cloud-gateway-resilience4j-6

6. Handling Slow Responses

In the all previous examples we have set a short timeout on response, what results in HTTP 504 - Gateway Timeout or fallback. However, we don’t have to timeout the requests, but we can just set a threshold and failure rate for indicating slow responses. There are two parameters responsible for that: slowCallDurationThreshold and slowCallRateThreshold.

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
    return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
        .circuitBreakerConfig(CircuitBreakerConfig.custom()
           .slidingWindowSize(5)
           .permittedNumberOfCallsInHalfOpenState(5)
           .failureRateThreshold(50.0F)
           .waitDurationInOpenState(Duration.ofMillis(50))
           .slowCallDurationThreshold(Duration.ofMillis(200))
           .slowCallRateThreshold(50.0F)
           .build())
        .build());
}

Now, the delayed responses are not finished with timeout, however the circuit breaker is still recording these records. When the threshold is exceeded the circuit breaker is open as shown below.

circuit-breaking-spring-cloud-gateway-resilience4j-7

2 thoughts on “Circuit Breaking In Spring Cloud Gateway With Resilience4J

  1. Piotr how are you? Thanks for the article, there is very little spring gateway content with resilence4j. I downloaded your project and ran it without any changes. But all executions return 200, (I did not notice the fallback call). So I increased the time from 200ms to 1000ms, so I got an error message.

    21:14:19.961 — [ parallel-2] : CircuitBreaker ‘exampleSlowCircuitBreaker’ recorded an exception as failure:
    java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 200ms in ‘circuitBreaker’ (and no fallback has been configured)

    can you help me?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.