Best Practices For Microservices on Kubernetes

There are several best practices for building microservices architecture properly. You may find many articles about it online. One of them is my previous article Spring Boot Best Practices For Microservices. I focused there on the most important aspects that should be considered when running microservice applications built on top of Spring Boot on production. I didn’t assume there is any platform used for orchestration or management, but just a group of independent applications. In this article, I’m going to extend the list of already introduced best practices with some new rules dedicated especially to microservices deployed on the Kubernetes platform.
The first question is if it makes any difference when you deploy your microservices on Kubernetes instead of running them independently without any platform? Well, actually yes and no… Yes, because now you have a platform that is responsible for running and monitoring your applications, and it launches some of its own rules. No, because you still have microservices architecture, a group of loosely coupled, independent applications, and you should not forget about it! In fact, many of the previously introduced best practices are actual, some of them need to be redefined a little. There are also some new, platform-specific rules, which should be mentioned.
One thing that needs to be explained before proceeding. This list of Kubernetes microservices best practices is built based on my experience in running microservices-based architecture on cloud platforms like Kubernetes. I didn’t copy it from other articles or books. In my organization, we have already migrated our microservices from Spring Cloud (Eureka, Zuul, Spring Cloud Config) to OpenShift. We are continuously improving this architecture based on experience in maintaining it.

Example

The sample Spring Boot application that implements currently described Kubernetes microservices best practices is written in Kotlin. It is available on GitHub in repository sample-spring-kotlin-microservice under branch kubernetes: https://github.com/piomin/sample-spring-kotlin-microservice/tree/kubernetes.

1. Allow platform to collect metrics

I have also put a similar section in my article about best practices for Spring Boot. However, metrics are also one of the important Kubernetes microservices best practices. We were using InfluxDB as a target metrics store. Since our approach to gathering metrics data is being changed after migration to Kubernetes I redefined the title of this point into Allow platform to collect metrics. The main difference between current and previous approaches is in the way of collecting data. We use Prometheus, because that process may be managed by the platform. InfluxDB is a push-based system, where your application actively pushes data into the monitoring system. Prometheus is a pull-based system, where a server fetches the metrics values from the running application periodically. So, our main responsibility at this point is to provide endpoints on the application side for Prometheus.
Fortunately, it is very easy to provide metrics for Prometheus with Spring Boot. You need to include Spring Boot Actuator and a dedicated Micrometer library for integration with Prometheus.

<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
   <groupId>io.micrometer</groupId>
   <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

We should also enable exposing Actuator HTTP endpoints outside application. You can enable a single endpoint dedicated for Prometheus or just expose all Actuator endpoints as shown below.

management.endpoints.web.exposure.include: '*'

After running your application endpoint is by default available under path /actuator/prometheus.

best-practices-microservices-kubernetes-actuator

Assuming you run your application on Kubernetes you need to deploy and configure Prometheus to scrape logs from your pods. The configuration may be delivered as Kubernetes ConfigMap. The prometheus.yml file should contain section scrape_config with path of endpoint serving metrics and Kubernetes discovery settings. Prometheus is trying to localize all application pods by Kubernetes Endpoints. The application should be labeled with app=sample-spring-kotlin-microservice and have a port with name http exposed outside the container.

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus
  labels:
    name: prometheus
data:
  prometheus.yml: |-
    scrape_configs:
      - job_name: 'springboot'
        metrics_path: /actuator/prometheus
        scrape_interval: 5s
        kubernetes_sd_configs:
        - role: endpoints
          namespaces:
            names:
              - default

        relabel_configs:
          - source_labels: [__meta_kubernetes_service_label_app]
            separator: ;
            regex: sample-spring-kotlin-microservice
            replacement: $1
            action: keep
          - source_labels: [__meta_kubernetes_endpoint_port_name]
            separator: ;
            regex: http
            replacement: $1
            action: keep
          - source_labels: [__meta_kubernetes_namespace]
            separator: ;
            regex: (.*)
            target_label: namespace
            replacement: $1
            action: replace
          - source_labels: [__meta_kubernetes_pod_name]
            separator: ;
            regex: (.*)
            target_label: pod
            replacement: $1
            action: replace
          - source_labels: [__meta_kubernetes_service_name]
            separator: ;
            regex: (.*)
            target_label: service
            replacement: $1
            action: replace
          - source_labels: [__meta_kubernetes_service_name]
            separator: ;
            regex: (.*)
            target_label: job
            replacement: ${1}
            action: replace
          - separator: ;
            regex: (.*)
            target_label: endpoint
            replacement: http
            action: replace

The last step is to deploy Prometheus on Kubernetes. You should attach ConfigMap with Prometheus configuration to the Deployment via Kubernetes mounted volume. After that you may set the location of a configuration file using config.file parameter: --config.file=/prometheus2/prometheus.yml.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:latest
          args:
            - "--config.file=/prometheus2/prometheus.yml"
            - "--storage.tsdb.path=/prometheus/"
          ports:
            - containerPort: 9090
              name: http
          volumeMounts:
            - name: prometheus-storage-volume
              mountPath: /prometheus/
            - name: prometheus-config-map
              mountPath: /prometheus2/
      volumes:
        - name: prometheus-storage-volume
          emptyDir: {}
        - name: prometheus-config-map
          configMap:
            name: prometheus

Now you can verify if Prometheus has discovered your application running on Kubernetes by accessing endpoint /targets.

best-practices-microservices-kubernetes-prometheus

2. Prepare logs in right format

The approach to collecting logs is pretty similar to collecting metrics. Our application should not handle the process of sending logs by itself. It just should take care of formatting logs sent to the output stream properly. Since Docker has a built-in logging driver for Fluentd it is very convenient to use it as a log collector for applications running on Kubernetes. This means no additional agent is required on the container to push logs to Fluentd. Logs are directly shipped to Fluentd service from STDOUT and no additional logs file or persistent storage is required. Fluentd tries to structure data as JSON to unify logging across different sources and destinations.
In order to format our logs to JSON readable by Fluentd we may include the Logstash Logback Encoder library to our dependencies.

<dependency>
   <groupId>net.logstash.logback</groupId>
   <artifactId>logstash-logback-encoder</artifactId>
   <version>6.3</version>
</dependency>

Then we just need to set a default console log appender for our Spring Boot application in the file logback-spring.xml.

<configuration>
    <appender name="consoleAppender" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
    </appender>
    <logger name="jsonLogger" additivity="false" level="DEBUG">
        <appender-ref ref="consoleAppender"/>
    </logger>
    <root level="INFO">
        <appender-ref ref="consoleAppender"/>
    </root>
</configuration>

The logs are printed into STDOUT in the format visible below.

kubernetes-log-format

It is very simple to install Fluentd, Elasticsearch and Kibana on Minikube. Disadvantage of this approach is that we are installing older versions of these tools.

$ minikube addons enable efk
* efk was successfully enabled
$ minikube addons enable logviewer
* logviewer was successfully enabled

After enabling efk and logviewer addons Kubernetes pulls and starts all the required pods as shown below.

best-practices-microservices-kubernetes-pods-logging

Thanks to the logstash-logback-encoder library we may automatically create logs compatible with Fluentd including MDC fields. Here’s a screen from Kibana that shows logs from our test application.

best-practices-microservices-kubernetes-kibana

Optionally, you can add my library for logging requests/responses for Spring Boot application.

<dependency>
   <groupId>com.github.piomin</groupId>
   <artifactId>logstash-logging-spring-boot-starter</artifactId>
   <version>1.2.2.RELEASE</version>
</dependency>

3. Implement both liveness and readiness health check

It is important to understand the difference between liveness and readiness probes in Kubernetes. If these probes are not implemented carefully, they can degrade the overall operation of a service, for example by causing unnecessary restarts. Liveness probe is used to decide whether to restart the container or not. If an application is unavailable for any reason, restarting the container sometimes can make sense. On the other hand, a readiness probe is used to decide if a container can handle incoming traffic. If a pod has been recognized as not ready, it is removed from load balancing. Fail of the readiness probe does not result in pod restart. The most typical liveness or readiness probe for web applications is realized via HTTP endpoint.
In a typical web application running outside a platform like Kubernetes, you won’t distinguish liveness and readiness health checks. That’s why most web frameworks provide only a single built-in health check implementation. For Spring Boot application you may easily enable health check by including Spring Boot Actuator to your dependencies. The important information about the Actuator health check is that it may behave differently depending on integrations between your application and third-party systems. For example, if you define a Spring data source for connecting to a database or declare a connection to the message broker, a health check may automatically include such validation through auto-configuration. Therefore, if you set a default Spring Actuator health check implementation as a liveness probe endpoint, it may result in unnecessary restart if the application is unable to connect the database or message broker. Since such behavior is not desired, I suggest you should implement a very simple liveness endpoint, that just verifies the availability of application without checking connection to other external systems.
Adding a custom implementation of a health check is not very hard with Spring Boot. There are some different ways to do that. One of them is visible below. We are using the mechanism provided within Spring Boot Actuator. It is worth noting that we won’t override a default health check, but we are adding another, custom implementation. The following implementation is just checking if an application is able to handle incoming requests.

@Component
@Endpoint(id = "liveness")
class LivenessHealthEndpoint {

    @ReadOperation
    fun health() : Health = Health.up().build()

    @ReadOperation
    fun name(@Selector name: String) : String = "liveness"

    @WriteOperation
    fun write(@Selector name: String) {

    }

    @DeleteOperation
    fun delete(@Selector name: String) {

    }

}

In turn, a default Spring Boot Actuator health check may be the right solution for a readiness probe. Assuming your application would connect to database Postgres and RabbitMQ message broker you should add the following dependencies to your Maven pom.xml.

<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-amqp</artifactId>
</dependency>
<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
   <groupId>org.postgresql</groupId>
   <artifactId>postgresql</artifactId>
   <scope>runtime</scope>
</dependency>

Now, just for information add the following property to your application.yml. It enables displaying detailed information for auto-configured Actuator /health endpoint.

management:
  endpoint:
    health:
      show-details: always

Finally, let’s call /actuator/health to see the detailed result. As you see in the picture below, a health check returns information about Postgres and RabbitMQ connections.

best-practices-microservices-kubernetes-readiness

There is another aspect of using liveness and readiness probes in your web application. It is related to thread pooling. In a standard web container like Tomcat, each request is handled by the HTTP thread pool. If you are processing each request in the main thread and you have some long-running tasks in your application you may block all available HTTP threads. If your liveness will fail several times in row an application pod will be restarted. Therefore, you should consider implementing long-running tasks using another thread pool. Here’s the example of HTTP endpoint implementation with DeferredResult and Kotlin coroutines.

@PostMapping("/long-running")
fun addLongRunning(@RequestBody person: Person): DeferredResult<Person> {
   var result: DeferredResult<Person>  = DeferredResult()
   GlobalScope.launch {
      logger.info("Person long-running: {}", person)
      delay(10000L)
      result.setResult(repository.save(person))
   }
   return result
}

4. Consider your integrations

Hardly ever our application is able to exist without any external solutions like databases, message brokers or just other applications. There are two aspects of integration with third-party applications that should be carefully considered: connection settings and auto-creation of resources.
Let’s start with connection settings. As you probably remember, in the previous section we were using the default implementation of Spring Boot Actuator /health endpoint as a readiness probe. However, if you leave default connection settings for Postgres and Rabbit each call of the readiness probe takes a long time if they are unavailable. That’s why I suggest decreasing these timeouts to lower values as shown below.

spring:
  application:
    name: sample-spring-kotlin-microservice
  datasource:
    url: jdbc:postgresql://postgres:5432/postgres
    username: postgres
    password: postgres123
    hikari:
      connection-timeout: 2000
      initialization-fail-timeout: 0
  jpa:
    database-platform: org.hibernate.dialect.PostgreSQLDialect
  rabbitmq:
    host: rabbitmq
    port: 5672
    connection-timeout: 2000

Except properly configured connection timeouts you should also guarantee auto-creation of resources required by the application. For example, if you use RabbitMQ queue for asynchronous messaging between two applications you should guarantee that the queue is created on startup if it does not exist. To do that first declare a queue – usually on the listener side.

@Configuration
class RabbitMQConfig {

    @Bean
    fun myQueue(): Queue {
        return Queue("myQueue", false)
    }

}

Here’s a listener bean with receiving method implementation.

@Component
class PersonListener {

    val logger: Logger = LoggerFactory.getLogger(PersonListener::class.java)

    @RabbitListener(queues = ["myQueue"])
    fun listen(msg: String) {
        logger.info("Received: {}", msg)
    }

}

The similar case is with database integration. First, you should ensure that your application starts even if the connection to the database fails. That’s why I declared PostgreSQLDialect. It is required if the application is not able to connect to the database. Moreover, each change in the entities model should be applied on tables before application startup.
Fortunately, Spring Boot has auto-configured support for popular tools for managing database schema changes: Liquibase and Flyway. To enable Liquibase we just need to include the following dependency in Maven pom.xml.

<dependency>
   <groupId>org.liquibase</groupId>
   <artifactId>liquibase-core</artifactId>
</dependency>

Then you just need to create a change log and put in the default location db/changelog/db.changelog-master.yaml. Here’s a sample Liquibase changelog YAML file for creating table person.

databaseChangeLog:
  - changeSet:
      id: 1
      author: piomin
      changes:
        - createTable:
            tableName: person
            columns:
              - column:
                  name: id
                  type: int
                  autoIncrement: true
                  constraints:
                    primaryKey: true
                    nullable: false
              - column:
                  name: name
                  type: varchar(50)
                  constraints:
                    nullable: false
              - column:
                  name: age
                  type: int
                  constraints:
                    nullable: false
              - column:
                  name: gender
                  type: smallint
                  constraints:
                    nullable: false

5. Use Service Mesh

If you are building microservices architecture outside Kubernetes, such mechanisms like load balancing, circuit breaking, fallback, or retrying are realized on the application side. Popular cloud-native frameworks like Spring Cloud simplify the implementation of these patterns in your application and just reduce it to adding a dedicated library to your project. However, if you migrate your microservices to Kubernetes you should not still use these libraries for traffic management. It is becoming some kind of anti-pattern. Traffic management in communication between microservices should be delegated to the platform. This approach on Kubernetes is known as Service Mesh. One of the most important Kubernetes microservices best practices is to use dedicated software for building a service mesh.
Since originally Kubernetes has not been dedicated to microservices, it does not provide any built-in mechanism for advanced managing of traffic between many applications. However, there are some additional solutions dedicated for traffic management, which may be easily installed on Kubernetes. One of the most popular of them is Istio. Besides traffic management it also solves problems related to security, monitoring, tracing and metrics collecting.
Istio can be easily installed on your cluster or on standalone development instances like Minikube. After downloading it just run the following command.

$ istioctl manifest apply

Istio components need to be injected into a deployment manifest. After that, we can define traffic rules using YAML manifests. Istio gives many interesting options for configuration. The following example shows how to inject faults into the existing route. It can be either delays or aborts. We can define a percentage level of error using the percent field for both types of fault. In the Istio resource I have defined a 2 seconds delay for every single request sent to Service account-service.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: account-service
spec:
  hosts:
    - account-service
  http:
  - fault:
      delay:
        fixedDelay: 2s
        percent: 100
    route:
    - destination:
        host: account-service
        subset: v1

Besides VirtualService we also need to define DestinationRule for account-service. It is really simple – we have just defined the version label of the target service.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: account-service
spec:
  host: account-service
  subsets:
  - name: v1
    labels:
      version: v1

6. Be open for framework-specific solutions

There are many interesting tools and solutions around Kubernetes, which may help you in running and managing applications. However, you should also not forget about some interesting tools and solutions offered by a framework you use. Let me give you some examples. One of them is the Spring Boot Admin. It is a useful tool designed for monitoring Spring Boot applications across a single discovery. Assuming you are running microservices on Kubernetes you may also install Spring Boot Admin there.
There is another interesting project within Spring Cloud – Spring Cloud Kubernetes. It provides some useful features that simplify integration between a Spring Boot application and Kubernetes. One of them is a discovery across all namespaces. If you use that feature together with Spring Boot Admin, you may easily create a powerful tool, which is able to monitor all Spring Boot microservices running on your Kubernetes cluster. For more details about implementation details you may refer to my article Spring Boot Admin on Kubernetes.
Sometimes you may use Spring Boot integrations with third-party tools to easily deploy such a solution on Kubernetes without building separated Deployment. You can even build a cluster of multiple instances. This approach may be used for products that can be embedded in a Spring Boot application. It can be, for example RabbitMQ or Hazelcast (popular in-memory data grid). If you are interested in more details about running Hazelcast cluster on Kubernetes using this approach please refer to my article Hazelcast with Spring Boot on Kubernetes.

7. Be prepared for a rollback

Kubernetes provides a convenient way to rollback an application to an older version based on ReplicaSet and Deployment objects. By default, Kubernetes keeps 10 previous ReplicaSets and lets you roll back to any of them. However, one thing needs to be pointed out. A rollback does not include configuration stored inside ConfigMap and Secret. Sometimes it is desired to rollback not only application binaries, but also configuration.
Fortunately, Spring Boot gives us really great possibilities for managing externalized configuration. We may keep configuration files inside the application and also load them from an external location. On Kubernetes we may use ConfigMap and Secret for defining Spring configuration files. The following definition of ConfigMap creates application-rollbacktest.yml Spring configuration containing only a single property. This configuration is loaded by the application only if Spring profile rollbacktest is active.

apiVersion: v1
kind: ConfigMap
metadata:
  name: sample-spring-kotlin-microservice
data:
  application-rollbacktest.yml: |-
    property1: 123456

A ConfigMap is included to the application through a mounted volume.

spec:
  containers:
  - name: sample-spring-kotlin-microservice
    image: piomin/sample-spring-kotlin-microservice
    ports:
    - containerPort: 8080
       name: http
    volumeMounts:
    - name: config-map-volume
       mountPath: /config/
  volumes:
    - name: config-map-volume
       configMap:
         name: sample-spring-kotlin-microservice

We also have application.yml on the classpath. The first version contains only a single property.

property1: 123

In the second we are going to activate the rollbacktest profile. Since, a profile-specific configuration file has higher priority than application.yml, the value of property1 property is overridden with value taken from application-rollbacktest.yml.

property1: 123
spring.profiles.active: rollbacktest

Let’s test the mechanism using a simple HTTP endpoint that prints the value of the property.

@RestController
@RequestMapping("/properties")
class TestPropertyController(@Value("\${property1}") val property1: String) {

    @GetMapping
    fun printProperty1(): String  = property1
    
}

Let’s take a look how we are rolling back a version of deployment. First, let’s see how many revisions we have.

$ kubectl rollout history deployment/sample-spring-kotlin-microservice
deployment.apps/sample-spring-kotlin-microservice
REVISION  CHANGE-CAUSE
1         
2         
3

Now, we are calling endpoint /properties of current deployment, that returns value of property property1. Since, profile rollbacktest is active it returns value from file application-rollbacktest.yml.

$ curl http://localhost:8080/properties
123456

Let’s roll back to the previous revision.


$ kubectl rollout undo deployment/sample-spring-kotlin-microservice --to-revision=2
deployment.apps/sample-spring-kotlin-microservice rolled back

As you see below the revision=2 is not visible, it is now deployed as the newest revision=4.

$ kubectl rollout history deployment/sample-spring-kotlin-microservice
deployment.apps/sample-spring-kotlin-microservice
REVISION  CHANGE-CAUSE
1         
3         
4

In this version of application profile rollbacktest wasn’t active, so value of property property1 is taken from application.yml.

 $ curl http://localhost:8080/properties
123

Piotr's TechBlog

Best Practices For Microservices on Kubernetes