Handle Traffic Bursts with Ephemeral OpenShift Clusters

This article will teach you how to handle temporary traffic bursts with ephemeral OpenShift clusters provisioned in the public cloud. Such a solution should work in a fully automated way. We must forward part of that traffic to another cluster once we deal with unexpected or sudden network traffic volume peaks. Such a cluster is called “ephemeral” since it works just for a specified period until the unexpected situation ends. Of course, we should be able to use ephemeral OpenShift as soon as possible after the event occurs. But on the other hand, we don’t want to pay for it if unnecessary.

In this article, I’ll show how you can achieve all the described things with the GitOps (Argo CD) approach and several tools around OpenShift/Kubernetes like Kyverno or Red Hat Service Interconnect (open-source Skupper project). We will also use Advanced Cluster Management for Kubernetes (ACM) to create and handle “ephemeral” OpenShift clusters. If you need an introduction to the GitOps approach in a multicluster OpenShift environment read the following article. It is also to familiarize with the idea behind multicluster communication through the Skupper project. In order to do that you can read the article about multicluster load balancing with Skupper on my blog.

Source Code

If you would like to try it by yourself, you can always take a look at my source code. In order to do that, you need to clone my GitHub repository. It contains several YAML manifests that allow us to manage OpenShift clusters in a GitOps way. For that exercise, we will use the manifests under the clusterpool directory. There are two subdirectories there: hub and managed. The manifests inside the hub directory should be applied to the management cluster, while the manifests inside the managed directory to the managed cluster. In our traffic bursts scenario, a single OpenShift acts as a hub and managed cluster, and it creates another managed (ephemeral) cluster.

Prerequisites

In order to start the exercise, we need a running Openshift that acts as a management cluster. It will create and configure the ephemeral cluster on AWS used to handle traffic volume peaks. In the first step, we need to install two operators on the management cluster: “Openshift GitOps” and “Advanced Cluster Management for Kubernetes”.

After that, we have to create the MultiClusterHub object, which runs and configures ACM:

kind: MultiClusterHub
apiVersion: operator.open-cluster-management.io/v1
metadata:
  name: multiclusterhub
  namespace: open-cluster-management
spec: {}

We also need to install Kyverno. Since there is no official operator for it, we have to leverage the Helm chart. Firstly, let’s add the following Helm repository:

$ helm repo add kyverno https://kyverno.github.io/kyverno/

Then, we can install the latest version of Kyverno in the kyverno namespace using the following command:

$ helm install my-kyverno kyverno/kyverno -n kyverno --create-namespace

By the way, Openshift Console provides built-in support for Helm. In order to use it, you need to switch to the Developer perspective. Then, click the Helm menu and choose the Create -> Repository option. Once you do it you will be able to create a new Helm release of Kyverno.

Using OpenShift Cluster Pool

With ACM we can create a pool of Openshift clusters. That pool contains running or hibernated clusters. While a running cluster is just ready to work, a hibernated cluster needs to be resumed by ACM. We are defining a pool size and the number of running clusters inside that pool. Once we create the ClusterPool object ACM starts to provision new clusters on AWS. In our case, the pool size is 1, but the number of running clusters is 0. The object declaration also contains all things required to create a new cluster like the installation template (the aws-install-config Secret) or AWS account credentials reference (the aws-aws-creds Secret). Each cluster within that pool is automatically assigned to the interconnect ManagedClusterSet. The cluster set approach allows us to group multiple OpenShift clusters.

apiVersion: hive.openshift.io/v1
kind: ClusterPool
metadata:
  name: aws
  namespace: aws
  labels:
    cloud: AWS
    cluster.open-cluster-management.io/clusterset: interconnect
    region: us-east-1
    vendor: OpenShift
spec:
  baseDomain: sandbox449.opentlc.com
  imageSetRef:
    name: img4.12.36-multi-appsub
  installConfigSecretTemplateRef:
    name: aws-install-config
  platform:
    aws:
      credentialsSecretRef:
        name: aws-aws-creds
      region: us-east-1
  pullSecretRef:
    name: aws-pull-secret
  size: 1

So, as a result, there is only one cluster in the pool. ACM keeps that cluster in the hibernated state. It means that all the VMs with master and worker nodes are stopped. In order to resume the hibernated cluster we need to create the ClusterClaim object that refers to the ClusterPool. It is similar to clicking the Claim cluster link visible below. However, we don’t want to create that object directly, but as a reaction to the Kubernetes event.

Before we proceed, let’s just take a look at a list of virtual machines on AWS related to our cluster. As you see they are not running.

Claim Cluster From the Pool on Scaling Event

Now, the question is – what kind of event should result in getting a cluster from the pool? A single app could rely on the scaling event. So once the number of deployment pods exceeds the assumed threshold we will resume a hibernated cluster and run the app there. With Kyverno we can react to such scaling events by creating the ClusterPolicy object. As you see our policy monitors the Deployment/scale resource. The assumed maximum allowed pod for our app on the main cluster is 4. We need to put such a value in the preconditions together with the Deployment name. Once all the conditions are met we may generate a new Kubernetes resource. That resource is the ClusterClaim which refers to the ClusterPool we created in the previous section. It will result in getting a hibernated cluster from the pool and resuming it.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: aws
spec:
  background: true
  generateExisting: true
  rules:
    - generate:
        apiVersion: hive.openshift.io/v1
        data:
          spec:
            clusterPoolName: aws
        kind: ClusterClaim
        name: aws
        namespace: aws
        synchronize: true
      match:
        any:
          - resources:
              kinds:
                - Deployment/scale
      preconditions:
        all:
          - key: '{{request.object.spec.replicas}}'
            operator: Equals
            value: 4
          - key: '{{request.object.metadata.name}}'
            operator: Equals
            value: sample-kotlin-spring
  validationFailureAction: Audit

Kyverno requires additional permission to create the ClusterClaim object. We can easily achieve this by creating a properly annotated ClusterRole:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: kyverno:create-claim
  labels:
    app.kubernetes.io/component: background-controller
    app.kubernetes.io/instance: kyverno
    app.kubernetes.io/part-of: kyverno
rules:
  - verbs:
      - create
      - patch
      - update
      - delete
    apiGroups:
      - hive.openshift.io
    resources:
      - clusterclaims

Once the cluster is ready we are going to assign it to the interconnect group represented by the ManagedClusterSet object. This group of clusters is managed by our instance of Argo CD from the openshift-gitops namespace. In order to achieve it we need to apply the following objects to the management OpenShift cluster:

apiVersion: cluster.open-cluster-management.io/v1beta2
kind: ManagedClusterSetBinding
metadata:
  name: interconnect
  namespace: openshift-gitops
spec:
  clusterSet: interconnect
---
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
  name: interconnect
  namespace: openshift-gitops
spec:
  predicates:
    - requiredClusterSelector:
        labelSelector:
          matchExpressions:
            - key: vendor
              operator: In
              values:
                - OpenShift
---
apiVersion: apps.open-cluster-management.io/v1beta1
kind: GitOpsCluster
metadata:
  name: argo-acm-importer
  namespace: openshift-gitops
spec:
  argoServer:
    argoNamespace: openshift-gitops
    cluster: openshift-gitops
  placementRef:
    apiVersion: cluster.open-cluster-management.io/v1beta1
    kind: Placement
    name: interconnect
    namespace: openshift-gitops

After applying the manifest visible above you should see that the openshift-gitops is managing the interconnect cluster group.

Automatically Sync Configuration for a New Cluster with Argo CD

In Argo CD we can define the ApplicationSet with the “Cluster Decision Resource Generator” (1). You can read more details about that type of generator here in the docs. It will create the Argo CD Application per each Openshift cluster in the interconnect group (2). Then, the newly created Argo CD Application will automatically apply manifests responsible for creating our sample Deployment. Of course, those manifests are available in the same repository inside the clusterpool/managed directory (3).

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: cluster-init
  namespace: openshift-gitops
spec:
  generators:
    - clusterDecisionResource: # (1)
        configMapRef: acm-placement
        labelSelector:
          matchLabels:
            cluster.open-cluster-management.io/placement: interconnect # (2)
        requeueAfterSeconds: 180
  template:
    metadata:
      name: 'cluster-init-{{name}}'
    spec:
      ignoreDifferences:
        - group: apps
          kind: Deployment
          jsonPointers:
            - /spec/replicas
      destination:
        server: '{{server}}'
        namespace: interconnect
      project: default
      source:
        path: clusterpool/managed # (3)
        repoURL: 'https://github.com/piomin/openshift-cluster-config.git'
        targetRevision: master
      syncPolicy:
        automated:
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

Here’s the YAML manifest that contains the Deployment object and the Openshift Route definition. Pay attention to the three skupper.io/* annotations. We will let Skupper generate the Kubernetes Service to load balance between all running pods of our app. Finally, it will allow us to load balance between the pods spread across two Openshift clusters.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/instance: sample-kotlin-spring
  annotations:
    skupper.io/address: sample-kotlin-spring
    skupper.io/port: '8080'
    skupper.io/proxy: http
  name: sample-kotlin-spring
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sample-kotlin-spring
  template:
    metadata:
      labels:
        app: sample-kotlin-spring
    spec:
      containers:
        - image: 'quay.io/pminkows/sample-kotlin-spring:1.4.39'
          name: sample-kotlin-spring
          ports:
            - containerPort: 8080
          resources:
            limits:
              cpu: 1000m
              memory: 1024Mi
            requests:
              cpu: 100m
              memory: 128Mi
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    app: sample-kotlin-spring
    app.kubernetes.io/component: sample-kotlin-spring
    app.kubernetes.io/instance: sample-spring-kotlin
  name: sample-kotlin-spring
spec:
  port:
    targetPort: port8080
  to:
    kind: Service
    name: sample-kotlin-spring
    weight: 100
  wildcardPolicy: None

Let’s check out how it works. I won’t simulate traffic bursts on OpenShift. However, you can easily imagine that our app is autoscaled with HPA (Horizontal Pod Autoscaler) and therefore is able to react to the traffic volume peak. I will just manually scale up the app to 4 pods:

Now, let’s switch to the All Clusters view. As you see Kyverno sent a cluster claim to the aws ClusterPool. The claim stays in the Pending status until the cluster won’t be resumed. In the meantime, ACM creates a new cluster to fill up the pool.

Once the cluster is ready you will see it in the Clusters view.

ACM automatically adds a cluster from the aws pool to the interconnect group (ManagedClusterSet). Therefore Argo CD is seeing a new cluster and adding it as a managed.

Finally, Argo CD generates the Application for a new cluster to automatically install all required Kubernetes objects.

Using Red Hat Service Interconnect

In order to enable Skupper for our apps we first need to install the Red Hat Service Interconnect operator. We can also do it in the GitOps way. We need to define the Subscription object as shown below (1). The operator has to be installed on both hub and managed clusters. Once we install the operator we need to enable Skupper in the particular namespace. In order to do that we need to define the ConfigMap there with the skupper-site name (2). Those manifests are also applied by the Argo CD Application described in the previous section.

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: skupper-operator
  namespace: openshift-operators
  annotations:
    argocd.argoproj.io/sync-wave: "2"
spec:
  channel: alpha
  installPlanApproval: Automatic
  name: skupper-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: skupper-site

Here’s the result of synchronization for the managed cluster.

We can switch to the OpenShift Console of the new cluster. The Red Hat Service Interconnect operator is ready.

Finally, we are at the final phase of our exercise. Both our clusters are running. We have already installed our sample app and the Skupper operator on both of them. Now, we need to link the apps running on different clusters into a single Skupper network. In order to do that, we need to let Skupper generate a connection token. Here’s the Secret object responsible for that. It doesn’t contain any data – just the skupper.io/type label with the connection-token-request value. Argo CD has already applied it to the management cluster in the interconnect namespace.

apiVersion: v1
kind: Secret
metadata:
  labels:
    skupper.io/type: connection-token-request
  name: token-req
  namespace: interconnect

As a result, Skupper fills the Secret object with certificates and a private key. It also overrides the value of the skupper.io/type label.

So, now our goal is to copy that Secret to the managed cluster. We won’t do that in the GitOps way directly, since the object was dynamically generated on OpenShift. However, we may use the SelectorSyncSet object provided by ACM. It can copy the secrets between the hub and managed clusters.

apiVersion: hive.openshift.io/v1
kind: SelectorSyncSet
metadata:
  name: skupper-token-sync
spec:
  clusterDeploymentSelector:
    matchLabels:
      cluster.open-cluster-management.io/clusterset: interconnect
  secretMappings:
    - sourceRef:
        name: token-req
        namespace: interconnect
      targetRef:
        name: token-req
        namespace: interconnect

Once the token is copied into the managed cluster, it should connect to the Skupper network existing on the main cluster. We can verify that everything works fine with the skupper CLI command. The following command prints all the pods from the Skupper network. As you see, we have 4 pods on the main (local) cluster and 2 pods on the managed (linked) cluster.

Let’s display the route of our service:

$ oc get route sample-kotlin-spring

Now, we can make a final test. Here’s the siege request for my route and cluster domain. It will send 10k requests via the Route. After running it, you can verify the logs to see if the traffic comes to all six pods spread across our two clusters.

$ siege -r 1000 -c 10  http://sample-kotlin-spring-interconnect.apps.jaipxwuhcp.eastus.aroapp.io/persons

Final Thoughts

Handling traffic bursts is one of the more interesting scenarios for a hybrid-cloud environment with OpenShift. With the approach described in that article, we can dynamically provision clusters and redirect traffic from on-prem to the cloud. We can do it in a fully automated, GitOps-based way. The features and tools around OpenShift allow us to cut down the cloud costs and speed up cluster startup. Therefore it reduces system downtime in case of any failures or unexpected situations.

Piotr's TechBlog

Handle Traffic Bursts with Ephemeral OpenShift Clusters