Handle Traffic Bursts with Ephemeral OpenShift Clusters
This article will teach you how to handle temporary traffic bursts with ephemeral OpenShift clusters provisioned in the public cloud. Such a solution should work in a fully automated way. We must forward part of that traffic to another cluster once we deal with unexpected or sudden network traffic volume peaks. Such a cluster is called “ephemeral” since it works just for a specified period until the unexpected situation ends. Of course, we should be able to use ephemeral OpenShift as soon as possible after the event occurs. But on the other hand, we don’t want to pay for it if unnecessary.
In this article, I’ll show how you can achieve all the described things with the GitOps (Argo CD) approach and several tools around OpenShift/Kubernetes like Kyverno or Red Hat Service Interconnect (open-source Skupper project). We will also use Advanced Cluster Management for Kubernetes (ACM) to create and handle “ephemeral” OpenShift clusters. If you need an introduction to the GitOps approach in a multicluster OpenShift environment read the following article. It is also to familiarize with the idea behind multicluster communication through the Skupper project. In order to do that you can read the article about multicluster load balancing with Skupper on my blog.
Source Code
If you would like to try it by yourself, you can always take a look at my source code. In order to do that, you need to clone my GitHub repository. It contains several YAML manifests that allow us to manage OpenShift clusters in a GitOps way. For that exercise, we will use the manifests under the clusterpool
directory. There are two subdirectories there: hub
and managed
. The manifests inside the hub
directory should be applied to the management cluster, while the manifests inside the managed
directory to the managed cluster. In our traffic bursts scenario, a single OpenShift acts as a hub and managed cluster, and it creates another managed (ephemeral) cluster.
Prerequisites
In order to start the exercise, we need a running Openshift that acts as a management cluster. It will create and configure the ephemeral cluster on AWS used to handle traffic volume peaks. In the first step, we need to install two operators on the management cluster: “Openshift GitOps” and “Advanced Cluster Management for Kubernetes”.
After that, we have to create the MultiClusterHub
object, which runs and configures ACM:
kind: MultiClusterHub
apiVersion: operator.open-cluster-management.io/v1
metadata:
name: multiclusterhub
namespace: open-cluster-management
spec: {}
We also need to install Kyverno. Since there is no official operator for it, we have to leverage the Helm chart. Firstly, let’s add the following Helm repository:
$ helm repo add kyverno https://kyverno.github.io/kyverno/
Then, we can install the latest version of Kyverno in the kyverno
namespace using the following command:
$ helm install my-kyverno kyverno/kyverno -n kyverno --create-namespace
By the way, Openshift Console provides built-in support for Helm. In order to use it, you need to switch to the Developer perspective. Then, click the Helm menu and choose the Create -> Repository option. Once you do it you will be able to create a new Helm release of Kyverno.
Using OpenShift Cluster Pool
With ACM we can create a pool of Openshift clusters. That pool contains running or hibernated clusters. While a running cluster is just ready to work, a hibernated cluster needs to be resumed by ACM. We are defining a pool size and the number of running clusters inside that pool. Once we create the ClusterPool
object ACM starts to provision new clusters on AWS. In our case, the pool size is 1
, but the number of running clusters is 0
. The object declaration also contains all things required to create a new cluster like the installation template (the aws-install-config
Secret
) or AWS account credentials reference (the aws-aws-creds
Secret
). Each cluster within that pool is automatically assigned to the interconnect
ManagedClusterSet
. The cluster set approach allows us to group multiple OpenShift clusters.
apiVersion: hive.openshift.io/v1
kind: ClusterPool
metadata:
name: aws
namespace: aws
labels:
cloud: AWS
cluster.open-cluster-management.io/clusterset: interconnect
region: us-east-1
vendor: OpenShift
spec:
baseDomain: sandbox449.opentlc.com
imageSetRef:
name: img4.12.36-multi-appsub
installConfigSecretTemplateRef:
name: aws-install-config
platform:
aws:
credentialsSecretRef:
name: aws-aws-creds
region: us-east-1
pullSecretRef:
name: aws-pull-secret
size: 1
So, as a result, there is only one cluster in the pool. ACM keeps that cluster in the hibernated state. It means that all the VMs with master and worker nodes are stopped. In order to resume the hibernated cluster we need to create the ClusterClaim
object that refers to the ClusterPool
. It is similar to clicking the Claim cluster link visible below. However, we don’t want to create that object directly, but as a reaction to the Kubernetes event.
Before we proceed, let’s just take a look at a list of virtual machines on AWS related to our cluster. As you see they are not running.
Claim Cluster From the Pool on Scaling Event
Now, the question is – what kind of event should result in getting a cluster from the pool? A single app could rely on the scaling event. So once the number of deployment pods exceeds the assumed threshold we will resume a hibernated cluster and run the app there. With Kyverno we can react to such scaling events by creating the ClusterPolicy
object. As you see our policy monitors the Deployment/scale
resource. The assumed maximum allowed pod for our app on the main cluster is 4
. We need to put such a value in the preconditions together with the Deployment
name. Once all the conditions are met we may generate a new Kubernetes resource. That resource is the ClusterClaim
which refers to the ClusterPool
we created in the previous section. It will result in getting a hibernated cluster from the pool and resuming it.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: aws
spec:
background: true
generateExisting: true
rules:
- generate:
apiVersion: hive.openshift.io/v1
data:
spec:
clusterPoolName: aws
kind: ClusterClaim
name: aws
namespace: aws
synchronize: true
match:
any:
- resources:
kinds:
- Deployment/scale
preconditions:
all:
- key: '{{request.object.spec.replicas}}'
operator: Equals
value: 4
- key: '{{request.object.metadata.name}}'
operator: Equals
value: sample-kotlin-spring
validationFailureAction: Audit
Kyverno requires additional permission to create the ClusterClaim
object. We can easily achieve this by creating a properly annotated ClusterRole
:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: kyverno:create-claim
labels:
app.kubernetes.io/component: background-controller
app.kubernetes.io/instance: kyverno
app.kubernetes.io/part-of: kyverno
rules:
- verbs:
- create
- patch
- update
- delete
apiGroups:
- hive.openshift.io
resources:
- clusterclaims
Once the cluster is ready we are going to assign it to the interconnect
group represented by the ManagedClusterSet
object. This group of clusters is managed by our instance of Argo CD from the openshift-gitops
namespace. In order to achieve it we need to apply the following objects to the management OpenShift cluster:
apiVersion: cluster.open-cluster-management.io/v1beta2
kind: ManagedClusterSetBinding
metadata:
name: interconnect
namespace: openshift-gitops
spec:
clusterSet: interconnect
---
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: interconnect
namespace: openshift-gitops
spec:
predicates:
- requiredClusterSelector:
labelSelector:
matchExpressions:
- key: vendor
operator: In
values:
- OpenShift
---
apiVersion: apps.open-cluster-management.io/v1beta1
kind: GitOpsCluster
metadata:
name: argo-acm-importer
namespace: openshift-gitops
spec:
argoServer:
argoNamespace: openshift-gitops
cluster: openshift-gitops
placementRef:
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
name: interconnect
namespace: openshift-gitops
After applying the manifest visible above you should see that the openshift-gitops
is managing the interconnect
cluster group.
Automatically Sync Configuration for a New Cluster with Argo CD
In Argo CD we can define the ApplicationSet
with the “Cluster Decision Resource Generator” (1). You can read more details about that type of generator here in the docs. It will create the Argo CD Application per each Openshift cluster in the interconnect
group (2). Then, the newly created Argo CD Application will automatically apply manifests responsible for creating our sample Deployment
. Of course, those manifests are available in the same repository inside the clusterpool/managed
directory (3).
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: cluster-init
namespace: openshift-gitops
spec:
generators:
- clusterDecisionResource: # (1)
configMapRef: acm-placement
labelSelector:
matchLabels:
cluster.open-cluster-management.io/placement: interconnect # (2)
requeueAfterSeconds: 180
template:
metadata:
name: 'cluster-init-{{name}}'
spec:
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas
destination:
server: '{{server}}'
namespace: interconnect
project: default
source:
path: clusterpool/managed # (3)
repoURL: 'https://github.com/piomin/openshift-cluster-config.git'
targetRevision: master
syncPolicy:
automated:
selfHeal: true
syncOptions:
- CreateNamespace=true
Here’s the YAML manifest that contains the Deployment
object and the Openshift Route
definition. Pay attention to the three skupper.io/*
annotations. We will let Skupper generate the Kubernetes Service
to load balance between all running pods of our app. Finally, it will allow us to load balance between the pods spread across two Openshift clusters.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/instance: sample-kotlin-spring
annotations:
skupper.io/address: sample-kotlin-spring
skupper.io/port: '8080'
skupper.io/proxy: http
name: sample-kotlin-spring
spec:
replicas: 2
selector:
matchLabels:
app: sample-kotlin-spring
template:
metadata:
labels:
app: sample-kotlin-spring
spec:
containers:
- image: 'quay.io/pminkows/sample-kotlin-spring:1.4.39'
name: sample-kotlin-spring
ports:
- containerPort: 8080
resources:
limits:
cpu: 1000m
memory: 1024Mi
requests:
cpu: 100m
memory: 128Mi
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
labels:
app: sample-kotlin-spring
app.kubernetes.io/component: sample-kotlin-spring
app.kubernetes.io/instance: sample-spring-kotlin
name: sample-kotlin-spring
spec:
port:
targetPort: port8080
to:
kind: Service
name: sample-kotlin-spring
weight: 100
wildcardPolicy: None
Let’s check out how it works. I won’t simulate traffic bursts on OpenShift. However, you can easily imagine that our app is autoscaled with HPA (Horizontal Pod Autoscaler) and therefore is able to react to the traffic volume peak. I will just manually scale up the app to 4 pods:
Now, let’s switch to the All Clusters view. As you see Kyverno sent a cluster claim to the aws
ClusterPool
. The claim stays in the Pending
status until the cluster won’t be resumed. In the meantime, ACM creates a new cluster to fill up the pool.
Once the cluster is ready you will see it in the Clusters view.
ACM automatically adds a cluster from the aws
pool to the interconnect
group (ManagedClusterSet
). Therefore Argo CD is seeing a new cluster and adding it as a managed.
Finally, Argo CD generates the Application
for a new cluster to automatically install all required Kubernetes objects.
Using Red Hat Service Interconnect
In order to enable Skupper for our apps we first need to install the Red Hat Service Interconnect operator. We can also do it in the GitOps way. We need to define the Subscription
object as shown below (1). The operator has to be installed on both hub and managed clusters. Once we install the operator we need to enable Skupper in the particular namespace. In order to do that we need to define the ConfigMap
there with the skupper-site
name (2). Those manifests are also applied by the Argo CD Application described in the previous section.
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: skupper-operator
namespace: openshift-operators
annotations:
argocd.argoproj.io/sync-wave: "2"
spec:
channel: alpha
installPlanApproval: Automatic
name: skupper-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
---
kind: ConfigMap
apiVersion: v1
metadata:
name: skupper-site
Here’s the result of synchronization for the managed cluster.
We can switch to the OpenShift Console of the new cluster. The Red Hat Service Interconnect operator is ready.
Finally, we are at the final phase of our exercise. Both our clusters are running. We have already installed our sample app and the Skupper operator on both of them. Now, we need to link the apps running on different clusters into a single Skupper network. In order to do that, we need to let Skupper generate a connection token. Here’s the Secret
object responsible for that. It doesn’t contain any data – just the skupper.io/type
label with the connection-token-request
value. Argo CD has already applied it to the management cluster in the interconnect
namespace.
apiVersion: v1
kind: Secret
metadata:
labels:
skupper.io/type: connection-token-request
name: token-req
namespace: interconnect
As a result, Skupper fills the Secret object with certificates and a private key. It also overrides the value of the skupper.io/type
label.
So, now our goal is to copy that Secret
to the managed cluster. We won’t do that in the GitOps way directly, since the object was dynamically generated on OpenShift. However, we may use the SelectorSyncSet
object provided by ACM. It can copy the secrets between the hub and managed clusters.
apiVersion: hive.openshift.io/v1
kind: SelectorSyncSet
metadata:
name: skupper-token-sync
spec:
clusterDeploymentSelector:
matchLabels:
cluster.open-cluster-management.io/clusterset: interconnect
secretMappings:
- sourceRef:
name: token-req
namespace: interconnect
targetRef:
name: token-req
namespace: interconnect
Once the token is copied into the managed cluster, it should connect to the Skupper network existing on the main cluster. We can verify that everything works fine with the skupper
CLI command. The following command prints all the pods from the Skupper network. As you see, we have 4 pods on the main (local) cluster and 2 pods on the managed (linked) cluster.
Let’s display the route of our service:
$ oc get route sample-kotlin-spring
Now, we can make a final test. Here’s the siege
request for my route and cluster domain. It will send 10k requests via the Route
. After running it, you can verify the logs to see if the traffic comes to all six pods spread across our two clusters.
$ siege -r 1000 -c 10 http://sample-kotlin-spring-interconnect.apps.jaipxwuhcp.eastus.aroapp.io/persons
Final Thoughts
Handling traffic bursts is one of the more interesting scenarios for a hybrid-cloud environment with OpenShift. With the approach described in that article, we can dynamically provision clusters and redirect traffic from on-prem to the cloud. We can do it in a fully automated, GitOps-based way. The features and tools around OpenShift allow us to cut down the cloud costs and speed up cluster startup. Therefore it reduces system downtime in case of any failures or unexpected situations.
2 COMMENTS