Manage Kubernetes Operators with ArgoCD

Manage Kubernetes Operators with ArgoCD

In this article, you will learn how to install and configure operators on Kubernetes with ArgoCD automatically. A Kubernetes operator is a method of packaging, deploying, and managing applications on Kubernetes. It has its own lifecycle managed by the OLM. It also uses custom resources (CR) to manage applications and their components. The Kubernetes operator watches a CR object and takes actions to ensure the current state matches the desired state of that resource. Assuming we want to manage our Kubernetes cluster in the GitOps way, we want to keep the list of operators, their configuration, and CR objects definitions in the Git repository. Here comes Argo CD.

In this article, I’m describing several more advanced Argo CD features. If you looking for the basics you can find a lot of other articles about Argo CD on my blog. For example, you may about Kubernetes CI/CD with Tekton and ArgoCD in the following article.

Introduction

The main goal of this exercise is to run the scenario, in which we can automatically install and use operators on Kubernetes in the GitOps way. Therefore, the state of the Git repository should be automatically applied to the target Kubernetes cluster. We will define a single Argo CD Application that performs all the required steps. In the first step, it will trigger the operator installation process. It may take some time since we need to install the controller application and Kubernetes CRDs. Then we may define some CR objects to run our apps on the cluster.

We cannot create a CR object before installing an operator. Fortunately, with ArgoCD we can divide the sync process into multiple separate phases. This ArgoCD feature is called sync waves. In order to proceed to the next phase, ArgoCD first needs to finish the previous sync wave. ArgoCD checks the health checks of all objects created during the particular phase. If all of those checks reply with success the phase is considered to be finished. Argo CD provides some built-in health check implementations for several standard Kubernetes types. However, in this exercise, we will have to override the health check for the main operator CR – the Subscription object.

Source Code

If you would like to try it by yourself, you may always take a look at my source code. In order to do that you need to clone my GitHub repository. After that go to the global directory. Then you should just follow my instructions. Let’s begin.

Prerequisites

Before starting the exercise you need to have a running Kubernetes cluster with ArgoCD and Operator Lifecycle Manager (OLM) installed. You can install Argo CD using Helm chart or with the operator. In order to read about the installation details please refer to the Argo CD docs.

Install Operators with Argo CD

In the first step, we will define templates responsible for operators’ installation. If you have OLM installed on the Kubernetes cluster that process comes to the creation of the Subscription object (1). In some cases, we have to create the OperatorGroup object. It provides multitenant configuration to OLM-installed Operators. An Operator group selects target namespaces in which to generate required RBAC access for its members. Before installing in a different namespace than openshift-operators, we have to create the OperatorGroup in that namespace (2). We use the argocd.argoproj.io/sync-wave annotation to configure sync phases (3). The lower value of that parameter is – the highest priority for the object (before OperatorGroup we need to create the namespace).

{{- range .Values.subscriptions }}
apiVersion: operators.coreos.com/v1alpha1 # (1)
kind: Subscription
metadata:
  name: {{ .name }}
  namespace: {{ .namespace }}
  annotations:
    argocd.argoproj.io/sync-wave: "2" # (3)
spec:
  channel: {{ .channel }}
  installPlanApproval: Automatic
  name: {{ .name }}
  source: {{ .source }}
  sourceNamespace: openshift-marketplace
---
{{- if ne .namespace "openshift-operators" }}
apiVersion: v1
kind: Namespace
metadata:
  name: {{ .namespace }}
  annotations:
    argocd.argoproj.io/sync-wave: "1" # (3)
---
apiVersion: operators.coreos.com/v1alpha2 # (2)
kind: OperatorGroup
metadata:
  name: {{ .name }}
  namespace: {{ .namespace }}
  annotations:
    argocd.argoproj.io/sync-wave: "2" # (3)
spec: {}
---
{{- end }}
{{- end }}

I’m using Helm for templating the YAML manifests. Thanks to that we can use it to apply several Subscription and OperatorGroup objects. Our Helm templates iterate over the subscriptions list. In order to define a list of operators we just need to provide a similar configuration in the values.yaml file visible below. There are operators installed with that example: Kiali, Service Mesh (Istio), AMQ Streams (Strimzi Kafka), Patch Operator, and Serverless (Knative).

subscriptions:
  - name: kiali-ossm
    namespace: openshift-operators
    channel: stable
    source: redhat-operators
  - name: servicemeshoperator
    namespace: openshift-operators
    channel: stable
    source: redhat-operators
  - name: amq-streams
    namespace: openshift-operators
    channel: stable
    source: redhat-operators
  - name: patch-operator
    namespace: patch-operator
    channel: alpha
    source: community-operators
  - name: serverless-operator
    namespace: openshift-serverless
    channel: stable
    source: redhat-operators

Override Argo CD Health Check

As I mentioned before, we need to override the default Argo CD health check for the Subscription CR. Normally, Argo CD just creates the Subscription objects and doesn’t wait until the operator is installed on the cluster. In order to do that, we need to verify the value of the status.state field. If it equals the AtLatestKnown value, it means that the operator has been successfully installed. In that case, we can set the value of the Argo CD health check to Healthy. We can also override the default health check description to display the current version of the operator (the status.currentCSV field). If you installed Argo CD using Helm chart you can provide your health check implementation directly in the argocd-cm ConfigMap.

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-cm
    app.kubernetes.io/part-of: argocd
data:
  resource.customizations: |
    operators.coreos.com/Subscription:
      health.lua: |
        hs = {}
        hs.status = "Progressing"
        hs.message = ""
        if obj.status ~= nil then
          if obj.status.state ~= nil then
            if obj.status.state == "AtLatestKnown" then
              hs.message = obj.status.state .. " - " .. obj.status.currentCSV
              hs.status = "Healthy"
            end
          end
        end
        return hs

For those of you, who installed Argo CD using the operator (including me) there is another way to override the health check. We need to provide it inside the extraConfig field in the ArgoCD CR.

apiVersion: argoproj.io/v1alpha1
kind: ArgoCD
metadata:
  name: openshift-gitops
  namespace: openshift-gitops
spec:
  ...
  extraConfig:
    resource.customizations: |
      operators.coreos.com/Subscription:
        health.lua: |
          hs = {}
          hs.status = "Progressing"
          hs.message = ""
          if obj.status ~= nil then
            if obj.status.state ~= nil then
              if obj.status.state == "AtLatestKnown" then
                hs.message = obj.status.state .. " - " .. obj.status.currentCSV
                hs.status = "Healthy"
              end
            end
          end
          return hs

After the currently described steps, we achieved two things. We divided our sync process into multiple phases with the Argo CD waves feature. We also forced Argo CD to wait before going to the next phase until the operator installation process is finished. Let’s proceed to the next step – defining CRDs.

Create Custom Resources with Argo CD

In the previous steps, we successfully installed Kubernetes operators with ArgoCD. Now, it is time to use them. We will do everything in a single synchronization process. In the previous phase (wave=2), we installed the Kafka operator (Strimzi). In this phase, we will run the Kafka cluster using CRD provided by the Strimzi project. To be sure that we apply it after the Strimzi operator installation, we will do it in the third phase (1). That’s not all. Since our CRD has been created by the operator, it is not part of the sync process. By default, Argo CD tries to find the CRD in the sync and will fail with the error the server could not find the requested resource. To avoid it we will skip the dry run for missing resource types (2) during sync.

apiVersion: v1
kind: Namespace
metadata:
  name: kafka
  annotations:
    argocd.argoproj.io/sync-wave: "1"
---
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
  namespace: kafka
  annotations:
    argocd.argoproj.io/sync-wave: "3" # (1)
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true # (2)
spec:
  kafka:
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      inter.broker.protocol.version: '3.2'
    storage:
      type: persistent-claim
      size: 5Gi
      deleteClaim: true
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    version: 3.2.3
    replicas: 3
  entityOperator:
    topicOperator: {}
    userOperator: {}
  zookeeper:
    storage:
      type: persistent-claim
      deleteClaim: true
      size: 2Gi
    replicas: 3

We can also install Knative Serving on our cluster since we previously installed the Knative operator. The same as before we are setting the wave=3 and skipping the dry run on missing resources during the sync.

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
  annotations:
    argocd.argoproj.io/sync-wave: "3"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec: {}

Finally, let’s create the Argo CD Application that manages all the defined manifests and automatically applies them to the Kubernetes cluster. We need to define the source Git repository and the directory containing our YAMLs (global).

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cluster-config
spec:
  destination:
    server: 'https://kubernetes.default.svc'
  project: default
  source:
    path: global
    repoURL: 'https://github.com/piomin/openshift-cluster-config.git'
    targetRevision: HEAD
    helm:
      valueFiles:
        - values.yaml
  syncPolicy:
    automated:
      selfHeal: true

Helm Unit Testing

Just to ensure that we defined all the Helm templates properly we can include some unit tests. We can use helm-unittest for that. We will place the test sources inside the global/tests directory. Here’s our test defined in the subscription_tests.yaml file:

suite: test main
values:
  - ./values/test.yaml
templates:
  - templates/subscriptions.yaml
chart:
  version: 1.0.0+test
  appVersion: 1.0
tests:
  - it: subscription default ns
    template: templates/subscriptions.yaml
    documentIndex: 0
    asserts:
      - equal:
          path: metadata.namespace
          value: openshift-operators
      - equal:
          path: metadata.name
          value: test1
      - equal:
          path: spec.channel
          value: ch1
      - equal:
          path: spec.source
          value: src1
      - isKind:
          of: Subscription
      - isAPIVersion:
          of: operators.coreos.com/v1alpha1
  - it: subscription custom ns
    template: templates/subscriptions.yaml
    documentIndex: 1
    asserts:
      - equal:
          path: metadata.namespace
          value: custom-ns
      - equal:
          path: metadata.name
          value: test2
      - equal:
          path: spec.channel
          value: ch2
      - equal:
          path: spec.source
          value: src2
      - isKind:
          of: Subscription
      - isAPIVersion:
          of: operators.coreos.com/v1alpha1
  - it: custom ns
    template: templates/subscriptions.yaml
    documentIndex: 2
    asserts:
      - equal:
          path: metadata.name
          value: custom-ns
      - isKind:
          of: Namespace
      - isAPIVersion:
          of: v1

We need to define test values:

subscriptions:
  - name: test1
    namespace: openshift-operators
    channel: ch1
    source: src1
  - name: test2
    namespace: custom-ns
    channel: ch2
    source: src2

We can prepare a build process for our repository. Here’s a sample Circle CI configuration for that. If you are interested in more details about Helm unit testing and releasing please refer to my article.

version: 2.1

orbs:
  helm: circleci/helm@2.0.1

jobs:
  build:
    docker:
      - image: cimg/base:2023.04
    steps:
      - checkout
      - helm/install-helm-client
      - run:
          name: Install Helm unit-test
          command: helm plugin install https://github.com/helm-unittest/helm-unittest
      - run:
          name: Run unit tests
          command: helm unittest global

workflows:
  helm_test:
    jobs:
      - build

Synchronize Configuration with Argo CD

Once we create a new Argo CD Application responsible for synchronization our process is starting. In the first step, Argo CD creates the required namespaces. Then, it proceeds to the operators’ installation phase. It may take some time.

Once ArgoCD installs all the Kubernetes operators you can verify their health checks. Here’s the value of a health check during the installation phase.

kubernetes-operators-argocd-healthcheck

Here’s the result after successful installation.

Now, Argo CD is proceeding to the CRDs creation phase. It runs the Kafka cluster and enables Knative. Let’s switch to the Openshift cluster console. We can display a list of installed operators:

kubernetes-operators-argocd-operators

We can also verify if the Kafka cluster is running in the kafka namespace:

Final Thoughts

With Argo CD we can configure the whole Kubernetes cluster configuration. It supports Helm charts, but there is another way for installing apps on Kubernetes – operators. I focused on the features and approach that allow us to install and manage operators in the GitOps way. I showed a practical example of how to use sync waves and apply CRDs not managed directly by Argo CD. With all mechanisms, we can easily handle Kubernetes operators with ArgoCD.

14 COMMENTS

comments user
Krzysztof Pawliczuk

Great post!!
Have You description in pl?

    comments user
    piotr.minkowski

    Thanks! No sorry I don’t have version in polish

comments user
Shadow3

Dude this is a great document!!! Well done!

    comments user
    piotr.minkowski

    Thanks!

comments user
James Hewitt

Thanks, clear and well written!

At what point would you suggest spoiling the config up into multiple applications? If you had two applications that need the same operator, how would you handle it?

    comments user
    piotr.minkowski

    Hi,
    Can you elaborate on what you mean? I’m not sure I understand your question…
    It doesn’t matter how apps use the operator. Let’s assume we have a Kafka operator and then the Kafka cluster created with that operator – it may be used but many apps.

comments user
Yagna narayana prasadpuranam

thanks for the research. Its great learning. Tons of thanks for the passion towards technology. Please keep continue the great work sir. 🙂

    comments user
    piotr.minkowski

    Thanks!

comments user
Thomas Jungbauer

Hi,

That you for your article. I am experiencing a small issue:

Have you faced the issue that the operator is reporting “ok” but the crd is not yet there? I have seen this for example, while installing Stackrox (ACS). The Operator already says “ok”, but is actually still installing. The next wave, to deploy a crd, is then failing, due to a race condition.

I ended up with either automatic-retries or a Job that verifies the status of an operator before argocd proceeds.

    comments user
    piotr.minkowski

    Hi,
    No. However, I didn’t install ACS operator with that approach. The operator says ok – means that the status is “AtLatestKnown” ?

comments user
Thomas Jungbauer

Yes the operator says “AtLatestKnown” but takes a bit longer to be actually ready. In the meantime argo is failing. If I recall correctly I saw this with a different Operator as well. Just tested ACS this week.

comments user
Thomas Jungbauer

PS: the “reply” link does not seem to work 🙂

comments user
John

Hello Piotr,

This is a great artical, thank you very much. Did you know about – github.com/redhat-cop/gitops-catalog/installplan-approver/ ? Can it be affective?

    comments user
    piotr.minkowski

    Hi,
    I’m seeing it for the first time. But I believe that it is for manual approvals?

Leave a Reply