Running Tekton Pipelines on Kubernetes at Scale

In this article, you will learn how to configure and run CI pipelines on Kubernetes at scale with Tekton. Tekton is a Kubernetes-native solution for building CI/CD pipelines. It provides a set of Kubernetes Custom Resources (CRD) that allows us to define the building blocks and reuse them for our pipelines. You can find several articles about Tekton on my blog. If you don’t have previous experience with that tool you can read my introduction to CI/CD with Tekton and Argo CD to understand basic concepts.

Today, we will consider performance issues related to running Tekton pipelines at scale. We will run several different pipelines at the same time or the same pipeline several times simultaneously. It results in maintaining a long history of previous runs. In order to handle it successfully, Tekton provides a special module configured with the TektonResults CRD. It can also clean up of the selected resources using the Kubernetes CronJob.

Source Code

This time we won’t work much with a source code. However, if you would like to try it by yourself, you may always take a look at my source code. In order to do that you need to clone my GitHub repository. After that, you should follow my further instructions.

Install Tekton on Kubernetes

We can easily install Tekton on Kubernetes using the operator. We need to apply the following YAML manifest:

$ kubectl apply -f https://storage.googleapis.com/tekton-releases/operator/latest/release.yaml

After that, we can choose between some installation profiles: lite , all, basic. Let’s choose the all profile:

$ kubectl apply -f https://raw.githubusercontent.com/tektoncd/operator/main/config/crs/kubernetes/config/all/operator_v1alpha1_config_cr.yaml

On OpenShift, we can do it using the web UI. OpenShift Console provides the Operator Hub section, where we can find the “Red Hat OpenShift Pipelines” operator. This operator installs Tekton and integrates it with OpenShift. Once you install it, you can e.g. create, manage, and run pipelines in OpenShift Console.

OpenShift Console offers a dedicated section in the menu for Tekton pipelines as shown below.

We can also install the tkn CLI on the local machine to interact with Tekton Pipelines running on the Kubernetes cluster. For example, on macOS, we can do it using Homebrew:

$ brew install tektoncd-cli

How It Works

Create a Tekton Pipeline

Firstly, let’s discuss some basic concepts around Tekton. We can run the same pipeline several times simultaneously. We can trigger that process by creating the PipelineRun object directly, or indirectly e.g. via the tkn CLI command or graphical dashboard. However, each time the PipelineRun object must be created somehow. The Tekton pipeline consists of one or more tasks. Each task is executed by the separated pod. In order to share the data between those pods, we need to use a persistent volume. An example of such data is the app source code cloned from the git repository. We need to attach such a PVC (Persistent Volume Claim) as the pipeline workspace in the PipelineRun definition. The following diagram illustrates that scenario.

Let’s switch to the code. Here’s the YAML manifest with our sample pipeline. The pipeline consists of three tasks. It refers to the tasks from Tekton Hub: git-clone, s2i-java and openshift-client. With these three simple tasks we clone the git repository with the app source code, build the image using the source-to-image approach, and deploy it on the OpenShift cluster. As you see, the pipeline defines a workspace with the source-dir name. Both git-clone and s2i-java share the same workspace. It tags the image with a branch name. The name of the branch is set as the pipeline input parameter.

apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
  name: sample-pipeline
spec:
  params:
    - description: Git branch name
      name: branch
      type: string
    - description: Target namespace
      name: namespace
      type: string
  tasks:
    - name: git-clone
      params:
        - name: url
          value: 'https://github.com/piomin/sample-spring-kotlin-microservice.git'
        - name: revision
          value: $(params.branch)
      taskRef:
        kind: ClusterTask
        name: git-clone
      workspaces:
        - name: output
          workspace: source-dir
    - name: s2i-java
      params:
        - name: IMAGE
          value: image-registry.openshift-image-registry.svc:5000/$(params.namespace)/sample-spring-kotlin-microservice:$(params.branch)
      runAfter:
        - git-clone
      taskRef:
        kind: ClusterTask
        name: s2i-java
      workspaces:
        - name: source
          workspace: source-dir
    - name: openshift-client
      params:
        - name: SCRIPT
          value: oc process -f openshift/app.yaml -p namespace=$(params.namespace) -p version=$(params.branch) | oc apply -f -
      runAfter:
        - s2i-java
      taskRef:
        kind: ClusterTask
        name: openshift-client
      workspaces:
        - name: manifest-dir
          workspace: source-dir
  workspaces:
    - name: source-dir

Run a Pipeline Several Times Simultaneously

Now, let’s consider the scenario where we run the pipeline several times with the code from different Git branches. Here’s the updated diagram illustrating it. As you see, we need to attach a dedicated volume to the pipeline run. We store there a code related to each of the source branches.

In order to start the pipeline, we can apply the PipelineRun object. The PipelineRun definition must satisfy the previous requirement for a dedicated volume per run. Therefore we need to define the volumeClaimTemplate, which automatically creates the volume and bounds it to the pods within the pipeline. Here’s a sample PipelineRun object for the master branch:

apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
  generateName: sample-pipeline-
  labels:
    tekton.dev/pipeline: sample-pipeline
spec:
  params:
    - name: branch
      value: master
    - name: namespace
      value: app-master
  pipelineRef:
    name: sample-pipeline
  taskRunTemplate:
    serviceAccountName: pipeline
  workspaces:
    - name: source-dir
      volumeClaimTemplate:
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi
          volumeMode: Filesystem

With this bash script, we can run our pipeline for every single branch existing in the source repository prefixed by the feature word. It uses the tkn CLI to interact with Tekton

#! /bin/bash

for OUTPUT in $(git branch -r)
do
  branch=$(echo $OUTPUT | sed -e "s/^origin\///")
  if [[ $branch == feature* ]]
  then
    echo "Running the pipeline: branch="$branch
    tkn pipeline start sample-pipeline -p branch=$branch -p namespace=app-$branch -w name=source-dir,volumeClaimTemplateFile=pvc.yaml
  fi
done

The script is available in the sample GitHub repository under the openshift directory. If you want to reproduce my action you need to clone the repository and then execute the run.sh script on your OpenShift cluster.

$ git clone https://github.com/piomin/sample-spring-kotlin-microservice.git
$ cd sample-spring-kotlin-microservice/openshift
$ ./run.sh

The PipelineRun object is responsible not only for starting a pipeline. We can also use it to see the history of runs with detailed logs generated by each task. However, there is also the other side of the coin. The more times we run the pipeline, the more objects we store on the Kubernetes cluster.

Tekton creates a dedicated PVC per each PipelineRun. Such a PVC exists on Kubernetes until we don’t delete the parent PipelineRun.

Pruning Old Pipeline Runs

I just ran the sample-pipeline six times using different feature-* branches. However, you can imagine that there are many more previous runs. It results in many existing PipelineRun and PersistenceVolumeClaim objects on Kubernetes. Fortunately, Tekton provides an automatic mechanism for removing objects from the previous runs. It installs the global CronJob responsible for pruning the PipelineRun objects. We can override the default CronJob configuration in the TektonConfig CRD. I’ll change the CronJob frequency execution from one day to 10 minutes for testing purposes.

apiVersion: operator.tekton.dev/v1alpha1
kind: TektonConfig
metadata:
  name: config
spec:
  # other properties ...
  pruner:
    disabled: false
    keep: 100
    resources:
      - pipelinerun
    schedule: '*/10 * * * *'

We can customize the behavior of the Tekton pruner per each namespace. Thanks to that, it is possible to set the different configurations e.g. for the “production” and “development” pipelines. In order to do that, we need to annotate the namespace with some Tekton parameters. For example, instead of keeping the specific number of previous pipeline runs, we can set the time criterion. The operator.tekton.dev/prune.keep-since annotation allows us to retain resources based on their age. Let’s set it to 1 hour. The annotation requires setting that time in minutes, so the value is 60. We will also override the default pruning strategy to keep-since, which enables removing by time.

kind: Namespace
apiVersion: v1
metadata:
  name: tekton-demo
  annotations:
    operator.tekton.dev/prune.keep-since: "60"
    operator.tekton.dev/prune.strategy: "keep-since"
spec: {}

The CronJob exists in the Tekton operator installation namespace.

$ kubectl get cj -n openshift-pipelines
NAME                           SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
tekton-resource-pruner-ksdkj   */10 * * * *   False     0        9m44s           24m

As you see, the job runs every ten minutes.

$ kubectl get job -n openshift-pipelines
NAME                                    COMPLETIONS   DURATION   AGE
tekton-resource-pruner-ksdkj-28524850   1/1           5s         11m
tekton-resource-pruner-ksdkj-28524860   1/1           5s         75s

There are no PipelineRun objects older than 1 hour in the tekton-demo namespace.

$ kubectl get pipelinerun -n tekton-demo
NAME                        SUCCEEDED   REASON      STARTTIME   COMPLETIONTIME
sample-pipeline-run-2m4rq   True        Succeeded   55m         51m
sample-pipeline-run-4gjqw   True        Succeeded   55m         53m
sample-pipeline-run-5sxcf   True        Succeeded   55m         51m
sample-pipeline-run-667mb   True        Succeeded   34m         30m
sample-pipeline-run-6jqvl   True        Succeeded   34m         32m
sample-pipeline-run-8slfx   True        Succeeded   34m         31m
sample-pipeline-run-bvjq6   True        Succeeded   34m         30m
sample-pipeline-run-d87kn   True        Succeeded   55m         51m
sample-pipeline-run-lrvm2   True        Succeeded   34m         30m
sample-pipeline-run-tx4hl   True        Succeeded   55m         51m
sample-pipeline-run-w5cq8   True        Succeeded   55m         52m
sample-pipeline-run-wn2xx   True        Succeeded   34m         30m

This approach works fine. It minimizes the number of Kubernetes objects stored on the cluster. However, after removing the old objects, we cannot access the full history of pipeline runs. In some cases, it can be useful. Can we do it better? Yes! We can enable Tekton Results.

Using Tekton Results

Install and Configure Tekton Results

Tekton Results is a feature that allows us to archive the complete information for every pipeline run and task run. After pruning the old PipelineRun or TaskRun objects, we can still access the full history using Tekton Results API. It archives all the required information in the form of results and records stored in the database. Before we enable it, we need to prepare several things. In the first step, we need to generate the certificate for exposing Tekton Results REST API over HTTPS. Let’s generate public/private keys with the following openssl command:

$ openssl req -x509 \
    -newkey rsa:4096 \
    -keyout key.pem \
    -out cert.pem \
    -days 365 \
    -nodes \
    -subj "/CN=tekton-results-api-service.openshift-pipelines.svc.cluster.local" \
    -addext "subjectAltName = DNS:tekton-results-api-service.openshift-pipelines.svc.cluster.local"

Then, we can use the key.pem and cert.pem files to create the Kubernetes TLS Secret in the Tekton operator namespace.

$ kubectl create secret tls tekton-results-tls \
    -n openshift-pipelines \
    --cert=cert.pem \
    --key=key.pem

We also need to generate credentials for the Postgres database in Kubernetes Secret form. By default, Tekton Results uses a PostgreSQL database to store data. We can choose between the external instance of that database or the instance managed by the Tekton operator. We will use the internal Postgres installed on our cluster.

$ kubectl create secret generic tekton-results-postgres \
    -n openshift-pipelines \
    --from-literal=POSTGRES_USER=result \
    --from-literal=POSTGRES_PASSWORD=$(openssl rand -base64 20)

Tekton Results requires a persistence volume for storing the logs from pipeline runs.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: tekton-logs
  namespace: openshift-pipelines 
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Finally, we can proceed to the main step. We need to create the TektonResults object. I won’t get into the details of that object. You can just create it “as is” on your cluster.

apiVersion: operator.tekton.dev/v1alpha1
kind: TektonResult
metadata:
  name: result
spec:
  targetNamespace: openshift-pipelines
  logs_api: true
  log_level: debug
  db_port: 5432
  db_host: tekton-results-postgres-service.openshift-pipelines.svc.cluster.local
  logs_path: /logs
  logs_type: File
  logs_buffer_size: 32768
  auth_disable: true
  tls_hostname_override: tekton-results-api-service.openshift-pipelines.svc.cluster.local
  db_enable_auto_migration: true
  server_port: 8080
  prometheus_port: 9090
  logging_pvc_name: tekton-logs

Archive Pipeline Runs with Tekton Results

After applying the TektonResult object into the cluster Tekton runs three additional pods in the openshift-pipelines namespace. There are pods with a Postgres database, with Tekton Results API, and a watcher responsible for monitoring and archiving existing PipelineRun objects.

If you run Tekton on OpenShift you will also see the additional “Overview” menu in the “Pipelines” section. It displays the summary of pipeline runs for the selected namespace.

However, the best thing in this mechanism is that we can still access the old pipeline runs with Tekton Results although the PipelineRun objects have been deleted. Tekton Results integrates smoothly with OpenShift Console. The archived pipeline run is marked with the special icon as shown below. We can still access the logs or the results of running every single task in that pipeline.

If we switch to the tkn CLI it doesn’t return any PipelineRun. That’s because all the runs were older than one hour, and thus they were removed by the pruner.

$ kubectl get pipelinerun
NAME                     SUCCEEDED   REASON      STARTTIME   COMPLETIONTIME
sample-pipeline-yiuqhf   Unknown     Running     30s

Consequently, there is also a single PersistentVolumeClaim object.

$ kubectl get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
pvc-0f16a64031   Bound    pvc-ba6ea9ef-4281-4a39-983b-0379419076b0   1Gi        RWO            ocs-external-storagecluster-ceph-rbd   41s

Of course, we can still access access details and logs of archived pipeline runs via the OpenShift Console.

Final Thoughts

Tekton is a Kubernetes-native tool for CI/CD pipelines. This approach involves many advantages, but may also lead to some challenges. One of them is running pipelines at scale. In this article, I focused on showing you new Tekton features that address some concerns around the intensive usage of pipelines. Features like pipeline run pruning or Tekton Results archives work fine and smoothly integrate with e.g. the OpenShift Console. Tekton gradually adds new useful features. It is becoming a really interesting alternative to more popular CI/CD tools like Jenkins, GitLab CI, or Circle CI.

Piotr's TechBlog

Running Tekton Pipelines on Kubernetes at Scale