Claude Code on OpenShift with vLLM and Dev Spaces

This article explains how to run Claude Code on OpenShift as a VSCode plugin and then integrate it with AI models deployed on OpenShift using vLLM. vLLM supports the Anthropic Messages API, which Claude Code by default uses to communicate with Anthropic’s servers. Claude Code can be installed in several different ways. The VSCode extension for Claude Code is particularly relevant to the topic of this article. You can run VSCode in OpenShift as a container using OpenShift Dev Spaces (Eclipse Che community project). On the other hand, OpenShift relies heavily on vLLM in support for running AI models. This article aims to provide a complete recipe for using OpenShift tools to configure your development environment to run Claude Code and AI models on the same cluster.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions. This repository contains several branches, each with an application generated from the same prompt using different models. This article shows how to generate code using the gpt-oss model running on OpenShift vLLM. So switch to the starting branch – dev.

The repository version located in the dev branch contains the necessary configuration for VSCode and Claude Code to work correctly in the OpenShift environment.

Prerequisites

For this exercise, you must have an AWS account and an OpenShift cluster created there. You must also have the appropriate resources and permissions in your account to create an OpenShift node with a GPU. Of course, you can repeat a very similar exercise on infrastructure other than AWS.

The following article explains how to install and configure OpenShift AI to run nodes with NVIDIA GPU support and how to deploy AI models on those nodes. In this exercise, I will not show you how to run the model on OpenShift AI, but simply use the vLLM server on a node with a GPU. If you want to automate the installation of operators required to properly serve GPU for AI models on OpenShift, just clone the following repository with Terraform scripts.

Enable GPU Support in OpenShift

The article mentioned above describes in detail the steps involved in installing a GPU node on OpenShift, so I will only briefly mention a few key points. Several issues also need to be updated. We will run exactly this gpt-oss model from RedHatAI Hugging Face. This model was post-trained with MXFP4 quantization. Therefore, it also requires a specific GPU in order to run properly. In my case, the g5.12xlarge machine in AWS is enough. So, we should create a machine pool with at least one node on OpenShift using the g5.12xlarge machine.

Then, you must install and configure the NVIDIA GPU operator. Create the ClusterPolicy object using default values and verify its status.

After that, you must install the Node Feature Discovery operator and create the NodeFeatureDiscovery object. Once again, you just need to click it in the OpenShift console with the default values, or just use my Terraform script.

Run Model on vLLM

You can use the vLLM server directly to run an AI model. It is pretty straightforward. I’m using the latest image from the Red Hat repository with NVIDIA GPU support: registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3. It is important to use exactly this version or a newer one because support for the Anthropic Messaging API is a relatively new feature in vLLM (2). The g5.12xlarge machine provides 4 GPUs, so I will use all available resources for the best possible performance (1). As I mentioned earlier, I use the RedHatAI/gpt-oss-20b model (3). For vLLM, it is also important to set the name under which the model is served, as we will use it later in API calls (4). Finally, don’t forget to insert your Hugging Face token value (5).

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: ai
  name: gpt-oss-rhaiis
spec:
  selector:
    matchLabels:
      app: gpt-oss-rhaiis
  replicas: 1
  template:
    metadata:
      labels:
        app: gpt-oss-rhaiis
    spec:
      containers:
        - resources:
            limits:
              cpu: '16'
              memory: 30Gi
              nvidia.com/gpu: '4'
            requests:
              cpu: '1'
              memory: 10Gi
              nvidia.com/gpu: '4' # (1)
          name: vllm
          image: registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3 # (2)
          command:
            - python
            - '-m'
            - vllm.entrypoints.openai.api_server
          args:
            - '--port=8000'
            - '--model=RedHatAI/gpt-oss-20b' # (3)
            - '--served-model-name=gpt-oss' # (4)
            - '--tensor-parallel-size=1'
            - '--enforce-eager'
          ports:
            - containerPort: 8000
              protocol: TCP
          env:
            - name: HF_HUB_OFFLINE
              value: '0'
            - name: HUGGING_FACE_HUB_TOKEN
              value: <YOUR_TOKEN_TO_HUGGING_FACE> # (5)

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: ai
  name: gpt-oss-rhaiis
spec:
  selector:
    matchLabels:
      app: gpt-oss-rhaiis
  replicas: 1
  template:
    metadata:
      labels:
        app: gpt-oss-rhaiis
    spec:
      containers:
        - resources:
            limits:
              cpu: '16'
              memory: 30Gi
              nvidia.com/gpu: '4'
            requests:
              cpu: '1'
              memory: 10Gi
              nvidia.com/gpu: '4' # (1)
          name: vllm
          image: registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3 # (2)
          command:
            - python
            - '-m'
            - vllm.entrypoints.openai.api_server
          args:
            - '--port=8000'
            - '--model=RedHatAI/gpt-oss-20b' # (3)
            - '--served-model-name=gpt-oss' # (4)
            - '--tensor-parallel-size=1'
            - '--enforce-eager'
          ports:
            - containerPort: 8000
              protocol: TCP
          env:
            - name: HF_HUB_OFFLINE
              value: '0'
            - name: HUGGING_FACE_HUB_TOKEN
              value: <YOUR_TOKEN_TO_HUGGING_FACE> # (5)

YAML

Let’s create a Kubernetes Service for that model:

apiVersion: v1
kind: Service
metadata:
  name: gpt-oss-rhaiis
  namespace: ai
spec:
  selector:
    app: gpt-oss-rhaiis
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8000

apiVersion: v1
kind: Service
metadata:
  name: gpt-oss-rhaiis
  namespace: ai
spec:
  selector:
    app: gpt-oss-rhaiis
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8000

YAML

The simplest way to expose the model API outside a cluster is via OpenShift Route. However, we will access the model internally, from a container in which VSCode will be running. So, just in case, here’s the command that creates a Route for the gpt-oss-rhaiis Service.

oc expose svc/gpt-oss-rhaiis

oc expose svc/gpt-oss-rhaiis

ShellSession

Let’s verify if our pod with the AI model is running. Note which node this pod is running on.

$ oc get pods -n ai -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP             NODE
gpt-oss-rhaiis-779d94b8fc-8jtgr   1/1     Running   0          24h   10.128.4.112   ip-10-0-20-154.us-east-2.compute.internal

$ oc get pods -n ai -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP             NODE
gpt-oss-rhaiis-779d94b8fc-8jtgr   1/1     Running   0          24h   10.128.4.112   ip-10-0-20-154.us-east-2.compute.internal

ShellSession

Now, let’s take a moment to look at the detailed description of our node. As you can see, the current request for the GPU (nvidia.com/gpu) is 4.

Enable Claude Code in OpenShift Dev Spaces

Finally, we can move on to installing OpenShift Dev Spaces and configuring the Claude Code plugin in VSCode. First, find the right operator and install it as shown below. Then, create the devspaces project (namespace) and click the Red Hat OpenShift Dev Spaces instance Specification link when you are in this namespace.

Then click the Create CheCluster button. You can leave the default values everywhere except for the spec.components.pluginRegistry.openVSXURL field. It must contain the https://open-vsx.org address.

apiVersion: org.eclipse.che/v2
kind: CheCluster
metadata:
  name: devspaces
  namespace: ai
spec:
  components:
    pluginRegistry:
      openVSXURL: 'https://open-vsx.org'
  containerRegistry: {}
  devEnvironments: {}
  gitServices: {}
  networking: {}

apiVersion: org.eclipse.che/v2
kind: CheCluster
metadata:
  name: devspaces
  namespace: ai
spec:
  components:
    pluginRegistry:
      openVSXURL: 'https://open-vsx.org'
  containerRegistry: {}
  devEnvironments: {}
  gitServices: {}
  networking: {}

YAML

Within a few minutes, Dev Spaces should be available on your cluster.

Now we can move on to configuring Claude Code. The entire configuration is available in our sample repository. We need to create two configuration files in the repository root: .vscode/extension.json and .claude/settings.local.json. The extension.json contains a list of recommended extensions for VSCode. Interestingly, all recommended extensions are automatically installed in OpenShift Dev Spaces on startup 🙂 Therefore, we recommend the Claude Code extension.

{
  "recommendations": [
    "Anthropic.claude-code"
  ]
}

{
  "recommendations": [
    "Anthropic.claude-code"
  ]
}

.vscode/extension.json

The .claude/settings.local.json file specifies Claude Code configuration settings for the current repository. First of all, we must override the default Anthropic API server address with the internal URL in OpenShift of our AI model Service. To do that, we must use the ANTHROPIC_BASE_URL environment variable. Our model doesn’t require an API key (the simplest demo installation), but we still need to set ANTHROPIC_API_KEY. By default, Claude Code tries to sign in to your Anthropic account. It was unnecessary, and, in addition, in Dev Spaces, it meant I had to log in endlessly. Fortunately, we can omit it using the CLAUDE_CODE_SKIP_AUTH_LOGIN environment variable.

{
  "permissions": {
    "allow": [
      "Bash(mvn:*)"
    ]
  },
  "env": {
    "ANTHROPIC_BASE_URL": "http://gpt-oss-rhaiis.ai.svc.cluster.local:8000",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "gpt-oss",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "gpt-oss",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "gpt-oss",
    "ANTHROPIC_API_KEY": "dummy",
    "CLAUDE_CODE_SKIP_AUTH_LOGIN": 1
  }
}

{
  "permissions": {
    "allow": [
      "Bash(mvn:*)"
    ]
  },
  "env": {
    "ANTHROPIC_BASE_URL": "http://gpt-oss-rhaiis.ai.svc.cluster.local:8000",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "gpt-oss",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "gpt-oss",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "gpt-oss",
    "ANTHROPIC_API_KEY": "dummy",
    "CLAUDE_CODE_SKIP_AUTH_LOGIN": 1
  }
}

.claude/settings.local.json

Use Claude Code with VSCode

Finally, we can run an OpenShift Dev Spaces instance with our sample codebase. Provide the address of the sample Git repository. Don’t forget you should use the dev branch in my repository.

After a few moments, Dev Spaces starts VSCode in the web browser with our sample repository source code and automatically installs the Claude Code plugin. Then you can just start using Claude to generate your source code. You can repeat the exact same exercise I described in my article about Claude Code on Ollama.

Below is a screenshot from the battlefield 🙂

Conclusion

Claude Code is currently having its momentum. From OpenShift’s perspective, it is important that the entire development environment can be contained within the RedHat cluster and products in this case. With vLLM, we can run various AI models in OpenShift. In turn, we use Eclipse Che to install and configure an IDE for developers. Claude Code can be easily run and configured on top of those tools.

Piotr's TechBlog

Claude Code on OpenShift with vLLM and Dev Spaces