Claude Code on OpenShift with vLLM and Dev Spaces
This article explains how to run Claude Code on OpenShift as a VSCode plugin and then integrate it with AI models deployed on OpenShift using vLLM. vLLM supports the Anthropic Messages API, which Claude Code by default uses to communicate with Anthropic’s servers. Claude Code can be installed in several different ways. The VSCode extension for Claude Code is particularly relevant to the topic of this article. You can run VSCode in OpenShift as a container using OpenShift Dev Spaces (Eclipse Che community project). On the other hand, OpenShift relies heavily on vLLM in support for running AI models. This article aims to provide a complete recipe for using OpenShift tools to configure your development environment to run Claude Code and AI models on the same cluster.
Source Code
Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions. This repository contains several branches, each with an application generated from the same prompt using different models. This article shows how to generate code using the gpt-oss model running on OpenShift vLLM. So switch to the starting branch – dev.

The repository version located in the dev branch contains the necessary configuration for VSCode and Claude Code to work correctly in the OpenShift environment.
Prerequisites
For this exercise, you must have an AWS account and an OpenShift cluster created there. You must also have the appropriate resources and permissions in your account to create an OpenShift node with a GPU. Of course, you can repeat a very similar exercise on infrastructure other than AWS.
The following article explains how to install and configure OpenShift AI to run nodes with NVIDIA GPU support and how to deploy AI models on those nodes. In this exercise, I will not show you how to run the model on OpenShift AI, but simply use the vLLM server on a node with a GPU. If you want to automate the installation of operators required to properly serve GPU for AI models on OpenShift, just clone the following repository with Terraform scripts.
Enable GPU Support in OpenShift
The article mentioned above describes in detail the steps involved in installing a GPU node on OpenShift, so I will only briefly mention a few key points. Several issues also need to be updated. We will run exactly this gpt-oss model from RedHatAI Hugging Face. This model was post-trained with MXFP4 quantization. Therefore, it also requires a specific GPU in order to run properly. In my case, the g5.12xlarge machine in AWS is enough. So, we should create a machine pool with at least one node on OpenShift using the g5.12xlarge machine.

Then, you must install and configure the NVIDIA GPU operator. Create the ClusterPolicy object using default values and verify its status.

After that, you must install the Node Feature Discovery operator and create the NodeFeatureDiscovery object. Once again, you just need to click it in the OpenShift console with the default values, or just use my Terraform script.

Run Model on vLLM
You can use the vLLM server directly to run an AI model. It is pretty straightforward. I’m using the latest image from the Red Hat repository with NVIDIA GPU support: registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3. It is important to use exactly this version or a newer one because support for the Anthropic Messaging API is a relatively new feature in vLLM (2). The g5.12xlarge machine provides 4 GPUs, so I will use all available resources for the best possible performance (1). As I mentioned earlier, I use the RedHatAI/gpt-oss-20b model (3). For vLLM, it is also important to set the name under which the model is served, as we will use it later in API calls (4). Finally, don’t forget to insert your Hugging Face token value (5).
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: ai
name: gpt-oss-rhaiis
spec:
selector:
matchLabels:
app: gpt-oss-rhaiis
replicas: 1
template:
metadata:
labels:
app: gpt-oss-rhaiis
spec:
containers:
- resources:
limits:
cpu: '16'
memory: 30Gi
nvidia.com/gpu: '4'
requests:
cpu: '1'
memory: 10Gi
nvidia.com/gpu: '4' # (1)
name: vllm
image: registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3 # (2)
command:
- python
- '-m'
- vllm.entrypoints.openai.api_server
args:
- '--port=8000'
- '--model=RedHatAI/gpt-oss-20b' # (3)
- '--served-model-name=gpt-oss' # (4)
- '--tensor-parallel-size=1'
- '--enforce-eager'
ports:
- containerPort: 8000
protocol: TCP
env:
- name: HF_HUB_OFFLINE
value: '0'
- name: HUGGING_FACE_HUB_TOKEN
value: <YOUR_TOKEN_TO_HUGGING_FACE> # (5)YAMLLet’s create a Kubernetes Service for that model:
apiVersion: v1
kind: Service
metadata:
name: gpt-oss-rhaiis
namespace: ai
spec:
selector:
app: gpt-oss-rhaiis
ports:
- protocol: TCP
port: 8000
targetPort: 8000YAMLThe simplest way to expose the model API outside a cluster is via OpenShift Route. However, we will access the model internally, from a container in which VSCode will be running. So, just in case, here’s the command that creates a Route for the gpt-oss-rhaiis Service.
oc expose svc/gpt-oss-rhaiisShellSessionLet’s verify if our pod with the AI model is running. Note which node this pod is running on.
$ oc get pods -n ai -o wide
NAME READY STATUS RESTARTS AGE IP NODE
gpt-oss-rhaiis-779d94b8fc-8jtgr 1/1 Running 0 24h 10.128.4.112 ip-10-0-20-154.us-east-2.compute.internalShellSessionNow, let’s take a moment to look at the detailed description of our node. As you can see, the current request for the GPU (nvidia.com/gpu) is 4.

Enable Claude Code in OpenShift Dev Spaces
Finally, we can move on to installing OpenShift Dev Spaces and configuring the Claude Code plugin in VSCode. First, find the right operator and install it as shown below. Then, create the devspaces project (namespace) and click the Red Hat OpenShift Dev Spaces instance Specification link when you are in this namespace.

Then click the Create CheCluster button. You can leave the default values everywhere except for the spec.components.pluginRegistry.openVSXURL field. It must contain the https://open-vsx.org address.
apiVersion: org.eclipse.che/v2
kind: CheCluster
metadata:
name: devspaces
namespace: ai
spec:
components:
pluginRegistry:
openVSXURL: 'https://open-vsx.org'
containerRegistry: {}
devEnvironments: {}
gitServices: {}
networking: {}
YAMLWithin a few minutes, Dev Spaces should be available on your cluster.

Now we can move on to configuring Claude Code. The entire configuration is available in our sample repository. We need to create two configuration files in the repository root: .vscode/extension.json and .claude/settings.local.json. The extension.json contains a list of recommended extensions for VSCode. Interestingly, all recommended extensions are automatically installed in OpenShift Dev Spaces on startup 🙂 Therefore, we recommend the Claude Code extension.
{
"recommendations": [
"Anthropic.claude-code"
]
}.vscode/extension.jsonThe .claude/settings.local.json file specifies Claude Code configuration settings for the current repository. First of all, we must override the default Anthropic API server address with the internal URL in OpenShift of our AI model Service. To do that, we must use the ANTHROPIC_BASE_URL environment variable. Our model doesn’t require an API key (the simplest demo installation), but we still need to set ANTHROPIC_API_KEY. By default, Claude Code tries to sign in to your Anthropic account. It was unnecessary, and, in addition, in Dev Spaces, it meant I had to log in endlessly. Fortunately, we can omit it using the CLAUDE_CODE_SKIP_AUTH_LOGIN environment variable.
{
"permissions": {
"allow": [
"Bash(mvn:*)"
]
},
"env": {
"ANTHROPIC_BASE_URL": "http://gpt-oss-rhaiis.ai.svc.cluster.local:8000",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "gpt-oss",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "gpt-oss",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "gpt-oss",
"ANTHROPIC_API_KEY": "dummy",
"CLAUDE_CODE_SKIP_AUTH_LOGIN": 1
}
}.claude/settings.local.jsonUse Claude Code with VSCode
Finally, we can run an OpenShift Dev Spaces instance with our sample codebase. Provide the address of the sample Git repository. Don’t forget you should use the dev branch in my repository.

After a few moments, Dev Spaces starts VSCode in the web browser with our sample repository source code and automatically installs the Claude Code plugin. Then you can just start using Claude to generate your source code. You can repeat the exact same exercise I described in my article about Claude Code on Ollama.

Below is a screenshot from the battlefield 🙂

Conclusion
Claude Code is currently having its momentum. From OpenShift’s perspective, it is important that the entire development environment can be contained within the RedHat cluster and products in this case. With vLLM, we can run various AI models in OpenShift. In turn, we use Eclipse Che to install and configure an IDE for developers. Claude Code can be easily run and configured on top of those tools.

Leave a Reply