Native Java with GraalVM and Virtual Threads on Kubernetes
In this article, you will learn how to use virtual threads, build a native image with GraalVM and run such the Java app on Kubernetes. Currently, the native compilation (GraalVM) and virtual threads (Project Loom) are probably the hottest topics in the Java world. They improve the general performance of your app including memory usage and startup time. Since startup time and memory usage were always a problem for Java, expectations for native images or virtual threads were really big.
Of course, we usually consider such performance issues within the context of microservices or serverless apps. They should not consume many OS resources and should be easily auto-scalable. We can easily control resource usage on Kubernetes. If you are interested in Java virtual threads you can read my previous article about using them to create an HTTP server available here. For more details about Knative as serverless on Kubernetes, you can refer to the following article.
Introduction
Let’s start with the plan for our exercise today. In the first step, we will create a simple Java web app that uses virtual threads for processing incoming HTTP requests. Before we run the sample app we will install Knative on Kubernetes to quickly test autoscaling based on HTTP traffic. We will also install Prometheus on Kubernetes. This monitoring stack allows us to compare the performance of the app without/with GraalVM and virtual threads on Kubernetes. Then, we can proceed with the deployment. In order to easily build and run our native app on Kubernetes we will use Cloud Native Buildpacks. Finally, we will perform some load tests and compare metrics.
Source Code
If you would like to try it by yourself, you may always take a look at my source code. In order to do that you need to clone my GitHub repository. After that, you should follow my instructions.
Create Java App with Virtual Threads
In the first step, we will create a simple Java app that acts as an HTTP server and handles incoming requests. In order to do that, we can use the HttpServer
object from the core Java API. Once we create the server we can override a default thread executor with the setExecutor
method. In the end, we will try to compare the app using standard threads with the same app using virtual threads. Therefore, we allow overriding the type of executor using an environment variable. The name of that is THREAD_TYPE
. If you want to enable virtual threads you need to set the value virtual
for that env. Here’s the main method of our app.
public class MainApp {
public static void main(String[] args) throws IOException {
HttpServer httpServer = HttpServer
.create(new InetSocketAddress(8080), 0);
httpServer.createContext("/example",
new SimpleCPUConsumeHandler());
if (System.getenv("THREAD_TYPE").equals("virtual")) {
httpServer.setExecutor(
Executors.newVirtualThreadPerTaskExecutor());
} else {
httpServer.setExecutor(Executors.newFixedThreadPool(200));
}
httpServer.start();
}
}
In order to process incoming requests, the HTTP server uses the handler that implements the HttpHandler
interface. In our case, the handler is implemented inside the SimpleCPUConsumeHandler
class as shown below. It consumes a lot of CPU since it creates an instance of BigInteger
with the constructor that performs a lot of computations under the hood. It will also consume some time, so we have the simulation of processing time in the same step. As a response, we just return the next number in the sequence with the Hello_
prefix.
public class SimpleCPUConsumeHandler implements HttpHandler {
Logger LOG = Logger.getLogger("handler");
AtomicLong i = new AtomicLong();
final Integer cpus = Runtime.getRuntime().availableProcessors();
@Override
public void handle(HttpExchange exchange) throws IOException {
new BigInteger(1000, 3, new Random());
String response = "Hello_" + i.incrementAndGet();
LOG.log(Level.INFO, "(CPU->{0}) {1}",
new Object[] {cpus, response});
exchange.sendResponseHeaders(200, response.length());
OutputStream os = exchange.getResponseBody();
os.write(response.getBytes());
os.close();
}
}
In order to use virtual threads in Java 19 we need to enable preview mode during compilation. With Maven we need to enable preview features using maven-compiler-plugin
as shown below.
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.10.1</version>
<configuration>
<release>19</release>
<compilerArgs>
--enable-preview
</compilerArgs>
</configuration>
</plugin>
Install Knative on Kubernetes
This and the next step are not required to run the native application on Kubernetes. We will use Knative to easily autoscale the app in reaction to the volume of incoming traffic. In the next section, I’ll describe how to run a monitoring stack on Kubernetes.
The simplest way to install Knative on Kubernetes is with the kubectl
command. We just need the Knative Serving component without any additional features. The Knative CLI (kn
) is not required. We will deploy the application from the YAML manifest using Skaffold.
First, let’s install the required custom resources with the following command:
$ kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.8.3/serving-crds.yaml
Then, we can Install the core components of Knative Serving by running the command:
$ kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.8.3/serving-core.yaml
In order to access Knative services outside of the Kubernetes cluster we also need to install a networking layer. By default, Knative uses Kourier as an ingress. We can install the Kourier controller by running the following command.
$ kubectl apply -f https://github.com/knative/net-kourier/releases/download/knative-v1.8.1/kourier.yaml
Finally, let’s configure Knative Serving to use Kourier with the following command:
kubectl patch configmap/config-network \
--namespace knative-serving \
--type merge \
--patch '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'
If you don’t have an external domain configured or you are running Knative on the local cluster you need to configure DNS. Otherwise, you would have to run curl
commands with a host header. Knative provides a Kubernetes Job
 that sets sslip.io
as the default DNS suffix.
$ kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.8.3/serving-default-domain.yaml
The generated URL contains the name of the service, the namespace, and the address of your Kubernetes cluster. Since I’m running my service on the local Kubernetes cluster in the demo-sless
namespace my service is available under the following address:
But before we deploy the sample app on Knative, let’s do some other things.
Install Prometheus Stack on Kubernetes
As I mentioned before, we can also install a monitoring stack on Kubernetes.
The simplest way to install it is with the kube-prometheus-stack
Helm chart. The package contains Prometheus and Grafana. It also includes all required rules and dashboards to visualize the basic metrics of your Kubernetes cluster. Firstly, let’s add the Helm repository containing our chart:
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
Then we can install the kube-prometheus-stack
Helm chart in the prometheus
namespace with the following command:
$ helm install prometheus-stack prometheus-community/kube-prometheus-stack \
-n prometheus \
--create-namespace
If everything goes fine, you should see a similar list of Kubernetes services:
$ kubectl get svc -n prometheus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 11s
prometheus-operated ClusterIP None <none> 9090/TCP 10s
prometheus-stack-grafana ClusterIP 10.96.218.142 <none> 80/TCP 23s
prometheus-stack-kube-prom-alertmanager ClusterIP 10.105.10.183 <none> 9093/TCP 23s
prometheus-stack-kube-prom-operator ClusterIP 10.98.190.230 <none> 443/TCP 23s
prometheus-stack-kube-prom-prometheus ClusterIP 10.111.158.146 <none> 9090/TCP 23s
prometheus-stack-kube-state-metrics ClusterIP 10.100.111.196 <none> 8080/TCP 23s
prometheus-stack-prometheus-node-exporter ClusterIP 10.102.39.238 <none> 9100/TCP 23s
We will analyze Grafana dashboards with memory and CPU statistics. We can enable port-forward
to access it locally on the defined port, for example 9080
:
$ kubectl port-forward svc/prometheus-stack-grafana 9080:80 -n prometheus
The default username for Grafana is admin
and password prom-operator
.
Running environment
Personally, I’m using a local Kubernetes on Docker Desktop for that exercise. It doesn’t provide any simplified way of running Prometheus or Knative. However, you can use any other Kubernetes distribution. For example in OpenShift, we can do it with a single click from the UI dashboard thanks to operator support.
We will create two panels in the custom Grafana dashboard. First of them will show the memory usage per single pod in the demo-sless
namespace.
sum(container_memory_working_set_bytes{namespace="demo-sless"} / (1024 * 1024)) by (pod)
The second of them will show the average CPU usage per single pod in the demo-sless
namespace. You can import both of these directly to Grafana from the k8s/grafana-dasboards.json
file from the GitHub repo.
rate(container_cpu_usage_seconds_total{namespace="demo-sless"}[3m])
Prometheus Staleness
By default, Prometheus stores metrics without a timestamp for 5 minutes if no value is returned. For example, if the pod is killed you will the metric with a memory and CPU usage of 5 minutes. To change this behavior set the value `prometheus.prometheusSpec.query.lookbackDelta` to e.g. `1m` during kube-prometheus-stack chart installation.
Build and Deploy a native Java Application
We have already created the sample app and then configured the Kubernetes environment. Now, we may proceed to the deployment phase. Our goal here is to simplify the process of building a native image and running it on Kubernetes as much as possible. Therefore, we will use Cloud Native Buildpacks and Skaffold. With Buildpacks we don’t need to have anything installed on our laptop besides Docker. Skaffold can be easily integrated with Buildpacks to automate the whole process of building and running the app on Kubernetes. You just need to install the skaffold
CLI on your machine.
For building a native image of a Java application we may use Paketo Buildpacks. It provides a dedicated buildpack for GraalVM called Paketo GraalVM Buildpack. We should include it in the configuration using the paketo-buildpacks/graalvm
name. Since Skaffold supports Buildpacks, we should set all the properties inside the skaffold.yaml
file. We need to override some default settings with environment variables. First of all, we have to set the version of Java to 19 and enable preview features (virtual threads). The Kubernetes deployment manifest is available under the k8s/deployment.yaml
path.
apiVersion: skaffold/v2beta29
kind: Config
metadata:
name: sample-java-concurrency
build:
artifacts:
- image: piomin/sample-java-concurrency
buildpacks:
builder: paketobuildpacks/builder:base
buildpacks:
- paketo-buildpacks/graalvm
- paketo-buildpacks/java-native-image
env:
- BP_NATIVE_IMAGE=true
- BP_JVM_VERSION=19
- BP_NATIVE_IMAGE_BUILD_ARGUMENTS=--enable-preview
local:
push: true
deploy:
kubectl:
manifests:
- k8s/deployment.yaml
Knative simplifies not only autoscaling, but also Kubernetes manifests. Here’s the manifest for our sample app available in the k8s/deployment.yaml
file. We need to define a single object Service
containing details of the application container. We will change the autoscaling target from the default 200
concurrent requests to 80
. It means that if a single instance of the app will process more than 80 requests simultaneously Knative will create a new instance of the app (or a pod – to be more precise). In order to enable virtual threads for our app we also need to set the environment variable THREAD_TYPE
to virtual
.
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: sample-java-concurrency
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/target: "80"
spec:
containers:
- name: sample-java-concurrency
image: piomin/sample-java-concurrency
ports:
- containerPort: 8080
env:
- name: THREAD_TYPE
value: virtual
- name: JAVA_TOOL_OPTIONS
value: --enable-preview
Assuming you already installed Skaffold, the only thing you need to do is to run the following command:
$ skaffold run -n demo-sless
Or you can just deploy a ready image from my registry on Docker Hub. However, in that case, you need to change the image tag in the deployment.yaml
manifest to virtual-native
.
Once you deploy the app you can verify a list of Knative Service
. The name of our target service is sample-java-concurrency
. The address of the service is returned in the URL field.
$ kn service list -n demo-sless
Run a non-native app
You can also build and deploy a non-native app from my repo using Skaffold and Paketo Buildpacks. Just use the paketo-buildpacks/java buildpack instead of paketo-buildpacks/graalvm in the Skaffold configuration file.
Load Testing
We will run three testing scenarios today. In the first of them, we will test a standard compilation and a standard thread pool of 100
size. In the second of them, we will test a standard compilation with virtual threads. The final test will check native compilation in conjunction with virtual threads. In all these scenarios, we will set the same autoscaling target – 80
concurrent requests. I’m using the k6
tool for load tests. Each test scenario consists of 4 same steps. Each step takes 2 minutes. In the first step, we are simulating 50 users.
$ k6 run -u 50 -d 120s k6-test.js
Then, we are simulating 100 users.
$ k6 run -u 100 -d 120s k6-test.js
Finally, we run the test for 200 users twice. So, in total, there are four tests with 50, 100, 200, and 200 users, which takes 8 minutes.
$ k6 run -u 200 -d 120s k6-test.js
Let’s verify the results. By the way, here is our test for the k6
tool in javascript.
import http from 'k6/http';
import { check } from 'k6';
export default function () {
const res = http.get(`http://sample-java-concurrency.demo-sless.127.0.0.1.sslip.io/example`);
check(res, {
'is status 200': (res) => res.status === 200,
'body size is > 0': (r) => r.body.length > 0,
});
}
Test for Standard Compilation and Threads
The diagram visible below shows memory usage at each phase of the test scenario. After simulating 200 users Knative scales up the number of instances. Theoretically, it should do that during 100 users test. But Knative measures incoming traffic at the level of the sidecar container inside the pod. The memory usage for the first instance is around ~900MB (it includes also sidecar container usage).
Here’s a similar view as before but for the CPU usage. The highest consumption was before autoscaling occurs at the level of ~1.2 core. Then, depending on the number of instances ranges from ~0.4 core to ~0.7 core. As I mentioned before, we are using a time-consuming BigInteger
constructor to simulate CPU usage under a heavy load.
Here are the test results for 50 users. The application was able to process ~105k requests in 2 minutes. The highest processing time value was ~3 seconds.
Here are the test results for 100 users. The application was able to process ~130k requests in 2 minutes with an average response time of ~90ms.
Finally, we have results for 200 users test. The application was able to process ~135k requests in 2 minutes with an average response time of ~175ms. The failure threshold was at the level of 0.02%.
Test for Standard Compilation and Virtual Threads
The same as in the previous section, here’s the diagram that shows memory usage at each phase of the test scenario. After simulating 100 users Knative scales up the number of instances. Theoretically, it should run the third instance of the app for 200 users. The memory usage for the first instance is around ~850MB (it includes also sidecar container usage).
Here’s a similar view as before but for the CPU usage. The highest consumption was before autoscaling occurs at ~1.1 core. Then, depending on the number of instances ranges from ~0.3 core to ~0.7 core.
Here are the test results for 50 users. The application was able to process ~105k requests in 2 minutes. The highest processing time value was ~2.2 seconds.
Here are the test results for 100 users. The application was able to process ~115k requests in 2 minutes with an average response time of ~100ms.
Finally, we have results for 200 users test. The application was able to process ~135k requests in 2 minutes with an average response time of ~180ms. The failure threshold was at the level of 0.02%.
Test for Native Compilation and Virtual Threads
The same as in the previous section, here’s the diagram that shows memory usage at each phase of the test scenario. After simulating 100 users Knative scales up the number of instances. Theoretically, it should run the third instance of the app for 200 users (the third pod visible on the diagram was in fact in the Terminating
phase for some time). The memory usage for the first instance is around ~50MB.
Here’s a similar view as before but for the CPU usage. The highest consumption was before autoscaling occurs at ~1.3 core. Then, depending on the number of instances ranges from ~0.3 core to ~0.9 core.
Here are the test results for 50 users. The application was able to process ~75k requests in 2 minutes. The highest processing time value was ~2 seconds.
Here are the test results for 100 users. The application was able to process ~85k requests in 2 minutes with an average response time of ~140ms
Finally, we have results for 200 users test. The application was able to process ~100k requests in 2 minutes with an average response time of ~240ms. Plus – there were no failures at the second 200 users attempt.
Summary
In this article, I tried to compare the behavior of the Java app for GraalVM native compilation with virtual threads on Kubernetes with a standard approach. There are several conclusions after running all described tests:
7 COMMENTS