Java 21 Virtual Threads in Practice

Java thread handling has always lived at the center of the platform tradeoff. Platform threads are easy to understand because they map directly to operating-system threads. That directness made Java simple to reason about for years, but it also introduced a hard cost every time a request blocked on I/O. When a thread waits on a database, a remote API, or a file system call, it still occupies memory and scheduler attention even though it is not doing useful work.

As systems became more network-heavy, that cost pushed developers toward reactive frameworks, callback chains, and asynchronous orchestration. Those tools are powerful, but they also move complexity into the application model. Java 21 virtual threads are best understood as an attempt to keep the simple coding style while removing the scarcity that made blocking code expensive.

Background

The threading model Java grew up with

For a long time, the default Java concurrency story was the thread per request model: create a platform thread, do the work, block when needed, and let the operating system schedule it. That model was never elegant at massive scale, but it was practical and familiar. The main pain points were straightforward. More threads meant more memory, more scheduling overhead, and more time spent in context switches rather than in application work.

In practice, the problem was not raw thread count alone. The real issue was blocked time. A server could hold dozens or hundreds of threads in a waiting state while upstream services responded. Once the pool filled, the queue grew, latency rose, and throughput hit a ceiling. That is why the industry built compensating systems around the JVM: asynchronous clients, reactive pipelines, message queues, and elaborate executor tuning.

Java 21

What virtual threads actually change

Virtual threads are part of Project Loom and became a final feature in Java 21 after preview iterations in earlier releases. The key idea is simple: the JVM, not the operating system, manages the lightweight thread. A virtual thread can block without monopolizing a carrier thread, so the runtime can park it, free the carrier, and resume work later.

In day-to-day code, that means you can keep writing direct, blocking Java. The API remains familiar: the same Thread abstraction, the same imperative style, and the same request-oriented code paths. In exchange, the runtime handles scheduling in a way that makes far higher concurrency practical for I/O-bound work. Virtual threads are not magic, though. They are still sensitive to pinned sections, native blocking, and external bottlenecks. They also do not make CPU-bound work faster. They simply make waiting cheaper.

They let many more blocked tasks coexist without a huge thread pool.
They reduce the pressure to rewrite blocking code into callbacks.
They still depend on carrier threads underneath, so the runtime is not free of limits.
They shine most when the workload is mostly waiting on external I/O.

Benchmark

Why this comparison matters

To make the difference visible, I used a benchmark built around a blocking REST path rather than a synthetic micro task. The idea is simple: keep the workload close to the kind of service that spends most of its time waiting on something else, then increase concurrency until the threading model starts to show its limits.

In practice, that means comparing how classic platform threads and virtual threads behave when the same request path is under load, not just at a comfortable rate but under enough pressure to expose the point where queuing, latency, and failures begin to diverge.

Method

How the benchmark was built

The benchmark was built as a three-JVM setup so the client, the proxy server, and the slow dependency stay isolated. That matters because it avoids mixing load generation and server execution in the same process. The downstream server is deliberately simple: it simulates a 200 ms latency by sleeping before returning a response. The server under test then proxies that request. One version uses a fixed pool of 200 platform threads. The other uses Executors.newVirtualThreadPerTaskExecutor().

The benchmark client keeps a constant number of workers active for 20 seconds and repeatedly sends HTTP requests to the proxy. Each worker measures request latency around the full client call, so the results include the proxy, the downstream delay, and any queueing or timeout behavior that appears along the way. The exported CSV records requests, successes, failures, throughput, and latency percentiles from the same run.

This is an I/O-bound benchmark, which is exactly where virtual threads are supposed to make the biggest difference.

Results

What the numbers say

The first thing the CSV makes clear is that low concurrency does not tell the story. At 50 and 200 concurrent clients, the classic pool and the virtual-thread server are essentially tied. That is expected because the fixed pool still has enough capacity to absorb the blocked requests.

The interesting behavior starts at 1000 concurrency. At that point, the classic thread pool has effectively run out of runway. Throughput stays near 1,000 requests per second, but p95 latency jumps above one second because requests begin waiting in line. Virtual threads keep the same blocking code path alive, but they remove the pool as the immediate ceiling. Throughput rises to almost 4,808 requests per second while p95 latency stays close to the downstream delay.

Concurrency	Classic throughput	Virtual throughput	Classic p95	Virtual p95	Classic failures	Virtual failures
50	232.54 req/s	231.24 req/s	222 ms	224 ms	0	0
200	943.52 req/s	946.78 req/s	220 ms	220 ms	0	0
1,000	964.06 req/s	4,807.93 req/s	1,058 ms	212 ms	0	0
2,000	957.5 req/s	9,569.33 req/s	2,088 ms	214 ms	0	0
5,000	990.43 req/s	11,046.78 req/s	5,002 ms	588 ms	15,509	849

Throughput comparison chart — Throughput stays flat on the classic pool and climbs sharply with virtual threads as concurrency increases.

p95 latency comparison chart — Latency is stable in the virtual-thread run until the system hits a different bottleneck at the highest concurrency level.

Failure rate comparison chart — The classic pool begins failing under extreme pressure, while the virtual-thread server keeps failures low for much longer.

At 2000 concurrency, the gap is no longer subtle. The classic server remains trapped around 958 requests per second, with p95 latency above two seconds. The virtual-thread server reaches roughly 9,569 requests per second and still keeps p95 latency close to 214 ms. In other words, the virtual-thread version is now close to the downstream delay itself, which means the queueing penalty has been pushed out of the hot path.

The 5000-concurrency run is the most revealing one. The classic pool hits saturation hard: throughput remains under 1,000 requests per second, and more than 15,000 requests fail. Virtual threads still process more than 263,000 successful requests, but p95 latency rises to 588 ms and a small number of failures appear. That does not mean virtual threads stopped working. It means the benchmark has finally found the next bottleneck, most likely in the downstream path, connection pressure, or timeout boundaries.

The shape of the results matters as much as the raw totals. Classic threads show a plateau. Virtual threads show continued growth until the surrounding system becomes the limiting factor. That is the practical promise of Java 21: blocking code can scale much further before the architecture has to become more complex.

Conclusions

What this benchmark concludes

Virtual threads are not a cosmetic API change. They change the economics of waiting.
For blocking I/O workloads, the classic thread-pool ceiling arrives quickly and predictably.
Virtual threads keep the direct, imperative programming style while scaling far beyond a fixed pool.
They do not remove bottlenecks; they move them to the real pressure point in the system.
At very high concurrency, the surrounding stack still needs tuning, especially downstream services and timeouts.

The broader lesson is that Java 21 virtual threads are best viewed as a capacity unlock, not a miracle. They allow conventional server code to absorb far more concurrent blocking work without forcing teams into reactive complexity. For many services, that is a substantial engineering win: simpler code, better throughput, and a much more forgiving latency profile under pressure.