Java Concurrency Series, Part 8: Virtual Threads & Project Loom
What changes when threads become cheap? Understand carrier threads, continuations, pinning, StructuredTaskScope, and how virtual threads flip the economics of I/O-bound Java services.
In Part 1, we established that platform threads are expensive: ~30 µs to create and ~512KB of stack. For a service with 10,000 concurrent requests, that’s 5 GB of stack memory and prohibitive creation overhead. This is why we use thread pools (Part 6) — reusing threads rather than creating new ones.
Java 21 (JEP 444) changes the economics entirely. Virtual threads are cheap: millions can coexist, creation costs ~1 µs, and they consume kilobytes of heap — not megabytes of native stack. A blocking call in a virtual thread doesn’t block an OS thread.
Key question: What changes when threads become cheap?
Virtual Thread Architecture
In the platform thread model (Part 1):
Virtual Thread ←── NOT THIS
Platform Thread (JVM) ──── OS Thread (1:1)
Virtual threads introduce a new layer:
Virtual Thread (JVM-managed, millions)
│
mounts onto
│
Carrier Thread (platform thread, small pool)
│
is an
│
OS Kernel Thread (1:1 with carrier)
- Carrier threads: a small
ForkJoinPool(default size:N_cpus) that runs virtual threads - Virtual threads: JVM-managed, backed by continuations (stackful coroutines)
- Mounting/unmounting: when a virtual thread blocks (I/O, sleep, lock), it unmounts from its carrier — the carrier is freed to run another virtual thread
Creating Virtual Threads
// Direct creation
Thread vt = Thread.ofVirtual().name("my-vt").start(() -> {
System.out.println("Hello from virtual thread");
});
// Via factory
ThreadFactory factory = Thread.ofVirtual().name("worker-", 0).factory();
// Via executor (simplest for pools)
ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
executor.submit(() -> handleRequest(request)); // one virtual thread per task
With newVirtualThreadPerTaskExecutor(), you submit one task per virtual thread — no pool sizing, no queue tuning. The JVM manages the carrier pool automatically.
What Happens on a Blocking Call
// In a virtual thread:
InputStream in = socket.getInputStream();
byte[] buf = new byte[1024];
in.read(buf); // I/O operation — blocks
Sequence:
- Virtual thread calls
read() - JVM detects the blocking operation
- Virtual thread unmounts from carrier — its continuation (stack state) is saved to heap
- Carrier thread is freed — picks up another virtual thread
- When I/O completes, virtual thread is remounted — on any available carrier
- Execution resumes after
read()
Carrier C1: [VT-1 running] → [VT-2 running] → [VT-1 resumed]
VT-1 blocked (I/O) VT-1 I/O done
Stack state for VT-1 is saved to heap during the block.
This is why BlockingQueue.take(), Thread.sleep(), JDBC calls, HTTP requests — all of them work correctly in virtual threads without blocking a carrier.
Benchmark: Platform Threads vs Virtual Threads
A service that simulates 1,000 concurrent I/O-bound tasks (50ms sleep = network call):
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Thread)
public class VirtualThreadBenchmark {
static final int TASKS = 1_000;
static final int SLEEP_MS = 50;
@Benchmark
public void platformThreadPool() throws Exception {
ExecutorService pool = Executors.newFixedThreadPool(200); // typical pool
List<Future<?>> futures = new ArrayList<>();
for (int i = 0; i < TASKS; i++) {
futures.add(pool.submit(() -> {
Thread.sleep(SLEEP_MS);
return null;
}));
}
for (Future<?> f : futures) f.get();
pool.shutdown();
}
@Benchmark
public void virtualThreads() throws Exception {
ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
List<Future<?>> futures = new ArrayList<>();
for (int i = 0; i < TASKS; i++) {
futures.add(executor.submit(() -> {
Thread.sleep(SLEEP_MS);
return null;
}));
}
for (Future<?> f : futures) f.get();
executor.shutdown();
}
}
Results:
Benchmark Mode Cnt Score Error Units
VirtualThreadBenchmark.platformThreadPool avt 10 254.3 ± 12.1 ms
VirtualThreadBenchmark.virtualThreads avt 10 52.4 ± 1.8 ms
Platform thread pool (200 threads): 1,000 tasks × 50ms / 200 threads = ~250ms (5 batches).
Virtual threads: all 1,000 run “concurrently” on ~8 carrier threads, all sleeping at once. Wall time ≈ one sleep = ~52ms.
StructuredTaskScope: Structured Concurrency
Virtual threads are paired with structured concurrency (Java 21 preview, stabilized in 23): task lifetimes are scoped to a block, like try-with-resources.
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
Subtask<User> userTask = scope.fork(() -> fetchUser(userId));
Subtask<Account> accountTask = scope.fork(() -> fetchAccount(userId));
scope.join(); // wait for both
scope.throwIfFailed(); // propagate exceptions
return new Profile(userTask.get(), accountTask.get());
}
ShutdownOnFailure: if any subtask fails, cancel the others immediately.
ShutdownOnSuccess: return the first successful result, cancel the rest:
try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {
scope.fork(() -> fetchFromPrimary());
scope.fork(() -> fetchFromReplica());
scope.join();
return scope.result(); // whichever returned first
}
No manual Future.cancel(). No leaked tasks. The scope guarantees all subtasks are done (or cancelled) before the block exits.
Scoped Values: Replacing ThreadLocal
ThreadLocal breaks with virtual threads — not technically, but ergonomically. A virtual thread inheriting ThreadLocal values from a parent and sharing them with child tasks is tricky.
Scoped values (Java 21 preview) are the modern replacement:
static final ScopedValue<User> CURRENT_USER = ScopedValue.newInstance();
// Bind the value for a scope
ScopedValue.where(CURRENT_USER, user).run(() -> {
processRequest(); // CURRENT_USER is accessible here
});
// Read from anywhere within the scope
void processRequest() {
User user = CURRENT_USER.get(); // always accessible within the scope
}
Properties:
- Immutable within scope — can’t be changed after binding (unlike
ThreadLocal.set()) - Inherited by child virtual threads — automatically
- Bounded lifetime — gone when the scope exits
Pinning: The One Thing That Breaks Virtual Threads
Virtual threads unmount on blocking operations — unless they’re pinned to their carrier. Pinned threads cannot unmount, and their carrier is held for the duration.
Two causes of pinning:
1. synchronized blocks with blocking I/O inside
// This PINS the carrier — don't do this with virtual threads
synchronized (lock) {
result = jdbcStatement.executeQuery(); // blocks while pinned!
}
The carrier is blocked waiting for the DB query. No other virtual thread can use it.
Fix: Replace synchronized with ReentrantLock (Part 4):
lock.lock();
try {
result = jdbcStatement.executeQuery(); // virtual thread unmounts while waiting
} finally {
lock.unlock();
}
2. Native frames in the call stack
If a virtual thread is inside a native method when it tries to block, it cannot unmount. JNI calls pin.
Detect pinning:
java -Djdk.tracePinnedThreads=full MyApp
Output when pinned:
Thread[#27,ForkJoinPool-1-worker-1,5,CarrierThreads]
com.example.SlowLock.compute(SlowLock.java:42)
<-- synchronized
Or use JFR:
jcmd <pid> JFR.start name=pinning settings=profile duration=30s filename=pinning.jfr
Look for the jdk.VirtualThreadPinned event.
What to Migrate and What Not to
Good candidates for virtual threads
- HTTP servers handling many concurrent requests (one virtual thread per request)
- gRPC/REST clients making many concurrent outbound calls
- Database connection pools — virtual threads block on JDBC, unmounting cleanly with
ReentrantLock-based pools (HikariCP works well) - Message queue consumers — blocking
poll()on Kafka or similar
Poor candidates
- CPU-bound tasks — virtual threads don’t make computation faster; carrier threads still do the work. Use
ForkJoinPoolfor parallelism. - Code with heavy
synchronized+ I/O — pinning will negate benefits until you migrate toReentrantLock - Work that uses
ThreadLocalfor mutable per-task state — migrate toScopedValuefirst
Migration Guide: Thread Pool → Virtual Threads
Before:
ExecutorService pool = Executors.newFixedThreadPool(200);
// ...
pool.submit(() -> handleRequest(request));
After:
ExecutorService pool = Executors.newVirtualThreadPerTaskExecutor();
// ...
pool.submit(() -> handleRequest(request));
// No pool sizing needed — one virtual thread per task
That’s often the entire migration. The big work is:
- Auditing
synchronizedblocks that contain I/O — replace withReentrantLock - Auditing
ThreadLocalusage for mutable state — migrate toScopedValueor explicit parameters - Checking library dependencies for pinning issues (JVM-level, not your code)
Virtual Thread Memory Model
Virtual threads follow the same JMM rules as platform threads. volatile, synchronized, ReentrantLock — all have the same semantics. The JMM doesn’t change.
What changes:
- Creating millions of virtual threads doesn’t exhaust native memory
- Blocking on I/O doesn’t block OS threads
- No need to tune pool size for I/O-bound concurrency
What doesn’t change:
- Data races still exist — use the same synchronization primitives
- Atomicity requirements are the same
- Lock-free vs lock trade-offs are the same
Observing Virtual Threads
// List all virtual threads in a thread dump
jcmd <pid> Thread.print
# You'll see entries like:
# #31 "" virtual
# java.lang.Thread.State: WAITING (parking)
# at java.lang.VirtualThread.park(VirtualThread.java:...)
Or with JFR — jdk.VirtualThreadStart, jdk.VirtualThreadEnd, jdk.VirtualThreadPinned events give full lifecycle visibility.
Summary
| Concept | Key Point |
|---|---|
| Virtual thread | JVM-managed; mounts/unmounts on carrier threads |
| Carrier pool | Small ForkJoinPool (~N_cpus); shared by all virtual threads |
| Blocking I/O | Unmounts virtual thread; carrier is freed |
| Pinning | synchronized + I/O holds the carrier; replace with ReentrantLock |
StructuredTaskScope | Scoped lifetime for concurrent subtasks |
ScopedValue | Immutable, inherited alternative to ThreadLocal |
| Migration | Replace fixed pools with newVirtualThreadPerTaskExecutor; audit synchronized blocks |
Series Complete
You’ve now traced the full spectrum of Java concurrency — from the CPU cache coherence model that motivates volatile, through the JVM’s lock inflation machinery in synchronized, up to virtual threads that make millions of concurrent I/O operations practical.
The key insight: every tool in this series exists because the one below it has a limitation. volatile doesn’t give atomicity → use synchronized. synchronized serializes all access → use ReadWriteLock or StampedLock. Locks have overhead → use atomics. Threads are expensive → use thread pools. Thread pools have sizing constraints → use virtual threads.
Understanding the whole stack means you can pick the right tool — and know when to switch.
Start of series: Overview — Why Concurrent Java Is Hard