Skip to content(if available)orjump to list(if available)

Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory

koito17

> Without any back-pressure, the program kept stuffing memory with pending results.

The key phrase in the whole article.

This is the reason I am not fully confident in the "pervasive async" mindset overtaking web development. In JavaScript, pervasive async makes sense, because all functions eventually return control to a central event loop anyway. In environments with real threading, the move from OS threads to "green threads and async everywhere" implicitly gives up the tried-and-tested scheduler of your OS and its mechanisms to handle back-pressure. Developers must now take full responsibility for avoiding buffer bloat.

> they fundamentally change how we think about concurrency limits and resource management.

Virtual threads make concurrency cheaper, but nothing about them (or any other concurrency abstraction) eliminates the need for flow control. By switching to virtual threads, you traded off the safeguards once provided by your (bounded!) thread pool and OS scheduler. It's now your responsibility to put safeguards back in.

As an aside: in Clojure's core.async, channel limits (like 1024 pending puts) are intentionally there as a warning sign that your async pipeline isn't handling back-pressure correctly. This is exactly the "hyperactive downloader with no brakes" problem described in the article. With core.async, you would have eventually hit an exception related to excessive puts on a channel, rather than silently expanding buffers until resource exhaustion.

pron

> By switching to virtual threads, you traded off the safeguards once provided by your (bounded!) thread pool and OS scheduler.

The OS scheduler provides no safeguards here (you can use the thread-per-task executor with OS threads - you'll just get an OOME much sooner), and as the virtual thread adoption guide says, you should replace bounded thread pools with a semaphore when needed because it's just a better construct. It allows you to limit concurrency for different resources in an easy and efficient way even when different threads make use of different resources.

In this case, by switching to virtual threads you get to use more appropriate concurrency constructs without needing to manage threads directly, and without sharing them among tasks.

never_inline

Stupid question: Why not provide a threadpool-like construct which may not necessarily keep threads around but limits their number?

pron

What you usually want to limit isn't the number of threads but the number of threads doing a particular operation, such as accessing some service, so it's more flexible (and efficient) to just guard that particular operation with a semaphore.

The only place where you may want to limit the number of threads is at the entry to the system, but since the semaphore is the construct for limiting the number of anything, you might as well just use that there, too. For example, if the number of requests currently being processed is too high (above the semaphore's number of leases), you don't accept new connections.

andersmurphy

Good, question. The simple answer is there already is such a construct on the JVM already: the semaphore. People just forget it exists. I wrote an article about it a while back[1].

1. Managing throughput with virtual threads (it's in Clojure, but it's using the underlying Java APIs)

https://andersmurphy.com/2024/05/06/clojure-managing-through...

jeroenhd

The amount of actual threads is limited, by default to the amount of CPU cores the system has. The problem isn't the amount of threads itself, but the fact that the threadpool will keep taking on more work, and that it's efficient enough to blow past the CPU+I/O bottleneck. You can achieve the same problem with a threadpool if your threadpool is efficient enough.

Standard concurrency tools like semaphores are capable of reducing the amount of work being done at the same time. You could also simulate classic thread pools by using concurrent lists/sets/etc to replicate traditional work queues, but you'd probably lose any advantage you'd gained from switching to green threads in the first place.

708145_

Of course it possible to limit the number of virtual threads. A web server can have a limit on number of virtual threads too, and queue incoming request before dispatching to to workers (virtual threads).

As other have said, this can be achieved with a semaphore and the bulkhead pattern. You can also limit the number of number of connections.

I would expect any "production ready" web server using virtual threads having some sort of limiting. That must be the case right?

ndriscoll

Wouldn't there be significant overhead in waking tasks from IO just to put them back to sleep on a semaphore vs. there being fewer threads servicing the IO ring for a given type of task?

mdavidn

Presumably the semaphore would be used to limit the number of concurrent virtual threads.

Incoming tasks can be rejected proactively if a system would overflow its limit on concurrent tasks. This approach uses a semaphore’s non-waiting acquire and release operations. See the “bulkhead pattern.”

Alternatively, if a system has a fixed number of threads ingesting tasks, these would respond to backpressure by waiting for available permits before creating more virtual threads.

pron

It should be the other way around. The concurrency-limited resource is usually accessed by IO (some service), so a thread would wait and then acquire the semaphore before doing the IO.

tombert

At my last job I made a thing that took in tens of millions of records from Kafka, did some processing, plopped stuff into Postgres, and moved along. It wasn't hard, but getting throughput was difficult to achieve my throughput goals with regular blocking JDBC.

Eventually I moved to an async-non-blocking model, and for my initial tests it went great, stuff returned almost immediately and I could grab the next one. The problem is that, sort of by definition, non-blocking stuff doesn't really know when it's going to complete right away, so it just kept enqueuing stuff until it ate all my memory.

I figured out pretty early on what was happening, but the other people on my team really seemed to struggle with the concept of backpressure, I think because they weren't really used to a non-blocking model. You kind of get backpressure for free with blocking code.

Eventually I was able to solve my problem using a BlockingQueue,

PaulKeeble

It's not really any different to the limitations of memory that we have always dealt with, it just consumes it faster than we might expect. The techniques we use for ensuring we don't run out of memory also apply to anything that can make threads and we have to put limitations on how many can exist, especially since they consume quite a bit of memory even in their virtual thread variant.

wahern

The OOM error is the back pressure. This is why some engineers working on highly concurrent frameworks insist on pervasive OOM recovery.

I haven't done any Java in years, but I always thought OOM errors were recoverable in the JVM. Or is this just a case where the author never thought to catch OOM exceptions? My instinct would be to catch OOM early (i.e. in entry function) in a virtual thread and re-queue the task. In theory re-queueing might fail, too, I guess, but in practice probably not.

This is why I like to code in C (and Lua) for this kind of thing, as in C my core data structures like lists and trees don't require additional allocations for insert. My normal pattern here would be to have, say, three job lists--todo, pending, complete; a job would always exist in one of those lists (presuming it could be allocated and initialized, of course), and no allocations would be required to move a job back from pending to todo on an OOM error during processing.

koito17

In Java, there are two kinds of Throwable instances[1]: Error and Exception. As the name suggests, OutOfMemoryError is a subclass of Error. In contrast to Exception, an Error "indicates serious problems that a reasonable application should not try to catch"[2]. For this reason, it's considered bad practice in Java to catch all Throwable instances (or catch Error instances explicitly).

> My instinct would be to catch OOM early

OutOfMemoryError subclasses VirtualMachineError, and when a VirtualMachineError is thrown, your program seems to be firmly in undefined behavior territory. Quoting the JVM specification [3]:

  A Java Virtual Machine implementation throws an object that is an instance of a subclass of the class VirtualMachineError when an internal error or resource limitation prevents it from implementing the semantics described in this chapter. This specification cannot predict where internal errors or resource limitations may be encountered and does not mandate precisely when they can be reported. 
For context: "this chapter" refers to the whole chapter describing the behavior of each JVM instruction. The specification seems to suggest that all bets are off on any guarantees the JVM makes by the time a VirtualMachineError is thrown.

[1] https://docs.oracle.com/en/java/javase/21/docs/api/java.base...

[2] https://docs.oracle.com/en/java/javase/21/docs/api/java.base...

[3] https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-6.h...

fulafel

Is this a hole in the safety guarantees?

unscaled

You can catch OutOfMemoryErrors in Java, but it wouldn't be as simple as you describe. In a real-world program, you may be doing other things besides downloading URLs and feeding the URLs to the download scheduler. Even though the URL downloader is what's causing the memory pressure, an OutOfMemoryError may very well be thrown by an allocation anywhere else. _You only have one shared heap_, so unless there is only one single thing that is being allocated on the heap you cannot use it as.

Coming from C, you might think you can just avoid heap allocations for anything that you don't manage with backpressure, but that doesn't work in Java, since Java can only store primitives and references on the stack (at least until project Valhalla is delivered[1]). Almost everything you do triggers a heap allocation.

Virtual Threads make this even worse, since their "stack" is also stored on the heap and it can grow dynamically and require more heap allocations even if you just add a primitive object on the stack!

tl;dr: No, you cannot rely on catching OutOfMemoryError in Java in anything but the simplest scenarios and things break down when you have multiple threads and especially virtual threads.

[1] https://openjdk.org/projects/valhalla/

wyago

This feels analogous to saying that static memory management might be superior, because if dynamic memory is available developers can trivially produce an OOM.

It's possible we'll start borrowing more patterns from Erlang/OTP since it's been living in a post-greenthreads world for as long as many languages have existed, and has developed patterns around it.

jandrewrogers

Proper async architecture requires designing a scheduler appropriate for your workload. Many (most?) async applications ignore or skip this part and try to delegate it to a runtime or the OS, usually with mediocre results. Designing a robust scheduler usually requires excellent visibility into the real-time distribution of available resources so that you can dynamically schedule load away from the resources under the most pressure at every instant in time. If you aren’t going to do that work, async may not be the best architecture choice.

pron

First, this is true in any architecture. Second, in Java you can do just that with the use of semaphores guarding different resources with a different number of leases.

The one resource you can't easily schedule well is CPU, but that's not a big problem because interactive servers don't work well when the CPU needs to be scheduled (i.e. when it's at 100%). Again, this is true regardless of architecture.

Back when we first released virtual threads, many asked us why we didn't make the default virtual thread scheduler offer time-sharing (with or without priorities). The answer was that we wanted to, but were unable to find workloads for the intended domain of the default scheduler (interactive servers) where any kind of CPU scheduling could help (we said that when somebody finds such real workloads, we'll add fairer CPU scheduling to the default scheduler, assuming there was a good algorithm that would address such workloads).

jandrewrogers

This is kind of what I am talking about though. The whole point of async scheduling is that semaphores, locking, etc are unnecessary because you strictly control the CPU schedule and can dynamically rewrite the schedule such that the need for such mechanisms largely doesn't exist. This provides a qualitative performance boost for quite a few analytics and other workloads, which is the only reason we do it (since it is considerably more complex to implement).

If you aren't bothering to write a real scheduler then it becomes less clear why you would bother with true async. Lots of hassle for not much payoff.

3cats-in-a-coat

Everything, always, returns to a main event loop, even the actual hardware threads on your CPU. So JavaScript developers don't get a pass on this.

The problem is you keep creating async requests, they need to be stored somewhere, and that's in memory. Thread or no thread, it's in memory.

And backpressure is essential, yes. Except for basic tasks.

But there is another solution, which is coalescing requests. I.e. in my current project, I combine thousands of operations, each of which could've been its own async request, into a single "unit of work" which then results in a single async request, for all operations combined.

If we treat IO seriously, we won't just fire off requests into the dark by the hundreds. We'd think about it, and batch things together. This can be done manually, but it's best done at a lower level, so you don't have to think about it at the business logic level.

Copenjin

Async, green threads, etc... can be useful tools but are never the out of the box solution for not having to think about concurrency and program flow. Even if people keep thinking they are. Learn how to reason about concurrency, there is no way to avoid it.

fjasdfwa

> Virtual Threads Ate My Memory

From what I see, It's a developer trade off.

If you want to save on memory then you need to drop down to callbacks, promise chains, or async-await sugar that is a compiler transform to a state machine.

But if you do that, then you will write a blog complaining about the boilerplate!

I think Zig had an ability to kind of give you the best of both worlds. A zig expert would need to correct me on this.

written-beyond

I thought rusts async compiles into a state machine.

hocuspocus

Jox provides buffered channels with a bounded capacity by default, if you want a more high-level API to implement this kind of things using virtual threads.

https://github.com/softwaremill/jox

jsd1982

The obvious solution to me is to implement streaming/buffered processing of the content while downloading it instead of downloading the entire content into memory to be processed in a single contiguous byte[].

Buffered IO would have the CPU processing acting as a natural backpressure mechanism to prevent downloading too much content and also prevent unbounded memory allocation. Each CPU worker only needs to allocate a single small buffer for what it processes and it can refill that same buffer with the next IO request. Your memory usage becomes entirely predictable and will only scale with how many concurrent threads you can actually execute at once.

Also, no matter how you artificially rate limit the virtual thread scheduling (e.g. via semaphore), if you still insist on downloading the entire content into memory before starting processing then obviously you cannot process any single piece of content larger than what can fit into available memory at any given time.

kburman

Its a design problem not virtual threads problem.

CactusRocket

Yes, I am a bit surprised too. The author changed their solution from a bounded resource limit to unbounded. Of course you're going to run into problems sooner or later then. Regardless of how lightweight the (virtual) threads implementation is.

kburman

Exactly. It's Virtual Thread, not Magic Thread. Just because they’re lightweight doesn’t mean you can throw out all notions of flow control or resource management.

geodel

I used VThreads recently with good results I should say. In my case I limited concurrency by a blocking queue of concurrent tasks. So number of virtual threads does not keep rising based of millions of input records I was processing.

I found for me keeping VThreads about ~100X of concurrent call to external resources was kind of sweet spot for overall throughput.

marginalia_nu

I think it's a mistake to think of virtual threads as "threads, but magically better". They have real drawbacks, and aren't suitable in every scenario.

A crawler is a textbook case where you get more out of a bounded (but large) threadpool than virtual threads, because the boundedness is a feature. Even with infinite RAM, crawling from as many connections as possible leads to a ton of network congestion, and ironically, will slow down your crawling speed.

unscaled

This article focuses about memory, but even if memory wasn't an immediate issue (e.g. if you had smaller pages and didn't restrict the heap size to 1GB), you still want backpressure in this case.

What really happens here is that we've got two operations:

1. An I/O-bound operation (downloading URLs). 2. A CPU-bound operation (processing downloaded URLs).

Memory is a limited resource that requires you to introduce backpressure into the system, but so is the CPU. If downloads were fast and memory usage was relatively low (compared to the heap size), this program would have run into another issue: Thousands of lightweight threads all competing for the same small number of CPU cores when processing their downloaded data.

In practice, in most cases CPU-bound code in virtual threads would just introduce a new bottleneck, without proper back pressure. If the processing doesn't have any further I/O operations involved, the virtual threads will keep queuing up until the physical threads will finish processing everything (Java virtual threads, unlike goroutines in Go, cannot be preempted by the scheduler). This will just create more opportunities for an OOM, with virtual threads that keep piling up and waiting to be processed.

But even if you introduce a backpressure mechanism like a semaphore or BlockingQueue when queuing up the URLs, things can go bad for you here if your app has to do other things besdies processing URLs.

For instance, assume you've got a batch of 3000 URLs to process and each of these URLs takes 10 seconds to process. Let's say you're running everything on a CPU with 8 logical cores. Even if fully utilizing all of your CPU cores, you'll need about an hour to process everything. During that time any other I/O bound task that you do may be starved, because each time a URL processing task gets a hang of a carrier thread (the actual OS thread running virtual threads), it blocks it until processing is finished, and no other virtual threads can be scheduled on it! So in the scenario described here, you might quickly run out of carrier threads and all other I/O-bound operations would starve out waiting for available carrier threads.

If you've got many long-running CPU-bound tasks, it's often better to keep a traditional dedicated thread pool (ExecutorService.newFixedThreadPool()) just for these tasks.

swsieber

Good timing. I was actually reading the official docs the other day on virtual threads, and it has a big section decidicated to rate limiting virtual threads:

"The hardest thing to internalize about virtual threads is that, while they have the same behavior as platform threads they should not represent the same program concept."

...

"Sometimes there is a need to limit the concurrency of a certain operation. For example, some external service may not be able to handle more than ten concurrent requests. Because platform threads are a precious resource that is usually managed in a pool, thread pools have become so ubiquitious that they're used for this purpose of restricting concurrency, "

...

"But restricting concurrency is only a side-effect of thread pools' operation. Pools are designed to share scarce resources, and virtual threads aren’t scarce and therefore should never be pooled!

"When using virtual threads, if you want to limit the concurrency of accessing some service, you should use a construct designed specifically for that purpose: the Semaphore class."

...

"Simply blocking some virtual threads with a semaphore may appear to be substantially different from submitting tasks to a fixed thread pool, but it isn't. Submitting tasks to a thread pool queues them up for later execution, but the semaphore internally (or any other blocking synchronization construct for that matter) creates a queue of threads that are blocked on it that mirrors the queue of tasks waiting for a pooled thread to execute them. Because virtual threads are tasks, the resulting structure is equivalent"

"Even though you can think of a pool of platform threads as workers processing tasks that they pull from a queue and of virtual threads as the tasks themselves, blocked until they may continue, the underlying representation in the computer is virtually identical. _Recognizing the equivalence between queued tasks and blocked threads will help you make the most of virtual threads._" (emphasis mine)

So it was cool seeing him arrive at the same conclusion (or he read the docss). Either way it was a nice timely article.

Link to the docs: https://docs.oracle.com/en/java/javase/21/core/virtual-threa...

geodel

Well 6 hours of debugging can save 5 min of reading the docs.

palmfacehn

There's also the comfort which exclusively comes from knowing with certainty how your tools work, are intended to be used and the conditions which they will fail. This sentiment is lost on the AI coding proponents.

ivolimmen

Nice that people can now switch to this threading model so that programs can easily process things in paralell but if you do a lot of memory intensive processing it is not that usefull. Focus on less memory usage and then use more threads.

null

[deleted]