Strings Just Got Faster
188 comments
·May 1, 2025stevoski
neuroelectron
When I started my career in software development, SDE, and soon advanced to SRE, I hated Java. The extreme OOP paradigm made enterprise class situations impossible to understand. But after a few short years, I began to appreciate it as a real, battle hardened ecology. Now, I consider it much better than modern trends such as Rust and Python.
These kinds of niche optimizations are still significant. The OOP model allows them to be implemented with much less fanfare. This is in the context of billion-dollar platforms. With some basic performance testing and API replays, we're saving thousands of dollars a day. Nobody gets a pat on the back. Maybe some pizza on Friday.
davnicwil
the mind blowing moment for me with Java came about 5 years into using it, when I encountered the idea - via some smart colleagues - that none of the extra 'stuff' is intrinsic to the language but rather is self-imposed.
Turns out you can write java without the stuff. No getters and setters, no interfaces or dependency injection, no separate application server (just embed one in your jar). No inheritence. Indeed no OOP (just data classes and static methods).
Just simple, c-like code with the amazing ecosystem of libraries and the incredibly fast marvel that is the JVM and (though this is less of a deal now with LLM autocomplete) a simple built in type system that makes the code practically write itself with autocomplete.
It's truly an awesome dev experience if you just have the power / culture to ignore the forces pressuring you to use the 'stuff'.
concerndc1tizen
I love Java in the same way.
But that free-thinking definition of Java clashes with the mainstream beliefs in the Java ecosystem, and you'll get a lot opposition at workplaces.
So I gave up on Java, not because of the language, but because of the people, and the forced culture around it.
90s_dev
Someone's tweet from about 15 years ago (paraphrased from memory):
"We rewrote 200k line Java codebase in only 50k lines. Guess what language we used?"
The follow-up tweet a day later:
"It was Java."
specialist
> No inheritence.
Yup. Prefer composition over inheritance.
Such a hard sell during the heyday of OOAD. UML, Fusion, RUP, blahblahblah.
Having previous experience with LISP, it just seemed obvious, barely worth mentioning. Definitely set me apart from my colleagues.
FWIW: Our local design pattern study group tackled Arthur J. Riel's Object-Oriented Design Heuristics [1996] https://archive.org/details/objectorientedde0000riel https://www.amazon.com/Object-Oriented-Design-Heuristics-Art... Which might be the earliest published pushback against all that overwrought enterprisey ThoughtWorks-style brainrot.
> No ... dependency injection
Yes and: Mocks (mocking?) is a code stench. Just flip the ownership relationship(s). As Riel, and surely many others, have laboriously explained, to a mostly uncaring world.
jimmaswell
I've seen Java described as made for companies to be able to rotate out mediocre programmers as efficiently as possible without letting them mess things up easily, and it makes a lot of sense from that perspective. Barebones semantics to the point of being Spartan (can't even define your own operator overloads), easy to take another class and copy it with small modifications but not mess it up for anyone else (inheritance)..
Then there's C# which most anyone who's enthusiastic about software dev will find far nicer to work with, but it's probably harder for bargain basement offshore sweatshops to bang their head against.
atomicnumber3
I really don't think this stance aged well, even if it was closer to true way back when. IMO the spartan language is now Go, and Java has ended up the boring open source workhorse. The jvm is very performant compared to many stacks these days (python Ruby node) while still having a very compelling concurrent programming story, and has a lot of nice language feature things ever since 8 and onwards. Lambdas and streams are the big 8's, but I think virtual threads growing up and even new things like scoped variables are really compelling reasons to build a new thing in java right now.
nradov
The lack of operator overloading is a bit annoying but in practice seldom a real problem. An operator is just a funny looking method. So what.
There are worse fundamental problems in Java. For example the lack of a proper numeric tower. Or the need to rely on annotations to indicate something as basic as nullabilty.
SkiFire13
> easy to take another class and copy it with small modifications but not mess it up for anyone else (inheritance)
That sounds like a recipe for disaster though, as it generally makes code much harder to read.
ivan_gammel
I remember the times on one of professional forums, where there was lots of questions about architecture in C# sections and almost none in Java section. Abundance of tools creates abundance of possibilities to get confused about what’s right. In Java many design decisions converged to some dominant design long time ago, so you no longer think about it and focus on business. It’s sometimes as bad as getter verbosity (thankfully record style is getting traction), but in most cases it’s just fine.
SkiFire13
Did you actually started to appreciate the same OOP that made class situations impossible to understand or did you gradually switched to a simplier OOP, made up of mostly interfaces and classes that implement them (as opposed to extending other classes)?
In my experience OOP is actually pretty pleasant to work with if you avoid extending classes as much as possible.
> These kinds of niche optimizations are still significant. The OOP model allows them to be implemented with much less fanfare.
If you're referring to the optimization in the article posted then I would argue an OOP model is not needed for it, just having encapsulation is enough.
Etheryte
I'm not sure if the argument that OOP is pleasant so long as you avoid any OOP is a very sturdy one.
gigatexal
> Did you actually started to appreciate the same OOP that made class situations impossible to understand or did you gradually switched to a simplier OOP, made up of mostly interfaces and classes that implement them (as opposed to extending other classes)?
My thoughts exactly. Give me more classes with shallower inheritance hierarchies. Here is where I think go’s approach makes sense.
neuroelectron
No. Things only get more complicated with more technical debt. The best you can do is manage it with yet another abstraction.
threeseed
You can use the JVM without needing to use OOP e.g. Scala, Clojure, Python, Javascript, Ruby etc.
Then you can get to benefit from Java's unparalleled ecosystem of enterprise hardened libraries, monitoring etc.
neuroelectron
It's hot really your decision in a corporate environment.
bradhe
> no work required other than updating the JRE we use
Have you tried updating production usage of a JRE before??
znpy
Yes. I moved a few repository from java 8 up to Java 21.
Java 8 -> 9 is the largest source of annoyances, past that it's essentially painless.
You just change a line (the version of the JRE) and you get a faster JVM with better GC.
And with ZGC nowadays garbage collection is essentially a solved problem.
I worked on a piece of software serving almost 5 million requests per second on a single (albeit fairly large) box off a single JVM and I was still seeing GC pauses below the single millisecond (~800 usec p99 stop the world pauses) despite the very high allocation rate (~60gb/sec).
The JVM is a marvel of software engineering.
microflash
I have done it multiple times for different versions of Java with nominal effort. Of course, difficulty may vary depending on a project.
With projects like OpenRewrite [1] and good LLMs, things are a lot easier these days.
niuzeta
I love hearing more about this, especially the historical context, but don't have a good java writeups/articles on this. Would you mind sharing some suggestions/pointers? I'd very much appreciate it.
stevoski
A good starting point is Joshua Bloch’s Effective Java. He shares some stories there from Java’s early days, and - at least in passing - mentions some aspects of the String class’s history.
niuzeta
Ah, I certainly remember these anecdotes! What other resources would you recommend(even the tidbits) could there be for more modern Java? The original article like this one should be treasured.
wging
String compression was one. tl;dr: the JVM supports Unicode for strings, but uses 1-byte chars for strings where possible (previously it was UTF-16), even though it's not actually doing UTF-8.
Depending on what sort of document you're looking for, you might like either the JEP: https://openjdk.org/jeps/254
or Shipilev's slides (pdf warning): https://shipilev.net/talks/jfokus-Feb2016-lord-of-the-string...
Shipilev's website (https://shipilev.net/#lord-of-the-strings), and links from the JEP above to other JEPS, are both good places to find further reading.
(I think I saw a feature article about the implementation of the string compression feature, but I'm not sure who wrote it or where it was, or if I'm thinking about something else. Actually I think it might've been https://shipilev.net/blog/2015/black-magic-method-dispatch/, despite the title.)
niuzeta
Absolutely love it. Thanks a lot. A fancy hit me yesterday and I've been looking through JDK's String commit history to see little tidbits that I could grab.
Shipilev's website looks like a fascinating resource. I appreciate the pointer!
cempaka
This is a good video that goes over a ton of the optimizations, especially around concatenation: https://youtu.be/z3yu1kjtcok?si=mOdZ5sh5rp8LNyap
niuzeta
I appreciate it! I will take a look this weekend,
DaiPlusPlus
> Would you mind sharing some suggestions/pointers?
I would, but unfortunately I got a NullPointerException.
I suggest you try Rust instead; its borrow checker will ensure you can't share pointers in an unsafe manner.
gf000
I know it's a tongue in cheek reply but..
You can't share "pointers" in an unsafe manner in Java. Even data races are completely memory safe.
paulddraper
The flip side is that for years Java developers have been dealing with suboptimal strings with nothing to do about it.
g0db1t
[dead]
gavinray
This post makes mention of a new JEP I hadn't heard of before: "Stable Values"
https://cr.openjdk.org/~pminborg/stable-values2/api/java.bas...
I don't understand the functional difference between the suggested StableValue and Records, or Value Classes.
They define a StableValue as:
> "A stable value is a holder of contents that can be set at most once."
Records were defined as: > "... classes that act as transparent carriers for immutable data. Records can be thought of as nominal tuples."
And Value Objects/Classes as: > "... value objects, class instances that have only final fields and lack object identity."
Both Records and Value Objects are immutable, and hence can only have their contents set upon creation or static initalization.layer8
Record fields cannot be lazily initialized. The point of StableValue is lazy initialization, meaning that their value is stable if and only if they carry a non-default value (i.e. after initialization). If you don’t need lazy initialization, you can just use a regular final field. For a non-final field, without StableValue the JIT optimizer can’t tell if it is stable or not.
The implementation of a value object will be able to use StableValue internally for lazy computation and/or caching of derived values.
whartung
I don't know, these are mostly uninformed guesses, but the distinction between Records and Value objects is that the contents lack object identity.
Which, to me, means, potentially, two things.
One, that the JVM can de-dup "anything", like, in theory, it can with Strings now. VOs that are equal are the same, rather than relying on object identity.
But, also, two, it can copy the contents of the VO to consolidate them into a single unit.
Typically, Java Objects and records are blobs of pointers. Each field pointing to something else.
With Value Objects that may not be the case. Instead of acting as a collection of pointers, a VO with VOs in it may more be like a C struct containing structs itself -- a single, continuous block of memory.
So, an Object is a collection of pointers. A Record is a collection of immutable pointers. A Value Object is (may be) a cohesive, contiguous block of memory to represent its contents.
sagacity
Handwavy explanation: A stable value is used as a static constant, with the difference being that you can initialize it at runtime. Once initialized it is treated as fully constant by the JVM. It's similar to something like lateinit in Kotlin, except on the JVM level.
Records are also immutable, but you can create them and delete them throughout your application like you would a regular class.
w10-1
> used as a static constant
Yes, but remind people it's not static in the sense of being associated with the class, nor constant for compile-time purposes.
Perhaps better to say: A stable value is lazy, set on first use, resulting in pre- and post- initialization states. The data being set once means you cannot observe a data change (i.e., appears to be immutable), but you could observe reduction in resource utilization when comparing instances with pre-set or un-set values -- less memory or time or other side-effects of value initialization.
So even if data-immutable, a class with a stable value ends up with behavior combinations of two states for each stable value. Immutable records or classes without stable values have no such behavior changes.
But, writ large, we've always had this with the JVM's hotspot optimizations.
For String, it becomes significant whether hashcode is used when calculating equals (as a fast path to negative result). If not, one would have two equal instances that will behave differently (though producing the same data), at least for one hashcode-dependent operation.
owlstuffing
Right. Oracle should reconsider the naming here: stable -> lazy
gavinray
But this is also achievable with static init methods on records and value classes, right?
record Rational(int num, int denom) {
Rational {
int gcd = gcd(num, denom);
num /= gcd;
denom /= gcd;
}
}
sagacity
How would you do the same thing with the late initialization of, say, a HashMap where you don't control the constructor?
pkulak
So every time you take the hash of a string you leak 4 bytes of memory???
I assume it's static in the context of it's containing object. So, it will be collected when it's string is collected.
nimrody
No. The string hash is stored as part of the String object. It is initialized to 0 but gets set to the real hash of the string on first call to hashCode()
(which is why it will be computed over and over again if your special string happens to hash to 0)
leksak
> It's similar to something like lateinit in Kotlin, except on the JVM level.
What level are you suggesting lateinit happens at if not on the JVM?
Tmpod
I assume they mean this feature is built into the JVM itself, whereas Kotlin's lateinit more or less "just" desugars into code you could otherwise write yourself.
drob518
A stable value, as I understand it, is either not-yet-computed or is constant. In other words, once computed it’s constant and the JIT can therefore treat it as constant.
ajkjk
It's a much-needed idea but... such an awkward way to do it. Only Java would be okay with an actual language feature using words like "orElseGet". And personally I suspect there's an inverse correlation between feature usage and how semantically awkward it is... it just feels like a mistake to even consider using an inelegant feature unless it's out of complete necessity.
It should really be something like
public stable logger = () -> new Logger(/* .. */).
Where the JDK hides the details of making sure the value is only created once, basically like the classholder idiom but under the hood. I'm sure there are reasons why they're not doing it that way, but ... it's definitely what the language needs to be able to do.Incidentally, I've always appreciated for python PEPs how they list all of the obvious complaints about an issue and explain methodically why each was determined not to work. The JEPs don't seem to reach quite the same candor.
_old_dude_
In Java, constants are declared as static final.
static final Complex CONSTANT = new Complex(1, 2);
If you want a lazy initialized constant, you want a stable value static final StableValue<Complex> STABLE_VALUE = StableValue.of();
Complex getLazyConstant() {
return STABLE_VALUE.orElseGet(() -> new Complex(1, 2))
}
If you want the fields of a constant to be constant too, Complex has to be declared has a record.null
elric
- StableValue is about defining when and how to lazily initialy a field, with strong "exactly-once" semantics. The kinds of things we would do with safe double-checked locking before, but more convenient. Using a record doesn't address that, though you could certainly put a record in a StableValue.
Additionally, having to define a record FooHolder(Foo foo) simply to hold a Foo would be a lot more cumbersome than just saying StableValue<Foo> fooHolder = StableValue.of(); There's no need for an extra type.
- Value classes have no identity, which means they can't have synchronized methods and don't have an object monitor. While it would be possible to store a value object inside a StableValue, there are plenty of use cases for an identity object inside a StableValue, such as the Logger example inside the JEP: one could easily imagine a fictional logger having a synchronized method to preserve ordering of logs.
I wouldn't say these are all entirely orthogonal concerns, but they are different concepts with different purposes.
int_19h
I'm rather surprised that string hashes aren't randomized in JVM - or at least that's what the article implies. These days, it's a fairly common mitigation for DDoS attacks based on hash collisions. Does Java not do it to preserve backwards compatibility because too much existing code relies on a specific hashing algorithm?
ncruces
The hashing algorithm is specified in the docs:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
Per Hyrum's law, there's no changing it now.https://docs.oracle.com/javase/8/docs/api/java/lang/String.h...
ivan_gammel
Java hashCode contract is to optimize calculation of hash for performance not for collision search resistance. Its sole purpose is to use it in collections. It must not be used in situations where you need cryptographic properties.
tialaramex
So, the problem here is that you're thinking of "cryptographic properties" as only the "cryptographic hashes" such as those built with the Merkle–Damgård construction (SHA-512/256 being the one you might reasonably pick today)
But, it's actually desirable to have some cryptographic properties in a faster one way function for making hash tables. Read about SipHash to see why.
Because Java didn't (and as others have discussed, now can't) choose otherwise the hash table structures provided must resist sabotage via collision, which isn't necessary if your attacker can't collide the hash you use.
jillyboel
java hashcodes are just 4 bytes, there will always be collisions
ivan_gammel
There’s no reason to add extra properties and associated complexity to hash used in collections because of one exotic use case. To be able to execute hash flooding, the server must accumulate arbitrary user inputs in a hash map, which is problematic design anyway. Even if that would make sense, what kind of function could work as 32 bit gash? You mention SipHash as an example: it is poor example anyway, because this function requires a secret key - meaning Java would have to do the key management behind the scene (just for strings? For Number subclasses?) or impose key management on API users. What’s the point? The case is so rare that it’s easier for developers of vulnerable applications to use a wrapper with desired hash properties.
andrewaylett
Even in collections, an unstable hash is desirable -- to avoid denial of service attacks caused by attacker-controlled hash collisions.
For example, Python back in 2012: https://nvd.nist.gov/vuln/detail/cve-2012-1150.
pfdietz
Or the hash table implementation should be resistant to collisions, falling back to another data structure in that case (as described below, using trees instead of lists in buckets with sufficiently many occupants.)
ivan_gammel
There exists CVE-2012-5373 for Java and it is not fixed because it is not a risk worth taking care of.
GolDDranks
The thing is, even it being used just in collections can lead to DOS, if the attacker can control the string contents, and selectively choose strings that your hash table stops being a hash table and turns into a linked list.
ivan_gammel
That’s clear and that is not a reason to have it in general purpose collections or String::hashCode. If your app is vulnerable to this sort of attack, just use a wrapper for keys and specialized collection (you may want to limit the maximum size of it too).
null
sedatk
Java uses a tree instead of a linked list for collided items, so the search performance degrades more gracefully (e.g. O(N) vs O(logN)).
77
To be more precise, Java initially uses a linked list for nodes within a bin. If the number of items inside the bin crosses TREEIFY_THRESHOLD (which is 8), then that specific bin is converted into a RB tree.
This is detailed in implementation notes comment here: https://github.com/openjdk/jdk/blob/56468c42bef8524e53a929dc...
jillesvangurp
Depends on which Map / Set implementation you use. There are multiple, each with different and interesting properties.
jillesvangurp
No, it would break the semantics of equals and hash which dictate that two objects that are equal should also have the same hash code. So hash codes for objects must be deterministic. Which in turn is an important property for sets, hash tables, etc. It's unlikely to be ever changed because it would break an enormous amount of stuff.
For things that need to be secure, there are dedicated libraries, standard APIs, etc. that you probably should be using. For everything else, this is pretty much a non issue that just isn't worth ripping up this contract for. It's not much of an issue in practice and easily mitigated by just picking things that are intended for whatever it is you are trying to do.
xmcqdpt2
Languages that choose to fix this problem at the hash code level have hash codes that are still deterministic within a given program execution. They are not deterministic between executions, cf
https://docs.python.org/3/reference/datamodel.html#object.__...
yxhuvud
Nothing stops the jvm from caching hashes even if the hashes are unique per process invocation.
PaulHoule
They aren’t and it’s quite unfortunate that the empty string hashes to 0 so it will have to recompute every time although presumably it is quick to compute the hash of the empty string.
TacticalCoder
[dead]
theanonymousone
It seems string typing is not as bad an anti-pattern as it used to be :)
It will be a very impactful work; I'm excited to see. Probably even a 1% improvement in String::hashCode will have an impact on global carbon footprint or so.
daxfohl
Cool that we're still seeing perf improvements after all these years! I'm confused by some of the details in the example. Like, would we see similar 8x improvement in a simpler example like a string hashset lookup? Is there something special about MethodHandle or immutable maps here that accentuates the improvement?
> Computing the hash code of the String “malloc” (which is always -1081483544)
Makes sense. Very cool.
> Probing the immutable Map (i.e., compute the internal array index which is always the same for the malloc hashcode)
How would this work? "Compute" seems like something that would be unaffected by the new attribute. Unless it's stably memoizing, but then I don't quite see what it would be memoizing here: it's already a hash map.
> Retrieving the associated MethodHandle (which always resides on said computed index)
Has this changed? Returning the value in a hash map once you've identified the index has always been zero overhead, no?
> Resolving the actual native call (which is always the native malloc() call)
Was this previously "lazyinit" also? If so, makes sense, though would be nice if this was explained in the article.
twic
> How would this work? "Compute" seems like something that would be unaffected by the new attribute.
The index is computed from the hashcode and the size of the array. Now that the hash code can be treated as a constant, and the size of the array is already a constant, the index can be worked out at compile time. The JVM can basically inline all the methods involved in creating and probing the map, and eliminate it entirely.
daxfohl
The bucket is computed from the hash code and the size of the array, but that's not necessarily the index. If there are no bucket collisions, then index==bucket and this works out. But if there are bucket collisions then the index will be different from the bucket. So you still need some computation IIUC. And there's no way to memoize that result, since memoization would require a hashmap that has the exact same characteristics as the original hashmap.
I guess a @Stable attribute on the array underlying the map would allow for the elimination of one redirection: in a mutable map the underlying array can get resized so its pointer isn't stable. With an annotated immutable map it could be (though IDK whether that'd work with GC defrag etc). But that seems like relatively small potatoes? I don't see a way to "pretend the map isn't even there".
vault
why is a website about programming not writing code blocks in ASCII? I'm referring to quotes and other undesirable symbols
ashvardanian
Has anyone done/shared a recent benchmark comparing JNI call latency across Java runtimes? I’m exploring the idea of bringing my strings library to the JVM ecosystem, but in the past, JNI overhead has made this impractical.
cempaka
Java has replaced JNI with the Project Panama FFM, which depending on your use case might perform quite a bit better than JNI used to. The Vector API is stuck in incubator and still a bit rough around the edges though, so SIMD might be a bit trickier.
throwaway2037
Can you share a link to your "strings library"? I am curious about what it can do that a Java String cannot.
ashvardanian
At this point, it doesn’t provide much novel functionality, but it should be faster than the standard libraries of most (or maybe all) programming languages.
delusional
The example is entirely unconvincing. Why would you store those calls in a map and not just a variable?
Even if the map is crucial for some reason, why not have the map take a simple value (like a unint64) and require the caller to convert their string into a slot before looking up the function pointer. That way the cost to exchange the string becomes obvious to the reader of the code.
I struggle to find a use case where this would optimize good code. I can think of plenty of bad code usecases, but are we really optimizing for bad code?
koolba
> I struggle to find a use case where this would optimize good code. I can think of plenty of bad code usecases, but are we really optimizing for bad code?
The most common such usage in modern web programming is storing and retrieving a map of HTTP headers, parsed query parameters, or deserialized POST bodies. Every single web app, which arguably is most apps, would take advantage of this.
delusional
> storing and retrieving a map of HTTP headers.
I dont have the profiling data for this, so this is pure theoretical speculation. At the time you're shoving http headers, which is dynamic data that will have to be read at runtime, into a heap allocated datastructures inside the request handling. It kinda feel like doing a little xor on your characters is a trivial computation.
I don't envision this making any meaningful difference to those HTTP handlers, because they were written without regard for perfomance in the first place.
joejev
Isn't the entire point of an optimizer to convert "bad code" into "good code"?
Your proposed solution is to have the user manually implement a hash table, but if you have a good optimizer, users can focus on writing clear code without bugs or logic errors and let the machine turn that into efficient code.
gf000
I think it's a pretty myopic view - this exact case might not appear as is, but might readily appear in completely sane and common code after inlining a few things and other optimisations.
f33d5173
>but are we really optimizing for bad code?
Yes
jbverschoor
So strings don’t get hash code at compile time?
At first I thought the article was describing something similar to Ruby’s symbols
chii
> So strings don’t get hash code at compile time?
only strings that are known at compile time could possibly be compile-time hashed?
But the article is talking about strings in a running program. The performance improvements can apply to strings that are constants, but is created at run time.
layer8
It’s a bit unfortunate that the user code equivalent (JEP 502) comes at the cost of an extra object per “stable” field. Lazy initialization is often motivated by avoiding creating an object up-front, but with this new pattern you’ll have to create one anyway.
elric
Well, no. The JVM makes the wrapper object disappear. One of the design drivers for StableValue was performance.
I mean the developer has to create the StableValue field, but its access is optimized away.
layer8
After JIT compilation, yes (presumably), but for interpreted byte code I assume that a regular object is still allocated.
elric
This is addressed in the JEP:
> There is, furthermore, mechanical sympathy between stable values and the Java runtime. Under the hood, the content of a stable value is stored in a non-final field annotated with the JDK-internal @Stable annotation. This annotation is a common feature of low-level JDK code. It asserts that, even though the field is non-final, the JVM can trust that the field’s value will not change after the field’s initial and only update. This allows the JVM to treat the content of a stable value as a constant, provided that the field which refers to the stable value is final. Thus the JVM can perform constant-folding optimizations for code that accesses immutable data through multiple levels of stable values, e.g., Application.orders().getLogger(). > Consequently, developers no longer have to choose between flexible initialization and peak performance.
cempaka
The general thinking around this is that if you're still in interpreted code, you're not in the kind of context where the cost of an extra object allocation is worth worrying about.
Traubenfuchs
This all sounds very hard to grasp for me. Does this only work with Map.of? Would it also work with map.put?
What would be the performance improvement in average java services?
Are there specific types of applications that would benefit a lot?
Does this make string.intern() more valueable? String caches?
amiga386
> Does this only work with Map.of? Would it also work with map.put?
It would be faster but not as blindingly fast. Combined with an immutable map, what it means is that the JVM can directly replace your key with its value, like the map is not even there. Because the key's hashcode won't ever change, and the map won't ever change.
> Does this make string.intern() more valueable?
No, String.intern() does a different job, it's there to save you memory - if you know a string (e.g. an attribute name in an XML document) is used billions of times, and parsed out of a stream, but you know you only want one copy of it and not a billion copies). The downside is that it puts the string into PermGen, which means if you start interning normal strings, you'll run out of memory quickly.
daxfohl
How would it directly replace your key with its value? What if there are bucket collisions? Do immutable maps expand until there aren't any? Moreover, what if there are hash key collisions? There needs to be some underlying mechanism to deal with these, I'd think. I don't see how replace-like-the-map-isn't-there could work. Or even how "@Stable" could be used to affect it. Would love to understand more deeply.
amiga386
> How would it directly replace your key with its value?
In the same way that if you wrote this C code:
const int x[] = {20, 100, 42};
int addten(int idx) { return x[idx] + 10; }
the C compiler would "just know" that anywhere you wrote x[2], it could substitute 42. Because you signalled with the "const" that these values will never change. It could even replace addten(2) with 52 and not even make the call to addten(), or do the addition.The same goes for Java's value-based classes: https://docs.oracle.com/en/java/javase/17/docs/api/java.base...
But it's a bit more magical than C, because _some_ code runs, to initialise the value, and then once it's initialised, there can be further rounds of code compilation or optimisation, where the JVM can take advantage of knowing these objects are plain values and can participate in things like constant-folding, constant propagation, dead-code elimination, and so on. And with @Stable it knows it that if a function has been called once and didn't return zero, it can memoise it.
> What if there are bucket collisions? Do immutable maps expand until there aren't any? Moreover, what if there are hash key collisions?
I don't know the details, but you can't have an immutable map until it's constructed, and if there are problems with the keys or values, it can refuse to construct one by throwing a runtime exception instead.
Immutable maps make a lot of promises -- https://docs.oracle.com/en/java/javase/17/docs/api/java.base... -- but for the most part they're normal HashMaps that are just making semantic promises. They make enough semantic promises internally to the JVM that it can constant fold them, e.g. with x = Map.of(1, "hello", 2, "world") the JVM knows enough to replace x.get(1) with "hello" and x.get(2) with "world" without needing to invoke _any_ of the map internals more than once.
What wasn't working until now was strings as keys, because the JVM didn't see the String.hash field as stable. Now it does, and it can constant fold _all_ the steps, meaning you can also have y = Map.of("hello", 1, "world", 2) and the JVM can replace y.get("hello") with 1
Traubenfuchs
How does the jvm know the map is immutable?
But interned strings can also reuse their hashcode forever.
amiga386
Map.of() promises to return an immutable map. new HashMap<>() does not.
https://docs.oracle.com/en/java/javase/17/docs/api/java.base...
How it tells the JVM this? It uses the internal annotation @jdk.internal.ValueBased
https://github.com/openjdk/jdk/blob/jdk-17-ga/src/java.base/...
null
daxfohl
> Does this make string.intern() more valuable?
Probably depends on the use case, though I'm having trouble thinking of such a use case. If you were dynamically creating a ton of different sets that had different instances of the same strings, then, maybe? But then the overhead of calling `.intern` on all of them would presumably outweigh the overhead of calling `.hash` anyway. In fact, now that `.hash` is faster, that could ostensibly make `.intern` less valuable. I guess.
smcin
Notwithstanding HN's guidline to preserve original title and not editorialize in any way, this would be less misleading if it had been titled 'Java Strings Just Got Faster in JDK 25'
I find the entire history of improvements to Java’s String class enjoyable to read about.
Over the years, the implementation of Java’s String class has been improved again and again, offering performance improvements and memory usage reduction. And us Java developers get these improvements with no work required other than updating the JRE we use.
All the low-hanging fruit was taken years ago, of course. These days, I’m sure most Java apps would barely get any noticeable improvement from further String improvements, such as the one in the article we’re discussing.