Tail Call Recursion in Java with ASM (2023)
48 comments
·March 30, 2025fsckboy
ekimekim
In a weird way it kinda reminds me of `exec` in sh (which replaces the current process instead of creating a child process). Practically, there's little difference between these two scripts:
#!/bin/sh
foo
bar
vs #!/bin/sh
foo
exec bar
And you could perhaps imagine a shell that does "tail process elimination" to automatically perform the latter when you write the former.But the distinction can be important due to a variety of side effects and if you could only achieve it through carefully following a pattern that the shell might or might not recognize, that would be very limiting.
nagaiaida
this is pretty much exactly how my "forth" handles tail call elimination, and it's the main thing that's added the quotes so far since it shifts the mental burden to being aware of this when writing code to manipulate the return stack.
as you imply towards the end, i'm not confident this is a trick you can get away with as easily without the constraints of concatenative programming to railroad you into it being an easily recognizable pattern for both the human and the interpreter.
LeFantome
One of the issues with Java is that it is two levels of language. You compile Java into Java Byte code which is further compiled into native machine code. There is no concept of tail call recursion in Java Byte code. So, it is difficult to propagate the semantics. So it really has to be a programmer or compiler optimization to implement the tail call optimization into the generated intermediate bytecode before that is further compiled.
.NET is an interesting contrast. The equivalent of Java Bytecode in .NET (CIL) does have the concept of tail calls. This allows a functional language like F# to be compiled to the intermediate form without losing the tail call concept. It is still up to the first level compiler though. C# for example does not support tail calls even though it’s intermediate target (CIL) does.
ghoul2
Sigh. I have been kicking this horse forever as well: an "optimization" implies just a performance improvement.
Tail call elimination, if it exists in a language, allows coding certain (even infinite) loops as recursion - making loop data flow explicit, easier to analyze, and at least in theory, easier to vectorize/parallelize, etc
But if a language/runtime doesn't do tail call elimination, then you CAN'T code up loops as recursion, as you would be destroying you stack. So the WAY you code, structure it, must be different.
Its NOT an optimization.
I have no idea who even came up with that expression.
ameliaquining
I mean, in the particular case demonstrated in this blog post it can only be an optimization, because semantically guaranteeing it would require language features that Java doesn't have.
bradley13
Every compiler should recognize and optimize for tail recursion. It's not any harder than most other optimizations, and some algorithms are far better expressed recursively.
Why is this not done?
SkiFire13
In general, tail recursion destroys stacktrace information, e.g. if f calls g which tail calls h, and h crashes, you won't see g in the stacktrace, and this is bad for debuggability.
In lower level languages there are also a bunch of other issues:
- RAII can easily make functions that appear in a tail position not actually tail calls, due to destructors implicitly running after the call;
- there can be issues when reusing the stack frame of the caller, especially with caller-cleanup calling conventions;
- the compiler needs to prove that no pointers to the stack frame of the function being optimized have escaped, otherwise it would be reusing the memory of live variables which is illegal.
chowells
I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had. Tail recursion is equivalent to looping. People should stop complaining when it gives them the same information as looping.
roenxi
> I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had.
That is a common criticism. You're referring to the functional programmers. They would typically argue that building up state based on transient loop variables is a mistake. The body of a loop ideally should be (at the time any stack trace gets thrown) a pure function of constant values and a range that is being iterated over while being preserved. That makes debugging easier.
ameliaquining
I mean, if I were doing an ordinary non-recursive function call that just happened to be in tail position, and it got eliminated, and this caused me to not be able to get the full stack trace while debugging, I might be annoyed.
In a couple languages I've seen proposals to solve this problem with a syntactic opt-in for tail call elimination, though I'm not sure whether any mainstream language has actually implemented this.
vbezhenar
Some of the issues partially alleviated by using limited part of tail recursion optimization. You mark some function with tailrec keyword, and compiler verifies that this function calls itself as the last statement. You also wouldn't expect complete stack trace from that function. At the same time it probably helps with 90% of recursive algorithms which would benefit from the tail recursion.
LeFantome
That is what Clojure does I believe.
hyperbrainer
AFAIK Zig is the only somewhat-big and known low-level language with TCO. Obviously, Haskell/Ocaml and the like support and it are decently fast too, but system programming languages they are not.
vlovich123
For guarantee:
https://crates.io/crates/tiny_tco
As an optimization my understanding is that GCC and LLVM implement it so Rust, C, and C++ also have it implicitly as optimizations that may or may not apply to your code.
But yes, zig does have a formal language syntax for guaranteeing tail calls to happen at the language level (which I agree with as the right way to expose this optimization).
SkiFire13
Zig's tco support is not much different than Clang's `[[clang::musttail]]` in C++. Both have the big restriction that the two functions involved are required to have the same signature.
Thorrez
C++:
> All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade)
https://stackoverflow.com/questions/34125/which-if-any-c-com... (2008)
johnisgood
Depends on what you mean by "systems programming", you can definitely do that in OCaml.
pjmlp
"Unix system programming in OCaml"
https://ocaml.github.io/ocamlunix/ocamlunix.html
MirageOS
House OS,
https://programatica.cs.pdx.edu/House/
Just saying.
vlovich123
My bigger issue with tail call optimization is that you really want it to be enforceable since if you accidentally deoptimize it for some reason then you can end up blowing up your stack at runtime. Usually failure to optimize some pattern doesn’t have such a drastic effect - normally code just runs more slowly. So tail call is one of those special optimizations you want a language annotation for so that if it fails you get a compiler error (and similarly you may want it applied even in debug builds).
_old_dude_
Parroting something i have heard at a Java conference several years ago, tail recursion remove stack frames but the security model is based on stack frames, so it has to be a JVM optimization, not a compiler optimization.
I've no idea if this fact still holds when the security manager will be removed.
smarks
The security manager was removed (well, “permanently disabled”) in Java 24. As you note, the permissions available at any given point can depend on the permissions of the code on the stack, and TCO affects this. Removal of the SM thus removes one impediment to TCO.
However, there are other things still in the platform for which stack frames are significant. These are referred to as “caller sensitive” methods. An example is Class.forName(). This looks up the given name in the classloader of the class that contains the calling code. If the stack frames were shifted around by TCO, this might cause Class.forName() to use the wrong classloader.
No doubt there are ways to overcome this — the JVM does inlining after all — but there’s work to be done and problems to be solved.
thfuran
Is there? As you say, there's already inlining, and I don't see how tco presents a harder case for that.
javier2
In theory, if all you do is implement algorithms, this sounds fine. But most apps implement horrible business processes, so what would one do with missing stacktraces? Maybe in languages that can mark functions as pure.
1932812267
Scala has been using this technique for years with its scala.annotation.tailrec annotation. Regardless, it's cool to see this implemented as a bytecode pass.
gavinray
Kotlin as well, with the "tailrec" keyword, e.g. "tailrec fun fibonacci()"
https://kotlinlang.org/docs/functions.html#tail-recursive-fu...
Kotlin also has a neat other tool, "DeepRecursiveFunction<T, R>" that allows defining deep recursion that is not necessarily tail-recursive.
Really useful if you wind up a problem that is most cleanly solved with mutual recursion or similar:
https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/-deep-r...
deepsun
Interesting, does it depend on Kotlin compiler or it can be implemented in Java as well?
gavinray
The "DeepRecursiveFunction<T,R>" could be implemented in Java. The Kotlin implementation leverages Kotlin's native coroutines and uses continuations.
It'd require a bit of engineering to get something working in native Java I'd imagine, even with the new JDK Structured Concurrency API offering you a coroutines alternative.
On the other hand, "tailrec" is a keyword and implemented as a compiler optimization.
The closest I've seen in Java is a neat IntelliJ plugin that has a transformation to convert recursive method calls into imperative loops with a stack frame.
This transformation and resulting tool was the result of someone's thesis, it's pretty cool:
https://github.com/andreisilviudragnea/remove-recursion-insp...
ncruces
It's been a long time since I've messed with Java bytecode [1], but shouldn't the private method call use INVOKESPECIAL?
In general I don't think you can do this to INVOKEVIRTUAL (or INVOKEINTERFACE) as it covers cases where your target is not statically resolved (virtual/interface calls). This transformation should be limited to INVOKESTATIC and INVOKESPECIAL.
You also need lots more checks to make sure you can apply the transformations, like ensure the call site is not covered by a try block, otherwise this is not semantics preserving.
lukaslalinsky
I never understood the need for tail recursion optimization in imperative languages. Sure, you need it in FP if you don't have loops and recursion is you only option, but what is the benefit of recursive algorithms, that could benefit from tail optimization (i.e recursive loops), in a language like Java?
cempaka
Very nice article demonstrating a neat use of ASM bytecode. The Java language devs are also working on Project Babylon (code reflection), which will bring additional techniques to manipulate the output from the Java compiler: https://openjdk.org/projects/babylon/articles/code-models
gavinray
This was delivered in JDK 24 as the "Class-File API"
algo_trader
Can this improve/replace AspectJ and similar instrumentations? We do lots of instruction level modifications
droideqa
Cool, now ABCL can have TCO!
1932812267
This isn't a _general_ tail call optimization--just tail recursion. The issue is that this won't support mutual tail recursion.
e.g.:
(defun func-a (x) (func-b (- x 34)) (defun func-b (x) (cond ((<= 0 x) x) ('t (func-a (-x 3))))
Because func-a and func-b are different (JVM) functions, you'd need an inter-procedural goto (i.e. a tail call) in order to natively implement this.
As an alternative, some implementations will use a trampoline. func-a and func-b return a _value_ which says what function to call (and what arguments) for the next step of the computation. The trampoline then calls the appropriate function. Because func-a and func-b _return_ instead of actually calling their sibling, the stack depth is always constant, and the trampoline takes care of the dispatch.
knome
Sounds like a manual form of clojures recur function.
1932812267
Clojure's loop/recur is specifically tail recursion like scala's tailrec or the optimization described in the blogpost. It doesn't use trampolines to enable tail calls that aren't tail recursion.
dapperdrake
Finally.
The ANTLR guys went through terrible contortions for their parsers.
Never felt like working those details out for ABCL.
curtisszmania
[dead]
the "lambda the ultimate" papers and the birth of scheme was a loong time ago, so it grates on my ears to hear this topic presented as "an optimization". Yes, it is sometimes an optimization a compiler can make, but the idea is much better presented as a useful semantic of a language.
in the same way that passing parameters to a subfunction "creates" a special set of local variables for the subfunction, the tail recursion semantic updates this set of local variables in an especially clean way for loop semantics, allowing "simultaneous assignment" from old values to new ones.
(yes, it would be confusing with side effected C/C++ operators like ++ because then you'd need to know order of evaluation or know not to do that, but those are already issues in those languages quite apart from tail recursion)
because it's the way I learned it, I tend to call the semantic "tail recursion" and the optimization "tail call elimination", but since other people don't do the same it's somewhat pointless; but I do like to crusade for awareness of the semantic beyond the optimization. If it's an optimization, you can't rely on it because you could blow the stack on large loops. If it's a semantic, you can rely on it.
(the semantic is not entirely "clean" either. it's a bit of a subtle point that you need to return straightaway the return value of the tail call or it's not a tail call. fibonacci is the sum of the current with the next so it's not a tail call unless you somewhat carefully arrange the values you pass/keep around. also worth pointing out that all "tail calls" are up for consideration, not just recursive ones)