Programming languages should have a tree traversal primitive
115 comments
·April 29, 2025aeonik
bts
Agreed; also Traversable in Haskell is a simpler abstraction than lenses and pretty directly addresses what they seem to be looking for: https://hackage.haskell.org/package/base-4.21.0.0/docs/Data-...
chowells
Traversable and lenses are very closely linked. If you go to the original paper leading to Traversable [1] and read through it, it feels basically identical to reading through the parts of the lens library that lay down the core abstractions and the laws implementations must follow if you want to be able to blindly manipulate them. In fact, the traverse function is a Traversal, and so fits trivially into the lens ecosystem.
[1] https://www.cs.ox.ac.uk/jeremy.gibbons/publications/iterator...
jasperry
These are what I think the author is looking for. But it shouldn't be a "primitive" in terms of code automatically generated by the compiler, but an interface or typeclass like your examples (in a language advanced enough to have them.)
The problem is that 'lens', 'monocle', etc. are famously abstract and difficult for people to apply to their actual problems. IMO, the solution would be for standard libraries to specify interfaces called 'BreadthFirstTraverse', 'DepthFirstTraverse', etc.
eru
Haskell has 'Traversable' (and 'Foldable' etc) which are a lot more approachable than the fully generalised lens library.
naasking
> These are what I think the author is looking for. But it shouldn't be a "primitive" in terms of code automatically generated by the compiler
I think people are often too enamored by general purpose languages that can express such abstractions natively. I don't see an issue with a language that provides this as a primitive without being able to express it itself, constraints can be useful for other properties. Once you can traverse trees, most programming problems can be tackled even in such constrained languages, eg. SQL with CTE.
rebeccaskinner
Don’t forget recursion schemes. The last example in the article was just asking for a hylomorphism.
akkartik
I still love the Scrap Your Boilerplate papers.
https://wiki.haskell.org/index.php?title=Research_papers/Gen...
larodi
how about we also get regex-parsable streams (IO::Async in perl has something like it, suboptimal perhaps) and regex-parsable treestructures (totally possible)? seems like just having the ~= work on structures (or whatever the API is called in other languages, this being Perl5)?
postepowanieadm
SQL recursive CTE maybe?
CyberDildonics
This seems like dramatically over complicated iterators. Do they really need to be called 'optics' and 'lenses' and 'prisms' ?
Think of these as programmable accessors and updaters.
How is iterating through something already not 'programmable' ?
Xmd5a
Tree traversal primitives (clojure.walk):
(defn walk [inner outer form]
(cond
(list? form) (outer (with-meta (apply list (map inner form)) (meta form)))
(instance? clojure.lang.IMapEntry form)
(outer (clojure.lang.MapEntry/create (inner (key form)) (inner (val form))))
(seq? form) (outer (with-meta (doall (map inner form)) (meta form)))
(instance? clojure.lang.IRecord form)
(outer (reduce (fn [r x] (conj r (inner x))) form form))
(coll? form) (outer (into (empty form) (map inner form)))
:else (outer form)))
(defn postwalk [f form]
(walk (partial postwalk f) f form))
(defn prewalk [f form]
(walk (partial prewalk f) identity (f form)))
Another reason why this perlisism holds: 9. It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.
"Let's move on."nimih
In addition, clojure.core has the handy tree-seq function:
(defn tree-seq
"Returns a lazy sequence of the nodes in a tree, via a depth-first walk.
branch? must be a fn of one arg that returns true if passed a node
that can have children (but may not). children must be a fn of one
arg that returns a sequence of the children. Will only be called on
nodes for which branch? returns true. Root is the root node of the
tree."
{:added "1.0"
:static true}
[branch? children root]
(let [walk (fn walk [node]
(lazy-seq
(cons node
(when (branch? node)
(mapcat walk (children node))))))]
(walk root)))
Xmd5a
(defn tree-seq-breadth
"Like tree-seq, but in breadth-first order"
[branch? children root]
(let [walk (fn walk [node]
(when (branch? node)
(let [cs (children node)]
(lazy-cat cs (mapcat walk cs)))))]
(cons root (walk root))))
MarkMarine
How about clojure zipper:
pgt
Nathan Marz' talk on Specter (Clojure Library that decouples navigation from transformation) is must-watch if you deal with data: https://www.youtube.com/watch?v=VTCy_DkAJGk
I use it in every project for data navigation and transformation, and it's more performant than standard Clojure data manipulation, while retaining types (instead of coercing back from seqs).
E.g. if you have a map and you want to increment every value in the map: (require '[com.rpl.specter :as S])
``` (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS] inc)) => {:a 6, :b 7, :c 8} ```
^ try to write that in normal clojure.
Now let's say you have a map of vectors and want to increment all of those? (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS S/ALL] inc)) ;; note the navigator juts got another level of nesting => {:a [2 3], :b [4 5], :c [6 7]}works for all clj data types, and of course it has navigators for recursive walking .
It took me a while to get good at Specter, but it was worth it. I hear Rama uses Specter navigators internally.
jerf
That's "just" a particular kind of fancy iterator that you should be able to implement in any language with iterator support. Here's one in Python:
# Expects:
# Tuple of iterateOn where iterateOn can be None
def fancyIt(init, *options):
if init != None:
yield(init)
for f in options:
newVal = f(init)
yield from fancyIt(newVal, *options)
class Tree:
def __init__(self, val, left = None, right = None):
self.val = val
self.left = left
self.right = right
def left(self):
return self.left
def right(self):
return self.right
myTree = Tree(
1,
Tree(2,
Tree(3),
Tree(4, None, Tree(5))),
Tree(6, None, Tree(7)))
for node in fancyIt(myTree, Tree.left, Tree.right):
print(node.val)
which prints the numbers 1 through 7 in order.Breadth-first is slightly trickier, but only slightly trickier one time.
imglorp
Yes, it seems easy to implement.
Yes, students should absolutely implement the classic algorithms to learn.
Yes, there are some occasions when you need to home grow one at $work.
BUT, in my opinion, most of the time, professional code should use a battle tested, vuln hardened library or builtin version. These things are VERY HARD to get exactly right. Jon Bently's Programming Pearls famously had a latent bug in its binary search for 20 years before someone caught it.
https://research.google/blog/extra-extra-read-all-about-it-n...
So yeah, it looks easy but don't do it. Stand on some giant's shoulders instead.
jerf
Your reply is not relevant to my reply. The original poster is asking for this functionality and appears to believe it is something other than an iterator and requires some sort of special language support. However, it is completely implementable as an iterator, in a reasonably usable manner, with no additional language support. My specific code is written only to show that fact off.
Anyone who copies and pastes it is welcome to both pieces when it breaks. Others have already alluded to possible improvements that could be made, and I already have my own analysis in a grandchild reply as to why I don't think this is a terribly pressing need or necessarily even a good idea.
The reason I provide code is that it gets past the "oh, you say it's just an iterator, but I still don't believe you, since you haven't spelled it out to the n'th degree". When code is provided, belief ceases to be an issue. It is clearly something an iterator can implement, in existing languages, with existing iterator support.
Unless you're going to claim it is somehow impossible to provide this functionality in a tested manner, you're completely changing the topic in an uninteresting direction, since it is always true that functionality generally needs testing and bits of code slammed into an HN conversation just to make a particular point probably shouldn't be copied wholesale into your production code.
Amadiro
Sure but all pre-made, battle-tested tree datastructures you'd use in production in all languages already come with some form of iterator that you can just for-loop over, so the original articles point is still moot.
thayne
Sure, but it can be done by a library. There's no reason it needs to be built into the language.
packetlost
Yeah, tree traversal is really easy and implementing it as an iterator is natural. Maybe don't use a recursive technique if you plan on working with non-toy datasets, Python's default stack limit is pretty small (1000), but this is otherwise a very flexible API.
While easy, I think bisect would be a good addition to every stdlib too.
jerf
I did the tree just to match the author's example. I would agree that a bespoke iterator for breadth- and depth-first iteration for any given tree is probably a better way to go. As long as we're in a language like Python, build in something that allows you to examine a branch and decline to descend into it while you're at it.
I don't think this is a large problem in practice because you shouldn't be using dozens of tree types in a given code base, so adding iterators to a tree is no big deal. In general there aren't enough types of iteration available to a given data structure that you need to describe how to iterate on it from the "outside". (Generally when you are doing that, it's too non-trivial to fit into this pattern anyhow; see the Visitor pattern in general.) This strikes me as maybe the sort of default tool you might slap in a library somewhere, but it should be a niche tool. If you're using it all the time you're probably doing something wrong. By default your data structures should be providing iteration packaged with them and it should generally be what you need. And your language should support aborting iteration, in whatever that looks like normally. I'm not sure I know a language that doesn't, it's a fairly basic element of iterator support when you get into implementation.
There are also many cases where a tree iterator will perform significantly better, including CPython. I don't have enough experience with PyPy to know if it could inline the Tree.left and Tree.right calls down to zero penalty at JIT time. Rust and C++ and the other static languages with sophisticated compilers might be able to get that down to fully inlined and zero-cost, but even if they can it's probably better not to push that on to the optimizer as the optimizers will eventually give up if this is composed with enough other stuff. Better to just have an efficient implementation in the first place.
Retr0id
I avoided the stack limit by using itertools.chain: https://github.com/DavidBuchanan314/millipds/blob/15727d474c...
(this is for iterating over nested JSON-like objects, which are just weird trees)
packetlost
CBOR objects, to be specific ;)
There are a lot of ways you could avoid the recursion, but that's a particularly nice way!
thrance
Rust has a bisect [1], which is surprising since the std is kept relatively small.
[1] https://doc.rust-lang.org/std/vec/struct.Vec.html#method.bin...
porphyra
Well, the whole point of the blog post is to argue for a new kind of syntactic sugar and the author explicitly mentions iterators.
> Well a range based for loop requires that your tree exist in memory AND that you have an iterator defined for your tree. With for_tree you could operate on an entirely imperative tree, without needing to define any iterators or generator functions. Here's an example where I'm checking every single string composed of "a", "b", and "c" of length 8 or less.
for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
print(x);
}
You could definitely find every string composed of "a", "b", and "c" of length 8 or less by defining a custom iterator but it would be a verbose and unpleasant way of writing it: class StringIterator {
public:
using iterator_category = std::forward_iterator_tag;
using value_type = std::string;
using difference_type = std::ptrdiff_t;
using pointer = const std::string*;
using reference = const std::string&;
StringIterator(bool begin = false) : is_end_(!begin) { if (begin) s_ = ""; }
const std::string& operator*() const {
if (is_end_) throw std::out_of_range("End iterator");
return s_;
}
StringIterator& operator++() {
if (is_end_) return *this;
if (s_.size() < 8) return s_.push_back('a'), *this;
while (!s_.empty() && s_.back() == 'c') s_.pop_back();
if (s_.empty()) is_end_ = true;
else s_.back() = s_.back() == 'a' ? 'b' : 'c';
return *this;
}
StringIterator operator++(int) { auto tmp = *this; ++(*this); return tmp; }
bool operator==(const StringIterator& other) const {
return is_end_ == other.is_end_ && (is_end_ || s_ == other.s_);
}
bool operator!=(const StringIterator& other) const { return !(*this == other); }
private:
std::string s_;
bool is_end_;
};
int main() {
StringIterator begin(true), end;
int count = 0;
for (auto it = begin; it != end; ++it) ++count;
std::cout << (count == 9841 ? "Pass" : "Fail") << std::endl;
return 0;
}
jerf
def itWithStop(init, stop, *options):
if init is not None and not stop(init):
yield(init)
for f in options:
newVal = f(init)
yield from itWithStop(newVal, stop, *options)
for s in itWithStop("",
lambda x: len(x) > 2,
lambda x: x + "a",
lambda x: x + "b",
lambda x: x + "c"):
print(s)
yields the combinations of 0 - 2 length strings with a, b, and c.Python has a number of ways to achieve this depending on exactly how you want to pass the arguments; multiple functions, optional arguments, etc. How nice the final call looks is more about your local language's closures look.
The main point here is that this will happily iterate on things that don't "exist".
module Tmp where
iter :: forall a. (a -> Bool) -> [a -> a] -> a -> [a]
iter p opts x = if p x then x:concatMap (iter p opts) (opts <*> [x]) else []
ghci> :l tmp.hs
[1 of 1] Compiling Tmp ( tmp.hs, interpreted )
Ok, one module loaded.
ghci> iter (\x -> length x < 3) [(++ "a"), (++ "b"), (++ "c")] ""
["","a","aa","ab","ac","b","ba","bb","bc",
"c","ca","cb","cc"]
(Since things are lazy in Haskell, functions that return lists effectively are iterators. There's probably something in the standard library somewhere for (opts <*> [x]) to avoid the wrapping x in an unnecessary list, but my Haskell is rusty.)porphyra
The iterator in my example will also happily iterate on things that don't exist. We all agree that it's possible to do in any language. But the main point is that the blog post is talking about syntactic sugar for an easier way to do it.
And yes, Haskell is amazing at this sort of thing.
null
jerf
Whoops, my edit window closed, but that first comment derives from a previous version that had a different signature for the functions in the "options" list. Ignore it.
cdrini
I'm not sure if it needs to be at the syntax level, but I think having built in standard library support for graphs (general trees) would help make graphs more common in programming. And seeing how powerful they are, I think that would be a good thing!
I explored this idea with gstd, a standard library for graphs inspired by the JS Array interface: https://github.com/cdrini/gstd/
yxhuvud
No, that seems like language bloat. Better to make your language have internal iterators, as then data structures can add their own logic that matches their need for iteration. Then special cases don't need additional syntax.
naasking
Trees aren't really a "special case". Sequences, trees and graphs are core abstractions to most programming, why try to flatten all trees and graphs to sequences?
smallnamespace
Because your code is actual running serially so no matter what there is an iteration order, and the order matters for performance even if your code does not care.
For example if you literally don’t care about the order then your code can map over a default iterator that is efficient (DFS).
naasking
Not true due to parallelism. Also, SQL demonstrates that you can describe the desired result declaratively without being concerned about iteration order. I see SQL's CTEs as a good example of the kind of primitive the article is talking about.
pmontra
I remember that I was using trees quite often last century, when I was writing programs in C. I seldom use tree or comparably complex data structures nowadays, when I almost only write web apps (mostly backend in Ruby or Python but some frontend in JS.) I'm bet that both Ruby and Python have plenty of trees written in C inside their interpreters. I do everything with arrays (lists) and hash tables (dicts) in Ruby and Python. Maybe a language construct for trees would be nice for lower level languages and almost out of scope for higher level ones.
The same from another angle: there are a lot of trees in the indices of SQL databases (example [1]) but we don't zoom in to that level of detail very often when defining our tables.
thesz
User interface widgets form a forest - windows contain layouts that contain widgets which also can be layouts, etc, etc.
To implement Brown's algorithm to optimize class-based language models I had to implement a complex forest (DAG, actually) in Python using lists of fixed length. That was not especially nice to work with.
ivanjermakov
I think functional languages handle this nicely with Foldable or Traversable typeclasses: https://hackage.haskell.org/package/base-4.21.0.0/docs/Data-...
dtech
Folding and traversing can be trivially done in all relevant programming languages if you have an iterator
trealira
I feel like programming an iterator like this akin to a state machine is just not that convenient or trivial, though. Below is pseudo-Java.
class InOrderTreeIterator {
Stack stack;
TreeNode cursor;
InOrderTreeIterator(TreeNode root) {
cursor = root;
s = new Stack;
}
bool hasNext() {
return cursor != null || !stack.empty();
}
TreeNode next() {
if (cursor != null) {
while (cursor.left != null) {
stack.push(cursor);
cursor = cursor.left;
}
} else if (!stack.empty()) {
cursor = stack.pop();
} else {
throw new NoSuchElementException();
}
TreeNode ret = cursor;
cursor = cursor.right
return ret;
}
}
munificent
It's much easier if the language has built-in support for generators. Here's an in-order tree iterator in Dart:
Iterable<TreeNode> inOrder(TreeNode node) sync* {
if (node.left != null) yield* inOrder(node.left!);
yield node;
if (node.right != null) yield* inOrder(node.right!);
}
munchler
Agreed. It’s amusing to see procedural programmers slowly rediscovering what functional programmers have known for years. Paul Graham called this “The Blub Paradox”.
taeric
Tree traversal is an odd one to want to try and make primitive. There are a lot of hidden questions that you need to consider when traversing a tree. Would you want your syntax to indicate what order traversal? Should it indicate a way to control the extra memory that you allow for the common stack/queue used in basic traversal? If you have a threaded tree, should the syntax work without any extra memory usage?
hinkley
I think Java did the right thing by making tree shaped collections, since they’re iterable and that covers the for loop situation, but it doesn’t expose the tree structure, so you can’t tell if two nodes are siblings or relatives or indeed if they are even in the same tree.
But I think grafting those two together is the right answer, not inventing a new loop construct.
Trees are easy to write iterators for. DAGs are a bit harder and full graphs are an advanced interview question.
Retr0id
> "Why not just use recursive functions"
One great reason not to use recursive functions for traversing trees is that you can allocate your own stack data structure rather than relying on the call stack itself. In most languages/runtimes, the call stack has a maximum depth which limits the depth of trees you can process, usually on the order of thousands of stack frames.
Managing your own stack usually produces weirder looking code (personally I find "naive" recursive approaches more readable) - but having it as a first-class language feature could solve that!
cobbal
If we're fixing the language, may as well fix the real problem and not artificially limit stack space.
Retr0id
The language doesn't get much of a say, stack limits are usually inherited from the OS itself. It's fixable by not using the OS-provided stack but that's a much more invasive change than a new syntax/stdlib feature.
monkeyelite
This is solved by iterators in C++. The idea of an iterators is to generalize the concept of a pointer —- something which refers to a location in your data structure.
For example the most basic operations of a pointer are to advance and dereference.
std::map is actually implemented as a tree. To iterator its members you can do
for (cost auto &pair : map)
The only requirement for your custom data structure to work is to implement begin() and end() which return iterators - “pointer like” objects.
I agree with the author, we need better primitives, if you need functionality now:
Major tools that exist today for partial structure traversal and focused manipulation:
- Optics (Lenses, Prisms, Traversals)
- Zippers - Query Languages (for semantic traversal and deep search)