Programming languages should have a tree traversal primitive

210 comments

·April 29, 2025

aeonik

I agree with the author, we need better primitives, if you need functionality now:

Major tools that exist today for partial structure traversal and focused manipulation:

- Optics (Lenses, Prisms, Traversals)

  Elegant, composable ways to zoom into, modify, and rebuild structures.

  Examples: Haskell's `lens`, Scala's Monocle, Clojure's Specter.

  Think of these as programmable accessors and updaters.

- Zippers

  Data structures with a "focused cursor" that allow local edits without manually traversing the whole structure.

  Examples: Huet’s original Zipper (1997), Haskell’s `Data.Tree.Zipper`, Clojure’s built-in zippers.

- Query Languages (for semantic traversal and deep search)

  When paths aren't enough and you need semantic conditionals:

    - SPARQL (semantic web graph querying)
    - Datalog (logic programming and query over facts)
    - Cypher (graph traversal in Neo4j)
    - Prolog (pure logic exploration)

  These approaches let you declaratively state what you want instead of manually specifying traversal steps.

bts

Agreed; also Traversable in Haskell is a simpler abstraction than lenses and pretty directly addresses what they seem to be looking for: https://hackage.haskell.org/package/base-4.21.0.0/docs/Data-...

chowells

Traversable and lenses are very closely linked. If you go to the original paper leading to Traversable [1] and read through it, it feels basically identical to reading through the parts of the lens library that lay down the core abstractions and the laws implementations must follow if you want to be able to blindly manipulate them. In fact, the traverse function is a Traversal, and so fits trivially into the lens ecosystem.

[1] https://www.cs.ox.ac.uk/jeremy.gibbons/publications/iterator...

Akronymus

Arent haskell traversables different in that they preserve the structure if you were to map over them, as compared to the solution posted in the article, where they get flattened to what amounts to a list?

jasperry

These are what I think the author is looking for. But it shouldn't be a "primitive" in terms of code automatically generated by the compiler, but an interface or typeclass like your examples (in a language advanced enough to have them.)

The problem is that 'lens', 'monocle', etc. are famously abstract and difficult for people to apply to their actual problems. IMO, the solution would be for standard libraries to specify interfaces called 'BreadthFirstTraverse', 'DepthFirstTraverse', etc.

T-R

I definitely agree for traversals, but Lenses need some sort of primitive support - even in Haskell they're mostly generated with TemplateHaskell, and the language developers have spent a long time trying to make the `record.field` accessor syntax overloadable enough to work with lenses[1][2]. Hopefully someday we'll be free from having to memorize all the lens operators.

Optics are famously abstract in implementation, but I don't think people have trouble applying them - people seem to like JQuery/CSS selectors, and insist on `object.field` syntax; it's kind of wild that no mainstream language has a first-class way to pass around the description of a location in an arbitrary data structure.

[1] https://ghc-proposals.readthedocs.io/en/latest/proposals/002...

[2] https://ghc-proposals.readthedocs.io/en/latest/proposals/015...

aozgaa

Like offsetof[1]?

[1] https://en.cppreference.com/w/cpp/types/offsetof

eru

Haskell has 'Traversable' (and 'Foldable' etc) which are a lot more approachable than the fully generalised lens library.

naasking

> These are what I think the author is looking for. But it shouldn't be a "primitive" in terms of code automatically generated by the compiler

I think people are often too enamored by general purpose languages that can express such abstractions natively. I don't see an issue with a language that provides this as a primitive without being able to express it itself, constraints can be useful for other properties. Once you can traverse trees, most programming problems can be tackled even in such constrained languages, eg. SQL with CTE.

Avshalom

To point out a prolog thing which is also applicable to other languages with good patter matching: the break/return/prune examples are all ergonomic to implement as recursion in a way that fails in C++ style type based dispatch.

rebeccaskinner

Don’t forget recursion schemes. The last example in the article was just asking for a hylomorphism.

moomin

Honestly when I read the article I was, "Do you have a minute to talk about our Lord and Saviour, derive (Functor) and how her son hylo saved us all?"

jll29

In Rust, you can extend the syntax a bit within the language itself.

For C++, you could define yourself a template that expands to the two functions you listed.

For any language, you could write yourself a pre-processor that adds for_tree notation and expands it, either with pre-processor semantics or working on the abstract syntax tree (which is more "proper" but also more work"). I would recommend the former to test notations, and then you can live with them for a while to experiment with them and see if and how often you really need the construct (recall that is how C++ evolved from C via "C with classes" - it was first a pre-processor). Once you and others are 100% convinced of the new syntax, go to your prefered language's working group/ISO committee and propose inclusion.

My own feeling is that this is not something for which I need more than recursion; calling inside traverse() traverse(left) and traverse(right) for binary trees or using a normal for loop to iterate over all this->children() from 0 to this->children.size() is something that occurs in graph libraries and parsers/compilers once in a while but not often enough to warrant its own notation. Rather, when I look at languages like C++, I'd have a language that is simpler, cleaner, more homogeneous and more orthogonal; C++ is complicated, convoluted, too large to implement it correctly by a single person in reasonable time (compared to beauties like C, Pascal, Scheme), so I stand on the side of minimalism.

timeninja

The "syntax extension" thing is precisely the reason I have an interest in (and advocacy for) Common Lisp.

Be that as it may, for C++, Eric Neibler's [Boost.Proto](https://www.boost.org/doc/libs/1_84_0/doc/html/proto.html) could likely help conveniently connect syntax to implementation to achieve something similar to what the author is taking about.

postepowanieadm

SQL recursive CTE maybe?

adamgordonbell

Even just traverse from scala, haskell, kotlin, etc will work iterating over trees.

larodi

how about we also get regex-parsable streams (IO::Async in perl has something like it, suboptimal perhaps) and regex-parsable treestructures (totally possible)? seems like just having the ~= work on structures (or whatever the API is called in other languages, this being Perl5)?

melagonster

Does this mean something like $tree_object =~ m/regex/ will work and return a sub tree?

larodi

Yeah and if you think about it is totally doable. The FSM or whatever automata is just picks and exhausts branches that provide the correct value.

Definitely can be done to streaming data… protobufs in a way is this, given it’s a sort of BNF from a bird’s eye.

Xmd5a

Tree traversal primitives (clojure.walk):

    (defn walk [inner outer form]
      (cond
       (list? form) (outer (with-meta (apply list (map inner form)) (meta form)))
       (instance? clojure.lang.IMapEntry form)
       (outer (clojure.lang.MapEntry/create (inner (key form)) (inner (val form))))
       (seq? form) (outer (with-meta (doall (map inner form)) (meta form)))
       (instance? clojure.lang.IRecord form)
         (outer (reduce (fn [r x] (conj r (inner x))) form form))
       (coll? form) (outer (into (empty form) (map inner form)))
       :else (outer form)))
    
    (defn postwalk [f form]
      (walk (partial postwalk f) f form))
    
    (defn prewalk [f form]
      (walk (partial prewalk f) identity (f form)))

Another reason why this perlisism holds:

    9. It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

"Let's move on."

nimih

In addition, clojure.core has the handy tree-seq function:

    (defn tree-seq
      "Returns a lazy sequence of the nodes in a tree, via a depth-first walk.
       branch? must be a fn of one arg that returns true if passed a node
       that can have children (but may not).  children must be a fn of one
       arg that returns a sequence of the children. Will only be called on
       nodes for which branch? returns true. Root is the root node of the
      tree."
      {:added "1.0"
       :static true}
       [branch? children root]
       (let [walk (fn walk [node]
                    (lazy-seq
                     (cons node
                      (when (branch? node)
                        (mapcat walk (children node))))))]
         (walk root)))

Xmd5a

    (defn tree-seq-breadth
      "Like tree-seq, but in breadth-first order"
      [branch? children root]
      (let [walk (fn walk [node]
                   (when (branch? node)
                     (let [cs (children node)]
                       (lazy-cat cs (mapcat walk cs)))))]
        (cons root (walk root))))

MarkMarine

How about clojure zipper:

https://clojuredocs.org/clojure.zip/zipper

billfruit

Didn't Wirth(may be it was someone else) say that it is better to have a complex data structure and simple algorithm/code that works on them than having simple data structures and complex code.

Complex data structures absorb lot of the complexity of the problem and reduce the complexity of the rest of the code.

Vosporos

Linus Torvalds also spoke in favour of using adequate data structures instead of the code.

parrit

It is even better to have 100 functions work on 100 data structures. Powerful programming languages like Lisp and Haskell give you that. Generics gives you most of that.

If every ounce of performance matters, e.g. in a database, you want 10000 functions, 100 for each data structure.

MathMonkeyMan

Lisp gives you an infinite amount of functions that operate on one data structure: the cons cell. :P

I guess all you really need are dynamically allocated arrays. A cons cell is an array of two. A struct with N fields is an array of N. Everything else is built on that.

pgt

Nathan Marz' talk on Specter (Clojure Library that decouples navigation from transformation) is must-watch if you deal with data: https://www.youtube.com/watch?v=VTCy_DkAJGk

I use it in every project for data navigation and transformation, and it's more performant than standard Clojure data manipulation, while retaining types (instead of coercing back from seqs).

E.g. if you have a map and you want to increment every value in the map: (require '[com.rpl.specter :as S])

``` (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS] inc)) => {:a 6, :b 7, :c 8} ```

^ try to write that in normal clojure.

Now let's say you have a map of vectors and want to increment all of those? (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS S/ALL] inc)) ;; note the navigator juts got another level of nesting => {:a [2 3], :b [4 5], :c [6 7]}works for all clj data types, and of course it has navigators for recursive walking .

It took me a while to get good at Specter, but it was worth it. I hear Rama uses Specter navigators internally.

erichocean

Meander is also useful if you need to map from one structure to another, and Malli if you need to write coercisions that check the validity of structures at runtime (aka "schema checking").

https://www.metosin.fi/blog/transforming-data-with-malli-and...

jerf

That's "just" a particular kind of fancy iterator that you should be able to implement in any language with iterator support. Here's one in Python:

    # Expects:
    # Tuple of iterateOn where iterateOn can be None
    def fancyIt(init, *options):
        if init != None:
            yield(init)
            for f in options:
                newVal = f(init)
                yield from fancyIt(newVal, *options)

    class Tree:
        def __init__(self, val, left = None, right = None):
            self.val = val
            self.left = left
            self.right = right

        def left(self):
            return self.left
        def right(self):
            return self.right

    myTree = Tree(
        1,
        Tree(2,
             Tree(3),
             Tree(4, None, Tree(5))),
        Tree(6, None, Tree(7)))

    for node in fancyIt(myTree, Tree.left, Tree.right):
        print(node.val)

which prints the numbers 1 through 7 in order.

Breadth-first is slightly trickier, but only slightly trickier one time.

imglorp

Yes, it seems easy to implement.

Yes, students should absolutely implement the classic algorithms to learn.

Yes, there are some occasions when you need to home grow one at $work.

BUT, in my opinion, most of the time, professional code should use a battle tested, vuln hardened library or builtin version. These things are VERY HARD to get exactly right. Jon Bently's Programming Pearls famously had a latent bug in its binary search for 20 years before someone caught it.

https://research.google/blog/extra-extra-read-all-about-it-n...

So yeah, it looks easy but don't do it. Stand on some giant's shoulders instead.

jerf

Your reply is not relevant to my reply. The original poster is asking for this functionality and appears to believe it is something other than an iterator and requires some sort of special language support. However, it is completely implementable as an iterator, in a reasonably usable manner, with no additional language support. My specific code is written only to show that fact off.

Anyone who copies and pastes it is welcome to both pieces when it breaks. Others have already alluded to possible improvements that could be made, and I already have my own analysis in a grandchild reply as to why I don't think this is a terribly pressing need or necessarily even a good idea.

The reason I provide code is that it gets past the "oh, you say it's just an iterator, but I still don't believe you, since you haven't spelled it out to the n'th degree". When code is provided, belief ceases to be an issue. It is clearly something an iterator can implement, in existing languages, with existing iterator support.

Unless you're going to claim it is somehow impossible to provide this functionality in a tested manner, you're completely changing the topic in an uninteresting direction, since it is always true that functionality generally needs testing and bits of code slammed into an HN conversation just to make a particular point probably shouldn't be copied wholesale into your production code.

Amadiro

Sure but all pre-made, battle-tested tree datastructures you'd use in production in all languages already come with some form of iterator that you can just for-loop over, so the original articles point is still moot.

thayne

Sure, but it can be done by a library. There's no reason it needs to be built into the language.

packetlost

Yeah, tree traversal is really easy and implementing it as an iterator is natural. Maybe don't use a recursive technique if you plan on working with non-toy datasets, Python's default stack limit is pretty small (1000), but this is otherwise a very flexible API.

While easy, I think bisect would be a good addition to every stdlib too.

jerf

I did the tree just to match the author's example. I would agree that a bespoke iterator for breadth- and depth-first iteration for any given tree is probably a better way to go. As long as we're in a language like Python, build in something that allows you to examine a branch and decline to descend into it while you're at it.

I don't think this is a large problem in practice because you shouldn't be using dozens of tree types in a given code base, so adding iterators to a tree is no big deal. In general there aren't enough types of iteration available to a given data structure that you need to describe how to iterate on it from the "outside". (Generally when you are doing that, it's too non-trivial to fit into this pattern anyhow; see the Visitor pattern in general.) This strikes me as maybe the sort of default tool you might slap in a library somewhere, but it should be a niche tool. If you're using it all the time you're probably doing something wrong. By default your data structures should be providing iteration packaged with them and it should generally be what you need. And your language should support aborting iteration, in whatever that looks like normally. I'm not sure I know a language that doesn't, it's a fairly basic element of iterator support when you get into implementation.

There are also many cases where a tree iterator will perform significantly better, including CPython. I don't have enough experience with PyPy to know if it could inline the Tree.left and Tree.right calls down to zero penalty at JIT time. Rust and C++ and the other static languages with sophisticated compilers might be able to get that down to fully inlined and zero-cost, but even if they can it's probably better not to push that on to the optimizer as the optimizers will eventually give up if this is composed with enough other stuff. Better to just have an efficient implementation in the first place.

Retr0id

I avoided the stack limit by using itertools.chain: https://github.com/DavidBuchanan314/millipds/blob/15727d474c...

(this is for iterating over nested JSON-like objects, which are just weird trees)

packetlost

CBOR objects, to be specific ;)

There are a lot of ways you could avoid the recursion, but that's a particularly nice way!

thrance

Rust has a bisect [1], which is surprising since the std is kept relatively small.

[1] https://doc.rust-lang.org/std/vec/struct.Vec.html#method.bin...

null

[deleted]

porphyra

Well, the whole point of the blog post is to argue for a new kind of syntactic sugar and the author explicitly mentions iterators.

> Well a range based for loop requires that your tree exist in memory AND that you have an iterator defined for your tree. With for_tree you could operate on an entirely imperative tree, without needing to define any iterators or generator functions. Here's an example where I'm checking every single string composed of "a", "b", and "c" of length 8 or less.

    for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
     print(x);
    }

You could definitely find every string composed of "a", "b", and "c" of length 8 or less by defining a custom iterator but it would be a verbose and unpleasant way of writing it:

    class StringIterator {
    public:
        using iterator_category = std::forward_iterator_tag;
        using value_type = std::string;
        using difference_type = std::ptrdiff_t;
        using pointer = const std::string*;
        using reference = const std::string&;

        StringIterator(bool begin = false) : is_end_(!begin) { if (begin) s_ = ""; }

        const std::string& operator*() const {
            if (is_end_) throw std::out_of_range("End iterator");
            return s_;
        }

        StringIterator& operator++() {
            if (is_end_) return *this;
            if (s_.size() < 8) return s_.push_back('a'), *this;
            while (!s_.empty() && s_.back() == 'c') s_.pop_back();
            if (s_.empty()) is_end_ = true;
            else s_.back() = s_.back() == 'a' ? 'b' : 'c';
            return *this;
        }

        StringIterator operator++(int) { auto tmp = *this; ++(*this); return tmp; }

        bool operator==(const StringIterator& other) const {
            return is_end_ == other.is_end_ && (is_end_ || s_ == other.s_);
        }

        bool operator!=(const StringIterator& other) const { return !(*this == other); }

    private:
        std::string s_;
        bool is_end_;
    };

    int main() {
        StringIterator begin(true), end;
        int count = 0;
        for (auto it = begin; it != end; ++it) ++count;
        std::cout << (count == 9841 ? "Pass" : "Fail") << std::endl;
        return 0;
    }

jerf

    def itWithStop(init, stop, *options):
        if init is not None and not stop(init):
            yield(init)
            for f in options:
                newVal = f(init)
                yield from itWithStop(newVal, stop, *options)

    for s in itWithStop("",
                       lambda x: len(x) > 2,
                       lambda x: x + "a",
                       lambda x: x + "b",
                       lambda x: x + "c"):
        print(s)

yields the combinations of 0 - 2 length strings with a, b, and c.

Python has a number of ways to achieve this depending on exactly how you want to pass the arguments; multiple functions, optional arguments, etc. How nice the final call looks is more about your local language's closures look.

The main point here is that this will happily iterate on things that don't "exist".

    module Tmp where

    iter :: forall a. (a -> Bool) -> [a -> a] -> a -> [a]
    iter p opts x = if p x then x:concatMap (iter p opts) (opts <*> [x]) else []


    ghci> :l tmp.hs
    [1 of 1] Compiling Tmp              ( tmp.hs, interpreted )
    Ok, one module loaded.
    ghci> iter (\x -> length x < 3) [(++ "a"), (++ "b"), (++ "c")] "" 
    ["","a","aa","ab","ac","b","ba","bb","bc",
     "c","ca","cb","cc"]

(Since things are lazy in Haskell, functions that return lists effectively are iterators. There's probably something in the standard library somewhere for (opts <*> [x]) to avoid the wrapping x in an unnecessary list, but my Haskell is rusty.)

porphyra

The iterator in my example will also happily iterate on things that don't exist. We all agree that it's possible to do in any language. But the main point is that the blog post is talking about syntactic sugar for an easier way to do it.

And yes, Haskell is amazing at this sort of thing.

samus

Same thing, but the iterator is instead hidden inside the language implementation. `foreach` is already a quite general construct, and iterators are the way to extend them. However, I can see the benefit of designing a library to help implement more intricate iterators.

Ceterum censeo this would be a family of simple macros in LISP.

abbeyj

It seems like it should be possible to factor out most of the iterator boilerplate into a helper class. Then each place where you want to iterate you can construct an instance of that helper class and supply a lambda that specifies how to descend into children. If you're doing the same iteration in several places then you can use a named function instead of a lambda, which means less typing for each `for` loop. Here's a rough sketch: https://godbolt.org/z/x94WY77rv

bawolff

I kind of agree, but at the same time, for loops are just a fancy control structure that any student should be able to implement with goto.

IshKebab

I mean people do use iterator instead of for loops quite a lot.

IMO the thing that would be really nice is if control flow like `for` was actually the same as using an iterator. This would really help in Rust too where handling errors inside iterator callbacks is a right pain.

I've seen a few languages try this but it seems to not be very popular. I think it can get a bit confusing how control flow keywords like `return` and `break` work if you turn `if` into syntactic sugar for a function call taking a closure etc.

charrondev

In what way is for different than an iterator?

In PHP you loop through an iterator with the foreach control structure.

In JavaScript you use for of.

In rust it’s for in.

What am I missing?

bawolff

That's why i said "goto". Iterators are already a high level cintrol structure.

jerf

Whoops, my edit window closed, but that first comment derives from a previous version that had a different signature for the functions in the "options" list. Ignore it.

monkeyelite

This is solved by iterators in C++. The idea of an iterators is to generalize the concept of a pointer —- something which refers to a location in your data structure.

For example the most basic operations of a pointer are to advance and dereference.

std::map is actually implemented as a tree. To iterator its members you can do

    for (cost auto &pair : map)

The only requirement for your custom data structure to work is to implement begin() and end() which return iterators - “pointer like” objects.

rerdavies

I'm surprised nobody's brought up C++ generators yet (which avoid potential problems with recursion stack depth).

    for (const auto&node: await depth_first_tree(node))

And generators have the added advantage that walking trees is just a special case of the more general case of traversing a directed graph of elements, which is equally easy.

Are generators not a thing in other languages?

trealira

I didn't even know C++ had generators, but you're right, generators make functions that walking trees much more obviously correct and convenient to write.

The site cppreference has an example of walking trees in C++ using them.

https://en.cppreference.com/w/cpp/coroutine/generator

But I have a question: why is it that in your example, you write `await` before the generator function, but it's not in the example given on cppreference? Also, did you mean `co_await`?

rerdavies

To answer your question: I have used them, but I was too lazy to look up the correct syntax. Yes. It should be co_await. The library I used was layered on top of the C++20 co-routine implementation, not c++23, and used co_await there (but probably should not have).

Do NOT use the c++20 co-routine APIs (a half-implemented nightmare of an API, although what is there does miraculously work, contrary to expectations). Probably better to wait for c++23 generators, which are so far available as an experimental feature on GCC 14.0 (which makes the feature unusable in any of my projects).

All of which, I guess, answers my question about why nobody has brought up c++ generators yet. C# has nice generators.

Sorry I brought it up. :-(

monkeyelite

> (which avoid potential problems with recursion stack depth).

Iterators for trees are not implemented with call stack recursion.

> I'm surprised nobody's brought up C++ generators yet

You cannot modify, insert, or use standard algorithms with a generator.

yxhuvud

No, that seems like language bloat. Better to make your language have internal iterators, as then data structures can add their own logic that matches their need for iteration. Then special cases don't need additional syntax.

naasking

Trees aren't really a "special case". Sequences, trees and graphs are core abstractions to most programming, why try to flatten all trees and graphs to sequences?

smallnamespace

Because your code is actual running serially so no matter what there is an iteration order, and the order matters for performance even if your code does not care.

For example if you literally don’t care about the order then your code can map over a default iterator that is efficient (DFS).

naasking

Not true due to parallelism. Also, SQL demonstrates that you can describe the desired result declaratively without being concerned about iteration order. I see SQL's CTEs as a good example of the kind of primitive the article is talking about.

lmm

> Sequences, trees and graphs are core abstractions to most programming, why try to flatten all trees and graphs to sequences?

Because "trees" and "graphs" aren't actually single concepts, they're families of concepts that can vary along many dimensions. Most programming languages offer one or a small handful of sequence types, because most sequence use patterns are the same. Most languages don't offer a tree or graph type, and as soon as you try to implement one you see why: there are multiple different kinds of trees and graphs, and multiple different representations that are appropriate for different use cases, and any one you might pick would be unsuitable for most programs people want to write.

naasking

> Most languages don't offer a tree or graph type, and as soon as you try to implement one you see why: there are multiple different kinds of trees and graphs, and multiple different representations that are appropriate for different use cases

I think it's more because the abstract interfaces for trees and graphs that can support multiple representations aren't as well known [1]. An iterator/sequence interface has a simple abstract structure that everyone immediately understands, but the structure needed to abstract over graphs and trees are trickier.

[1] https://www.cs.tufts.edu/~nr/cs257/archive/andrey-mokhov/alg...

yxhuvud

The point wasn't that it should be treated as a sequence of not, but that with good base abstractions you can move the decision of what to do, and how, into the data structure itself, rather than having to have the language decide for it.

ivanjermakov

I think functional languages handle this nicely with Foldable or Traversable typeclasses: https://hackage.haskell.org/package/base-4.21.0.0/docs/Data-...

dtech

Folding and traversing can be trivially done in all relevant programming languages if you have an iterator

trealira

I feel like programming an iterator like this akin to a state machine is just not that convenient or trivial, though. Below is pseudo-Java.

  class InOrderTreeIterator {
      Stack stack;
      TreeNode cursor;

      InOrderTreeIterator(TreeNode root) {
          cursor = root;
          s = new Stack;
      }

      bool hasNext() {
          return cursor != null || !stack.empty();
      }

      TreeNode next() {
          if (cursor != null) {
              while (cursor.left != null) {
                  stack.push(cursor);
                  cursor = cursor.left;
              }
          } else if (!stack.empty()) {
              cursor = stack.pop();
          } else {
              throw new NoSuchElementException();
          }
          TreeNode ret = cursor;
          cursor = cursor.right
          return ret;
      }
  }

munificent

It's much easier if the language has built-in support for generators. Here's an in-order tree iterator in Dart:

    Iterable<TreeNode> inOrder(TreeNode node) sync* {
      if (node.left != null) yield* inOrder(node.left!);
      yield node;
      if (node.right != null) yield* inOrder(node.right!);
    }

ookdatnog

Don't iterators necessarily "forget" the structure of the tree? I can see how you can write an iterator to, for example, compute the size of a tree, or generate the set of all variables that occur in an expression tree.

But it's not obvious to me how you'd write a generic iterator that supports something like finding all free variables in an expression tree, or even just express a tree mapping function that constructs a new tree with the same structure (for example, "make all variables in the expression uppercase"). It's been a while since I worked with iterators so correct me if I'm wrong and this is in fact easy with iterators.

With something like Haskell's recursion-schemes library [0] these operations are all just a few lines long, guaranteed to terminate, and don't need to be updated if your data structure changes (for example you add new expression nodes). I'm not aware of any non-functional programming language that can do that. See for example the freeVars function at the bottom of the linked recursion-schemes doc.

[0] https://hackage.haskell.org/package/recursion-schemes-5.2.3

null

[deleted]

agnishom

This becomes even better with Recursion Schemes https://hackage.haskell.org/package/recursion-schemes

trealira

I do think recursion schemes are pretty cool, particularly the idea that a list fold could be generalized to the concept of catamorphisms, which could be mechanically derived upon defining a data structure, because the shape of the recursion that's defined is the same as the shape of the recursion of the data constructors. It makes me realize that programming languages still could be higher-level; what if a programming language automatically derived the equivalent of the "fold" and "map" functions for any recursive data type defined by the programmer?

At the same time, the way it's implemented in Haskell's recursion-schemes library might be hard to wrap one's head around at first, kind of like how the list functions "foldr" and "foldl" are also often confusing to newbies, even though they're like the go-to, default way to make list functions in Haskell.

s-zeng

Haskell already supports deriving Functor and Foldable

munchler

Agreed. It’s amusing to see procedural programmers slowly rediscovering what functional programmers have known for years. Paul Graham called this “The Blub Paradox”.

pmontra

I remember that I was using trees quite often last century, when I was writing programs in C. I seldom use tree or comparably complex data structures nowadays, when I almost only write web apps (mostly backend in Ruby or Python but some frontend in JS.) I'm bet that both Ruby and Python have plenty of trees written in C inside their interpreters. I do everything with arrays (lists) and hash tables (dicts) in Ruby and Python. Maybe a language construct for trees would be nice for lower level languages and almost out of scope for higher level ones.

The same from another angle: there are a lot of trees in the indices of SQL databases (example [1]) but we don't zoom in to that level of detail very often when defining our tables.

[1] https://www.postgresql.org/docs/current/btree.html

thesz

User interface widgets form a forest - windows contain layouts that contain widgets which also can be layouts, etc, etc.

To implement Brown's algorithm to optimize class-based language models I had to implement a complex forest (DAG, actually) in Python using lists of fixed length. That was not especially nice to work with.

scotty79

Nested dicts are trees.

walleeee

Do you keep your dicts fastidiously flat?

pmontra

Usually yes. Web apps are simple. There isn't much to do inside a request params to db to response call. But sometimes, not every year, there is something to really reason about. I still remember some project from 10 or 20 years ago.

Retr0id

> "Why not just use recursive functions"

One great reason not to use recursive functions for traversing trees is that you can allocate your own stack data structure rather than relying on the call stack itself. In most languages/runtimes, the call stack has a maximum depth which limits the depth of trees you can process, usually on the order of thousands of stack frames.

Managing your own stack usually produces weirder looking code (personally I find "naive" recursive approaches more readable) - but having it as a first-class language feature could solve that!

cobbal

If we're fixing the language, may as well fix the real problem and not artificially limit stack space.

Retr0id

The language doesn't get much of a say, stack limits are usually inherited from the OS itself. It's fixable by not using the OS-provided stack but that's a much more invasive change than a new syntax/stdlib feature.

worthless-trash

The compiler/language absolutely gets a say in this, you can do tail call recursion without stack saturation.

swiftcoder

To some extent, this seems like it's just a symptom of "writing iterators in C++ is harder than it ought to be". You don't need to have a tree in memory to build an iterator over it - folks write iterators over implicit data all the time.

Here's the range based for loop counterexample from the blog post, as a python generator:

  import itertools

  def iterate_tree():
      for length in range(1, 8):
          for combo in itertools.product("abc", repeat=length):
              yield ''.join(combo)

  for s in iterate_tree():
      print(s)

uzerfcwn

C++ also has generators[1], but the author is perhaps unaware of them.

[1] https://en.cppreference.com/w/cpp/coroutine/generator

hiccuphippo

"yield from" is also very useful to iterate trees in python.

soegaard

Pick a language that allows users to define their own control structures and test your idea in practise.

Candidates: Racket, Scheme, Rust.

k__

You don't even have to got that far.

Defining your own iterator would be enough for most cases.

scotty79

The point is not to do that.

scotty79

I think Scala is also close. It has really flexible syntax.

lutusp

The article's thesis relies on the idea that a genuinely primitive traversal action exists, in the way that a for-loop is primitive and widely applicable, or adding two floats is common enough to justify building it into the language (or processor).

But tree traversal doesn't have this universal property. There are too many methods and purposes for traversing a tree, sufficient that IMHO no single primitive embodiment could materially improve a language. Also, modern compilers efficiently break down high-level traversal code so well that expressing the idea at a high level incurs no serious penalty compared to having a primitive for that purpose, or a series of them.

fngjdflmdflg

I read a similar argument for graphs in general,[0] where this is more obviously true.

[0] https://www.hillelwayne.com/post/graph-types/ and https://news.ycombinator.com/item?id=39592444