Skip to content(if available)orjump to list(if available)

Show HN: Zig Topological Sort Library for Parallel Processing

Show HN: Zig Topological Sort Library for Parallel Processing

44 comments

·April 1, 2025

I believe the best way to learn a language is by doing an in-depth project. This is my first Zig project intended for learning the ropes on publishing a Zig package. It turns out to be quite solid and performant. It might be a bit over-engineered.

This little library is packed with the following features:

  - Building dependency graph from dependency data.
  - Performing topological sort on the dependency graph.
  - Generating dependence-free subsets for parallel processing.
  - Cycle detection and cycle reporting.

Cieric

I've been enjoying zig myself quite a bit, I'm fairly confident I could do some larger projects in it (excluding comptime since it's missing features I sorely need for some of my current projects.) I like it a bit more than C/C++ in a lot of cases, I need it to be pushed just a tiny bit further before I can really dedicate effort towards large projects in it. I was even curious if I could implement the features I need myself (there is even a proposal already), but got the answer "Just don't." (I don't blame andrew, he already has a lot on his plate and I'm a stranger to him.) So I'm at the point of either waiting for the feature or forking it, haven't decided what I'm going to do though.

More on topic for the project though, did you have any project ideas for this? I think I could use this for my opencv node editor, I just did the naive method of marking all outputs dirty as their inputs nodes got reprocessed. I assume this would fix the potential problem of recomputing a node twice? I also see you mention "Cycle detection and cycle reporting." but what specifically happens if I do have a cycle? Does it just give up or is it something like best effort?

ww520

> More on topic for the project though, did you have any project ideas for this?

I implemented topological sort for bringing up and shutting down OS services in order like 20 years ago, but it was like a quick and dirty way doing it and never formally as a published library. I do it this time because it's a well understood topic so I can concentrate on learning the ropes of Zig and package publishing while doing something useful. And I go the extra miles to find dependence free subsets for parallel processing and to detect cycles.

> I assume this would fix the potential problem of recomputing a node twice?

Yes. Topological sort is perfect for laying out the path of computation tasks to avoid redoing things.

> "Cycle detection and cycle reporting." but what specifically happens if I do have a cycle? Does it just give up or is it something like best effort?

It will keep going as a best effort when cycles are detected. It will produce a partial topological sorted result and a set of cyclic nodes.

klysm

I was surprised to see this as a library at all - isn’t it trivial especially for small collections?

ww520

Yes. The core algorithm is like 20 lines of code in the run_algorithm() function. But to make it easier to use, to handle different types of input, and report/query output, etc. take much more. This is the difference between an idea and a product in a loose sense. It gives purpose to my learning process anyway.

kreco

Could you describe briefly what feature you are sorely missing?

I like the language intention but I can't get past the syntax.

Cieric

For me it's all comptime stuff and it's kind of arbitrary things like parsing out the type information of a function doesn't include the name of the function parameters, but basically everything else that has a name has that information present in their info structure. The other thing is tags, being able to tag things that I can parse at compile time. I'm making something close to a database orm, (specifically it's spacetimedb, thought it'd be fun to use zig with). But information about things like primary keys, auto increments, constraints and similar all has to live in a different structure completely untied to the original struct or function. I'd like to be able to tie those things together easily to avoid mistakes and confusion. I have different workarounds that I've tried, but nothing that's universal for all my test cases.

For syntax there are a few things that I'm iffy on, but nothing that I'd consider a deal breaker. I found it very easy to read right out of the gate, which is basically the only thing I need to really learn a new language (probably the only reason I haven't learned rust yet.)

kreco

Thanks for the reply.

I totally understand how those two features could be useful.

For the parameter name feature, I can't imagine a strong reason for not implementing it (I mean, apart of "we have other stuff to prioritize").

For the tag I could see an attribute system like in C++ [0]

On a tangential topic, I believe that's exactly the Pandora box of meta-programming.

[0] https://en.cppreference.com/w/cpp/language/attributes#Explan...

airstrike

Just wanted to say that Rust may look strange early on but very, very quickly becomes entirely natural, so don't let that be the reason why you haven't learned it is my unsolicited input

klysm

Syntax is so much less important that semantics that it isn’t even really worth talking about in my opinion

codethief

Readability (in the sense of "How fast can the average developer parse code in the given language?") and proneness to errors are a thing, though.

Consider, e.g., how in TypeScript object literals ({ a: b, c: d, e }), object types ({ a: b; c: d; }) and object destructuring ({ a, c, e } = … ) can all look very similar. Same thing for lambdas ((a: b) => b) and function types ((a: b) => b). Also, additional parentheses are needed to prevent the compiler from mistaking an object literal ({ … }) for a function body ({ … }) when it is returned in a lambda expression. In short: Some of TypeScript's syntactic constructs are heavily overloaded and their meaning depends on the context.

For an example of proneness to errors, consider that in Nix function calls (<function name> <param1> <param2> …) and lists ([<item1> <item 2> …]) look almost the same and it's very easy to confuse the two and mess up, e.g.

``` let foo = [ item1 myFunction "arg1" "arg2" item3 ] ```

defines a list with 5 items, not 3, due to missing parentheses around the function call.

jitl

For something like task scheduling in a build system, if you use an up-front partition of the tasks like this, you may have a task that has all its dependencies fulfilled because tasks may have different durations, but still remains unscheduled.

For example, with this input:

    $ cat data.file
    root: parentA parentB
    parentA: C D
    parentB: E F
Asking the package tool for the sorted sets gives me:

    $ ./zig-out/bin/toposort-cli --data data.file
    Processing succeeded.
      Topologically sorted sets: [ { C D E F }  { parentA parentB }  { root }  ]
      Topologically sorted list: [ C D E F parentA parentB root ]
      Nodes: [ root parentA parentB C D E F ]
      Dependency tree:
      [ C D E F ]
        C -> [ parentA ]
          parentA -> [ root ]
            root ->
        D -> [ parentA ]
          parentA -> [ root ]
            root ->
        E -> [ parentB ]
          parentB -> [ root ]
            root ->
        F -> [ parentB ]
          parentB -> [ root ]
            root ->
        
If we follow the README's advice to parallel process set-by-set, we can easily have starvation: once `C` and `D` are finished, we are ready to start `parentA`, even if E and F are still work-in-progress.

How would I use the API of this package to detect and start the task `parentA` as soon as `C` and `D` are finished? I guess when a task is finished, I can ask for the list of dependent tasks, and for each of them, check if all of their dependencies are finished, and if so start the task. But this real-world need feels un-met; it feels odd to me to focus on the sorted sets instead of more practical scheduling.

That is kind of doing Kahn's algorithm iteratively during the build. It would be cool to try and optimize that for maximum performance.

ww520

That's a great point! Unfortunately Topological Sort generates a linear order, forcing nodes to run one after another. This library attempts to bring some parallel processing into the picture by grouping dependence-free nodes together. This produces a linear batches.

  {C D E F}, {parentA parentB}, {root} 
Within a batch, nodes can run parallel. Between batches, they still need to run one after another.

What you're asking for is to partition the dependency graph according to node dependency with minimum span.

  { {C D} parentA } \
  { {E F} parentB } - {root}
Or with a shared leader.

  { {C D S} parentA } \
  { {E F S} parentB } - {root}
And on top of that, add node weight into the consideration. That's hard.

For now, you can send notification to a dependent when the leading task finishes. E.g. When C finishes, notifies parentA. When D finishes, notifies parentA. When parentA notices that all its leading tasks have done, it can start.

The library can help in maintaining the dependency relationship and let the task query its leaders and dependents.

For task running, it would be a separate library using the TopoSort library and specifically geared toward scheduling tasks.

genewitch

Did you use this library on the advent of code 2024? I'd never heard of topological sorting prior to that problem - and it was real early in the game.

rk06

Topological sorting is just "depth first traversal" in a trench coat. I have implemented it thrice in my day job.

It is actually more commonly implemented than any other algorithm in CS course

ww520

I didn’t take part in advent code. Topological sorting is a really old algorithm. Anything dealing with dependence would need it, like makefile.

emmanueloga_

For those that have not implemented toposort or don't remember it: 1) only directed graphs without cycles can be topologically sorted (DAGs) 2) there can be more than one topological order and 2) reverse post order of depth first traversal from every unvisited node shields a topological order.

In JavaScript:

    function toposort(graph = { a: ['b', 'c'], b: ['d'] }) {
      const order = [];
      const visited = new Map();
      
      function dfs(node) {
        const status = visited.get(node);

        if (status === 1) throw new Error("Cycle found.");
        if (status === 2) return;

        visited.set(node, 1); // In progress.
        const adjacent = graph[node] ?? [];
        for (const neighbor of adjacent) dfs(neighbor);
        visited.set(node, 2); // Done.

        order.unshift(node); // Reverse post order.
      }

      for (const node in graph) {
        if (!visited.has(node)) dfs(node);
      }

      return order;
    }

chubot

Hm this reminds me that the Python stdlib has grown a module for topological sorting fo a graph:

https://docs.python.org/3/library/graphlib.html

I haven't used it yet, I'd be curious if anyone has

nerdponx

I used it to build a (now permanently unfinished) lightweight DAG runner that uses only the Python standard library, with the intention that you can just copy the .py file into a project and use it without installing anything other than Python on your system. I think it might be of niche use to some people, but I personally wasn't even dogfooding it, I was just scratching an itch.

The purpose of adding it to the standard library was to implement linear MRO for classes: https://bugs.python.org/issue17005

jessekv

I built one too! I started with networkx but migrated to graphlib with python 3.9.

I run it in production for many customers. It's great to have this in the std lib.

paulddraper

It’s a common algorithmic need.

Not as common as array sort. But still common.

tediousgraffit1

I really like this, this is the perfect size project for exploring a new piece of tech. I especially like that you implemented an actual cli and not just tests.

ww520

Yes. The CLI is important. It's great for running tests and driving development. It also forces me to dog-food my own library to think in the shoes of the users of the library. In fact a number of features were added due to shortcomings exposed while implementing the CLI.

duped

> Generating dependence-free subsets for parallel processing.

Unsure how this is defined (*) but graph cutting approaches to concurrent task scheduling is both pessimistic (poor utilization of available resources) and (iirc) NP-hard, so you pay an big cost upfront.

On the other hand, if you know the indegree/outdegree of each node at the time they are visited (meaning the graph is static) you can run Kahn's algorithm concurrently and put each node into a shared work queue. You can optimize that further by having per-thread work queues and stealing between them. Depending on what the nodes represent there are even more optimizations and heuristics, concurrent task scheduling is a hard problem.

* imagine the graph

(a, b) (a, c) (b, d) (c, d)

Is it possible to get nodes b and c in parallel "subsets" in your library?

ww520

Yes. It would produce dependence-free subsets. I just ran your sample (assuming a,b means a depends on b).

  Topologically sorted sets: [ { d }  { b c }  { a }  ]
  Topologically sorted list: [ d b c a ]
  Nodes: [ a b c d ]
  Dependency tree:
  [ d ]
    d -> [ b c ]
      b -> [ a ]
        a ->
      c -> [ a ]
        a ->
The dependence-free subset finding is probably not exhausting and optimal. I haven't gone through formal proofing. It's opportunistic and best effort at best currently.

duped

How are the subsets defined?

ww520

At every round of the algorithm, all nodes with 0 in-degree (i.e. they are not depending on anyone) are collected as a dependence-free subset.

They serve as the root set to the rest of the graph for the current round. The depending nodes reached from root set have their in-degree decremented. When their in-degrees reach 0, they are added to the next root set.

I'm using double-buffering to maintain the current root set for processing and to collect the next root set for the next round, instead of using a queue as in Kahn's algorithm. At the end of the round, I simply swap the double-buffers. It's very efficient. When the next root set is empty, all nodes have been processed.

hbbio

Congrats! I like Zig a lot even though I never implemented a full project with it.

FWIW we had to build a related lib in TypeScript: https://github.com/okcontract/graph as part of the runtime of https://github.com/okcontract/cells

ww520

Nice! I like how it generates DOT data. May be I'll add support for it in the future.

sharmasachin98

Curious: have you benchmarked it against any similar tools in other languages (like Go or Rust) for comparison?

ww520

No. I haven't got the chance to benchmark against others. When I ran the benchmarks in the library, I could process 1 million pairs of dependency in 10's of milliseconds, in my 5-year old laptop.

  Testing 1000000 items, 1-to-1 chaining dependency.
    Add dependency 1000000 items. Time: 93ms, 10645885 items/s, 93 ns/item
              Sort 1000000 items. Time: 113ms, 8795778 items/s, 113 ns/item

  Testing 1000000 items, 1-to-4 chaining dependency.
    Add dependency 1000000 items. Time: 87ms, 11428323 items/s, 87 ns/item
              Sort 1000000 items. Time: 44ms, 22508986 items/s, 44 ns/item

  Testing 1000000 items, 1-to-10 chaining dependency.
    Add dependency 1000000 items. Time: 102ms, 9748793 items/s, 102 ns/item
              Sort 1000000 items. Time: 31ms, 31707077 items/s, 31 ns/item

  Testing 1000000 items, 1-to-10 chaining dependency, with max_range set.
    Add dependency 1000000 items. Time: 25ms, 39460028 items/s, 25 ns/item
              Sort 1000000 items. Time: 31ms, 31633556 items/s, 31 ns/item

jedisct1

Good job! This is really cool!