Skip to content(if available)orjump to list(if available)

Ovld – Efficient and featureful multiple dispatch for Python

linschn

One the one hand, this is pretty cool, the API is pythonic and makes quite a lot of sense.

On the other hand, I can't stop myself from thinking about "Greenspun's tenth rule":

> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

This doesn't apply directly here, as the features are intentional and it seems they are not bug ridden at all. But I get a nagging feeling of wanting to shout 'just use lisp!' when reading this.

https://wiki.c2.com/?MultiMethods

upghost

It's actually more like "just use Haskell".

Having written one of these[1] a decade ago and inflicting it (with the best of intentions) upon production code in anger, I can tell you this often leads to completely unmaintainable code. It is impossible to predict the effect of changing a method, tracing a method, debugging a method (where do I put the breakpoint??).

The code reads beautifully though. Pray you never have to change it.

The reason I say "just use haskell" instead of lisp is bc lisp generics suffer from this same problem.

Btw if anyone has a solution to this "generic/multidispatch maintainability in a dynamically typed language" problem I would love to hear it.

[1]: https://github.com/jjtolton/naga/blob/master/naga/tools.py

Fr0styMatt88

I’ve been doing the once-every-few-years deep dive into Smalltalk that I tend to do out of curiosity and this kind of question is one that always comes up for me. The answer there seems to be “the whole environment and philosophy is geared to work that way”; a dynamic language needs a lot of tooling support and interactivity.

igouy

> this kind of question

Which kind-of question? "where do I put the breakpoint??"

hasley

I am thinking more about Julia here - which I would use if Python was not that common in several communities.

pansa2

Is it common in Julia to use multiple-dispatch on 3 or more arguments, or just double-dispatch?

Julia definitely made the right choice to implement operators in terms of double-dispatch - it’s straightforward to know what happens when you write `a + b`. Whereas in Python, the addition is turned into a complex set of rules to determine whether to call `a.__add__(b)` or `b.__radd__(a)` - and it can still get it wrong in some fairly simple cases, e.g. when `type(a)` and `type(b)` are sibling classes.

I wonder whether Python would have been better off implementing double-dispatch natively (especially for operators) - could it get most of the elegance of Julia without the complexity of full multiple-dispatch?

ChrisRackauckas

It's not uncommon to dispatch on 3 or more arguments. Linear algebra specializations are one case where I tend to do this a lot, for example specializing on structured matrix types (block banded matrices) against non-standard vectors (GPU arrays), you then need to specialize on the output vector to make it non-ambiguous in many cases.

StefanKarpinski

The paper "Julia: Dynamism and Performance Reconciled by Design" [1] (work largely by Jan Vitek's group at North Eastern, with collaboration from Julia co-creators, myself included), has a really interesting section on multiple dispatch, comparing how different languages with support for it make use of it in practice. The takeaway is that Julia has a much higher "dispatch ratio" and "degree of dispatch" than other systems—it really does lean into multiple dispatch harder than any other language. As to why this is the case: in Julia, multiple dispatch is not opt-in, it's always-on, and it has no runtime cost, so there's no reason not to use it. Anecdotally, once you get used to using multiple dispatch everywhere, when you go back to a language without it, it feels like programming in a straight jacket.

Double dispatch feels like kind of a hack, tbh, but it is easier to implement and would certainly be an improvement over Python's awkward `__add__` and `__radd__` methods.

[1] https://janvitek.org/pubs/oopsla18b.pdf

ethagnawl

I will forever think of this talk by Peter Siebel (author of Practical Common Lisp) whenever I hear about multiple dispatch: https://youtu.be/VeAdryYZ7ak?si=wY3RmcRnW96jxQQm

cdrini

That's a very long video you've linked to :P could you talk about why multiple dispatch reminds you of that talk?

ethagnawl

Seibel spends a fair amount of time talking about how powerful and flexible CL's dynamic dispatch mechanism is and how/why it was still a novel feature at the time (~15 years ago).

null

[deleted]

ziihrs

I'm curious about what makes this implementation faster than the alternatives.

Also, if you care about function call performance, I guess you'd use PyPy. Have you tried to run the benchmarks (with appropriate warmup) on PyPy to see if the results carry over?

breuleux

Mainly codegen. Most other libraries do something like `return dispatch_dictionary[tuple(map(type, args))](*args)`. Whereas ovld generates a specialized function to the set of actual signatures of the overloads (still with a dictionary, but you remove a surprising amount of overhead just from removing varargs). The code is registered in the linecache, so you can step into it with pdb if you want to look at it. After that I became a bit obsessive about pushing it further and further (because it's fun). So... when dependent types are involved, it will actually generate custom dispatch code, e.g. if you define an @ovld with x:Literal[0], for example, it will generate an `if x == 0` in the int dispatch path (and you can define custom codegens for new types like Regexp).

Regarding PyPy, I did run some benchmarks recently (I use pytest-benchmark). The advantage over other libraries remains and the magnitude is similar, but when I compare with custom if/isinstance code, that code is optimized a lot more aggressively and it gains a significant edge (let's say ~5x). Now, since I'm into codegen, part of me feels like meeting the challenge and figuring out a way to generate optimal code from a set of signatures, but I think I've spent enough time as it is haha.

ziihrs

Very interesting, thanks for the explanation!

paddy_m

This looks really cool. I get multiple dispatch, multi methods, and CLOS method specialization confused in my head. I can understand what this does... But when do you use it.

Can people list some times when they actually used multimethods to solve a real problem? How did it work out? Would you do it again?

SatvikBeri

My company has used Julia for about 5 years now. 90% of the time in application code you only need single dispatch, same as OOP.

One case where I actually rely on multiple dispatch is conversion to more or less structured data. Inspired by Clojure, I have a function `conform(OutputDataType, input_data::SomeType)` and specific implementations depend on both types involved.

Multiple dispatch is also really helpful when two libraries don't quite do the same thing. In python pandas, numpy, and pytorch all have slightly different implementations of x.std() (standard deviation) with slightly different APIs. This means you can't write code that's generic across a numpy array or a pytorch tensor. In Python this could only easily be fixed if library authors coordinate, but with multiple dispatch you can just fix this in your own code.

paddy_m

That makes sense for writing your own generic libraries. Do you frequently have to convert from pytorch to numpy? I would think that a for a given execution unit (pipeline step, single program, model running on a server) that the data will stay as the same type after some initial conversion.

SatvikBeri

I'd say we have about 60% library code and 40% application code internally, so making it easier to write libraries is really nice. E.g. you don't want to have to write different statistical calculations for every type you use.

In particular, the fact that these types would silently give you different answers if you called `.std()` was a big headache

It was very common for us to want to be generic over pandas series and numpy arrays. A bit less so with pytorch tensors, but that was because we just aggressively converted them to numpy arrays. Fundamentally these three are all very similar types so it's frustrating to have to treat them differently.

breuleux

I use it quite a lot (having made it), most often when I need to recursively process complex heterogeneous data structures. Merging two data structures, treemap, processing a Python AST, etc. Also in any situation where you want to be able to register new behavior.

One case where I'm finding it extremely useful is that I'm currently using ovld to implement a serialization library (https://github.com/breuleux/serieux, but I haven't documented anything yet). The arguments for deserialization are (declared_type, value, context). With multimethods, I can define

* deserialize(type[int], str, Any) to deserialize a string into an int

* deserialize(type[str], Regexp[r"^\$"], Environment) to deserialize $XYZ as an environment variable lookup but only if the context has the Environment type

* deserialize(type[Any], Path, WorkingDirectory) to deserialize from a file

That's one situation where ovld's performance and codegen capabilities come in handy: the overhead is low enough to remain in the same performance ballpark as Pydantic v2.

perlgeek

A use case is when a sub or method needs to do different things based on the data type of the argument.

Example: turning a data structure into a JSON string, solved with multi dispatch: https://github.com/moritz/json/blob/master/lib/JSON/Tiny.pm#...

Another common useful example is constructors. A Date class might have constructors that accept

   Date(Int, Int where 1..12, Int where 1..31) # Y, M, D
   Date(Str where /^\d4-\d2-\d2$/) # YYYY-MM-DD
   Date(DateTime) # extract the date of a DateTime object
etc. to make it very intuitive to use.

nbadg

Also, dependent types, IE value-based dispatch. Which can be incredibly useful when dealing with enums. That alone is enough to make me curious to try it!

paddy_m

Value based dispatch is a much better name for it then Dependent Types. I have seen the term dependent types and just glossed over it because I thought it was a more complex topic.

breuleux

Yeah, I suppose. I wrote "dependent" because I think that's the term of art of it, but you're making me think I should probably change the header to something more intuitive to people unfamiliar with type theory.

blarg1

Makes naming functions easier, eg multiply(mat3,vec3), multiply(mat3,mat3), etc

If the language is dynamically typed, you can use it for polymorphism.

photios

Lord, have mercy on the souls of those developers that inherit such overload-infested codebases.

Iwan-Zotow

AI has no soul

rahulprasath

Why don’t mainstream languages like Python just support multimethods directly?

perlgeek

Probably because typing in Python is optional and kinda bolted-on after the fact, and you really need a good type system to get the most out of multimethods.

In particular, type declarations, as they were first introduced in python, did not have a run-time effect, and they very much have a run-time effect with multi methods.

(Dataclasses are kind of an exception to this rule, but one that feels far less fundamental than multi dispatch would).

RadiozRadioz

Many of them do, e.g. Java. Why don't _some_ mainstream languages like Python not support it? Entirely design preference, usually because they want to have less emphasis on the type system.

pdhborges

Java supports it? Are you conflating method overload (which is statically determined) with multimethods?

blu3h4t

Why is it made look like it’s called by the nickname of a prominent Perl hacker? :D

perlgeek

It also looks familiar to multi dispatch in Raku, formerly Perl 6.

See for example https://docs.raku.org/language/functions#Multi-dispatch

shawn_w

Not just a prominent perl hacker, but one who's been working on a major new OO system for the language.

blu3h4t

Yea, I know right, Before I had a joke about Perl like this: Perl is like nineties music, it’s simply the best of all times. :) But a few days ago I made a new one up: Perl is a shrodingers programming language. :D

shawn_w

>Perl is like nineties music, it’s simply the best of all times.

You called that a joke, but I think it's just the truth.

olejorgenb

I assume this does not play ball with typehints? I don't think a decorator can "output" an @overload?

olejorgenb

mpeg

That's both genius and a massive hack, love it.

For those who are wondering: they alias @typing.overload as @ovld only for type checkers.

Normally, @typing.overload is a way to define multiple function signatures with empty bodies so your type checker can work nicely with functions that for example only return a certain type when called with a certain signature, but the function implementation is left to you and usually involves a bunch of runtime type checks to branch your logic the right way

paddy_m

What is the return type of an overloaded function? I don't see that in any of the examples.

I think it would make sense to have each function return either the exact same type, or a generic.

  @ovld(int) #all of these overloads should return an int
  len(a:list) -> int
  len(a:str) -> int

  #the effective type of len is
  len(a:Union[int, str]) -> int
and then a generic example

  T = TypeVar('T')
  @ovld(T)
  add(a:np.int8, b:np.int8) -> np.int8

  @ovld(T)
  add(a:np.int16, b:np.int16) -> np.int16
In the second example, I don't know how you would connect the input params to the generic T. If you put concrete types in, I see how ovld works.

Am I missing something?

breuleux

It works, ish. Type checkers have a rule that you must have a non-stub, non-decorated overload, so if using ty for example, you need to --ignore invalid-overload. mypy doesn't seem to recognize the hack, although I could've sworn it did? Anyway, it'd be nice if we could mark a decorator as functioning like @overload but without the need for stubs.

est

how about dispatch for both sync and async?

I did this: https://news.ycombinator.com/item?id=43982570