Skip to content(if available)orjump to list(if available)

Decorator JITs: Python as a DSL

Decorator JITs: Python as a DSL

21 comments

·February 3, 2025

6gvONxR4sf7o

I've had a lot of fun with tracing decorators in python, but the limitation of data dependent control flow (e.g. an if statement, a for loop) always ends up being more painful that I'd hope. It's a shame since it's such a great pattern otherwise.

Can anyone think of a way to get a nice smooth gradation of tracing based transformations based on effort required or something. I'd love to say, 'okay, in this case i'm willing to put in a bit more effort' and somehow get data dependent if statements working, but not support data dependent loops. All I know of now is either tracing with zero data dependent control flow, or going all the way to writing a python compiler with whatever set of semantics you want to support and full failure on what you don't.

On a different note, some easy decorator DSL based pdb integration would be an incredible enabler for these kinds of things. My coworkers are always trying to write little 'engine' DSLs for one thing or another, and it sucks that whenever you implement your own execution engine, you completely lose all language tooling. As I understand it, in compiler tooling, you always have some burden of shepherding around maps of what part of the source a given thing corresponds to. Ditto for python decorator DSLs, except nobody bothers, meaning you get the equivalent of a 1960's developer experience in that DSL.

sega_sai

I hope this is the future for Python. Write in pure Python, but if needed, the code can be JIT (or not JIT) compiled into something faster (provided your code does not rely too much on low-level python stuff, such as __ functions).

sevensor

The misleading thing about this approach is that the decorated function is no longer Python at all. It’s another language with Python syntax. Which is a neat way to get a parser for free, but it’s going to set up expectations about semantics that are bound to be incorrect.

Scene_Cast2

If this is the way forward, I'd love for a better developer experience.

I'm currently wrangling with pytorch.compile (flex attention doesn't like bias terms - see issues 145869 and 144511 if curious). As much as I love speed and optimization, JIT (at least the pytorch flavor) currently has weird return types that break VS Code's intellisense, weird stack traces, limitations around printing, and random issues like the bias term, and limitations such as not supporting sparsity.

Speaking of pytorch JIT workflows - what's a nice way of having a flag to turn off compilation?

dleeftink

> random issues like the bias term

I'd like to know more about this!

dec0dedab0de

I would rather a JIT just built into the reference implementation. a JIT would help way more programs than removing the GIL but everyone thinks the GIL affects them for some reason.

masklinn

They serve different use cases. The function JIT pattern being a manual opt in it can be much more aggressive by only supporting restricted language patterns rather than... the entire thing. They can also use bespoke annotations for better codegen e.g. you can tell numba to only codegen for i32 -> i32 -> i32, rather than lazily codegen for any a -> b -> c.

svilen_dobrev

i needed to make the "tracing" part - which i called "explain" - without jits, in 2007-8.. using combination of operator-overloading, variables-"declaring", and bytecode hacks [0].

Applied over set of (constrained) functions, and the result was well-formed trace of which var got what value because of what expression over what values.

So can these ~hacks be avoided now - or not really?

[0] https://github.com/svilendobrev/svd_util/blob/master/tracer....

est

Aha, anyone remember psyco from the python 2.x era?

https://psyco.sourceforge.net/psycoguide/node8.html

p.s. The psyco guys then went another direction called pypy.

p.p.s. There's also a pypy based decorator but it limits the parameters to basic types. Sadly I forgot the github.

rented_mule

Yes! I used psyco in production for a while, and the transition to psyco resulted in some interesting learning...

I had written a C extension to speed up an is-point-in-polygon function that was called multiple times during every mouse move in a Python-based graphical application (the pure Python version of the function resulted in too much lag on early 2000s laptops). When psyco came out, I tried moving the function back to Python to see how close its speed came to the C extension. I was shocked to see that psyco was significantly faster.

How could it be faster? In the C extension, I specified everything as doubles, because I called it with doubles in some places. It turns out the vast majority of the calls were working with ints. The C extension, as written, had to cast those ints to doubles and then do everything in flouting point, even though none of the calculations would have had fractional parts. Pysco did specialization - it produced a version of the function for every type signature it was called with. So it had an all-int version and an all-double version. Psyco's all-int version was much faster than the all-double version I'd written in C, and it was what was being called 95% of the time.

If I'd spent enough time profiling, I could have made two C functions and split my calls between them. But psyco discovered this for me. As an experiment, I tried making two versions of the C functions. Unsurprisingly, that was faster than psyco. I shipped the psyco version as it was more than fast enough, and much simpler to maintain.

My conclusion... JITs have more information to use for optimization than compilers do (e.g., runtime data types, runtime execution environment, etc.), so they have the potential to produce faster code than compilers in some cases if they exploit that added information through techniques like specialization.

svilen_dobrev

it was very good. But there was a win only if one can avoid the overhead of function-calls, which is slowest thing in python - magnitude+ more than anything else (well, apart of exception throwing which is even slower, but.. rare). In my case, the speedup in calculations was lost in slowdown because of funccals.. so i ended up grouping and jamming most calculations in one-big-func(TM).. and then that was psyco-assembly-zed.

btw funccalls are still slowest thing. somedict.get(x) is almost 2x slower than (x in somedict and somedict[x]). In my last-year attempt to optimizing transit-protocol lib [0], bundling / copying few one-line calls in one 5-line func was the biggest win - and of course, not-doing some things at all.

[0] https://github.com/svilendobrev/transit-python3/blob/master/...

null

[deleted]

agumonkey

less complex libraries do python ast analysis wrapped in decorators to ensure purity of code for instance

it's a fun foot-in-the-door trick to start going into compilation

bjourne

Great, but afaict, it's not a jit. It is using llvm to aot-compile Python code. Decorators are called when their respective functions are compiled, not when they are called.

dec0dedab0de

A decorator is run at compile time and it’s output replaces the decorated function. The new replaced function could then have a jit at runtime. I don’t know if that’s what’s happening, but using a decorator doesn’t mean it cant also be a jit.

masklinn

FWIW numba at least supports both cases depending how the decorator is used:

- if you just `@jit`, it will create a megamorphic function with specialisations generated at runtime

- if you pass a signature to `@jit`, it will compile a monomorphic function during loading

eliben

JIT is something different people sometimes define in different ways.

In this sample, when the function itself is called (not when it's decorated), analysis runs followed by LLVM codegen and execution. The examples in the blog post are minimal, but can be trivially extended to cache the JIT step when needed, specialize on runtime argument types or values, etc.

If this isn't JIT, I'm curious to hear what you consider to be JIT?

null

[deleted]