Skip to content(if available)orjump to list(if available)

Superfunctions: A universal solution against sync/async fragmentation in Python

PaulHoule

It strikes me as a worst of all worlds solution where it is awkward to write superfunctions (gotta write all those conditionals) and awkward to use them (gotta call fn.something())

For async to be useful you've got to be accessing the network or a pipe or sleep so you have some context, you might as well encapsulate this context in an object and if you're doing that the object is going to look like

  class base_class:
     @maybe_async
     def api_call(parameters)
        ... transform the parameters for an http call ...
        response = maybe_await(self.http_call(**http_parameters))
        ... transform the response to a result ...
        return result
almost every time where I was wishing I could have sync and async generated from the same source it was some kind of wrapper for an http API where everything had the same structure -- and these things can be a huge amount of code because the http API has a huge number of calls but the code is all the same structure.

zelphirkalt

Why is it only "maybe async" and only "maybe await"? Isn't it clear at writing code time which one it is, async or sync?

jcranmer

There is some code that wants to be generic over its executor. This is particularly true of something like parsing code, where you logically have a "get more data" call and you really don't care about the details of the call otherwise. So if the user provided a synchronous "get more data" source, you'd want the parse method to be synchronous; if it were asynchronous, you'd want it the parse method to be asynchronous.

That's basically the genesis of the idea of maybe-async. I've cooled tremendously on the idea, personally, because it turns out that a lot of code has rather different designs throughout the entire stack if you're relying on sync I/O versus async I/O, and this isn't all that useful in practice.

PaulHoule

For an http API (say something like boto3 or a client for arangodb) you might want to use the API from either a sync or async application. Since the code is almost all the same you can code generate a version of the API for both sync and async which is particularly easy if you use

https://www.python-httpx.org/

since you can use basically the same http client for both sides. One way to do it is write code like the sample I showed and use

https://docs.python.org/3/library/ast.html

to scan the tree and either remove the maybe_await() or replace it with an await accordingly. You could either do this transformation when the application boots or have some code that builds both sync and async packages and packs the code up in PyPi. There are lots of ways to do it.

zelphirkalt

Why not just make a decision to query an API async and be done with it, instead of having something being "maybe"? Since this whole topic is about having a good way to have ones cake and also eat it, assuming one has such a way, I don't see any downside to choosing async for API calls.

operator-name

Certainly an interesting approach compared to asgiref or synchronicity but I have doubts about the approach.

Does this not add further function colors - that of a transfunction, tiddle superfunction and non-tilde superfunction? Now every time you call one from another you need to use both the context managers and know what variant you are calling.

asgiref provides the simple wrappers sync_to_async() and async_to_aync(). Easy to understand and to slowly transition. Caveat is the performance impact if overused.

synchronicity uses a different approach - write 100% async code and expose both a sync and async interface. async def foo() becomes def foo() and async def foo.aio().

https://github.com/django/asgiref https://github.com/modal-labs/synchronicity

nilslindemann

"async" is a misnomer. What we call "async" is actually "chaotic async" or "time-optimized async" or "switchable async" and what we call "sync" is actually "ordered async" or "unoptimized async", and what we call "parallel" (a good name) is actually "sync".

Because nomen est omen, everything done now will just result in a growing pile of complexity. (see also the "class" misnomer for types), until someone looks again, and give the proper name - or operator - to the concept.

I imagine a future where we have a single operator on a line, like a "." which says: do everything from the last point to here parallel or async - however you want, in any order you want - but here is the point where everything has to be done ("joined"), before proceeding.

7bit

> "async" is a misnomer. What we call "async" is actually "chaotic async" or "time-optimized async" or "switchable async" and what we call "sync" is actually "ordered async" or "unoptimized async", and what we call "parallel" (a good name) is actually "sync".

That's just semantic nitpicking. Everybody knows what async/sync means. The term is established for a very long time.

nilslindemann

Using a logical fallacy like "Everybody knows ..." indicates you are not sure about your argument, or intentionally dishonest.

pomponchik

Many old Python libraries got their mirrored async reflections after the popularization of asynchrony. As a result, the entire Python ecosystem was duplicated. Superfunctions are the first solution to this problem, which allows you to partially eliminate code duplication by giving the client the opportunity to choose (similar to how it is done in Zig) whether to use the regular or asynchronous version of the function.

nine_k

But AFAICT in Zig you don't have to have async and sync versions. Instead, the runtime may choose to interpret `try` as an async or as a synchronous call, the latter is equivalent to a future / promise that resolves before the next statement [1]. This is a sane approach.

Having separate sync / async versions that look like the same function is a great way to introduce subtle bugs.

[1]: https://kristoff.it/blog/zig-new-async-io/

RS-232

I don’t see the utility here. You’re still duplicating code inside the template function with context managers.

The decorator would be a lot more useful if it abstracted all that away automagically. I/O bound stuff could be async and everything else would be normal.

rsyring

  ~my_superfunction()
  #> so, it's just usual function!
  
> Yes, the tilde syntax simply means putting the ~ symbol in front of the function name when calling it.

There's a way to work around that but...

> The fact is that this mode uses a special trick with a reference counter, a special mechanism inside the interpreter that cleans up memory. When there is no reference to an object, the interpreter deletes it, and you can link your callback to this process. It is inside such a callback that the contents of your function are actually executed. This imposes some restrictions on you:

> - You cannot use the return values from this function in any way...

> - Exceptions will not work normally inside this function...

Ummm...I'm maybe not the target audience for this library. But...no. Just no.

operator-name

I think that only applies to tilde_syntax=False but way it is written isn’t very clear.

OutOfHere

I advise users to just abandon async in Python for long term success because the future of Python is free-threaded, and async is inherently single-threaded. Even if you don't need multiple threads now, your CPU will thank you later when you do, saving you a full rewrite. With Python 3.14, free-threading is an established optional feature of Python.

zbentley

The two don't really compete, because async/await is primarily about parallelizing IO.

If I want to (say) probe a dozen URLs for liveness in parallel, or write data to/from thousands of client sockets from my webserver, doing that with threads--especially free-threaded Python threads, which are still quite lock-happy inside the interpreter, GIL or not--has a very quickly-noticeable performance and resource cost.

Async/await's primary utility is that of a capable utility interface for making I/O concurrent (and parallel as well, in many cases), regardless of whether threads are in use.

Hell, even golang multiplexes concurrent goroutines' threads onto concurrent IO schedulers behind the scenes, as does Java's NIO, Erlang/BEAM, and many many similar systems.

PaulHoule

People who talk about there being a hard line between parallelism and concurrency are always writing code with race conditions that they deny exist or writing code with performance bottlenecks they can't understand because they deny they exist.

I like working in Java because you can use the same threading primitives for both and have systems that work well in both IO-dominated and CPU-dominated regimes which sometimes happen in the same application under different conditions.

Personally there are enough details to work out that we might be up to Python 3.24 when you can really count on all your dependncies to be thread safe. One of the reasons Java has been successful is the extreme xenophobia (not to mention painful JNI) which meant we re-implemented stuff to be thread-safe in pure Java as opposed to sucking in a lot of C/C++ stuff which will never be thread safe.

zelphirkalt

Using multiple OS level threads is not very efficient, when the only problem is waiting for some IO and wanting to do something else in the meantime. A more lightweight primitive of concurrency is needed. I am not saying that async is necessarily it.

amelius

Async is nice in theory, but then some manager asks to do not only IO but also some computation, and there goes the entire plan. Better to use threads from the start because they can be used to manage both types of resource (IO and CPU) in a uniform way.

OutOfHere

That is precisely what happens. People then jump to absurdities like microservices which severely compound the architecture.

OutOfHere

Go lang will happily use multiple cores if the load calls for it, multiple cores are available, and GOMAXPROCS is not restricted to 1.

With Python's asyncio, yes, you can read from many client sockets in a core, but you really can't do much work with them in the core, otherwise even the IO will slow down. It is not future-proof at all.

zbentley

Not for I/O. If many goroutines all request a socket read, the golang runtime shoves them all onto an IO multiplexing event loop very similar to Python's asyncio stdlib event loops: https://www.sobyte.net/post/2022-01/go-netpoller/

And sure, Golang's much better than a cooperative concurrency system at giving work out to multiple cores once the IO finishes, no argument there.

But again . . . async/await in Python (and JavaScript, Java NIO, and many more) is not about using multiple cores for computations; it's about efficiently making IO concurrent (and, if possible, parallel) for unpredictably "IO-wide" workloads, of which there are many.

anon291

So much work to implement the monad type class

adamwk

I don’t think this implements anything monad shaped

codebje

Async is monad shaped. Not-async is monad-shaped, for a degenerate monad. Writing a function that works in both async and not-async contexts just means writing a function that works for any monad.

zbentley

Not really; the problem is that languages with IO monads often provide a runtime that can schedule IO-ful things concurrently (or, in Haskell's case, lazily) based on the type. Python has no such scheduler; users have to run their own in the form of an async-capable event loop or a sequential (threadpool/processpool) executor for blocking code.

Because of that missing runtime for scheduling and evaluating IO-ful things, tools like superfunctions are necessary.

In other words: IO monads are only as useful as the thing that evaluates them; Python doesn't have a built-in way to do that, so people have to make code that looks "upward" to determine what kind of IO behavior (blocking/nonblocking/concurrent/lazy/etc.) is needed.

codebje

If you want a function to be usable in both an async and a non-async environment, the monads in question are ones for async and identity, not IO. The choice between a true concurrent runtime and a single threaded cooperative coroutine runtime is up to you in GHC Haskell.

Monad-agnostic functions are exactly looking upwards to allow the calling context to determine behaviour.

operator-name

You frame that as if python doesn’t have a choice, but it chose to have explicit syntax.

There’s no reason a python-like language couldn’t have deeper semantics for async, and language level implementation.

zbentley

Well, there aren't reasons why Python couldn't have that, but there certainly are good reasons why the Python maintainers decided that type of runtime was not appropriate for that specific language--not least among them that maintaining a scheduling/pre-empting runtime (which is the only approach I'm aware of that works here without basically making Python into an entirely unrelated language) is very labor intensive! There's a reason there aren't usable alternative implementations of ERTS/BEAM, and a reason why gccgo is maintained in fits and starts, and why the Python GIL has been around for so long. Getting that type of system right is very, very hard.