Skip to content(if available)orjump to list(if available)

Show HN: Pyper – Concurrent Python Made Simple

Show HN: Pyper – Concurrent Python Made Simple

10 comments

·January 12, 2025

Hello and happy new year!

We're excited to introduce the Pyper package for concurrency & parallelism in Python. Pyper is a flexible framework for concurrent / parallel data processing, following the functional paradigm.

Source code can be found on [github](https://github.com/pyper-dev/pyper)

Key features:

Intuitive API: Easy to learn, easy to think about. Implements clean abstractions to seamlessly unify threaded, multiprocessed, and asynchronous work.

Functional Paradigm: Python functions are the building blocks of data pipelines. Let's you write clean, reusable code naturally.

Safety: Hides the heavy lifting of underlying task execution and resource clean-up. No more worrying about race conditions, memory leaks, or thread-level error handling.

Efficiency: Designed from the ground up for lazy execution, using queues, workers, and generators.

Pure Python: Lightweight, with zero sub-dependencies.

We'd love to hear any feedback on this project!

solidasparagus

Nice work! There is a gap when it comes to writing single-machine, concurrent CPU-bound python code. Ray is too big, pykka is threads only, builtins are poorly abstracted. The syntax is also very nice!

But I'm not sure I can use this even though I have a specific use-case that feels like it would work well (high-performance pure Python downloading from cloud object storage). The examples are a bit too simple and I don't understand how I can do more complicated things.

I chunk up my work, run it in parallel and then I need to do a fan-in step to reduce my chunks - how do you do that in Pyper?

Can the processes have state? Pure functions are nice, but if I'm reaching for multiprocess, I need performance and if I need performance, I'll often want a cache of some sort (I don't want to pickle and re-instantiate a cloud client every time I download some bytes for instance).

How do exceptions work? Observability? Logs/prints?

Then there's stuff that is probably asking too much from this project, but I get it if I write my own python pipeline so it matters to me - rate limiting WIP, cancellation, progress bars.

But if some of these problems are/were solved and it offers an easy way to use multiprocessing in python, I would probably use it!

globular-toast

Do you really need to reinvent the wheel every time for parallel workloads? Just learn GNU parallel and write single-threaded code.

Concurrency in general isn't about parallelism. It's just about doing multiple things at the same time.

halfcat

> I don't want to pickle and re-instantiate a cloud client every time I download some bytes for instance

Have you tried multiprocessing.shared_memory to address this?

solidasparagus

I haven't played with that much! This isn't really a problem in general for my approach to writing this sort of code - when I use multiprocessing, I use a Process class or a worker task function with a setup step followed by a while loop that pulls from a work/control queue. But in the Pyper functional programming world, it would be a concern.

IIRC multiprocessing.shared_memory is a much more low-level of abstraction than most python stuff, so I think I'd need to figure out how to make the client use the shared memory and I'm not sure if I could.

rtpg

You really should dive more into the `multiprocess` support option and highlight how this gets around issues with the GIL. This feels like a major value add, and "does this help with CPU-bound work" being "yes" is a big deal!

I don't really need pipelining that much, but pipelining along with a certain level of durability and easy multiprocessing support? Now we're talking

t43562

...although python 3.13 can be built without the GIL and it really does make threading useful. I did some comparisons with and without.

I suppose one excellent thing about this would be if you could just change 1 parameter and switch from multiprocessing to threaded.

minig33

This is cool - I’ve been looking for something like this. I really liked the syntax of Prefect v1 but it was overcomplicated with execution configuration in subsequent versions. I just want something to help me just run async pipelines and prevent AsyncIO weirdness - going to test this out.

grandma_tea

Nice! I'm looking forward to trying it out. This seems very similar to https://github.com/cgarciae/pypeln/

kissgyorgy

Very simple and elegant API!