Skip to content(if available)orjump to list(if available)

The future of Python web services looks GIL-free

rogerbinns

C code needs to be updated to be safe in a GIL free execution environment. It is a lot of work! The pervasive problem is that mutable data structures (lists, dict etc) could change at any arbitrary point while the C code is working with them, and the reference count for others could drop to zero if *anyone* is using a borrowed reference (common for performance in CPython APIs). Previously the GIL protected where those changes could happen. In simple cases it is adding a critical section, but often there multiple data structures in play. As an example these are the changes that had to be done to the standard library json module:

https://github.com/python/cpython/pull/119438/files#diff-efe...

This is how much of the standard library has been audited:

https://github.com/python/cpython/issues/116738

The json changes above are in Python 3.15, not the just released 3.14.

The consequences of the C changes not being made are crashes and corruption if unexpected mutation or object freeing happens. Web services are exposed to adversity so be *very* careful.

It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.

dehrmann

I think Java got this mostly right. On the threading front, very little is thread-safe or atomic (x += 1 is not thread-safe), so as soon as you expose something to threads, you have to think about safe access. For interacting with C code, your choices are either shared buffers or copying data between C and Java. It's painful, but it's needed for memory safety.

westurner

> It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.

Create or extend a list of answers to:

What heuristics predict that code will fail in CPython's nogil "free threaded" mode?

rogerbinns

Some of that is already around, but scattered across multiple locations. For example there is a list in the Python doc:

https://docs.python.org/3/howto/free-threading-extensions.ht...

And a dedicated web site:

https://py-free-threading.github.io/

But as an example neither include PySequence_Fast which is in the json.c changes I pointed to. The folks doing the auditing of stdlib do have an idea of what they are looking for, and so would be best suited to keep a list (and tool) up to date with what is needed.

Zsfe510asG

Accessing a global object as the most simple benchmark that in fact exercises the locks still shows a massive slowdown that is not offset by the moderate general speedups since 3.9:

  x = 0

  def f():
      global x
      for i in range(100000000):
          x += i

  f()
  print(x)

Results:

  3.9:        7.1s

  3.11:       5.9s

  3.14:       6.5s

  3.14-nogil: 8.4s
That is a NOGIL slowdown of 18% compared to 3.9, 44% compared to 3.11 and 30% compared to 3.14. These numbers are in line with previous attempts at GIL removal that were rejected because they didn't come from Facebook.

Please do not complain about the global object. Using a pure function would obviously be a useless benchmark for locking and real world Python code bases have far more intricate access patterns.

wenc

Globals are not commonly used except in retrofit cases of legacy code or in beginner code.

I haven’t seen or used a global more than once in my 20 years of writing Python.

nilamo

Flask's request context is a global object. And since this is specifically about web services, flask seems highly relevant to the conversation.

treyd

That's correct. Flask has a global request context object, so by design it can only safely handle a single request at a time per Python interpreter. If you want to parallelize multiple Flask servers, you spin up multiple interpreters.

Web services in Python that want to handle multiple comcurrent requests in the same interpreter should be using a web framework that is designed around that expectation and don't use a global request context object, such as FastAPI.

NeutralForest

As mentioned in the article, others might have different constraints that make the GIL worth it for them; since both versions of Python are available anyways, it's a win in my book.

greatgib

Sad that you did not test with 3.12 that is a lot better then 3.11!

adamzwasserman

I will complain about the global object.

Even though technically, everything in Python is an object, I feel strongly that programmers should avoid OOP in Python like the plague. Every object is a petri dish for state corruption.

Thee is a very solid list of reasons to use pure functions with explicit passing wherever humanly possible, and I personally believe there is no comparable list of reason to use OOP. * Stack-allocated primitives need no refcounting * Immutable structures reduce synchronization * Data locality improves when you pass arrays/structs rather than object graphs * Pure functions can be parallelized without locks

logicchains

>Please do not complain about the global object. Using a pure function would obviously be a useless benchmark for locking and real world Python code bases have far more intricate access patterns.

Just because there's a lot of shit Python code out there, doesn't mean people who want to write clean, performant Python code should suffer for it.

null

[deleted]

immibis

> Using a pure function would obviously be a useless benchmark for locking

But isn't that the point? Previously, pure functions would all lock, so you could only run one pure function at a time. Now they don't.

byroot

I haven’t seriously used Python in over 15 years, but I assume the comparison is against using a preforking server with 1+ process per core.

The question is whether 1+ thread per core with GIL free Python perform as well as 1+ process per core with GIL.

My understanding is that this global is just a way to demonstrate that the finely grained locking in the GIL free version may make it so that preforking servers may still be more performant.

mynewaccount00

you realize this is not a concern because nobody in the last 20 years uses global?

kg

An incrementing global counter is a pretty common scenario if your goal is to have guaranteed-unique IDs assigned to objects within a process, especially if you want them to be sequential too. I've got counters like that in various parts of code I've shipped, typically incremented using atomics.

sillythrowaway7

It is a pretty common scenario if you don’t follow almost two decades worth of best practices and suggested alternatives and advice

pphysch

Is that global counter really the only dependency? No database connections or anything that would warrant some proper dependency injection?

Waterluvian

We already have PyPy and PyPI, so I think we are cosmically required to call Python 3.14 PyPi

jyscao

PiPy is more apt, no?

coldtea

PeePee hee hee

mythrwy

You are getting kicked out of the donglegate conference now.

btbuilder

This is fantastic progress for CPython. I had almost given up hope that CPython would overcome the GIL after first hitting its limitations over 10 years ago.

That being said I strongly believe that because of the sharp edges on async style code vs proper co-routine-based user threads like go-routines and Java virtual threads Python is still far behind optimal parallelism patterns.

null

[deleted]

rowanG077

Aren't go-routines the worst of all worlds? Sharp edges, undefined behavior galore? At least that was my takeaway when last using about 5 or 6 years ago. Did they fix go-routines in the meantime?

NeutralForest

Nice that someone takes the time to crunch the number, thanks! I know there's some community effort in how to use free-threaded Python: https://py-free-threading.github.io/

I've found debugging Python quite easy in general, I hope the experience will be great in free-threaded mode as well.

shdh

Hadn’t heard of Granian before, thinking about upgrading to 3.14 for my services and running them threaded now

natdempk

Really great, just waiting on library support / builds for free threading.

Have people had any/good experiences running Granian in prod?

alex_hirner

It was and is a life saver. Our django app suffered from runaway memory leaks (quite a story). We were not able to track down the root cause exactly. There are numerous, similar issues with uvicorn or other webservers. Granian contained these problems. Multi process management is also reliable.

callamdelaney

Python gets more bloated weekly in my view.

Spivak

> On asynchronous protocols like ASGI, despite the fact the concurrency model doesn't change that much – we shift from one event loop per process, to one event loop per thread – just the fact we no longer need to scale memory allocations just to use more CPU is a massive improvement.

It's nice that someone else recognizes that event loop per thread is the way. I swear if you said this online any time in the past few years people looked at you like you insulted their mother. It's so much easier to manage even before the performance improvements.

mkoubaa

I thought WSGI already used subinterpreters that each have their own GIL

null

[deleted]

ZiiS

This needs a lot of RAM with the speed/cores of modern CPUs. A GIL free multi threaded ASGI scales much further.

theandrewbailey

Subinterpreters are part of the Python standard library as of 3.13 (I think?).