Python performance myths and fairy tales

68 comments

·August 6, 2025

game_the0ry

I know I am going to get some hate for this from the "Python-stans" but..."python" and "performance" should never be associated with each other, and same for any scripting/interpreted programming language. Especially if it has a global interpreter lock.

While performance (however you may mean that) is always a worthy goal, you may need to question your choice of language if you start hitting performance ceilings.

As the saying goes - "Use the right tool for the job." Use case should dictate tech choices, with few exceptions.

Ok, now that I have said my piece, now you can down vote me :)

ahoka

Have you read the fine article?

danielrico

That's used by some people as excuse to write the most inefficient code.

Ok, you are not competing with c++, but also you shouldn't be redoing all the calculations because you haven't figured the data access pattern..

btown

I think an important bit of context here is that computers are very, very good at speculative happy-path execution.

The examples in the article seem gloomy: how could a JIT possibly do all the checks to make sure the arguments aren’t funky before adding them together, in a way that’s meaningfully better than just running the interpreter? But in practice, a JIT can create code that does these checks, and modern processors will branch-predict the happy path and effectively run it in parallel with the checks.

JavaScript, too, has complex prototype chains and common use of boxed objects - but v8 has made common use cases extremely fast. I’m excited for the future of Python.

jerf

That makes it so that in absolute terms, Python is not as slow as you might naively expect.

But we don't measure programming language performance in absolute terms. We measure them in relative terms, generally against C. And while your Python code is speculating about how this Python object will be unboxed, where its methods are, how to unbox its parameters, what methods will be called on those, etc., compiled code is speculating on actual code the programmer has written, running that in parallel, such that by the time the Python interpreter is done speculating successfully on how some method call will resolve with actual objects the compiled code language is now done with ~50 lines of code.

There's no way around it. You can spend your amazingly capable speculative parallel CPU on churning through Python interpretation or you can spend it on doing real work, but you can't do both.

After all, the interpreter is just C code too. It's not like it gets access to special speculation opcodes that no other program does.

fpoling

Although JS supports prototype mutations, the with operator and other constructs that make optimization harder, typical JS code does not use that. Thus JIT can add few checks for presence of problematic constructions to direct it to a slow path while optimizing not particularly big set of common patterns. And then the JS JIT does not need to care much about calling arbitrary native code as the browser internals can be adjusted/refactored to tune to JIT needs.

With Python that does not work. There are simply more optimization-unfriendly constructs and popular libraries use those. And Python calls arbitrary C libraries with fixed ABI.

So optimizing Python is inherently more difficult.

DanielHB

The main problem is when the optimizations silently fail because of seemingly innocent changes and suddenly your performance tanked 10x. This is a problem with any language really (CPU cache misses are a thing afterall and many non-dynamic languages have boxed objects) but it is a much, much worse in dynamic languages like Python, JS and Ruby.

Most of the time it doesn't matter, most high-throughput python code just invokes C/C++ where these concerns are not as big of a problem. Most JS code just invokes C/C++ browser DOM objects. As long as the hot-path is not in those languages you are not at such high risk of "innocent change tanked performance"

Even server-side most JS/Python/Ruby code is just simple HTTP stack handlers and invoking databases and shuffling data around. And often large part of the process of handling a request (encoding JSON/XML/etc, parsing HTTP messages, etc) can be written in lower-level languages.

nxobject

To be slightly flip, we could say that the Lisp Machine CISC-supports-language full stack design philosophy lives on in how massive M-series reorder buffers and ILP supports JavaScriptCore.

nu11ptr

The primary focus here is good and something I hadn't considered: python memory being so dynamic leads to poor cache locality. Makes sense. I will leave that to others to dig into.

That aside, I was expecting some level of a pedantic argument, and wasn't disappointed by this one:

"A compiler for C/C++/Rust could turn that kind of expression into three operations: load the value of x, multiply it by two, and then store the result. In Python, however, there is a long list of operations that have to be performed, starting with finding the type of p, calling its __getattribute__() method, through unboxing p.x and 2, to finally boxing the result, which requires memory allocation. None of that is dependent on whether Python is interpreted or not, those steps are required based on the language semantics."

The problem with this argument is the user isn't trying to do these things, they are trying to do multiplication, so the fact that the lang. has to do all things things in the end DOES mean it is slow. Why? Because if these things weren't done, the end result could still be achieved. They are pure overhead, for no value in this situation. Iow, if Python had a sufficiently intelligent compiler/JIT, these things could be optimized away (in this use case, but certainly not all). The argument is akin to: "Python isn't slow, it is just doing a lot of work". That might be true, but you can't leave it there. You have to ask if this work has value, and in this case, it does not.

By the same argument, someone could say that any interpreted language that is highly optimized is "fast" because the interpreter itself is optimized. But again, this is the wrong way to think about this. You always have to start by asking "What is the user trying to do? And (in comparison to what is considered a fast language) is it fast to compute?". If the answer is "no", then the language isn't fast, even if it meets the expected objectives. Playing games with things like this is why users get confused on "fast" vs "slow" languages. Slow isn't inherently "bad", but call a spade a spade. In this case, I would say the proper way to talk about this is to say: "It has a fast interpreter". The last word tells any developer with sufficient experience what they need to know (since they understand statically compiled/JIT and interpreted languages are in different speed classes and shouldn't be directly compared for execution speed).

ActivePattern

A “sufficiently smart compiler” can’t legally skip Python’s semantics.

In Python, p.x * 2 means dynamic lookup, possible descriptors, big-int overflow checks, etc. A compiler can drop that only if it proves they don’t matter or speculates and adds guards—which is still overhead. That’s why Python is slower on scalar hot loops: not because it’s interpreted, but because its dynamic contract must be honored.

andylei

The previous paragraph is

> Another "myth" is that Python is slow because it is interpreted; again, there is some truth to that, but interpretation is only a small part of what makes Python slow.

He concedes its slow, he's just saying it's not related to how interpreted it is.

nu11ptr

I would argue this isn't true. It is a big part of what makes it slow. The fastest interpreted languages are one to two orders of magnitude slower than for example C/C++/Rust. If your language does math 20-100 times slower than C, it isn't fast from a user perspective. Full stop. It might, however, have a "fast interpreter". Remember, the user doesn't care if it is a fast for an interpreted language, they are just trying to obtain their objective (aka do math as fast as possible). They can get cache locality perfect, and Python would still be very slow (from a math/computation perspective).

dgan

"Rewrite the hot path in C/C++" is also a landmine because how inefficient the boundary crossing is. so you really need "dispatch as much as possible at once" instead of continuously calling the native code

aragilar

Isn't this just a specific example of the general rule of pulling out repeated use of the same operation in a loop? I'm not sure calls out to C are specifically slow in CPython (given many operations are really just calling C underneath).

Twirrim

The serialisation cost of translating data representations between python and C (or whatever compiled language you're using) is notable. Instead of having the compiled code sit in the centre of a hot loop, it's significantly better to have the loop in the compiled code and call it once

https://pythonspeed.com/articles/python-extension-performanc...

KeplerBoy

The key is to move the entire loop to a compiled language instead of just the inner operation.

dgan

they are specifically slow. there was a project which measured FFI cost in different languages, and python is awfully bad

mrkeen

I didn't read with 100% focus, but this lwn account of the talk seemed to confirm those myths instead of debunking.

postexitus

A more careful reading of the article is required.

The first myth is "Python is not slow" - it is debunked, it is slow.

The second myth is ""it's just a glue language / you just need to rewrite the hot parts in C/C++" - it is debunked, just rewriting stuff in C/Rust does not help.

The third myth is " Python is slow because it is interpreted" - it is debunked, it is not slow only because it is interpreted.

diegocg

Yep, for me it confirms all the reasons why I think python is slow and not a good language for anything that goes beyond a script. I work with it everyday, and I have learned that I can't even trust tooling such as mypy because it's full of corner cases - turns out that not having a clear type design in a language is not something that can be fundamentally fixed by external tools. Tests are the only thing that can make me trust code written in this language

teo_zero

I don't know Python so well as to propose any meaningful contribution, but it seems to me that most issues would be mitigated by a sort of "final" statement or qualifier, that prohibits any further changes to the underlying data structure, thus enabling all the nice optimizations, tricks and shortcuts that compilers and interpreters can't afford when data is allowed to change shape under their feet.

quantumspandex

So we are paying 99% of the performance just for the 1% of cases where it's nice to code in.

Why do people think it's a good trade-off?

Krssst

Performance is worthless if the code isn't correct. It's easier to write correct code reasonably quickly in Python in simple cases (integers don't overflow like in C, don't wrap around like in C#, no absurd implicit conversions like in other scripting languages).

Also you don't need code to be fast a lot of the time. If you just need some number crunching that is occasionally run by a human, taking a whole second is fine. Pretty good replacement for shell scripting too.

nromiun

Because it's nice to code in. Not everything needs to scale or be fast.

Personally I think it is more crazy that you would optimize 99% of the time just to need it for 1% of the time.

BlackFly

It isn't an either or choice. The people interested in optimizing the performance are typically different people than those interested in implementing syntactic sugar. It is certainly true that growing the overall codebase risks introducing tensions for some feature sets but that is just a consideration you take when diligently adding to the language.

pjmlp

Because many never used Smalltalk, Common Lisp, Self, Dylan,... so they think CPython is the only way there is, plus they already have their computer resources wasted by tons of Electron apps anyway, that they hardly question CPython's performance, or lack thereof.

bluGill

Most of the time you are waiting on a human or at least something other than the cpu. Most of the time more time is spent by the programmer writing the code than all the users combined waiting for the program to run.

between those two, most often performance is just fine to trade off.

lmm

Because computers are more than 100x faster than they were when I started programming, and they were already fast enough back then? (And meanwhile my coding ability isn't any better, if anything it's worse)

dr-detroit

[dead]

jonathrg

It's much more than 1%, it is what enables commonly used libraries like pytest and Pydantic.

Hilift

It isn't. There are many things Python isn't up to the task. However, it has been around forever, and some influential niche verticals like cyber security Python was as or more useful than native tooling, and works on multiple platforms.

dgfitz

I can say with certainty I’ve never paid a penny. Have you?

robmccoll

Python as a language will likely never have a "fast" implementation and still be Python. It is way too dynamic to be predictable from the code alone or even an execution stream in a way that allows you to simplify the actual code that will be executed at runtime either through AOC or JIT. The language is itself is also quite large in terms of syntax and built-in capability at this point which makes new feature-conplete implementations that don't make major trade offs quite challenging. Given how capable LLMs are at translating code, it seems like the perfect time to build a language with similar syntax, but better scoped behavior, stricter rules around typing, and tooling to make porting code and libraries automated and relatively painless. What would existing candidates be and why won't they work as a replacement?

BlackFly

The secret as stated is the comlexity of a JIT. In practice, that dynamism just isn't used much in practice and in particular in optimization targets. The JIT analyses the code paths, sees that no writes to the target are possible so treats it as a constant.

Java has similar levels of dynamism-with invokedynamic especially, but already with dynamic dispatch-in practice the JIT monomorphises to a single class even though by default classes default to non-final in Java and there may even be multiple implementations known to the JVM when it monomorphises. Such is the strength of the knowledge that a JIT has compared to a local compiler.

pjmlp

Self and Smalltalk enter the room.

As for the language with similar syntax, do you want Nim, Mojo or Scala 3?

acmj

Pypy is 10x faster and is compatible with most cpython code. IMHO it was a big mistake not to adopt JIT during the 2-to-3 transition.

NeutralForest

Cool article, I think a lot of those issues are not Python specific so it's a good overview of whatever others can learn from a now 30 years old language! I think we'll probably go down the JS/TS route where another compiler (Pypy or mypyc or something else) will work alongside CPython but I don't see Python4 happening.

tweakimp

I thought we would never see the GIL go away and yet, here we are. Never say never. Maybe Python4 is Python with another compiler.

pjmlp

It required Facebook and Microsoft to change the point of view on it, and now the Microsoft team is no more.

So lets see what remains from CPython performance efforts.

nromiun

I really hope PyPy gets more popular so that I don't have to argue Python is pretty fast for the nth time.

Even if you have to stick to CPython, Numba, Pythran etc, can give you amazing performance for minimal code changes.

ic_fly2

It’s a good article on speed.

But honestly the thing that makes any of my programs slow is network calls. And there a nice async setup goes a long way. And then k8 for the scaling.

nicolaslem

This. I maintain an ecommerce platform written in Python. Even with Python being slow, less than 30% of our request time is spent executing code, the rest is talking to stuff over the network.

Ulti

Feel like Mojo is worth a shoutout in this context https://www.modular.com/mojo Solves the issue of having a superset of Python in syntax where "fn" instead of "def" functions are assumed static typed and compilable with Numba style optimisations.

_aavaa_

Mojo NOT being open-source is a complete non-starter.

Ulti

More of a question of /will/ Mojo eventually be entirely open source, chunks of it already are. The intent from Modular is eventually it will be, just not everything all at once and not whilst they're internally doing loads of dev for their own commercial entity. Which seems fair enough to me. Importantly they have open sourced lots of the stdlib which is probably what anyone external would contribute to or want to change anyway? https://www.modular.com/blog/the-next-big-step-in-mojo-open-...

_aavaa_

When it has become open source I will consider building up expertise and a product on it. Until it has happened there are no guarantees that it will.

alankarmisra

Genuinely curious; while I understand why we would want a language to be open-source (there's plenty of good reasons), do you have anecdotes where the open-sourceness helped you solve a problem?

yupyupyups

Not the OP, but I have needed to patch Qt due to bugs that couldn't be easily worked around.

I have also been frustrated while trying to interoperate with expensive proprietary software because documentation was lacking, and the source code was unavailable.

In one instance, a proprietary software had the source code "exposed", which helped me work around its bugs and use it properly (also poorly documented).

There are of course other advantages of having that transparancy, like being able to independently audit the code for vulnerabilities or unacceptable "features", and fix those.

Open source is oftentimes a prerequisite for us to be able to control our software.

_aavaa_

It has helped prevent problems. I am not worried about a python suddenly adding a clause stating that I can’t release a ML framework…

HN

Python performance myths and fairy tales

Python performance myths and fairy tales