15,000 lines of verified cryptography now in Python

165 comments

·April 18, 2025

chrisrodrigue

There's no mention of what Python version this is actually in.

After some digging, it looks like the answer is 3.14 [0], so we won't be seeing this until October [1].

One could argue that this is a security fix (just read the first sentence of the blog post) and should be included in all the currently supported versions (>=3.9) of Python [2].

[0] https://github.com/python/cpython/blob/main/Doc/whatsnew/3.1...

[1] https://peps.python.org/pep-0745/

[2] https://devguide.python.org/versions/

ashishb

Many famous libraries like Spacy do not support Python 3.13 and are stuck on Python 3.12 (Oct 2023).

So, even if this comes out in Python 3.14, any non-trivial project will have to wait till Oct 2026 (or Oct 2027) to be able to use it.

1 - https://github.com/explosion/spaCy/issues/13658

arp242

Good grief that issue is a clusterfuck of bozos.

Sometimes I wish there was a GitHub with entry exam. "A library you use has a bug, you find a find a 3 month old bug report for your exact issue. Do you 1) add a comment with "me too", 2) express your frustration this issue hasn't been fixed yet, 3) demand the maintainers fix it as soon as possible as it's causing issues for you, or 4) do nothing".

Only half joking.

Hasnep

The Refined GitHub extension [1] automatically hides comments that add nothing to the discussion. [2]

[1] https://github.com/refined-github/refined-github [2] https://github-production-user-asset-6210df.s3.amazonaws.com...

rowanG077

I actually find that hugely helpful that so many people are actually actively expressing they are hitting this. It's not easy to be able to get an idea what issues people are actually hitting with anything you have made. An issue being bumped with essentially "me too" is a highly valuable signal.

raverbashing

Geez honestly

This seems to be the issue https://github.com/explosion/spaCy/issues/13658#issuecomment...

And you depend on opinionated libraries that break with newer versions. Why? Well because f you that's why! Because our library is not just a tool, it's a lifestyle

Though it seems that Pydantic 1x does support 3.13 https://docs.pydantic.dev/1.10/changelog/#v11020-2025-01-07

__mharrison__

What is the passing answer?

bagels

You're missing

5) Do 1, 2, and 3

6) Submit a a tested pull request with a fix

sitkack

I have been getting paid to write Python since the late 90s and it amazes me how it consistently has these needless own goals, yet still keeps on going in spite of itself. Way to go Python!

spaCy should make Cython optional

hard fork Cython to not used stringitized annotations

stay on Python 3.12 forever and then skip to 3.15

It is like have a crowd of people trying to outdo each other on how much self harm they can induce.

ChrisMarshallNY

> It is like have a crowd of people trying to outdo each other on how much self harm they can induce.

See: Lemmings[0]

[0] https://en.wikipedia.org/wiki/Lemmings_(National_Lampoon)

martinky24

What exactly is the own goal here…? They’re making the language better in the upcoming release. This is how normal software works.

In no world is this an “own goal”. God forbid they take on a big task for the betterment of the future language.

rhdunn

I hit this when upgrading to Ubuntu 25.04 as that upgraded to Python 3.13. I'm running the various python projects I want in a venv. For the projects stuck on 3.12 I ended up building and installing it from source to my local directory (removing the unprefixed links for compatibility) as the ppa for Python ports doesn't (didn't?) support the latest Ubuntu.

I dislike using something like docker or conda as I don't want to install/use a separate distro just to be able to use a different version of Python. My setup is working well so far.

turbocon

Everyone has there own preferences, but I'd look into uv if I were you. It allows you to specify the python version, and for scripts you can even specify the python version as part of the shebang

rtpg

I was going to write something glib about getting things fixed but that thread looks gnarly!

To be honest I know so many people who use Pydantic and so many people who seem to get stuck because of Pydantic 2. I’m glad I have minimal exposure to that lib, personally.

I suppose the biggest issue really is type annotation usage by libs being intractable

bsoles

Does that mean that a point release of Python has breaking changes? If true, that sounds crazy.

aftbit

Yes, every Python 3 release for me (at least since 3.6) has had a breaking change or two that affects a library I use. Most of the time, the fix is pretty trivial, but not always.

dymk

Yes, things like format string syntax change between 3.10 and 3.11 and it’s incredibly frustrating

pbronez

Just ran into something similar with Great Expectations. Python 3.12 is the newest I can run.

0cf8612b2e1e

uv seems to have reverted to defaulting to 3.12 instead of 3.13. Which I fully endorse owing to how many packages are not yet compatible.

ashishb

Exactly, many compiled languages like Java and Go do not suffer from this issue.

devrandoom

> One could argue

How?

chrisrodrigue

From https://github.com/python/cpython/issues/99108#issue-1436673...:

> As evidenced by the recent SHA3 buffer overflow, cryptographic primitives are tricky to implement correctly. There might be issues with memory management, exceeding lengths, incorrect buffer management, or worse, incorrect implementations in corner cases.

This is a proactive fix for zero days that may be lurking in the wild.

bgwalter

Reading the article, they took a verified C library generated from F* from Microsoft, vendored the code in CPython and wrote a C extension.

And during the process they discovered that the original library did not handle allocation failures?

What is the big deal about Python here? It's just another wrapped C library.

quantumgarbage

A seamless, drop-in replacement, for all python users is actually a pretty good feat tbh

anon6362

In general, it's good practice to use best components available, regardless of source. Cryptographic components especially shouldn't be DIY.

It would be cool if Ruby did something similar for stdlib/openssl, but it could be also be done in a gem too.

nine_k

> just another wrapped C library.

I suppose you mean "just another top-notch library available in the Python ecosystem"? :)

bolognafairy

What part of how this is being conveyed do you disagree with? Specifically.

This isn’t a language war.

neuroelectron

The headline makes it sound like it was written in Python which is a lot more readable than 15,000 lines of C

frumplestlatz

The 15k lines of C are generated. The implementations are written in (and formally verified using) F*, and then the C is generated from the F* code using KaRaMeL.

One should be able to trust the proofs of correctness and never look at (or maintain) the generated C directly.

- https://fstar-lang.org

- https://github.com/FStarLang/karamel

- https://github.com/hacl-star/hacl-star

nxpnsv

How is that bad in any way?

Retr0id

Will this bring support for "streaming" output from SHAKE?

Here's the (recently closed!) issue for pyca/cryptography regarding this feature, although I can't find an equivalent issue for the python stdlib: https://github.com/pyca/cryptography/issues/9185

Edit: Found a relevant issue, "closed as not planned": https://github.com/python/cpython/issues/82198

IlikeKitties

Modern, ubiquitous cryptography is now practically unbreakable (even for the NSA) and widely used. The crypto wars of the 90s seem rather quaint. Any thoughts on the effects on society this now has?

xyzzy123

It's cool that we can largely "strike out" link level wiretapping from our threat models but it just means attackers move on to the endpoints. I have a wonderfully private link to google, my bank and my crypto exchange but all of those are coerced to report everything I do.

TacticalCoder

> ... but it just means attackers move on to the endpoints.

Yup but this doesn't scale anywhere near as well for the attackers.

FabHK

Sorry, if there are N clients, and M servers, then there are N+M endpoints, but N*M links, which is a lot more.

artursapek

Most internet traffic is cosolidated through a small number of providers like Cloudflare and AWS.

pyfon

Except for countries who hist companies who hold all the endpoints.

Analemma_

I think modern cryptography is basically unbreakable if used correctly, but there is still a lot of work to do re: developer ergonomics and avoiding footguns in implementations. This is much better than it used to be, thanks to things like libsodium, moving away from RSA, and newer protocols that de-emphasize cipher negotiation, but there is still more to do, even with “good” crypto primitives. For example, AES used perfectly is probably unbreakable but AES-GCM has ways to bite the unwary; ideally we should think about an even newer symmetric block cipher or increasing awareness of better AES modes without GCM’s drawbacks.

jakeogh

I just got bit by device attestation. Tried to install the latest ebay app version via the Aurora Store (on GrapheneOS), and instead of presenting me with the option of using my ebay login, it wanted a google account at a play store login with no way to bypass. I was able to downgrade to the previous version which does not require the Integrity API, but the walls are closing in. Only took 7 months: https://news.ycombinator.com/item?id=41517159 (I did get ebay on the phone and filed an issue, hopefully others do the same)

matheusmoreira

> Any thoughts on the effects on society this now has?

I have observed two effects.

They are constantly trying to make it illegal for the common man to use cryptography. Constantly. Cryptography is subversive. It has the power to defeat judges, armies, nations. Authorities don't want normal people to have access to this. It will increasingly become a tool of corporations and governments.

Authorities are bypassing cryptography instead by attacking the end points. It's easier to crack other parts of the system and exfiltrate data after it's been decrypted than to break the cryptography. Easier to hack a phone and get the stored messages than to intercept end-to-end encrypted ciphertext and decrypt it.

gpcz

For now. If someone makes a practical quantum computer, pretty much every asymmetric primitive we use at the start of a cryptographic protocol to make a shared secret for symmetric encryption will be breakable.

dist-epoch

The switch to post-quantum encryption already started - Signal, Chrome, iMessage

anon6362

This is so vague as to be meaningless because body of research (attacks and hw-accelerated implementations), details, implementations, uses, and adversary/ies budget(s) matter. Furthermore, the obsession with optimizing for "fast" cryptographic algorithms/implementations undermine themselves when it comes to the cost of CPU-bound brute force attack which become cheaper and more attainable over time.

imiric

Why are you so certain of this? The NSA has a long history of attempting to insert backdoors in cryptographic systems. Most recently they bribed RSA to make their compromised PRNG the default, which shipped in software as late as 2014, possibly even later.

And these are just the attempts we know about. What makes you think that they haven't succeeded and we just don't know about it?

IlikeKitties

We know from the Snowden Leaks that they couldn't crack PGP at the time. There's been some talks about cracking "export grade" cryptography and how that is done. I'm pretty confident that the newer hardened crypto libraries are pretty secure and since even the NSA recommends signal encryption now because the infrastructure in the US has so many holes the Chinese are in the entire mobile internet infrastructure, that's a pretty strong lead that it's very hard if not impossible to crack signal, at least for the Chinese.

It's also very likely that even if they can attack crypto in ways we don't know about, they can at best reduce the required time it takes to crack a given key. Chosing extra long keys and changing them frequently should protect you from lots of attacks.

mmooss

> It's also very likely that even if they can attack crypto in ways we don't know about, they can at best reduce the required time it takes to crack a given key.

Why do you say that? Very often the vulnerabilities are not in the mathematics but in the implementation.

If Signal works, as was pointed out during the recent scandal, it only protects messages in transit. The application on your phone is not especially secure and anyone who can access your phone can access your Signal messages (and thus any of your correspondents' messages that they share with you).

> that's a pretty strong lead that it's very hard if not impossible to crack signal, at least for the Chinese.

The NSA does not recommend Signal for classified communication.

The NSA thinks use of Signal is the best interest of the US government, as the NSA perceives those interests (every institution will have its own bias). It could be that Signal is the least insecure of the easily available options or that that the NSA believes that only they can crack Signal.

mmooss

> What makes you think that they haven't succeeded and we just don't know about it?

Yes, afaik they also have a history of not revealing exploits they discover. With a budget in the tens of billions and tens of thousands of employees, they have the resources to discover quite a bit.

jart

I think it was pretty obvious to anyone who looked at the SHA3 code Python had that it was insane and not to be trusted. Everyone smart moved on to blake2b. How they managed to frameworkify a hash function and push it through standardization, I'll never know.

andai

My first thought was "in the Python language", but it's a C extension. Is it even possible to write crypto code in memory managed languages? Due to the need for everything to run in constant time.

martinky24

IIRC Go has pretty decent crypto implemented in Go: https://pkg.go.dev/crypto

wolf550e

Plenty of cryptography in Java, C# and Go.

EasyMark

rust is a type of memory managed language and I don't see any issue with writing crypto there?

dymk

c++ is a (poorly) memory managed language by that logic (std::unique_ptr, etc)

EasyMark

Actually c++ is also excellent if you use modern c++20 and greater. Crypto algos tend to be fairly small compared to a lot of other software and if you can't keep your coders disciplined enough to use memory safe code there then the company or project should probably go under because of skills issues.

hall0ween

I'm cryptographically ignorant. What does this mean for python?

otterley

https://en.wikipedia.org/wiki/Formal_verification

Provable correctness is important in many different application contexts, especially cryptography. The benefit for Python (and for any other user of the library in question) is fewer bugs and thus greater security assurance.

If you look at the original bug that spawned this work (https://github.com/python/cpython/issues/99108), the poster reported:

"""As evidenced by the recent SHA3 buffer overflow, cryptographic primitives are tricky to implement correctly. There might be issues with memory management, exceeding lengths, incorrect buffer management, or worse, incorrect implementations in corner cases.

The HACL* project https://github.com/hacl-star/hacl-star provides verified implementations of cryptographic primitives. These implementations are mathematically shown to be:

    memory safe (no buffer overflows, no use-after-free)
    functionally correct (they always compute the right result)
    side-channel resistant (the most egregious variants of side-channels, such as memory and timing leaks, are ruled out by construction)."""

hathawsh

Here is how I read it: CPython has a hashlib module that has long been a wrapper around certain functions exported from OpenSSL, but since a SHA3 buffer overflow was discovered in OpenSSL, the CPython project has now decided to wrap a library called HACL*, which uses formal verification to ensure there are no buffer overflows and other similar bugs.

aseipp

Extreme ELI5 TL;DR: Your Python programs using the cpython interpreter and its built in cryptographic modules will now be using safer and faster, with no need for you to do anything.

kccqzy

Who uses its built-in cryptographic modules though? I wrote a fair bit of cryptographic code in Python more than five years ago and most people recommended the cryptography package https://pypi.org/project/cryptography/ over whatever that's built-in.

Retr0id

I'm a big fan of pyca/cryptography and I use it for any serious project, but if I just need hashing I tend to use the standard library hashlib - it saves a somewhat heavy dependency, and the API is less verbose.

Also, pyca/cryptography uses OpenSSL. OpenSSL is fine, but has the same "problem" as the previous python stdlib implementation. (Personally I think it's an acceptable risk. If anything, swapping in 15,000 lines of new code is the greater risk, even if it's "verified")

null

[deleted]

odo1242

Is it faster? I’m pretty sure the main goal of this effort is just the “safer” part.

protz

the performance was pretty much identical, however, as an added benefit, Blake2 got quite a bit faster due a combination of 1) our code being slightly more optimized, and 2) python's blake2 integration not actually doing runtime cpu detection, meaning that unless you compiled with -march=native (like, Gentoo), at the time, you were not getting the AVX2 version of Blake2 within Python -- my PR fixed that and added code for CPU autodetection

bear in mind that performance is tricky -- see my comment about NEON https://github.com/python/cpython/pull/119316#issuecomment-2...

aseipp

The goal is to make things safer, yes, but speed is absolutely a major priority for the project and a requirement for production deployment, because the difference in speed for optimized designs vs naive ones might be an order of magnitude or more. It's quite speedy IME. To be fair to your point, though, it's also a moving target; "which is faster" can change as improvements trickle in. Maybe "shouldn't be much slower" is a better statement, but I was speaking generally :)

(You could also make the argument that if previous implementations that were replaced were insecure that their relative performance is ultimately irrelevant, but I'm ignoring that since it hinges on a known unknown.)

drewcoo

And safer is often slower to avoid timing attacks.

jessekv

More context here.

https://github.com/python/cpython/issues/99108#issuecomment-...

tsecurity

How much of the development of this was verified, and what does that consist of?

I worry a little when I read that things are verified and were hard.

aseipp

https://eprint.iacr.org/2017/536.pdf is the relevant paper that introduces the project and its broad design. Figure 1 on page 3 is a good place to look.

rtkwe

The first two shouldn't matter because the entirety of the code is verified and anyone can check the verification. The last is an issue with any cryptography but verification doesn't try to address that only that the code does precisely and only what it's supposed to; ie it should guarantee that there are no exploits possible against that library.

badmonster

how reusable is the generic streaming verification framework beyond cryptographic hashes?

pluto_modadic

does this mean anything that imports the cryptography modules needs to include G++ or something weird, or is it compiled into cpython itself?

__s

The latter. Most of CPython is C you don't need compiler to run

AndyMcConachie

Thank you!

thebeardisred

Lines of code is a pretty poor measurement. Doubly so when you're boasting a large number in the context of cryptographic code.

bbeonx

This isn't really a metric. It's a description to help the reader understand the magnitude of effort that went into this project. SLoC is a bad metric for plenty of things, but I think it's fine for quickly conveying the scope of a project to a blog reader.

odyssey7

Lines of code is also a poor indicator for “magnitude of effort.”

Tangent: generally I’m more inclined to believe quality is improved when someone claims 1000s of lines reduced rather than 1000s of lines written.

tgv

Do you remember writing the proof for quicksort, which is say 0.1k lines? 15k lines of verified code is a pretty good indication of effort.

But the problem may come from the headline, which is somewhat clickbaity. HN forbids changing it, and then part of the discussion focuses on the literal content of the headline, which is, as you rightly hint, not the best summary of what's interesting here.

chaitimes

From my experience, pre LLMs, it was a valid proxy metric for effort

xeromal

See: AI generating 1000s of lines

lionkor

It's a metric of complexity

aseipp

It isn't boasting about anything, it's a straightforward description of the 2.5 years of effort they went through for this project, and some of the more annoying "production integration" bits.

bbstats

the 2nd sentence:

"As of last week, this issue is now closed, and every single hash and HMAC algorithm exposed by default in Python is now provided by HACL*, the verified cryptographic library."

it's describing coverage.

titaphraz

> Lines of code is a pretty poor measurement

But let's say at least that it's abused, exploited, perverted and a molested measurement before we call it a poor one. It not all that bad when you have context.

daquisu

Which better measurement do you propose?

null

[deleted]