Serving 200M requests per day with a CGI-bin
109 comments
·July 6, 2025kragen
mjw1007
The Python maintainers are removing the module _named_ cgi, but they're not removing the support for implementing CGI scripts, which is CGIHTTPRequestHandler in the http.server module.
All that was in the cgi module was a few functions for parsing HTML form data.
kragen
It would be very difficult indeed to make it impossible to implement CGI scripts in Python; you'd have to remove its ability to either read environment variables or perform stdio, crippling it for many other purposes, so I didn't think they had done that. Even if they removed the whole http package, you could just copy its contents into your codebase. It's not about making Python less powerful.
As a side note, though, CGIHTTPRequestHandler is for launching CGI programs (perhaps written in Rust) from a Python web server, not for writing CGI programs in Python, which is what the cgi module is for. And CGIHTTPRequestHandler is slated for removal in Python 3.15.
The problem is gratuitous changes that break existing code, so you have to debug your code base and fix the new problems introduced by each new Python release. It's usually fairly straightforward and quick, but it means you can't ship the code to someone who has Python installed but doesn't know it (they're dependent on you for continued fixes), and you can't count on being able to run code you wrote yourself on an earlier Python version without a half-hour interruption to fix it. Which may break it on the older Python version.
simonw
Here's the justification for removing cgi - https://peps.python.org/pep-0594/#cgi
Amusingly that links to https://peps.python.org/pep-0206/ from 14th July 2000 (25 years ago!) which, even back then, described the cgi package as "designed poorly and are now near-impossible to fix".
Looks like the https://github.com/jackrosenthal/legacy-cgi package provides a drop-in replacement for the standard library module.
kragen
That fails pretty hard at providing a rationale. Basically it says that CGI is an inefficient interface because it involves creating a new process! Even if that were true, "You shouldn't want to do such an inefficient thing" is very, very rarely a reasonable answer to a technical question like "How do I write a CGI script in Python?" or "How do I parse a CSV file in Python?"
There are certainly some suboptimal design choices in the cgi module's calling interface, things you did a much better job of in Django, but what made them "near-impossible to fix" was that at the time everyone reading and writing PEPs considered backwards compatibility to be not a bad thing, or even a mildly good thing, but an essential thing that was worth putting up with pain for. Fixing a badly designed interface is easy if you know what it should look like and aren't constrained by backwards compatibility.
pjmlp
Not to mention that if efficiency is a goal, probably Python isn't the language as well, so it is a very strange argument from Python developers.
riedel
Moving stuff out of the standard library seems like a reason. However, I think this all is a weird mix of arguments. IMHO new process spawning is a feature and not a bug in the use cases where CGI is used. Most of the stuff is low traffic config interfaces or remote invocable scripts. There was this trend to move stuff to fcgi. We had tons of cases of memory leaks in long running but really seldomly used stuff like mailing list servers. To me cgi is the poor man's alternative to serverless. However, I also do not really completely understand why a standard library has to support it. I have bash scripts running using the Apache CGI mod.
simonw
The main rationale is earlier in the PEP: https://peps.python.org/pep-0594/#rationale
pjmlp
I rather stick with PHP or JS, due to having a JIT in the box for such cases.
Since I learnt Python starting in version 1.6, it has mostly been for OS scripting stuff.
Too many hard learnt lessons with using Tcl in Apache and IIS modules, continuously rewriting modules in C, back in 1999 - 2003.
int0x29
I don't think the JIT will help that much as each request will need to be JITed again. Unless Node and PHP are caching JIT output
ChocolateGod
[delayed]
WD-42
I don't get it. Having a complaint about Python removing CGI from the stdlib is well and fine. But then you say you'd rather consider JS, which doesn't even have a std lib? Lua doesn't have a CGI module in stdlib either.
rollcat
> Lua doesn't have a CGI module in stdlib either.
Lua barely has any stdlib to speak of, most notably in terms of OS interfaces. I'm not even talking about chmod or sockets; there's no setenv or readdir.
You have to install C modules for any of that, which kinda kills it for having a simple language for CGI or scripting.
Don't get me wrong, I love Lua, but you won't get far without scaffolding.
kragen
Right, you need something more specific than Lua to actually write most complete programs in. The LuaJIT REPL does provide a C FFI by default, for example, so you don't need to install C modules.
But my concern is mostly not about needing to bring my own batteries; it's about instability of interfaces resulting from evaporating batteries.
kragen
I think it's fine to not have functionality in the standard library if it can be implemented by including some code in my project. It's not fine to have a standard library that stuff disappears from over time.
spockz
Whenever code is removed from the Java standard library it is announced ages ahead of time and then typically it becomes available in a separate artefact so you can still use it if you depended on it.
getdoneist
Ruby has been removing stuff from stdlib for some time now. But "moving" is the correct word, because it is simply moved to a stand-alone gem, and with packaging situation in Ruby being so good, it feels completely seamless.
bravesoul2
Node.js provides the defacto standard lib for JS backend and its got a good feature set.
That said these days I'd rather use Go.
kragen
Golang seems pretty comfortable from the stuff I've done in it, but it's not as oriented toward prototyping. It's more oriented toward writing code that's long-term maintainable even if that makes it more verbose, which is bad for throwaway code. And it's not clear how you'd use Golang to do the kind of open-ended exploration you can do in a Jupyter notebook, for example. How would you load new code into a Golang program that's already running?
Admittedly Python is not great at this either (reload has interacted buggily with isinstance since the beginning), but it does attempt it.
kqr
Consider Perl. It's not quite as batteries-included as Python, but it is preinstalled almost everywhere and certainly more stable than JS and Lua. (And Python.)
kragen
You may be interested in an article I wrote in ;login: 23 years ago: https://www.usenix.org/publications/login/june-2002-volume-2...
At the time Perl was the thing I used in the way I use Python now. I spent a couple of years after that working on a mod_perl codebase using an in-house ORM. I still occasionally reach for Perl for shell one-liners. So, it's not that I haven't considered it.
Lua is in a sense absolutely stable unless your C compiler changes under it, because projects just bundle whatever version of Lua they use. That's because new versions of Lua don't attempt backwards compatibility at all. But there isn't the kind of public shaming problem that the Python community has where people criticize you for using an old version.
JS is mostly very good at backwards compatibility, retaining compatibility with even very bad ideas like dynamically-typed `with` statements. I don't know if that will continue; browser vendors also seem to think that backwards compatibility with boring technology like FTP is harmful.
kqr
Ha, fun bit of history! Many of the listed problems with Perl can be configured away these days. I don't have time for a full list, but as two early examples:
- `perl -de 0` provides a REPL. With a readline wrapper, it gives you history and command editing. (I use comint-mode forn this, but there are other alternatives.)
- syscalls can automatically raise exceptions if you `use autodie`.
Why is this not the default? Because Perl maintainers value backward compatible. Improvements will always sit behind a line of config, preventing your scripts from breaking if you accidentally rely on functionality that later turns out to be a mistake.
rollcat
What's the landscape like, for when you need to scale your project up? As in: your project needs more structure, third-party integrations, atomic deployments, etc - not necessarily more performance.
Python has Werkzeug, Flask, or at the heavier end Django. With Werkzeug, you can translate your CGI business logic one small step at a time - it's pretty close to speaking raw HTTP, but has optional components like a router or debugger.
bravesoul2
Yeah high performance web used to be an art. Now it's find what you are doing that's stupidly wasteful that you did to ship fast, and stop doing that thing.
Your app could add almost no latency beyond storage if you try.
tonyhart7
Yeah but 400ms is unacceptable this days
cenamus
If the whole site takes 5 seconds to fully hydrate and load its 20megs of JS I'll gladly take a server side rendered page that has finished loading in a second.
kragen
That's intended as an unreasonably high upper bound. On my cellphone, in Termux, python3 -m cgi takes 430–480ms. On my laptop it takes 90–150ms. On your server it probably takes less.
I agree that tens of milliseconds of latency is significant to the user experience, but it's not always the single most important consideration. My ping time to news.ycombinator.com is 162–164ms because I'm in Argentina, and I do unfortunately regularly have the experience of web page loads taking 10 seconds or more because of client-side JS.
nurettin
<PHP> I would like to have a word.
Tractor8626
2400 rps on this hardware on hello world application - isn't it kinda bad?
And we trading performance for what exactly? Code certainly didn't become any simpler.
kqr
It's not great, but it is enough for many use cases. Should even handle a HN hug of death.
Tractor8626
But why? What advantages we getting?
kqr
Hypothetically, strong modularisation, ease of deployment and maintenance, testability, compatibility with virtually any programming language.
In practise I'm not convinced -- but I would love to be. Reverse proxying a library-specific server or fiddling with FastCGI and alternatives always feels unnecessarily difficult to me.
slyall
It's only bad if you need to get more than 2000 rps
Which is only a small proportion of sites out there.
Tractor8626
If there is no some other advantages - it is just bad.
masklinn
> It's only bad if you need to get more than 2000 rps
Or if you don't want to pay for an 8/16 for the sort of throughput you can get on a VPS with half a core.
gred
I'd rather not pay for 8 cores / 16 threads, though...
withinboredom
Depends on where you are shopping. I pay €211 every month for 96 threads and 384 gb of ram (clustered) -- disks are small (around 1tb each), but I'm still nowhere near 50% utilization there.
kqr
I'd argue it's bad even if you get more than 1000 Bq of requests. You never want to approach 100 % utilisation, and I'd aim to stay clear of 50 %.
simonw
Also discussed yesterday: https://news.ycombinator.com/item?id=44464272
jarofgreen
Had a similar chat with someone recently after I used Apache for a side project in part because of it's .htaccess feature.
This let's you drop .htaccess files anywhere and Apache will load them on each request for additional server config. https://httpd.apache.org/docs/2.4/howto/htaccess.html
One big reason to avoid them was performance; it required extra disk access on every request and it was always better to put the configuration in the main config file if possible.
But now? When most servers have an SSD and probably spare RAM that Linux will use to cache the file system?
Ok, performance is still slightly worse as Apache has to parse the config on every request as opposed to once, but again, now that most servers have more powerfull CPU's? In many use cases you can live with that.
[ Side project is very early version but I'm already using it: https://github.com/StaticPatch/StaticPatch/tree/main ]
rollcat
Quoting Rasmus Lerdorf:
> I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say "Yeah it works but you're leaking memory everywhere. Perhaps we should fix that." I’ll just restart Apache every 10 requests.
PHP got a very long way since then, but a huge part of that was correcting the early mistakes.
> PHP 8 is significantly better because it contains a lot less of my code.
faizshah
I’ve also thought about this moreso as part of a workflow for quickly prototyping stuff. At least for a lot of the modern JIT languages I believe their startup times will be dominated by your imports unless you go with a fastcgi model. This came up as I started adopting h2o web server for local scripts since it has clean and quick to write config files with mruby and fast-cgi handlers and is also crazy fast: https://h2o.examp1e.net/configure/fastcgi_directives.html
Another place this can be useful is for allowing customers to extend a local software with their own custom code. So instead of having to use say MCP to extend your AI tool they can just implement a certain request structure via CGI.
dolmen
An MCP frontend to CGI programs would not be a bad idea for a end user environment.
This makes me wonder if an MCP service couldn't be also implemented as CGI: an MCP framework might expose its feature as a program that supports both execution modes. I have to dig into the specs.
phplovesong
The way "old" stacks like PHP work also makes it impossible to do stateful stuff, like websockets.
There is work-arounds but usually it a better idea to ditch PHP for a better technology more suited for modern web.
johnisgood
Better technology? Please, do tell.
And for your information, you can have stateful whatnots in PHP. Hell, you can have it in CSS as I have demonstrated in my earlier comments.
petesergeant
> The nascent web community quickly learned that this was a bad idea, and invented technologies like PHP
Well ackshually ... the technology here that was important was mod_php; PHP itself was no different to Perl in how it was run, but the design choice of mod_php as compared to mod_perl was why PHP scripts could just be dumped on the server and run fast, where you needed a small amount of thinking and magic to mod_perl working.
fcatalan
At that time I was developing with a friend what later was called a Learning Management System: It had content management, assignment uploads, event calendar, grade management, real time chat, forums... It was all plain C via CGI and it was hell to work with.
What almost brought us to tears the day we learned about PHP was how everything we had been painstakingly programming ourselves from scratch reading RFCs or reverse engineering HTTP was just a simple function call in PHP. No more debugging our scuffed urlencode implementation or losing a day to a stray carriage return in an HTTP header...
simonw
Right, but mod_php was an early addition to the PHP ecosystem and quickly became the default way of deploying it - I believe the first version of the Apache module was for PHP/FI Version 2.0 in 1996: https://www.php.net/manual/phpfi2.php#module
petesergeant
It was indeed, and I spent much time wailing and gnashing my teeth as a Perl programmer that nothing similar existed in Perl.
AdieuToLogic
> It was indeed, and I spent much time wailing and gnashing my teeth as a Perl programmer that nothing similar existed in Perl.
mod_perl2[0] provides the ability to incorporate Perl logic within Apache httpd, if not other web servers. I believe this is functionally equivalent to the cited PHP Apache module documentation:
Running PHP/FI as an Apache module is the most efficient
way of using the package. Running it as a module means that
the PHP/FI functionality is combined with the Apache
server's functionality in a single program.
0 - https://perl.apache.org/docs/2.0/index.htmlxnx
Indeed. Perl was better in many ways, but not in the one that mattered to its continued viability.
slashdave
Maybe my memory is bad, but I don't remember people jumping ship from CGI-bin because of performance. I do remember a lot of security problems.
kragen
I remember performance being the main reason people jumped ship from CGI in the period 01995–02002. The switch didn't solve security problems itself (except Shellshock, if you wrote CGI scripts in bash, but Shellshock wasn't publicly known until much later) but it sometimes came with a less slapdash approach to building web services which did solve security problems. On the other hand, it often instead came with a move to PHP, which had just unbelievable levels of security problems.
It's possible that your experience with people switching was later, when performance was no longer such a pressing concern.
sitharus
The main security issue I recall from CGI was caused by the web server having to execute the binary. This meant either executing as www-data, running the web server as root so it can call setuid, or using setuid binaries which have their own issues.
These were real issues on multi-user hosts, but as most of the time we don’t use shared hosting like that anymore it’s not an issue.
There were also some problems with libraries parsing the environment variables with the request data wrong, but that’s no different from a badly implemented http stack these days. I vaguely recall some issues with excessively log requests overflowing environment variables, but I can’t remember if that was a security problem or DoS.
aaronblohowiak
Ehhhhhhh. I believe but cannot cite that fork() got a lot cheaper over the last 30 years as well (independent of machine stats, I believe the Linux impl is inherently cheaper now but I can’t remember the details.) cgi bin works really well if you don’t have to pay for ssl or tcp connections to databases or other services, but you can maybe run something like istio if you need that. I have long thought that (fast)cgi is better model than proprietary “lambda” /“faas”. But the languages de jure and vendor lock in didn’t favor a standards based approach here.
AdieuToLogic
> I believe but cannot cite that fork() got a lot cheaper over the last 30 years as well ...
The fork[0] system call has been a relatively quick operation for the entirety of its existence. Where latency is introduced is in the canonical use of the execve[1] equivalent in newly created child process.
> ... cgi bin works really well if you don’t have to pay for ssl or tcp connections to databases or other services, but you can maybe run something like istio if you need that.
Istio[2] is specific to Kubernetes and thus unrelated to CGI.
0 - https://man.freebsd.org/cgi/man.cgi?query=fork&apropos=0&sek...
1 - https://man.freebsd.org/cgi/man.cgi?query=execve&sektion=2&a...
seabrookmx
You should check out OpenFaaS. It uses a very CGI-inspired architecture for self hosting "functions" (in the lambda sense) on more conventional infra.
jekwoooooe
But why? A lot of modern stacks aren’t just about performance but development speed and stuff
petesergeant
Tangentially related, I wrote a brief history of CGI and the evolution away from it here (as conference slides):
"A brief, incomplete and largely inaccurate history of dynamic webpages"
https://www.slideshare.net/slideshow/psgi-and-plack-from-fir...
Even with things like Python, CGI is pretty fast these days. If your CGI script takes a generous 400 milliseconds of CPU to start up and your server has 64 cores, you can serve 160 requests per second, which is 14 million hits per day per server. That's a high-traffic site.
That is, if your web service struggles to handle single-digit millions of requests per day, not counting static "assets", CGI process startup is not the bottleneck.
A few years ago I would have said, "and of course it's boring technology that's been supported in the Python standard library forever," but apparently the remaining Python maintainers are the ones who think that code stability and backwards compatibility with boring technology are actively harmful things, so they've been removing modules from the standard library if they are too boring and stable. I swear I am not making this up. The cgi module is removed in 3.13.
I'm still in the habit of using Python for prototyping, since I've been using it daily for most of the past 25 years, but now I regret that. I'm kind of torn between JS and Lua.