Smallest Possible Files

46 comments

·May 27, 2025

arexxbifs

The 42 byte transparent GIF saw ample use in web development a quarter century ago, when it was used to create pixel perfect <table> layouts. Some things have changed for the better.

https://x42.com/test/gifdot.shtml?abcdef

JimDabell

The smallest GIF is still useful because it is the smallest possible valid favicon. This means you can stuff it into a data: URI to prevent useless requests showing up when you are working on something:

    <link rel="icon" href="data:image/gif;base64,R0lGODlhAQABAAAAADs=">

zamadatix

If you're just wanting to shut the request up and aren't actually trying to display a certain favicon you can do:

  <link rel=icon href=data:>

With the bonus you've probably already remembered how to reconstruct this on demand just by reading this comment. It is "invalid" data but so is your example on Safari and Firefox instead of Chromium based browsers. It doesn't matter as much because that problem is local and silent in the logs, unlike the request.

JimDabell

Thanks! I’m pretty sure I tried this ages ago and it didn’t work at the time, but I tried this again now and it does the job.

vbezhenar

You can also make an actually useful and readable SVG favicon this way:

    <link
      rel="shortcut icon"
      href='data:image/svg+xml,%3csvg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">%3ccircle cx="25" cy="50" r="20"/>%3ccircle cx="75" cy="50" r="20"/>%3c/svg>'
    />

JimDabell

Good to know! My goal is simply to stop a 404 popping up during development in the simplest way possible, so the smallest amount of code is best for me.

gudzpoz

A use case: https://news.ycombinator.com/s.gif (43 bytes) (use for comment indentation)

rollcat

It's kinda cool than HN looks OK even in simple browsers like Dillo:

<https://imgur.com/a/Seu8rYT>

However it's pretty bad on narrow screens. I wish there was some progressive enhancement via modern CSS, or at least just dark mode.

ayaros

Reminds me of https://github.com/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee/eeeeeeee...

adzm

I really appreciate the .gitignore file there https://github.com/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee/eeeeeeee...

DaSHacka

I love how even though the entire repo is essentially a shitpost, it still uses a CoC.

You know, to ensure cordiality in any of the various riveting PRs and discussions.

rollcat

Kinda. Empty files for so many languages, it would be interesting to see at least an exit(0) or so.

vitorfrois

yes what about the biggest possible files

jerf

Many of them are infinite, so you'd have to provide them as functions rather than files. There's obvious ones like plain text, but some less obvious ones, like, PNGs are defined as a series of chunks, but there's no chunk count in the header, so you can keep appending chunks forever: https://www.libpng.org/pub/png/spec/1.2/PNG-Structure.html

This sort of thing is not just a funny question, it's something you think about when you're writing scanners. For instance, another "biggest possible file" is the zip file that decompresses to itself[1], which is in some sense also an infinite file. Many a scanner has been written that will fill the disk then crash if presented with that file, which is actually more pathological behavior than would be experienced if the scanner isn't there.

[1]: https://research.swtch.com/zip

user32489318

Reminded me of a major “data”/“AI” platform that stripped all empty files when deploying the code. Because of “security” you were not allowed to list files on the deployed instance, nor review the deployment pipeline code or logs (“it just a works/batteries included”).

The most brilliant way to screw all Python developers I’ve ever seen.

Later learnt that the docker container run the code as root, so basically you could destroy the platform from within. Good times.

RainyDayTmrw

For context, this is because Python uses __init__.py files to indicate which directories are modules. They can contain contents, but quite often are empty placeholders with meaning. Removing those would make the corresponding Python modules invalid and invisible to the Python module loader.

ks2048

Pretty cool. But as everyone is pointing out, empty files aren't that interesting. 31/137.

    $ find . -name ".git" -prune -o -name "README.md" -prune -o -type f -print | wc -l
    137
    $ find . -name ".git" -prune -o -name "README.md" -prune -o -type f -empty -print | wc -l
    31

I suppose if you wanted minimal, non-empty examples, you'd end up with a "hello, world" collection, of which there are many, but nice that this handles file formats as well as programming languages.

aidenn0

The traditional minimal bourne-like shell script has a single ":" in it. This is because, when looking at an executable[1], bourne-alikes may try to detect if the file is binary to prevent executing a binary file. I don't know for a fact that some sh implementations will refuse to execute an empty file, but it seems likely.

1: If you try to run a program binary from a bourne-like shell and execl() signals ENOEXEC, then (if it believes it to be a text file) it will try to run it as a shell script; this makes shebangs optional for programs executed only from a shell. You can try it yourself (tested on bash, dash, ksh, fish, zsh, and osh):

  $ echo 'echo hi' > foo.sh
  $ chmod +x foo.sh
  $ ./foo.sh

LegionMammal978

Some of these files are very much nonstandard, even when the standard leaves no leeway (unlike HTML). E.g., every PDF standard requires an %%EOF, startxref offset, and an xref table (or an xref stream in the later versions), but this PDF file is missing those, among other oddities, like the page object missing a /Type and /MediaBox. Too bad the author doesn't specify which implementation these are supposed to work in.

dmd

For people who enjoy this sort of thing, vaguely related is this puzzle: https://dmd.3e.org/a-shell-puzzle/

xelxebar

Oh, you're the author! I didn't notice and sent you an email, but will repost here:

    $ for i in 3 4 5; do f=puzzle.$i; echo $f: $(head -1 $f | wc -c); tail -$((i-1)) $f; ./$f; done
    puzzle.3: 1
    futz
    futz
    ./puzzle.3: line 3: futz: command not found
    puzzle.4: 1
    futz
    futz
    futz
    ./puzzle.4: line 4: futz: command not found
    puzzle.5: 1
    futz
    futz
    futz
    futz
    ./puzzle.5: line 5: futz: command not found

Does this count?

RandallBrown

There must be some interesting code golf stuff hidden in here, but it seems like it's mostly empty files.

JimDabell

The linked blog post about the smallest possible valid (X)HTML documents is noteworthy, if only for the fact that a surprising amount of people adamantly refuse to believe that they are valid. Even when you think you have gotten through to them with specifications and validators, a lot of people will still think “yeah, but it’s relying on error handling though”. I’m not sure why “HTML explicitly permits this” will not be tolerated as a thought and somehow transforms into “HTML doesn’t permit this but browsers are lenient”. It’s a remarkably unshakeable position. And even the people who are eventually convinced that it’s valid still think that it is technically incorrect in some unspecified way.

jerf

"if only for the fact that a surprising amount of people adamantly refuse to believe that they are valid... And even the people who are eventually convinced that it’s valid still think that it is technically incorrect in some unspecified way."

Speaking from my personal experience, if your idea of "valid HTML" was created in the late 1990s or early 2000s, it's worth a spin through the current HTML standard. HTML has always de facto been permissive, but de jure it had certain requirements. However, HTML 5 essentially works by reifying a very, very well-specified algorithm for how to handle HTML "loosely" (even though it is very strictly specified), and then refactors away effectively every requirement it possibly can and defers them to that algorithm instead.

Technically speaking, as long as you put down the correct doctype, you can elide almost anything nowadays and get a functional document; for instance, "<!DOCTYPE html><title>Hello</title>" is fully standards compliant now (push it through [1]). Only thing the validator gives is a warning that you might like to specify a language in the doctype. It isn't just "browsers will pretty much do the 'right thing'" with that, which has been true for a long time... that's actually standards-compliant HTML now.

What a lot of old hands don't understand is that HTML 5 was a seismic shift in how HTML is specified. Instead of specifying a rigid language and then pretending the world is complying and it's super naughty of them not to, it defines a standard for extracting a DOM tree from effectively any soup of characters you can throw at it, compliance is loosened as much as is practical, and even when things don't comply there's a specification on exactly how to pick up the pieces. HTML 5 has a completely different philosophy than HTML 4 and before.

(Relatedly, the answer to the frequently-asked question "What is the BeautifulSoup equivalent for $LANGUAGE", at least as far as parsing, is effectively now "Find an HTML 5-compliant parser", which they all have now. Beautiful Soup's parsing philosophy was enshrined into the standard.)

[1]: https://validator.w3.org/nu/#textarea

JimDabell

It’s fair to point out the big difference in parsing philosophy between HTML 2–4 and HTML 5, but what I’m talking about happened before HTML5 as well. Some people can’t handle the fact that HTML intentionally has implied elements.

> <!DOCTYPE html><title>Hello</title>" is fully standards compliant now

Sure, but switch the doctype and put a <p> on the end, and it’s fully standards compliant HTML 4.01 Strict too. And yet so many people are adamant that it can’t be. That it’s invalid (even though a validator says it’s valid). That it’s relying on error handling (even though the spec. says otherwise). That some browsers parse it wrong (but they can never name one). That the DOM ends up broken (when browser dev tools show a normal DOM). That you need <html> and <body> elements (even though it already has both). That there’s something wrong with it at a technical level (even though they cannot describe what).

The concept “This is correct HTML that works everywhere with no error handling” is very difficult for some people to grasp, to a genuinely surprising degree.

currysausage

This is especially ironic, considering the same people will gladly use XML syntax and serve it as text/html. Historically, this has only worked because no relevant browser has ever implemented SGML (and NET [1], in particular), as required by HTML standards up to version 4 [2].

[1] https://en.wikipedia.org/wiki/Standard_Generalized_Markup_La...

[2] https://www.w3.org/TR/html401/conform.html#h-4.2

myfonj

> Historically, […] no relevant browser has ever implemented SGML […] NET

I can probably confirm that "relevant" part of this claim for the times spanning from the first decade of 2000s, but I still desperately (in a way) seek information whether ANY even niche and obscure application that consumed "HTML" treated the NET as specified back then. I am quite certain W3C Validator did (that Mathias' article proves that, after all) and that Amaya might have do that, since it was a reference implementation from the same spec body, IIRC, but cannot swear on that.

Have anybody here have a clearer recollection of that times, or even some evidence?

I still find it strange such feature had such prominent space in the specs back then, but practically nowhere else.

JimDabell

That’s not quite the whole story. Appendix C of the XHTML 1.0 specification provides HTML compatibility guidelines:

> This appendix summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents.

— https://www.w3.org/TR/xhtml1/#guidelines

And RFC 2854, which defines the text/html media type, explicitly states this is permissible to label as text/html:

> The text/html media type is now defined by W3C Recommendations; the latest published version is [HTML401]. In addition, [XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01 and which may also be labeled as text/html.

— https://datatracker.ietf.org/doc/html/rfc2854#section-2

However even browsers that support XHTML rendering use their HTML parser for XHTML 1.0 documents served as text/html, even though they should really be parsing them as XHTML 1.0.

But yes, that extra slash means something entirely different to the SGML formulation of HTML (HTML 2.0 to HTML 4.01). HTML5 ditched SGML though, so SHORTTAG NET is no longer a thing.

arexxbifs

The Python, Perl, Lua, etc. files are arguably valid quines.

eru

For eg the C examples, it depends a lot on which compiler you are using (and implicitly then also on which standard).

Wowfunhappy

...I feel like completely empty files shouldn't be allowed. Like, I realize the Python interpreter won't error if you feed it an empty file, but how can you really say that empty file represents a Python script if there is no script there?

However, I can't put my finger on what the correct rule would be.

ks2048

I guess if you can run `python myfile.py` and it finishes with without error (return code 0), you could consider it valid.

By that measure, there are also 1 byte valid Python programs (e.g. "1").

Wowfunhappy

But (at least for Python) that test also works on empty (0 byte) files, which is presumably why the repository says an empty file is the smallest possible Python program, but which feels wrong to me somehow.

ks2048

Yes, that was my point. And thus “also” for 1 byte programs

chasing

Okay, but what about the largest possible files?

afeezco

[flagged]

nivertech

File size of -∞ is the smallest

jotux

Not if the file size is -∞ - 1.