How hard would it be to display the contents of an image file on the screen?

88 comments

·January 16, 2025

lmm

I'm amused by how quickly he shifts from "all existing image viewers have silly bugs, no-one knows what they're doing, I want to understand everything" to "here's some code I copy-pasted that seems to work". What a deeply quixotic approach to, uh, everything in this blog post.

genewitch

after having read the entire thing, i don't find the same amusement. In fact, you concatenated disparate quotes and put them out of order.

1: Image viewer [libraries] have silly bugs

   a. no one knows what they're doing?

2. Copy and paste code to get images on the screen

   a. not very accurately

   b. very slowly

3. I want to understand everything

   a. here's sRGB, here's Y'UV  

   b. here's tone mapping  

   c.  etc

the prose and overarching methodology reminds me of frinklang's data file[0] (https://frinklang.org/frinkdata/units.txt) which is... hilariously and educationally opinionated about ... units and measurements and conversions.

the writing style comports with how i approach problems and may not jive with other methodologies, maybe? If i have some problem i generally will try copying and pasting (or copilot.exe or whatever) just to see if the problem space has any state of art. If that does what i want, great. I'm done. if i need to do what i want 10,000 times, i usually will dig in and speed it up, or add features (event handling, etc)

I also like scathing commentary about annoyances in technology, and this article had that, for sure. the webp aside was biting!

[0] huh, this file has been updated massively since i last viewed it (mid 2010s) - it looks like earth has made some advancements in definitions of primatives - like Ampere - and things might be more exact, now. Hooray!

yoz

Thank you for reminding me of Frink. Alan Eliasen's work is a delightful rabbit hole, and I'm so glad he's still maintaining and improving[1] Frink.

[1] https://frinklang.org/experimental.html#FrinkTNG

metabagel

This comment exemplifies the best of HN - respectful and informative.

lmm

The scathing, self-righteous arrogance, coupled with the evident genuine insight and talent, is why I have to either laugh or cry when he goes down the route of building yet another image viewer that will very clearly come with its own share of issues, rather than putting in the legwork to actually improve the situation.

raincole

I guess we're reading two very different posts then.

Animats

He missed JPEG 2000.

It's used by the US National Map, the National Geospatial Intelligence Agency, CAT scanners, and Second Life. JPEG 2000 lets you zoom in and access higher levels of detail without reading the whole file. Or not zoom in and read a low-rez version from a small part of the file. So it's good for map data.

Deep zooming doesn't come up much in web usage, and browsers don't support JPEG 2000.

The JPEG 2000 decoder situation isn't good. OpenJPEG is slow and has had several reported vulnerabilities. There are better expensive decoders, and GPUs can be used to help with decoding, but the most popular decoders are slow.

Scaevolus

Arguably the largest usage of JPEG 2000 is as the format movies are distributed to theaters (Digital Cinema Package), and there's been recent work to make decoding faster with HTJ2K.

Animats

That's a different codec for a different application. "Unlike J2K-1, the HT coder is not fully embedded and hence quality scalability is largely sacrificed." [1] The outer file format with the metadata has some commonality, but that's about it.

[1] https://ds.jpeg.org/whitepapers/jpeg-htj2k-whitepaper.pdf

rasz

Also RED camera after some obfuscation.

aidenn0

PDFs support j2k, so every PDF renderer includes a j2k decoder. I used this fact for a while because j2k significantly outperforms most JPEG encoders on line-art (e.g. comics). I switched back to JPEG recently though as there is now a JPEG encounter that targets high bpp only uses, and is only like 15% larger, while being fire and forget (as opposed to open JPEG, where I needed to adjust the quality factor depending on the source material).

move-on-by

What are your thoughts on JPEG XL? Since you mentioned browser support, JPEG XL is supported in Safari, but of course nothing else. My understanding is that it pretty much has all the same functionality as JPEG 2000.

astrange

From his footnotes:

> I was able to find a reasonable amount of AVIF HDR images, but HEIC HDR is nowhere to be found.

Anything taken by an iPhone camera is an HDR HEIF image… sort of. For backwards compatibility reasons it's an SDR image with an additional attached image called a gain map that HDRifies it. (This is mathematically worse for a reason I don't remember; it causes any picture taken at a concert with blacklights or blue spotlights to look bad. Once you see this you'll never unsee it.)

I believe the very newest Sony cameras can also save HEIF images, however I don't feel like spending $2500 to upgrade my second-to-newest A7 to a newest A7 to find out.

Lightroom also recently added HDR editing so maybe it can export them now?

alexbock

> (This is mathematically worse for a reason I don't remember; it causes any picture taken at a concert with blacklights or blue spotlights to look bad.

The iPhone camera sensor is prone to saturating and clipping the blue channel when strong light from a blue LED is in the frame. Once the blue channel clips at the maximum value, a typical HDR gain map won't do anything to restore more nuance to it because they're not designed to add high-frequency detail to a blob of clipped pixels with identical values in the base image.

karlmedley

Lightroom seems determined not to support HEIC export, even though iOS has trouble with all of Lightroom’s other HDR formats. From my testing:

- JPEG with HDR gain map is not supported

- AVIF and JXL HDR files look fine on initial export, but do not survive being sent anywhere from the Photo Library.

So far I haven’t found a way to export an HDR file from Lightroom and then share it with anyone, iOS or Android.

sgarland

Every time I read a story involving terminals, I am amazed that they function in the first place. The sheer amount of absurd workarounds for backwards compatibility’s sake is mind-boggling.

olddustytrail

Actually, what's really amazing is that the wheel has been reinvented so many times except for terminals.

Seriously, make something new that works either with ssh or wireguatd and cement your name in fame.

dietr1ch

There's a lot of things that could be reworked in much better ways if you drop backwards compatibility and think a bit about usability, but the real problem is the time it takes versus how "just fine" things work once you finally understand how to improve on old tools.

To make things worse, tools get much better they get once you start thinking hard about re-implementing things. It seems it takes a lot of hatred to start such a project, and starting is the easy part as it only needs the first 80% of the effort.

hnlmorg

Several options already exist:

- HTTP management interfaces

- RPCs such as what’s used in the Microsoft Windows ecosystem

- Agent-based configuration management / IaC tools such as Puppet

Each has their own strengths and weaknesses. But for all the criticisms people make about the terminal, and many of the complaints are completely justified, it’s often those weird eccentricities that also make the terminal such a powerful interface.

notsomeuser

speaking of .. https://arcan-fe.com/2025/01/27/sunsetting-cursed-terminal-e...

chungy

The reason terminal-land is like this, is precisely because the wheel has been reinvented a few hundred times. Basically XKCD 927, perpetually.

jdiff

Not really, because backwards compatibility with the most basic terminals has been maintained throughout it all. Instead of fragmentation, what we have is a very inconsistent soup.

tetris11

Idea: since most of us are looking at terminals through a framebuffer, why not just have a terminal image command that works as follows:

1. Prints a bunch of blank lines corresponding to the height of the image

2. Renders the image/video using the fbdev in the space provided.

3. Scrolling the terminal whilst viewing is handled by moving/cropping the fbdev-rendered media as appropriate?

null

[deleted]

ChrisMarshallNY

You want an idea of how tough it is, write a universal TIFF[0] reader (not writer -writers are simple).

Fun stuff. BTDT. Got the T-shirt.

[0] https://www.itu.int/itudoc/itu-t/com16/tiff-fx/docs/tiff6.pd...

astrange

Isn't TIFF one of those file formats that can contain just about anything, making this just about impossible?

Like, DNG files are TIFFs, so now you need a raw camera decoder, which is basically subjective.

tialaramex

My favourite part of TIFF is what they do about y-ordering.

See, sometimes people think images start at the top, so the first data at y=0 is at the top of the image, other people think they start at the bottom like a graph you draw in maths, were y=0 is clearly at the bottom of the image.

So TIFF says: That should be a parameter of the image.

Why? Well, TIFF came into existence because of scanners, when scanners were first invented each scanner would have its own data format - you're not going to store all this data because that's expensive - who owns that much tape? But when it's scanned clearly the bits you get have some sort of arrangement, maybe a bright white area is 1 and black is 0, maybe the opposite. That's kind of annoying, lets agree a standard.

OK, so as Scanner Maker A my proposed standard is: Exactly what my popular A9 Scanner does

No! As scanner maker B, clearly the standard should be what our BZ-20 model does

No! Everybody at scanner maker C knows the obvious thing to do is derive the standard from the behaviour of our popular C5 and C10 scanners!

Result: The TIFF standard says all of the above are OK, just add header data explaining what's going on. Since some scanners would scan a page from the top, those say y=0 is at the top, those vendors whose scanner works the other way say y=0 is at the bottom!

nayuki

> My favourite part of TIFF is what they do about y-ordering.

Windows BMP (DIB) also has the convention that y=0 is at the bottom. https://en.wikipedia.org/wiki/BMP_file_format

ChrisMarshallNY

Apple's QuickDraw GX did that, as well (I believe -it's been a long time). I think it was because it was based on display Postscript.

What a huge PItA.

ChrisMarshallNY

Yup. Couldn’t do the whole spec, even after about six months of continuous work.

I was young and stupid, back then.

I learned about not biting off more than I could chew. Important lesson in humility.

ChrisMarshallNY

Just to add some closure to this, the libTIFF library[0] comes fairly close to a universal reader. It’s been around, since the 1980s, so has had time to sand off the rough edges. It’s still being maintained and extended.

It's fairly "low-level," so could probably benefit from Façades.

[0] https://libtiff.gitlab.io/libtiff/

edflsafoiewq

BMP is pretty gross too, though fortunately far less useful.

secondcoming

What's gross about BMP, it's one of the easiest image formats out there.

genewitch

my limited understanding is it's basically just raw RGB values as binary, with just enough metadata to put it on the screen. BMP from my understanding is, like most things microsoft from that era, just a memory dump of the structure/array holding the image in memory.

I wonder if grandparent meant something else, but i don't know enough this instant to guess at what other format?

mananaysiempre

Is that even possible? I once tried to make a matched pyramid (JPEG-in-)TIFF reader/writer pair and to retain some compatibility with other people who do this sort of thing (GIS, medical images, etc.). Virtually nobody agrees on how you’re supposed to do it[1]: some prescribe storing smaller versions as siblings (a NextIFD chain started by the main image), some as children of the largest one (a NextIFD chain pointed to by the SubIFD link of the main image), some as children each other (a SubIFD chain started by the main image), some as both simultaneously (a chain of identical SubIFD and NextIFD fields started by the main image). And I mean, I could decide on something for my writer. But now I’m a reader and I get a TIFF file with some IFDs somehow linked by the NextIFD and/or SubIFD fields. WTF am I supposed to do with it? Is it a pyramid? Is it a multipage document? Is it a birdplane^W a sequence of pyramidal pages? I suppose I can walk the whole thing and construct a DAG, but again, how the hell can I tell what the DAG means?

(And don’t take this as a knock against TIFF in general—as far as I know, it’s one of the few image formats that takes the possibility of large and larger-than-memory images seriously. I think HEIF also does? But ISO paywalled it after first making it publicly available, so, hard pass.)

[1] Here’s a writeup that comes to similar conclusions: https://dpb587.me/entries/tiff-ifd-and-subifd-20240226

ChrisMarshallNY

We did a similar format, to allow editable "JPEGs." The IFD of a JFIF container was normal, except that we added a second entry, with the raw source of the image.

Traditional readers saw a JPEG (although an unusually obese one), but our software could access the second entry, which contained the raw source, and the control parameters for all the processing steps that resulted in the JPEG, so we could treat the image as nondestructive, and reversible.

It was never actually released, if I remember, but it may well be patented.

TIFF was originally designed to store drum scanner data, in realtime, so it uses strips, as opposed to tiles.

ChrisMarshallNY

This was in the ‘90s. About the time the ink was still wet on that spec.

It was in C++, and I couldn’t do 100%, but I probably got about 80% (but not so performant). The weirdest thing, if I remember correctly, was pixel data with different sizes between stored components.

aardvark179

Ah yes, that was always entertaining. All the different ways additional metadata could be encoded was so much fun if you were dealing with geographical data.

pornel

PNG and JPEG are simple enough that a single person can write a useful decoder for them from scratch in a weekend or two.

The newer formats achieve better compression by adding more and more stuff. They're year-long projects to reimplement, which makes almost everyone stick to their reference implementations.

tomrod

The newer ones are destined to failure by complexity then?

astrange

Newer image formats are based on video codecs, so if you already have the video codec around then theoretically it's not too bad.

lytedev

Nah the space savings can significantly cut down on bandwidth costs at scale. They'll get (and have been?) pushed by Google and friends for that reason.

null

[deleted]

userbinator

Same for GIF. I've written decoders for all 3.

qingcharles

PNG and JPEG both have ICC color profiles, which complicates things.

Even most Windows programs (including Windows Explorer thumbnails) don't display images correctly, which is infuriating.

pornel

ICC isn't too complex itself, but the bolted-on design of color profiles makes them annoying to handle, and easy to ignore.

You can't just handle pixels, you need to handle pixels in a context of input and output profiles. That's a pain like code-page based text encodings before Unicode, and we haven't established a Unicode equivalent for pixels yet.

hnlmorg

A “Unicode for pixels” is still just pixels.

The problem colour profiles solves is about how the monitor should display those colours. It’s so that what you see on the screen is going to be exactly the same shade of CMYK as what gets printed.

It’s a big problem for magazine (and equivalent) publishing. Movies too. But much less of an issue for other media industries which are targeting end user devices like smart phones and laptops.

The equivalent in typefaces would be the font rasterisation itself (like Microsoft Clear Type) rather than code pages.

Vampiero

> It may be worth noting that Chafa uses more of the block symbols, but the printouts it makes look ugly to me, like a JPEG image compressed with very low quality.

It may be worth nothing that Chafa can be configured to only use ASCII 219

capitainenemo

Yeah, that's one of many things I love about chafa. The charsets are very flexible allowing me to get something decentish working even in the limited default fonts and no fallbacks in putty on windows. Or blacklist stuff that looked bad in my preferred linux font.

Another nice thing about it is the author made an ffmpeg patch (playable using -c:v rawvideo -pix_fmt rgba -f chafa) that was pretty darn handy for quickly triaging videos on a remote server without having to relay them to a more usable terminal.

And it has sixel support too if you happen to be in a terminal that supports that.

And since he's delegating to imagemagick, it has loaded every image format I've thrown at it, including RAW.

pbmahol

imagemagick does not support RAW video formats.

capitainenemo

I think you misunderstood. The video is using ffmpeg. I was talking about RAW from a camera which my imagemagick delegates to ufraw. But running ldd on my copy of chafa I see it now links to a ton of graphic libs directly.

shiomiru

Sixel is a fun side quest as well, if you want to encode it on your own. You get to work on a color quantization and/or dithering algorithm in $CURRENT_YEAR.

And if you're doing text with images, integrating both Sixel and Kitty is especially painful because they have completely different display models.

(In particular, Sixel is cell based, so you get Z-ordering for free - with the caveat that writing text on top of images destroys the cells.

Meanwhile, Kitty has Z-ordering, but it's per-image, so e.g. to draw a menu on top of an image that partially covers text, you must send a new image with space for the menu erased...)

For those who are interested, the best resource I've found on this is the notcurses author's wiki: https://nick-black.com/dankwiki/index.php/Theory_and_Practic...

HarHarVeryFunny

ASCII renditions of photos have been around basically forever. Certainly printing these out on a line printer (80x24 screen was too small) was a thing in the mid 70's, and I'd bet go back to the first scanners.

Maybe the first quantized picture credit should go to Roman mosaics which quality wise are about the same as a lo-res JPEG.

Theodores

The author questions whether anyone is using the modern web formats. As I see it, nobody should be using these formats, they should just be served by content delivery networks when JPG and PNG images are requested. The idea being that graphic artists and programmers work in JPG and PNG and the browser request these image formats to actually get webp, AVIF or whatever is the latest thing.

Now, if you do right click, save image, it should then make another request with a different header that says webp, AVIF or whatever is not accepted, for the original JPG in high resolution with minimal compression to be downloaded.

account42

And TFA instead just serves AVIF with no fallback :(

OhMeadhbh

I just use ImageMagick. I put the following function in my ~/.bash_aliases:

   scat () {
     if [ "" != "$1" ]
     then
        COLS=`tput cols`
        if [ "$COLS" -gt "96" ] ; then COLS=96 ; fi
        convert "$1" -resize $(( $COLS * 9))x^200 -background white -flatten sixel:-
     fi
   }

layer8

> [about the (R)IFF format] Having a generic container for everything doesn’t make sense. It’s a cool concept on paper, but when you have to write file loaders, it turns out that you have very specific needs, and none of those needs play well with “anything can happen lol”.

Well, it's not like PNG and SVG are any different.

HarHarVeryFunny

RIFF (used for .wav and.avi files) was just a pure container format. The actual payload content was compressed/represented by an open-ended set of CODECs, as indicated by the "FOURCC" (four character code) present in the file.

layer8

I know, but what is the practical difference for the file loader? In both cases you have formats with open-ended extensions (PNG chunks and XML elements), and in both cases you have to make sure that the overall format matches what you expect.

jdiff

For SVG there are standards that define how to interact with unknown elements, and a set of standard elements which are actually part of the spec. With PNG, there's a minimum set of chunks that are required, and you can safely stick with those and get something that works for the vast majority of cases.

This is totally different from a container which can contain any type of data in any type of format. If you get a valid PNG, you can load something meaningful from it. If you get a generic container, you might be able to inspect it at a surface level but there's no guarantee that the contents are even conceptually meaningful to you.

HarHarVeryFunny

With RIFF you not only need to be able to handle the container format, but also the specific type of payload content, which for video varied from simple uncompressed YUV formats like Y41P to proprietary compressed ones like WMV1 (Windows Media Video). Being able to handle the RIFF format therefore had no bearing on whether you'd be able to extract data from it.