Winners of the $10k ISBN visualization bounty

69 comments

·February 25, 2025

matthberg

The winning submission [0] was discussed on HN recently [1]. It's highly impressive from both technical decisions and graphic design viewpoints, it somehow elegantly visualizes 2 billion books (in a way that resembles a bookcase no less).

[0]: https://phiresky.github.io/blog/2025/visualizing-all-books-i...

[1]: https://news.ycombinator.com/item?id=42897120

c-fe

Im slightly surprised mine won 3rd place, I believe they liked my simplicity and visualisation. Hosted at https://isbnviz.pages.dev

But honestly, I find both of these better: - https://bwv-1011.github.io/isbn-viewer/ - https://anna.candyland.page/map-sample.html

in particular the one from bwv is technically similar but just all around better than mine, it is what I would want mine to be

abetusk

I'm also surprised that I got 3rd place.

But in terms of comparison of yours to bwv, I don't agree that bwv's is technically superior in every way. It lacks comparison, ISBN selection and link creation. bwv's main focus looks to be that one feature to highlight the rare books without trying to get the other requirements that AA wanted.

c-fe

Congrats to you too! Indeed, I think they could have improved the visual and comparison part, its a bit dark and not too interesting to look at. But I am envious of how smooth their tiling is. My tiles are 4096x4096 which allows me to satisfy both the 20,000 file limit and the max 20mb file limit imposed by cloudflare. I had some issues with smaller tiles, and wanting to host it on cloudflare restricted me from doing 512x512 tiles iirc. Also I really like that they extracted the publisher information and put that as a pmtile vector, thats something I attempted but ultimately ran out of time with.

matsemann

What is it that make yours and bws' have a floating island with spain/italy/++ in addition to them being represented in the main blob?

c-fe

Its due to how those ISBN ranges were handed out - I think they probably gave a block like 978-53 (for example) to those countries, meaning the right to distributed ISBNs 978-530-000-000 to 978-539-999-999 and then later they ran out or had all subblocks distributed to publishers, and then they got a new block further away (so not 978-54 in my example) and therefore those blocks are not numerically close to each other and thus also they are separate "islands" in the hilbert space.

matsemann

I see, thanks for explaining. Cool that your visualization then shows these idiosyncrasies!

highcountess

I’m glad you said that, because I was also surprised by the fact that the bwv-1011 only made it to honorable mention even though its technical focus was on visualizing the rarity of books, which ostensibly was the primary objective of the whole effort.

gknoy

I really like that your page talks about _why_ a Hilbert curve is good. I don't remember ever learning about those before, and now hopefully if I'm ever trying to visualize 1D data, I might remember that :)

rahimnathwani

This is amazing.

One thing I found odd.

I searched for 'Stubborn Attachments' which worked.

On the same bookshelf there are several other Stripe Press books.

One of them is called Zero to One Hundred, by Stephanie Friedman.

When you search that book on Amazon, it has a different title, which I guess is reasonable as the book hasn't been published yet and they may not have finalized the decision: https://a.co/d/bQX5CNf

Here's where it gets weird:

- if you search for the book 'Zero to one hundred' (the title shown on the 'shelf') it doesn't come up

- if you search for the book by its ISBN, it does come up, but the name displayed in the search results is yet another alternate title. And the bookshelf displays that title. So the same part of the bookshelf looks different depending on what you searched for.

I haven't yet read the blog post about how this impressive visualization works, so I don't have an idea of why this is the case.

spondyl

I don't think it's the tool that's the issue, I think it's the book itself?

If you search the ISBN on the web, you'll get "Zero to One Hundred" with the cover of "Built to Grow" and vice versa.

There's also "Experiment, Build, Scale" which is the book that the visualisation shows, also with the same ISBN attributed to the previous two.

Experiment, Build, Scale seems to be the only book of Stephanie's that is in Google Books while Worldcat has "Zero to One Hundred" with the cover art for "Built to Grow".

Most of the online bookstore pages have this mess so I wouldn't blame the tool for what seems like an upstream data quality issue.

closewith

> Most of the online bookstore pages have this mess so I wouldn't blame the tool for what seems like an upstream data quality issue.

I think that's an uncharitable read of the GP's comment. I read it as curiosity about how the upstream data issues present in the tool, which also interests the part of my brain that likes to solve minor mysteries.

rahimnathwani

Sorry I didn't mean to make it seem like I think the tool is at fault.

I just think it's interesting that the book title shows differently on the shelf depending on whether you reach it via an ISBN search, vs. if you discover it by panning from a nearby book.

TomK32

Fascinating. It allows for some interesting observations when you as I zoom in on this one (sadly no direct links to coords/zoom level) https://archive.anarchy.cool/maps/isbn.html You can find publishers like Hueber Verlag[1] in the eastern part of the German language section. They spread their ISBN numbers in a pattern with something like 1360000 between them (I know, ISBN having a checksum leads to gaps in the numbering), which generates a repetitive pattern with plenty of empty space. It is so wasteful on this huge chunk they have.

Are there no rules on how publishers have to assign their numbers? Just so they could hand back an unused block if they don't need it any longer.

[1] I can see how publishing learning material in 30 languages can give people "ideas" when assigning ISBN numbers https://de.wikipedia.org/wiki/Hueber_Verlag

bawolff

I feel like visualizations of large datasets which are viewer-directed (i.e. they want you to "explore" the data instead of trying to tell you something specific about it or communicate a narrative) are often "pretty" but never particularly enlightening. I feel like that holds true for these in particular.

pphysch

That's my issue with attempts to 3D-ify viz. Unless you are actually modeling a 3D volume, like medical imaging or CAD, the added "forced exploration" of 3D simply hides insights.

WillAdams

The thing is, ISBNs map to:

- publisher - assigned title - (roughly) order of publication

That's all that they communicate --- there is no hierarchy here to aid in discovery or to organize the content (and further complicating things, the same text may appear multiple times in a different binding --- a differentiation which is immaterial to an e-book).

The elephant in the room of course is the matter that "Anna's Archive" is not a legitimate book repository, but a piracy site, so what they are showcasing is how compleat (and brazen) their theft (and attendant lack of compensation) is.

This would be far more interesting if it were based on an hierarchical system such as LoC, and instead afforded an interface for accessing legitimately available books as are available from https://www.gutenberg.org/ or listed at: http://onlinebooks.library.upenn.edu/ or worked on at: https://www.wikibooks.org/

bawolff

> The thing is, ISBNs map to: > - publisher - assigned title - (roughly) order of publication

I assume the task isn't just to visualize isbns literally. Presumably you are allowed to cross reference with other data.

> The elephant in the room of course is the matter that "Anna's Archive" is not a legitimate book repository, but a piracy site,

I think its pretty clear that the target audience doesn't care. I don't think the target audience holding differing political views is really a valid critcism of the project. It should be evaluated in the context and audience it was created for.

WillAdams

This is not a political stance, but one of basic questions of authorship and what compensation authors should receive and what control they should have over their work.

See arguments by Alexander Pope in Pope _V._ Curll.

zozbot234

> This would be far more interesting if it were based on an hierarchical system such as LoC, and instead afforded an interface for accessing legitimately available books

Isn't this exactly what Open Library does?

WillAdams

Given that "Textbooks" are separated out and "Animals" and "Childrens' Books" and "Health & Wellness" are top-level categories? and that it mixes in books which are not available for download, not really.

The UI is not all that great either.

I would like to see:

- an hierarchical list with a hierarchy which actually makes sense and truly organizes knowledge

- of legitimately available downloadable books

- which has a nice UI

but it's far more important that LLMs have training data without consideration of recompense than any other consideration.

dylan604

I had a Pavlovian response to reach for the defrag program at first sight of the top image.

2Gkashmiri

win 98 had the best animation. pity everything beyond that was dogshit

robingchan

This was great fun to enter nevertheless, congrats all involved.

My entry is still live for now for anyone curious:

https://d199hl4t3ts6d9.cloudfront.net/

ofou

My most sincere love to all shadow libraries out there, you're doing god's work.

xtracto

They do half of the work (which is a helluva lot)... the other half is done by the volunteers that digitize books.

I was looking at my country's "shelve" and it's so sad to see so many missing titles. I almost wanted to go to my local livrary and digitize sone of them. The old ones that are out of print and imposible to acquire right now...

So much knowledge lost.

FabHK

To be fair, the authors of the books also contribute quite a bit.

franciscop

I'm curious why there's no clear "Spanish" in these ISBN visualizations; there's 2 slots for English, one for France, Germany, Japan, Soviet Union, China, etc. but no big one for Spain. Do we really have so few books in Spanish? Or is this a predominantly English distribution?

I say this as someone who grew up in Spanish libraries and book shops, surrounded and immersed in Spanish books, so it feels a bit strange to see the tiny bit we occupy in the world map here.

rsecora

The dataset consists of books from the Anna Archive, each identified by an ISBN. The ISBNs and titles are extracted from datasets [1], which include magazines and books primarily in Chinese, English, and French.

Example: Germany publishes five times more books than the Netherlands [2], and Spain publishes twice as many books as the Netherlands. However, in visualizations, Germany appears similar to the Netherlands, while Spain and Mexico do not aligned with the high-level labels [3].

[1] https://annas-archive.li/datasets

[2] https://internationalpublishers.org/wp-content/uploads/2023/...

[3] https://software.annas-archive.li/AnnaArchivist/annas-archiv...

glenstein

>I'm curious why there's no clear "Spanish" in these ISBN visualizations

I had the exact same question, and I do have a completely unsupported theory. There's one large block that appears to be Argentina, or possibly Peru, although their titles are on the fringes of the large block. The block is otherwise unlabled, no name sitting at the center of the block like you see with the other major ones. I would be slightly surprised if it were entirely argentina, but it would make a lot of sense if that block were Spanish.

bondant

The winning submission kind of remind me of the Eagle mode file manager where you can zoom into a directory to see files in it and keep zooming to access subdirectories.

https://eaglemode.sourceforge.net/emvideo.html

soneca

Where the database is from? How and how often is it updated?

I have two self-published books with ISBNs. Neither of them has the details in the 1st place submission (I assume it won’t be in any other as well?).

One was published on Feb 23 and the other on Dec 24. I had hoped at least the older one would be there. Does anyone know why they are not?

The ISBNs:

- 9786500718836

- 9786501276830

ziddoap

From https://annas-archive.org/blog/all-isbns.html :

>We started mapping ISBNs two years ago with our scrape of ISBNdb. Since then, we have scraped many more metadata sources, such as Worldcat, Google Books, Goodreads, Libby, and more. A full list can be found on the “Datasets” and “Torrents” pages on Anna’s Archive. We now have by far the largest fully open, easily downloadable collection of book metadata (and thus ISBNs) in the world.

So, it your books would need to be present in one of the databases that Anna's Archive scraped, at the time they scraped it.

layer8

Does Anna’s Archive track and account for duplicate ISBNs?

https://scis.edublogs.org/2017/09/28/the-dreaded-case-of-dup...