Skip to content(if available)orjump to list(if available)

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

48 comments

·November 5, 2025

Hi everyone,

For the past couple months I've been working on a website with two main features:

- https://book.sv - put in a list of books and get recommendations on what to read next from a model trained on over a billion reviews

- https://book.sv/intersect - put in a list of books and find the users on Goodreads who have read them all (if you don't want to be included in these results, you can opt-out here: https://book.sv/remove-my-data)

Technical info available here: https://book.sv/how-it-works

Note 1: If you only provide one or two books, the model doesn't have a lot to work with and may include a handful of somewhat unrelated popular books in the results. If you want recommendations based on just one book, click the "Similar" button next to the book after adding it to the input book list on the recommendations page.

Note 2: This is uncommon, but if you get an unexpected non-English titled book in the results, it is probably not a mistake and it very likely has an English edition. The "canonical" edition of a book I use for display is whatever one is the most popular, which is usually the English version, but this is not the case for all books, especially those by famous French or Russian authors.

vessenes

OK, I just added books until you told me I had too many. Fun idea! I have a couple of suggestions:

* UI - once someone clicks "Add" you really should remove that item from the suggested list - it's very confusing to still see it.

* Beam search / diversification -- Your system threw like 100 books at me of which I'd read 95 and heard of 2 of the other 3, so it worked for me as a predictor of what I'd read, but not so well for discovery.

I'd be interested in recommendations that pushed me into a new area, or gave me a surprising read. This is easier to do if you have a fairly complete list of what someone's read, I know. But off the top of my head, I'm imagining finding my eigenfriends, then finding books that are either controversial (very wide rating differences amongst my fellow readers) or possibly ghettoized, that is, some portion of similar readers also read this X or Y subject, but not all.

Anyway, thanks, this is fun! Hook up a VLM and let people take pictures of their bookshelf next.

mcbrit

I don't know. I entered, trying to be popular but at least slightly? opiniated:

Tigana, Hyperion, A Fire Upon the Deep, Blindsight, Moby Dick

and I got a list. Sure, read all that or wasn't interested for reasons, I added (only Neuromancer on initial recommendations):

Neuromancer, VALIS, Quantum Thief, Towing Jehovah.

List did not get more interesting.

Book recommendations are still kind of difficult.

mcbrit

If I provide that list, a (real) person doesn't ask me if I've read the Hobbit.

teaearlgraycold

I don’t think past liked books are nearly enough information to provide a good book for you today. You need a lot more information about the state of someone’s mind.

mcbrit

You're talking to a dude. (in my case.) I mentioned 8 books.

I won't tell you exactly what to do, but one way to do it is to measure your surprise with me choosing each of those 8 books when you provide a recommendation back to me of what I should read next. I think I get kind of that experience talking to someone about books.

The algorithm didn't do that.

jimmoores

I unexpectedly liked this. I thought the recommendations were actually useful.

yoz-y

It works pretty well in the sense that after inputting only a few quite diverse books it gave me recommendations for a lot of books that I’ve already also read and enjoyed.

I would also really like a possibility to add negative signal. It did also recommend books that seemed interesting to me but I ultimately didn’t like.

Overall quite impressive.

nwhnwh

I entered "Alone Together: Why We Expect More from Technology and Less from Each Other" and I received books about Steve Jobs, Harry Potter and "The Subtle Art of Not Giving a F*ck". Like how???

costco

[delayed]

BeetleB

> Provide 3+ books for best results.

aj_hackman

Thank you! Because of this, "The Making of Prince of Persia: Journals 1985–1993" by Jordan Mechner is on its way to my house.

qingcharles

You definitely will not regret that purchase. It's a very enjoyable read.

varenc

I love this site, and the approach! Great seeing someone making good use of Goodreads data.

Sadly my experience with the book recommender isn't too great because of the 64 book limit. If I import either the most recent or least recent 64 book, 95% of the books it recommends to me are books I've read. Though it was helpful for spotting a few books I've read that I didn't log on Goodreads. Guess I'm pretty consistent.

costco

I think I will expand the input books limit (sadly requires retraining) and or the output books limit of 30.

MattGrommes

This is cool but I'd love the option to filter out the author of the book you entered. I put in Shroud by Adrian Tchaikovsky and almost all the books are others by him, which is fine but doesn't really mix up the stuff I'm reading.

walthamstow

Works pretty well with cookbooks. Very cool work.

One suggestion would be to make the search less strict on diacritics. Searching for popular cook J. Kenji López Alt was only successful if I entered the correct O.

nsypteras

I'm impressed it recommended so many books i've already read and liked! I have a big reading backlog but once it's whittled down I will likely come back to this. One feature request would be to also show a "why this is recommended" for each recommendation so I can further narrow down the list for what I'm looking for

NitpickLawyer

Interesting. I tested it with sci-fi, and it definitely recommends good books, but not sure how accurate it is at surfacing the sub genres / themes. For example for [aurora -ksr, seveneves, project hail mary, ender's game] it gave me dune. Which is a great book, but not in the "first-ish contact" style I hoped it would be.

Another thing I noticed is that it tends to recommend 2nd and 3rd books in a series, which is a bit so-so. If I add the first book in a series, I probably already read the whole series...

28304283409234

Came here to say this (recommending book 2 and 3 in a trilogy). Great app otherwise!

__alexander

Care to share the scrapped data? I would love to play around with it.

costco

Not sure if I can. At the very least book descriptions most likely could not be distributed. There is an academic dataset with around 200M reviews though: https://cseweb.ucsd.edu/~jmcauley/datasets/goodreads.html

guelo

I'm surprised he got that much data. Goodreads uses several tricks to try to stop scrapers, for example pagination only works up to a few pages.

demaga

I am not sure about legal side of things here, but a Kaggle dataset would be really cool