Show HN: I scrape Steam data every month and it's yours to download for free
38 comments
·February 24, 2025Apreche
Do you have data that https://steamdb.info/ doesn’t have?
noirscape
Steamdb lacks an API for one, and the devs officially have a policy that they'll never make one, saying you should just scrape Steam directly instead of bugging them about it[0].
It means that steamdb, while extraordinarily useful for casual prodding at what's stored on Valve's servers, isn't very good if you want to run data analysis or something like that on the metadata of Steam games at scale.
Not sure if it's legal to charge for the raw scrape when OP doesn't seem to be affiliated with Valve, but that's not up to me to figure out.
stared
Regarding Steam data, I am curious about how games are being played (hours spent) and, even more, about their co-occurrence (i.e., player X spent both time on game A and game B). I would love to make a visualization like https://p.migdal.pl/tagoverflow/?site=gaming&size=32, but for Steam data.
Also, for deeper insight than sales volumes (e.g., game design, general trends, demographics, types of players), such things would be crucial.
and
ghfhghg
I guess the main differentiator over steamdb is getting the data in CSV?
Might be good to clarify in the FAQ because the people I know who would pay for this are not the most techy types.
kmfrk
I got some answers that weren't specifically about my questions in some instances. As someone who's just trying out the free demo, it's not a big deal, but maybe you can provide a way to flag answers for to redeem their credits? It would probably increase retention and help people chase down bugs.
ddxv
Hi, I'm interested in scraping steam too. Do you have the scraper code available open source or one you recommend?
lolinder
Have you looked over the data that OP is providing here and determined that it doesn't meet your needs?
Generally it's polite to avoid scraping if you can help it, so I'd start by considering whether OP is already providing what you are looking for.
netruk44
I wrote a simple scraper for a 'steam game semantic search' app I built a while ago.
It definitely won't fetch all the data that this person does though. It only fetches the current list of games on Steam, their store page information and some reviews for the game.
The code quality probably isn't amazing, but it might give you an idea of how to get started with your own scraper.
https://github.com/Netruk44/steam-embedding-search/blob/main...
ddxv
Thanks! That's perfect, just want somewhere to get started.
DrammBA
https://steamdb.info/faq/#how-are-we-getting-this-informatio...
I found this explanation from steamdb that points to the various projects and libraries they use to gather all the data they have. It's not a how-to, but it has very useful info.
m00dy
If you need to be a paid member to download csv file, then it is not free :) lol
xerox13ster
If you need to make an account and give this guy personal information (a digital commodity like oil) to see the data it's not free lmao
stronglikedan
> If you need to make an account and give this guy personal information
In this case, you don't. That's just to weed out people who can't figure out temporary emails. I just used one to create an account without turning over any PI.
bitbasher
You use the chat but the credit used isn't updated immediately in the lower left.
bdd8f1df777b
It seems to be missing reviews? I have always thought about building my own recommendation engine from steam data, given how steam's own recommendation never works for me.
bloomingkales
Question for OP, or anyone that considered it:
Do you think Steam reviews are coordinated?
bluefirebrand
I think for basically any possible online discussion, from Facebook to Hacker News to Steam Reviews, you should always keep in mind that some portion of it is probably astroturfed, to some scale
Anything from a small indie game to a huge AAA title, you can bet that the creators got their friends and family to post some nice reviews early, just to give it that positive bump
bloomingkales
I was specifically alarmed by what looked like review bombing of a indie game. I just can't imagine it. I need to write a small llm plugin that collapses coordinated/astroturfed reviews.
bluefirebrand
The smaller the scale the easier to astroturf, honestly
If there are only 20 reviews it's pretty easy for one person to review bomb on their own if they want to
It gets much harder when there are 2 million reviews
somenameforme
Out of curiosity, what formula did you end up using for reviews:sales? I've looked into this a bunch and it's a very tough problem!
aranw
Nice! It would be nice however to see more detail about the data you collect and what exactly you provide on top of it using AI or through aggregation etc
giancarlostoro
> Yeah, there's AI, but I added it because I found it easier to find answers I'm looking for. For the data scientists, you can download the CSV and go crazy.
This is kind of the only way I use AI really, to summarize things, and extract details, then review from the raw sources to make sure the LLM isn't misleading me. I find myself using this approach instead of Googling for things since Google crippled their search the last few years, it feels like every year its harder to find things with Google. I miss 2007 Google...
dewey
Give Kagi a try, it's basically Google before it went to shit.
Yeah, there's AI, but I added it because I found it easier to find answers I'm looking for. For the data scientists, you can download the CSV and go crazy. Would love to know what discoveries or learnings can be found from it.
To download the raw scraped data you need to become a paid member but you don't really need it unless you're wanting to finesse a table of data for a particular need. The cost is mostly just an incentive to help me pay the bills for running the website.
The bunch of available CSV files contain large amounts of data which has everything from tags, genres, pricing, wishlists, estimated revenue, etc. It's what the AI is reading from.
Hope you find it useful :-)