Guy running a Google rival from his laundry room
93 comments
·September 10, 2025renegat0x0
raybb
What a nice project. What inspired this initially?
FYI there's a broken link in your readme:
https://rumca-js.github.io/internet full internet search
hobs
Cant you just request the ICANN’s zone files and have the canonical list of the day?
luizfelberti
I was trying to do this in 2023! The hardest part about building a search engine is not the actual searching though, it is (like others here have pointed out), building your index and crawling the (extremely adversarial) internet, especially when you're running the thing from a single server in your own home without fancy rotating IPs.
I hope this guy succeeds and becomes another reference in the community like the marginalia dude. This makes me want to give my project another go...
mhitza
You might want to bookmark https://openwebsearch.eu/open-webindex/
While the index is currently not open source, it should be at some point. Maybe when they get out of the beta stage (?) details are yet unclear.
moduspol
Is the common crawl usable for something like this?
giancarlostoro
Most likely it is, the issue then becomes being able to store and afford the storage for all the files.
wordpad
Why can't crawling be crowd sourced? It would solve ip rotation and spread the load
Poomba
That’s how residential proxies work, in a perverse way
ge96
The IP thing is interesting, I was trying to make this CSGO bot one time to scrape steam's prices and there are proxy services out there you rent, tried at least one and it was blocked by steam. So I wonder if people buy real IPs.
6510
The crawl seems hard but the difference between having something and not having it is is very obvious. Ordering the results is not. What should go on page 200 and do those results still count as having them?
ofrzeta
"The beefy CPU running this setup, a 32-core AMD EPYC 7532, underlines just how fast technology moves. At the time of its release in 2020, the processor alone would have cost more than $3,000. It can now be had on eBay for less than $200"
why do I never get deals like that when I am shopping for the homelab on eBay?
progval
You need to spend a lot of time looking through badly labeled offers, and be willing to buy from sellers with no reputation.
robrtsql
I searched "AMD EPYC 7532" and there are a ton of listings for $150-$200. Are you just regretful that it wasn't like this when you were shopping parts for your homelab?
_fat_santa
Not for a CPU but earlier this year I bought a Thinkpad workstation off eBay for $500. It's a machine from 2020 and when it was new cost $5,700.
I see this for pretty much all hardware out on eBay, just go back 5 years and watch the price fall 10x.
saalweachter
Has eBay fixed their "and then they ship you a box of rocks" problem?
I feel like there was a five year span where everyone I talked to said buying or selling electronics on eBay was a nightmare, so I'm a little curious if I need to re-evaluate my priors.
buildbot
Yes, it’s extremely rare to be stuck with a broken/wrong/missing item as a buyer on eBay. Selling is quite risky in some ways because eBay will nearly always side with a buyer. Every missing or broken thing I have purchased has been refunded or replaced. On the other hand, 3 things I have sold were claimed to not arrive. The only case where eBay decided in my favor was when the buyer had signed for the package in a literal USPS office :)
apetresc
My understanding is that eBay sides with the buyer on all disputes, to the point of ridiculousness. So you should be fine.
The real issue is being a seller and solving the "and then the customer claims I shipped them a box of rocks" problem.
ThatMedicIsASpy
Epyc7000+MB+256GB-512GB RAM (from china) usually starts at 800 euros + import tax
null
cheema33
I tried the search site at https://searcha.page/ by searching for something random and got the following message:
"An error has occurred building the search results."
authnopuz
hug of death? I fear the temperature will get very high in his laundry room
DannyBee
I'm sure it depends on how much laundry he is doing - his dryer is probably heated entirely by servers.
He can then exhaust the remaining server heat through the dryer vent stack.
debo_
Keep going. I love dry humor.
ArekDymalski
Untill the exhaust starts "Feeling leaky" I guess.
robofanatic
Might not even need a dryer :-)
ape4
Change it to a sauna?
HelloUsername
Yup; same at https://seek.ninja/s?q=beatles
BLKNSLVR
Great innovation plus cloud-skeptic self-hosting. There should be much much more of this!
evanjrowley
Search websites by Ryan Pearce:
- SearchaPage - Web Search Engine https://searcha.page/
- Seek Ninja - Stealthy Search Engine https://seek.ninja/
thm
I'm running one for news https://mozberg.com - not in my basement though.
317070
https://searcha.page/s?q=blog https://seek.ninja/s?q=blog
Both of them are erroring out right now?
ytrt54e
Crashed? The curse of Hacker News!
tolerance
The great thing about this is that with the decentralization/recentralization of the Web, it may become easier for certain people to roll their own search engines for their respective communities and crawl/index pages only according to their shared tastes.
The bad thing about this is...read above.
mooiedingen
Nothing new as it has been done before, the concept is simple enough: step 1: indexer, solr/lucene Step 2: crawler of which there are several foss, build one yourself? or you just run yacy which is a combo of the above, hook combine with an oldschool searx instance and you will be granted the title as seeker by the spirit of Fravia+ who was elder of the searchlores!!! Not only will you filter crap made by machine learning models, but thou shall find what thou seek! I refuse to call a 16 line long for loop triggering in memory loaded tokenized data where data can be anything from a scientific paper hallucinated by a chatbot to a message between two lovers anything intelligent for it is not intelligence but a blob of tokenized fcking data in memory getting triggered for an output by a derp with a 16 line long for loop!!!
iam_saurabh
I love stories like this—tech history is full of scrappy beginnings. Even if this project doesn’t succeed, it reminds us that giant companies aren’t unshakable.
Well, I created my own domain index. I have not crawled every page inside domains, but it is not my goal.
I have 1542766 domains. Might not be much, but it is an honest work.
It is available as a github repo, so anybody that wants to start crawling has some initial data to kick off.
Links
https://github.com/rumca-js/Internet-Places-Database