Skip to content(if available)orjump to list(if available)

Guy running a Google rival from his laundry room

renegat0x0

Well, I created my own domain index. I have not crawled every page inside domains, but it is not my goal.

I have 1542766 domains. Might not be much, but it is an honest work.

It is available as a github repo, so anybody that wants to start crawling has some initial data to kick off.

Links

https://github.com/rumca-js/Internet-Places-Database

raybb

What a nice project. What inspired this initially?

FYI there's a broken link in your readme:

    https://rumca-js.github.io/internet full internet search

hobs

Cant you just request the ICANN’s zone files and have the canonical list of the day?

luizfelberti

I was trying to do this in 2023! The hardest part about building a search engine is not the actual searching though, it is (like others here have pointed out), building your index and crawling the (extremely adversarial) internet, especially when you're running the thing from a single server in your own home without fancy rotating IPs.

I hope this guy succeeds and becomes another reference in the community like the marginalia dude. This makes me want to give my project another go...

mhitza

You might want to bookmark https://openwebsearch.eu/open-webindex/

While the index is currently not open source, it should be at some point. Maybe when they get out of the beta stage (?) details are yet unclear.

moduspol

Is the common crawl usable for something like this?

https://commoncrawl.org

giancarlostoro

Most likely it is, the issue then becomes being able to store and afford the storage for all the files.

wordpad

Why can't crawling be crowd sourced? It would solve ip rotation and spread the load

Poomba

That’s how residential proxies work, in a perverse way

ge96

The IP thing is interesting, I was trying to make this CSGO bot one time to scrape steam's prices and there are proxy services out there you rent, tried at least one and it was blocked by steam. So I wonder if people buy real IPs.

kccqzy

Yeah people buy residential IPs on the black market. They are essentially infected home PCs and botnets.

6510

The crawl seems hard but the difference between having something and not having it is is very obvious. Ordering the results is not. What should go on page 200 and do those results still count as having them?

ofrzeta

"The beefy CPU running this setup, a 32-core AMD EPYC 7532, underlines just how fast technology moves. At the time of its release in 2020, the processor alone would have cost more than $3,000. It can now be had on eBay for less than $200"

why do I never get deals like that when I am shopping for the homelab on eBay?

progval

You need to spend a lot of time looking through badly labeled offers, and be willing to buy from sellers with no reputation.

robrtsql

I searched "AMD EPYC 7532" and there are a ton of listings for $150-$200. Are you just regretful that it wasn't like this when you were shopping parts for your homelab?

_fat_santa

Not for a CPU but earlier this year I bought a Thinkpad workstation off eBay for $500. It's a machine from 2020 and when it was new cost $5,700.

I see this for pretty much all hardware out on eBay, just go back 5 years and watch the price fall 10x.

saalweachter

Has eBay fixed their "and then they ship you a box of rocks" problem?

I feel like there was a five year span where everyone I talked to said buying or selling electronics on eBay was a nightmare, so I'm a little curious if I need to re-evaluate my priors.

buildbot

Yes, it’s extremely rare to be stuck with a broken/wrong/missing item as a buyer on eBay. Selling is quite risky in some ways because eBay will nearly always side with a buyer. Every missing or broken thing I have purchased has been refunded or replaced. On the other hand, 3 things I have sold were claimed to not arrive. The only case where eBay decided in my favor was when the buyer had signed for the package in a literal USPS office :)

apetresc

My understanding is that eBay sides with the buyer on all disputes, to the point of ridiculousness. So you should be fine.

The real issue is being a seller and solving the "and then the customer claims I shipped them a box of rocks" problem.

ThatMedicIsASpy

Epyc7000+MB+256GB-512GB RAM (from china) usually starts at 800 euros + import tax

null

[deleted]

cheema33

I tried the search site at https://searcha.page/ by searching for something random and got the following message:

"An error has occurred building the search results."

authnopuz

hug of death? I fear the temperature will get very high in his laundry room

DannyBee

I'm sure it depends on how much laundry he is doing - his dryer is probably heated entirely by servers.

He can then exhaust the remaining server heat through the dryer vent stack.

debo_

Keep going. I love dry humor.

ArekDymalski

Untill the exhaust starts "Feeling leaky" I guess.

robofanatic

Might not even need a dryer :-)

ape4

Change it to a sauna?

BLKNSLVR

Great innovation plus cloud-skeptic self-hosting. There should be much much more of this!

evanjrowley

Search websites by Ryan Pearce:

- SearchaPage - Web Search Engine https://searcha.page/

- Seek Ninja - Stealthy Search Engine https://seek.ninja/

thm

I'm running one for news https://mozberg.com - not in my basement though.

317070

https://searcha.page/s?q=blog https://seek.ninja/s?q=blog

Both of them are erroring out right now?

kitd

Were you trying them via Chrome, by any chance? ;)

jslakro

firefox here and it's not working

ytrt54e

Crashed? The curse of Hacker News!

tolerance

The great thing about this is that with the decentralization/recentralization of the Web, it may become easier for certain people to roll their own search engines for their respective communities and crawl/index pages only according to their shared tastes.

The bad thing about this is...read above.

mooiedingen

Nothing new as it has been done before, the concept is simple enough: step 1: indexer, solr/lucene Step 2: crawler of which there are several foss, build one yourself? or you just run yacy which is a combo of the above, hook combine with an oldschool searx instance and you will be granted the title as seeker by the spirit of Fravia+ who was elder of the searchlores!!! Not only will you filter crap made by machine learning models, but thou shall find what thou seek! I refuse to call a 16 line long for loop triggering in memory loaded tokenized data where data can be anything from a scientific paper hallucinated by a chatbot to a message between two lovers anything intelligent for it is not intelligence but a blob of tokenized fcking data in memory getting triggered for an output by a derp with a 16 line long for loop!!!

iam_saurabh

I love stories like this—tech history is full of scrappy beginnings. Even if this project doesn’t succeed, it reminds us that giant companies aren’t unshakable.