Skip to content(if available)orjump to list(if available)

Search My Site – open-source search engine for personal and independent websites

OuterVale

Also worth a mention is Wiby.

"The Wiby search engine is building a web of pages as it was in the earlier days of the internet."

It's main indexing requirements are:

- "Pages must be simple in design. Simple HTML, non-commerical sites are preferred."

- "Pages should not use much scripts/css for cosmetic effect. Some might squeak through."

- "Don't use ads that are intrusive (such as ads that appear overtop of content)."

- "Don't submit a page which serves primarily as a portal to other bloated websites."

https://wiby.me

lelanthran

This is most definitely not the same thing. The indexing requirements are not "This site must be an independent or personal site", it's "This site must lean towards being a plain HTTP document".

The Search My Site, from what I can tell, has the goal of surfacing personal/independent websites, while Wiby has the goal of surfacing minimally styled documents.

Two different goals.

codetrotter

> In the early days of the web, pages were made primarily by hobbyists, academics, and computer savvy people about subjects they were personally interested in. Later on, the web became saturated with commercial pages that overcrowded everything else. All the personalized websites are hidden among a pile of commercial pages.

> […]

> The Wiby search engine is building a web of pages as it was in the earlier days of the internet.

https://wiby.me/about/

Sounds to me like Wiby is more similar to Search My Site than what your comment makes it sound like.

danlitt

I wonder if this is why they said "worth a mention" rather than "the same thing".

lelanthran

I dunno; something is worth a mention if it's in the same category being discussed. Wiby most certainly isn't.

OuterVale

I'm certainly not presenting Wiby as being the same thing, merely as something that is worth a mention due to likely being of interest to anyone interested in Search My Site.

It is relevant and has vaguely aligned intent.

notachatbot123

And that is ok. It is still contextually relevant to some people and a nice project to boost.

renegat0x0

freetonik

Also https://minifeed.net/ which I maintain; soon reaching 1000 personal blogs indexed.

null

[deleted]

rumgewieselt

I love the simplecity of https://pagefind.app/

kilroy123

Me too! I'm a huge fan. I use it for all my static sites.

junto

This is what I’m using with my Astro personal blog. It’s awesome.

brontosaurusrex

Interesting, is that a more complete variation of fuse.js? (Just pluged-in fuse.js into my static jekyll blog)

7952

Pagefind uses an index that is created ahead of time and stored as numerous files on a static site. It then downloads just the part of the index needed to complete the search. This means that you can search vastly more data than could be loaded onto a browser.

wonger_

I think Pagefind is focused on the whole experience of searching pages, like with default UI widgets, easy page indexing, and handling larger sites. fuse.js seems to be a fuzzy-filter function on JS data, not handling the site integration.

ozornin

This is just what I wanted, thank you for that!

1dom

I like this, thank you! I just lost an hour of time to the exact sort of random but considered personal websites that I think made the Web great in the first place.

m-i-l

Thanks for the great feedback:-) This is what searchmysite.net is attempting to do - help make "surfing the web" a fun leisure activity once more. It is good to see more people seem to get that point now. When it was on HN nearly 3 years ago[0], many people saw a search box and thought it must be a Google replacement, but were disappointed to find it wasn't. And I guess now more than ever it is useful to have a way of finding content on the web which has been made by humans rather than AI.

[0] https://news.ycombinator.com/item?id=31395231

kreelman

Thanks for putting this together. I wonder, is Postgres a bit of a large DB if it's just a personal website search tool? I'll have to give it a go. We need more tools like this.

m-i-l

Postgres is just used for the site admin, i.e. keeping track of submissions, review status, subscriptions etc. The actual search index is in Apache Solr. In theory you could use Solr to store all the admin data, but it is generally not recommended to use a Solr style document store to master data. I guess something more lightweight like SQLite could be used, but it is intended to be deployed on servers and Postgres isn't too resource intensive.

saltysalt

I'd also suggest https://greppr.org

(Disclaimer: I built it).

unfixed

This kind of projects are really good for finding interesting blogs and obscure sites.

My go to choice is https://marginalia-search.com/

_puk

Great to see this.

Ironically, given Google's stronghold over the past decade, I strongly feel that one of the big winners in the AI space is going to be the backend search engine.

Modern web search has become so polluted, with many tricks to get to the front page of Google that a lot (most?) of the good content is lost.

Now that many of the big models are capable of calling out to the web, this bloat is now appearing in AI search. A proper data first engine, without ads, less focus on presentation, and more on structured data is what is needed.

ThinkBeat

I am a bit confused. Solr is the search engine.

An LLM model is loaded. What does the LLM model add to the solution?

m-i-l

The LLM was for an experiment in retrieval augmented generation, i.e. "a chat with your website" style interface, using Apache Solr as the vector store. Results (on a small self-hosted LLM to keep costs manageable) weren't good enough for the functionality to be fully rolled out, so the LLM has been disabled and is likely to be fully removed.

nelsonfigueroa

This is awesome. I love anything that helps me discover new personal sites/blogs.

eviks

No basics like typo-resistance?

digiatl

> No results found for digiatl.

amanaplanacanal

In my imagination, I think I prefer a search engine which searches for what I ask, rather than one which tries to guess what I really want.

It's been so long since I had one that really worked that way that I might turn out to hate it though.

1dom

Best of both worlds:

> No results found for "digiatl". Did you mean to search for "digital" instead?

m-i-l

At a big corporate, we had an Apache Solr based search which had some reasonably clever lemmatization and stats analysis and spell check config to suggest alternative searches if not many results were found for the original query, but one day someone reported an unfortunate edge case which caused a bit of a panic - if you searched "annual report” it returned "did you mean anal report?" (we were in the finance sector rather than medical sector, but there were a lot more documents in the corpus containing words like analysts, analysis, analytics etc). Anyway, the point is yes, it is great to have that sort of functionality, but it does come at a cost, and a small project like this might prefer to keep it simple.

eviks

Most of the search engines you encounter fail here (press Ctrl+F in your browser and make a typo), it's the web search that's different. Though even here it's easy to check without making relying only on imagination - how often do you add quotes for literals?