Marginalia – A search engine that prioritizes non-commercial content
61 comments
·January 27, 2025Levitating
marginalia_nu
I'm using PPR for domain rankings, but it's a very weak factor. It mostly affects the physical ordering on the results in the index, and given that queries have a timeout they'll execute for, it makes it so that higher ranking results are discovered first. Though in general, as I mentioned, this is a weak effect.
Explore2 and the website discovery tools now built into the search engine are using cosine similarity of the incident link vectors. I wrote a blog post about the technique called "Creepy Website Similarity" available :-) https://www.marginalia.nu/log/69-creepy-website-similarity/
_emacsomancer_
where is the recent essay?
Levitating
It was for a university assignment, I am not sure if I could or should share it.
hooli_gan
If you don't share the task description or other materials provided by your university, I don't see the problem with sharing your essay. (Not a lawyer)
marginalia_nu
(Creator here) I recently moved the website from search.marginalia.nu to marginalia-search.com and gave it a bit of a visual touch up, on the basis that I felt it was working too well to be a weird subdomain outgrowth off my blog.
It's still the same search engine :-)
axiomdata316
It looks like the "random" button on the old version no longer works. Did you remove that feature?
alberth
Any reason why you don’t use a shorter domain name?
(Amazing what you built, btw)
marginalia_nu
Hard to find one that makes sense and isn't inscrutable like mxyzp.tlk.
If I ever do find one that's appropriate I might set up a CNAME record to point to marginalia-search.com
disqard
If somebody hasn't been following this, it's one person's passion project, and it doesn't aim to compete with Google/Bing/DDG.
It's more of a way to find interesting things on the "small web".
Please keep that in mind when you check this out.
(I'm not the creator)
lolinder
I don't know if something has changed since then, but in 2023 the author did quit their job to start working on the search engine full time [0]. So yes, it's a passion project, but it's also not just a side project.
eitland
> If somebody hasn't been following this, it's one person's passion project, and it doesn't aim to compete with Google/Bing/DDG.
That is correct.
But I used[1] to find that in certain niches (Linux and open source software, history) I'd often get significantly better results with Marginalia than with Google or DDG.
This seems to be related both to
- the input stupidifier that seems to be in use with mainstream search engines (if I search for anything unusual it assumes I misspelled or mis-remembered and replace my search with something I didn't search for before sending it to the backend)
- and mainstream engines going out of their way to prefer corporate media over original, authorative content
> It's more of a way to find interesting things on the "small web".
Yes, he keeps saying that, but it was[2] still not just interesting but useful for me.
[1][2]: past tense because I switched to Kagi almost 3 years ago and now I don't have to maintain all the hacks (like separate search engines for separate niche topics) that I used to have. Full disclaimer: I know certain other people frequently want to contest this, saying they get great results with Google or bad results with Kagi, to which all I have to say is I have documented history of Google consistently failing even simple queries going more than a decade back and if Google works for you, more power to you, but it hasn't work reliably for me for over a decade and I am fed up.
AndrewStephens
This is great - as a site owner I especially appreciated that you publish discovered backlinks. I discovered a few that I didn't know about thanks to your service.
hatefulmoron
I don't have anything unique to say, but I love when this project comes up on HN. The project exemplifies what a patient and loving hand can do.
nostradumbasp
same. I was thinking about doing something similar at some point but this exceeds my expectations. Seriously a great job. Let us know if theres anyway to contribute.
marginalia_nu
Well it's open source and contributions are welcome, though it's a fairly sprawling java project that's probably not the easiest to get into still (through no lack of effort making it more accessible): https://github.com/MarginaliaSearch/MarginaliaSearch
If you have spare dollars but not time, you can also contribute to the war chest: https://about.marginalia-search.com/article/supporting/
dredmorbius
I'm surprised to see that, as of now at least, there's apparently no Marginalia bang search at DDG:
<https://duckduckgo.com/bangs?q=marginalia>
I've submitted it as a suggestion.
rossdavidh
Decided to repeat my last few google searches ("celery", "Wardian case", and "ferry from Italy to Greece", if you're curious), and it came up with reasonable answers as the top one in every case! I will give it a try.
benreesman
I don’t have much to add other than to say that this is the most pleasant and useful and wholesome thing I’ve seen on the web in a long time. The technology looks extremely solid and the experience is flawless.
To the author, thank you so much for this trip to the glory days of the Internet.
fsflover
Looks similar to https://wiby.me.
miki123211
I tried a few relatively simple terms in the areas I have an interest in, e.g. ("attention is all you need paper", "sip rfc", "klatt speech synthesizer", "crafting interpreters book", "Scott Alexander Substack"), and none of them actually produced the results I expected.
Marginalia seems to work okay-ish if you want to learn about something, but not if you want to find something.
If you want to read what people think about Scott Alexander's Substack, you will get some decent results, but not if you want to find the newsletter itself.
marginalia_nu
Yeah navigational queries are a bit hit and miss, but I feel google does them very well as is so it's not a huge priority area. I've mostly been focusing my efforts on the stuff Google is bad at, finding human content about some topic.
black_puppydog
This is for English content only, right? I'm trying with some specifically French queries and it only gives me (relevant, but) English results.
dang
Much discussed under its previous domain (marginalia.nu):
Phrase matching in Marginalia Search - https://news.ycombinator.com/item?id=41696046 - Sept 2024 (24 comments)
Marginalia: 3 Years - https://news.ycombinator.com/item?id=39501061 - Feb 2024 (44 comments)
Interview with Viktor Lofgren from Marginalia Search - https://news.ycombinator.com/item?id=38470832 - Nov 2023 (21 comments)
Moving Marginalia to a new server - https://news.ycombinator.com/item?id=37800753 - Oct 2023 (39 comments)
Marginalia.nu API - https://news.ycombinator.com/item?id=35871186 - May 2023 (22 comments)
Marginalia: DIY search engine that focuses on non-commercial content - https://news.ycombinator.com/item?id=35611923 - April 2023 (193 comments)
Marginalia Search has received an NLNet grant - https://news.ycombinator.com/item?id=34945541 - Feb 2023 (17 comments)
A Theoretical Justification (2021) - https://news.ycombinator.com/item?id=32586273 - Aug 2022 (22 comments)
The Evolution of Marginalia's Crawling - https://news.ycombinator.com/item?id=32565052 - Aug 2022 (22 comments)
Marginalia Goes Open Source - https://news.ycombinator.com/item?id=31536626 - May 2022 (72 comments)
Uncertain Future for Marginalia Search - https://news.ycombinator.com/item?id=31200319 - April 2022 (37 comments)
Marginalia Search: 1 Year - https://news.ycombinator.com/item?id=30823481 - March 2022 (29 comments)
Show HN: Marginalia – Exploration Mode - https://news.ycombinator.com/item?id=30047455 - Jan 2022 (53 comments)
A search engine that favors text-heavy sites and punishes modern web design - https://news.ycombinator.com/item?id=28550764 - Sept 2021 (717 comments)
Funes-
Hell yeah. The very first one was submitted by yours truly. I love this project and hope to work on similar ones in the near future.
I recently wrote an essay about this search engine, and its ranking algorithms.
Initially Marginalia used an interesting variant of PageRank discussed in the original paper, called Personal Pagerank.[1] Currently pages are ranked with BM25.
I think Personalized PageRank is still used for a new feature of Marginalia which is ranking pages based on similarity. I think this is already integrated into the website but there used to only be this testing page: https://explore2.marginalia.nu/
In any case I have a lot of respect for the creator. Marginalia has seen a lot of growth and it's been interesting reading the blogposts.[2]
[1]: https://www.marginalia.nu/log/26-personalized-pagerank/ [2]: https://www.marginalia.nu/log/