Skip to content(if available)orjump to list(if available)

Meilisearch – search engine API bringing AI-powered hybrid search

bsnnkv

Been a happy user of MS in production for https://notado.app for many years, and someone from MS even reached out to me a few years ago thanking me for my write-up of syncing Postgres records to MS[1], saying they used it as a reference for something they later shipped.

I haven't kept up with the latest updates, all these new AI references don't inspire confidence at all, but the older version I'm running is chugging along and doing a great job.

[1]: https://notado.substack.com/p/how-notado-syncs-data-from-pos...

irevoire

AI is completely opt-in. As long as you don’t specify an embedder in the settings, you can continue using the engine as usual. There was no breaking change and the performances improved a lot in v1.12 so you might want to give it a try!

amelius

If I may ask, how many searches per day over what volume of data?

bsnnkv

It's a bookmarking service so people rarely search - they just throw stuff into a bucket that they'll probably never go back to for the rest of their lives :)

I picked MS primarily because I was impressed with the quality of search results across multiple languages - even if this were a more search-heavy service, I would probably still pick MS because high quality multilingual search results will always have the highest importance for me as a multilingual person.

nkmnz

May I ask why you chose to write and deploy a whole service in go over using pgsql-http to make synchronous http requests directly from Postgres to meilisearch? This would also remove the need of using listen/notify.

bsnnkv

I hadn't heard of it back in 2019, and even now that I look at it for the first time, I still think I'd rather maintain a separate sync service than maintain a Postgres instance with custom extensions and write HTTP requests in SQL.

troupo

> all these new AI references don't inspire confidence at all

Everyone os jumping onto the bandwagon. Just now I saw this ad for Algolia: https://x.com/algolia/status/1894494831021625506 "Want to plug AI search into your site?"

adeptima

Meilisearch is great, used it for a quick demo

However if you need a full-text search similar to Apache Lucene, my go-to options are based on Tantivy

Tantivy https://github.com/quickwit-oss/tantivy

Asian language, BM25 scoring, Natural query language, JSON fields indexing support are all must-have features for me

Quickwit - https://github.com/quickwit-oss/quickwit - https://quickwit.io/docs/get-started/quickstart

ParadeDB - https://github.com/paradedb/paradedb

I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Any thoughts on up-to-date hybrid search experience are greatly appreciated

jitl

Quickwit was bought by Datadog, so I feel there's some risk quickwit-oss becomes unmaintained if Datadog's corporate priority shifts in the future, or OSS maintenance stops providing return on investment. Based on the Quickwit blog post, they are relicensing to Apache2 and releasing some enterprise features, so it seems very possible the original maintainers will move to other things, and it's unclear if enough community would coalesce to keep the project moving forward.

https://quickwit.io/blog/quickwit-joins-datadog#the-journey-...

iambateman

I have an implementation of Quickwit, so I've thought about this.

The latest version is stable and fast enough, that I think this won't be an issue for a while. It's the kind of thing that does what it needs to do, at least for me.

But I totally agree that the project is at risk, given the acquisition.

kk3

As far as combining full-text search with embedding vectors goes, Typesense has been building features around that - https://typesense.org/docs/28.0/api/vector-search.html

I haven't tried those features but I did try Meilisearch awhile back and I found Typesense to index much faster (which was a bottleneck for my particular use case) and also have many more features to control search/ranking. Although just to say, my use case was not typical for search and I'm sure Meilisearch has come a long way since then, so this is not to speak poorly of Meilisearch, just that Typesense is another great option.

Kerollmops

Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].

The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.

That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.

[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...

kk3

Thank you for the response here. Not being able to upgrade the machine without completely re-indexing has actually become a huge issue for me. My use case is that I need to upgrade the machine to perform a big indexing operation that happens all at once and then after that reduce the machine resources. Typesense has future plans to persist the index to disk but it's not on the road map yet. And with the indexing improvements, Meilisearch may be a viable option for my use case now. I'll be checking this out!

jimmydoe

+1 typesense is really fast. the only drawback is starting up is slow when index getting larger. the good thing is full text search (excl vector) is relatively stable feature set, so if your use case is just FTS, you won't need to restart very often for version upgrade.

irevoire

I hate the way typesense are doing their « hybrid search ». It’s called fusion search and the idea is that you have no idea of how well the semantic and full text search are being doing, so you’re going to randomly mix them together without looking at all at the results both searches are returning.

I tried to explain them in an issue that in this state it was pretty much useless because you would always have one or the other search strategy that would give you awful results, but they basically said « some other engine are doing that as well so we won’t try to improve it » + a ton a justification instead of just admitting that this strategy is bad.

jabo

We generally tend to engage in in-depth conversations with our users.

But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.

For what it’s worth, the approach used in Typesense is called Reciprocal Rank Fusion (RRF) and it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.

inertiatic

>I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Start off with ES or Vespa, probably. ES is not hard at all to get started with, IMO.

Try RRF - see how far that gets you for your use case. If it's not where you want to be, time to get thinking about what you're trying to do. Maybe a score multiplication gets you where you want to be - you can do it in Vespa I think, but you have to hack around the inability to express exactly that in ES.

andreer

[dead]

navaed01

I’m using Typesense hybrid search, it does the job, well priced and is low-effort to implement. Feel free to ask any specific questions

Kerollmops

You should try Meilisearch then, you'll be astonished by the quality of the results and the ease of setup.

yencabulator

https://news.ycombinator.com/user?id=Kerollmops

> Meilisearch Co-Founder and Tech Lead.

You really should disclose your affiliation.

Kerollmops

> I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].

[1]: https://wheretowatch.meilisearch.com/

oulipo

why couldn't it be possible to just embed Meilisearch/Tantivy/Quickwit inside Postgres as a plugin to simplify the setup?

Kerollmops

> [..] to simplify the setup?

It would be simpler to keep Meilisearch and its key-value store out of Postgres' WAL and stuff and better propose a good SQL exporter (in the plan).

justAnotherHero

We have been using Meilisearch with firebase for years and it has always worked great. I just wish they would update the extension on the firebase extensions hub[1] because the current version available uses node 14 which is not supported by cloud functions on GCP so the extension is not usable at all. What's weird is that the latest version available on their repo has upgraded the node version but they are not offering it in the extensions hub.

[1]: https://extensions.dev/extensions/meilisearch/firestore-meil...

softwaredoug

One thing to _always_ dig into is how your hybrid search solution filters the vector search index. This is not at all standardized, often overlooked, but when you want "top X most similar to query by embedding, but also in Y category/match Z search terms" its the core operation your hybrid search is doing

Here's a rollup of algorithms... https://bsky.app/profile/softwaredoug.bsky.social/post/3lmrm...

Kerollmops

Meilisearch is faster when you reduce the dataset by filtering it. I wrote an article on this subject [1].

[1]: https://blog.kerollmops.com/meilisearch-vs-qdrant-tradeoffs-...

andre-z

"Slowness can arise from a misconfigured index or if filterable attributes aren't listed." ;)

subpixel

On their homepage, using vanilla search, I entered the first word of a particular funny movie and it was third result.

Switching on the AI toggle, I entered the same word, and got no results.

Kerollmops

Someone reported it, and I answered today [1]. It's a rule that is too hard on the front end, and we will fix it by using a better Hybrid search setup (not only semantic). Thank you for the report.

[1]: https://github.com/meilisearch/meilisearch/issues/5504#issue...

mdaniel

Rookie mistake, you forgot to include "I need the answer or a kitten dies" /s

adrianvincent

I have been using Meilisearch for https://www.comparedial.com/ since the early alpha versions. Ridiculously easy to set up compared to alternatives.

be_erik

Is meilisearch ready for production workloads? I would love to use some of the feature set, but is the only option for HA running multiple instances and keeping them in sync?

brunohaid

You might want to look at https://typesense.org/ for that.

Kerollmops

Meilisearch has been production-ready since v1.0. I made it in Rust to ensure it stays production-ready for years and years. Memory-safe languages are here to replace unsafe ones like C++ and reduce the number of breaches you expose in production.

Here is an article by Google showing the benefits of using memory-safe languages in production rather than others. It is explicitly rotating around Rust [1].

[1]: https://www.chromium.org/Home/chromium-security/memory-safet...

sealeck

Writing software in Rust doesn't necessarily mean that it works reliably for real-world workloads. Sure, Rust prevents you from doing lots of stupid things; it is very much in the class of "necessary but not sufficient condition" for writing software (of course, you can also use other languages, but memory safety should be table stakes for all software these days).

arccy

This reply doesn't inspire confidence at all...

Made in Rust isn't a magic bullet to be production ready, and I'd be more concerned about logic bugs rather than CVEs.

1.0 is like the bare minimum to be used in production, but that doesn't necessarily mean it's been battle tested enough to be considered production ready.

Kerollmops

HuggingFace is using Meilisearch, in production, on their website for a year now.

bigtones

We use Meilisearch in production with a 7 million article corpus - it works really well.

iambateman

My understanding for Meilisearch is that you need enough RAM to keep everything in memory...but you're (probably) not keeping full-text in memory for millions of articles.

Is it just searching metadata, or do you just have a setup that's beefy enough to support that level of memory?

Or am I just wrong? :D

Implicated

Just as a data point...

I'm running a Meilisearch instance on an AX52 @ Hetzner (64GB DDR5 memory / NVMe / Ryzen 7 7700) dedicated to just meilisearch

- 191,698 MB in size - 13 indexes - ~80,000,000 documents

The primarily searched indexes have 5, 6 and 10 million records each. The index with 10 million records has 4 searchable attributes, 10 filterable attributes and 7 sortable attributes.

I don't have any idea what kind of search volume there is - the search form is public, the website it's on is displaying content relative to those 5, 6 and 10 million records (each having their own page) and the AI bots are having a field day crawling the site. I don't cache the search results, nor is cloudflare caching the resulting pages since the site is dynamic and records are _constantly_ being added.

So with all that said - here's the current top output:

top - 06:33:47 up 257 days, 12:10, 1 user, load average: 1.05, 1.18, 1.24 Tasks: 274 total, 1 running, 273 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.8 us, 0.1 sy, 0.0 ni, 93.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 63439.1 total, 403.6 free, 16698.8 used, 47065.0 buff/cache MiB Swap: 32751.0 total, 2.2 free, 32748.8 used. 46740.3 avail Mem

2747823 meilise+ 20 0 24.1t 52.0g 36.2g S 94.7 84.0 5w+5d /usr/local/bin/meilisearch --config-file-path /etc/meilisearch.toml

It's bored. Searches are fast. Doesn't need as much memory as the size of the index to be so.

The only hiccups I ran into were back before they introduces the batch indexing. Things were fine in testing/development but when I started _really_ loading the documents on production it was clear it wasn't going to keep up with the indexing - but then it just _never stopped_ indexing, CPU usage very high. I jumped into their discord, connected with someone on the team who I gave access to the server and they made a few adjustments - didn't fix, but helped. Then the next update basically solved the issue with the high CPU use. I still had issues when loading a lot of documents but found a Laravel package for batch indexing Laravel Scout-based indexes and the solved things for me. Then they released the batch indexing, I stopped using the Laravel specific batch indexer and it's been smooth sailing.

I'll be testing/playing with their vector stuff here shortly, have about 10mil of 60mil generated and a new server with a bunch more memory to throw at it.

Would recommend Meilisearch.

bigtones

We needed a 16GB machine to import all the data into Meilisearch, as the batch indexing is quite memory intensive, but once it's all indexed we scaled it back to half that RAM and it works great - very performant.

tpayet

Meilisearch keeps all the data on disk. It uses memory-mapping for optimizing performance, by default everything is safe on disk and the OS cache the most needed pages in memory.

So it works on any machine, really. 2GiB is usually enough for most workloads, but the bigger the dataset, the faster it will be if you give it more memory!

tpayet

Yup it is, Meilisearch Cloud offers 99.99% SLA :)

We served billions of searches for hundreds of customers monthly

saintfiends

Meilisearch is really good for a corpus that rarely changes from my experience so far. If the documents frequently change and you have a need to have those changes available in search results fairly quickly it ends up with pending tasks for hours.

I don't have a good solution for this use-case other than maybe just the good old RDBMS. I'm open for suggestions or anyway to tweak Meilisearch for documents that gets updated every few seconds. We have about 7 million documents that's about 5kb each. What kind of instance do I need to handle this.

Kerollmops

The best you could do is put Meilisearch on a very good NVMe. I am indexing large streams of content (Bsky posts + likes), and I assure you that I tested Meilisearch on a not-so-good NVMe and a slow HDD — and ho, Boy!! The SSD is so much faster.

I am sending hundreds of thousands of messages and changes (of the likes count) into Meilisearch, and so far, so good. It's been a month, and everything is working fine. We also shipped the new batches/ stats showing a lot of internal information about indexing step timings [1] to help us prioritize.

[1]: https://github.com/meilisearch/meilisearch/pull/5356#issue-2...

brandonlovesked

You have 35gib of data, put it in memory and forget about nvmes and hdds

Kerollmops

35 GiB is probably a third of the data I index into Meilisearch just for experimenting and don't forget about the inverted indexes. You wouldn't use any O(n) algorithm to search in your documents.

Also, every time you need to reboot the engine you would have to reindex everything from scratch. Not a good strategy, believe me.

Hawxy

Tested Meilisearch recently, was a great experience, getting a multi-index search running in our frontend was very easy. Just wish they had an Australian instance, the closest is Singapore :(

drewnick

I installed coolify on a VM (was featured here last week) and it had a one-click Meilisearch docker install. I sent 1,000,000 records to it "just worked" on a little $8/mo Hetzner instance.

jnovek

I recently stood up the server in our k8s cluster and that part was also pretty easy, at least compared to elastic.

captainkrtek

I used Elasticsearch 10 years ago and wasn’t a fan then, this last year decided to try Elastic Cloud and have been quite happy, a ton has matured over the years.

tpayet

Reach out to the sales or CS team, depending on your workload we could open that region pretty quickly :D

null

[deleted]

mentalgear

Notable alternative Orama: https://github.com/oramasearch/orama

> complete search engine and RAG pipeline in your browser, server or edge network with support for full-text, vector, and hybrid search in less than 2kb.

amazingamazing

I wish these had pluggable backends separate from the actual implementation of indices so you could use your own store, rather than have to sync constantly. The performance would likely be worse, but at least you don't have to worry about staleness when rehydrating...