Cohere Launches Embed 4
43 comments
·April 15, 2025simonw
mahjongmen
Hey Simon,
Elliott here from Cohere.
We benchmarked against Nomic's models on our consortium of datasets ranging from text-only, image-only, and mixed modalities. Without publishing additional benchmarks, I am confident in saying that our model is more performant.
At Cohere, for our embed models, we have not deprecated any of our embedding models since we started (I know because I've been there that long) and if we were to start doing so, I would take into account the worry of ensuring our users have a way of accessing our models.
One aspect here that isn't factored is also efficiency. Yes there might be strong open weight models but if you're punching at the 7bn+ weight class your serving requirements are vastly different from a throughput efficiency perspective (also your query-inference speed).
All food for thought. That being said, if for your use-case, Nomic Embed Vision 1.5 is better than Embed-v4.0, happy to hop on a call to discuss where the differential may be.
simonw
I don't doubt the new Cohere model is better - but one of the features I value most from an embedding model is having an escape hatch, so I can continue to calculate vectors using that same model far into the future if something happens to the hosting provider.
This matters for embedding models because I'm presumably building up a database of many millions of vectors for later similarity comparisons - so I need to know I'll be able to embed an arbitrary string in the future in order for that investment to still make sense.
Size doesn't matter much to me, I don't even need to be able to run that model, it's more about having an insurance policy for my own peace of mind.
(Even a covenant that says "in the event that Cohere goes out of business this model will be made available under license X" would address this itch for me.)
mahjongmen
I'll start off with, I'm not one of our founders and REALLY wouldn't want to be publicly held accountable for policies or commitments until I've been able to get internal alignment on things I say.
That being said, since I do manage our Search and Retrieval offering, if we were to deprecate any of our embedding models (which is generally the risk of closed-source models), I will make sure that there is an "escape hatch" for users.
Heard on what your concerns are though :)
andriym
Hey Elliot,
Andriy, co-founder at Nomic here! Congrats on Embed v4 - the more embeddings the merrier!
Embed v1.5 is a 1.5 year old model!
You should check out our latest comparable open-weights, multimodal embedding model that's designed for text, PDFs and images! I can't directly say anything about relative performance to Embed v4 as you guys didn't publish evals on the Vidore-V2 open benchmark!
mahjongmen
Hey Andriy!
We actually did internally run benchmarks against your models since they are open-weights - however, when looking at the license on the 3bn multimodal model (https://huggingface.co/nomic-ai/nomic-embed-multimodal-3b/bl...) we're not permitted to include the results for the marketing of products/services. Rest assured, we know how our model stacks up against yours :)
In any-case, we didn't publish evals on only Vidore-V2 but we did benchmark on it internally.
null
throwup238
In my experience, a non-finetunable closed source API is a complete nonstarter for a large fraction of possible use cases, especially the higher value ones like law and engineering. Most of these embedding models are trained too much on colloquial use of language on the internet that has little overlap with how terms of art are used, and without the ability to fine tune it to a specific use case, they're only really useful for generic use cases and even then they can be limited.
mahjongmen
Hey!
Since we focus on Enterprise use-cases, we made sure to include training data from domains like you mentioned above. While in very specific use-cases finetuning may be helpful, but we also do offer that as a customization service (just not available via SaaS)
serjester
Have you looked at fine tuning linear adaptors to sit on top of the embedding models? This works with any model (proprietary or open) and I think in practice this is significantly easier to implement anyways.
tiffanyh
Can someone help me understand what Cohere does.
Do they just host open source models - so you can get them up and going faster?
If so, what’s their moat?
What prevents AWS from doing the same thing?
xfalcox
No downloadable open weights ?
Looks like I'll stay on [bge-m3](https://huggingface.co/BAAI/bge-m3)
podgietaru
I built a little RSS Reader / Aggregator that uses Cohere in order to do some arbitrary classification into different topics. I found it incredibly cheap to work with, and pretty good overall at classifying even with very limited inputs.
I also built this into a version of an OpenSource read it later app.
You can check it out here: https://github.com/aws-samples/rss-aggregator-using-cohere-e...
mahjongmen
cool project - I like the read-me but it looks like your link is down: https://djwtmt1np1xe4.cloudfront.net/
lukebuehler
I just started to look into multi-modal embedding models recently, and I was surprised how few options there are.
For example, Google's model only supports 30 text tokens [1]!!
This is definitely a welcome addition.
Any pointers to similarly powerful embedding models? I'm looking specifically for text and images? I wish there'd be also one that could do audio and video, but I don't think that exists.
[1] https://cloud.google.com/vertex-ai/generative-ai/docs/embedd...
mahjongmen
Hey Luke, Our model does exceptionally well on text and images, and in particular, when text and images are mixed together. An example of where this works well would be in E-commerce where you may have a product title, description, and multiple images of the product. When combining that into a single payload using our inputs parameter we find that our model responds really well to adding more images (i.e. retrieval quality moves up as you add 1,2,3....N images). As you pointed out with Google's multimodal model, most jointly trained multimodal embedding models will suffer in the text modality. Amazon used to have a multimodal embedding model, which also took in a very small text payload. We're thinking about Audio / Video as well but nothing for Q2 at least....
null
pencildiver
I'm a huge fan of Cohere. We were highlighted in the launch post and use their V3 text embeddings in production: https://www.searchagora.com/
We're switching to the V4 to store unified embeddings of our products. From the early tests we ran, this should help with edge case relevancy (i.e. when a product's image and text mismatch, thus creating a greater need for multi-modal embeddings) and improve our search speed by ~100ms.
mahjongmen
Thank you sir! I appreciate you.
neom
Curious for those in the industry, is there room for Cohere? Apparently they are doing very well in the enterprise, however recently I found myself wondering what their long term value prop is.
jeffchuber
enterprise GTM has its own set of challenges and needs and warrants someone really focused on it
BrandiATMuhkuh
This is really great. I'll use it asap. I'm working with enterprise clients in the AEC space. Having a model that actually understands documents with messy data (drawings, floor plans, books, norms, ...) will be great.
The current situation of chunking and transforming is such a messy situation.
mahjongmen
That sounds really cool! Would love to better understand your use-case and make sure it works well for you!
Drop me an email at elliott@cohere.ai
moojacob
Seems to under-perform voyage-3-large on the same benchmark. At the same time, I'm unsure how useful benchmarks are for embeddings.
SparkyMcUnicorn
I had the same thought, although voyage is 32k vs 128k for cohere 4.
Anecdotal evidence points to benchmarks correlating with result quality for data I've dealt with. I haven't spent a lot of time comparing results between models, because we were happy with the results after trying a few and tuning some settings.
Unless my dataset lines up really well with a benchmark's dataset, creating my own benchmark is probably the only way to know which model is "best".
CharlieDigital
Are people using 32k embeddings and no longer chunking?
It feels like embedding content that large -- especially in dense texts -- will lead to loss of fidelity/signal in the output vector.
SparkyMcUnicorn
My understanding is that long context models can create embeddings that are much better at capturing the overall meaning, and are less effective (without chunking) for documents that consist of short standalone sentences.
For example, "The configuration mentioned above is critical" now "knows" what configuration is being referenced, along with which project and anything else talked about in the document.
mahjongmen
Which benchmark are you referring to?
Voyage-3-large is a text-only and much larger model than Embed-v4. If you want to unlock multimodality with Voyage-3-large, you'd have to either OCR (really bad results usually) or use a VLM to parse your data into textual descriptions (this works alright, but the cost of using a VLM will jack-up your data-pre-processing costs).
serjester
I think anyone that cares enough about embedding performance to use niche models is probably parsing their PDF's into some sort of textual format. Otherwise you need orient your all your pipelines to handle images which adds significant complexity (hybrid search, reranking, LLM calls, etc - all way harder with images).
Not to mention an image is optimistically 50 KB vs the same page represented as markdown is maybe 2–5 KB. When you're talking about pulling in potentially hundreds of pages, that's a 10–20x increase in storage, memory usage, and network overhead.
I do wish they had a more head-to-head comparison with voyage. I think they're the de facto king of proprietary embeddings and with Mongo having bought them, I'd love to migrate away once someone can match their performance.
mahjongmen
Hey Serjester Email me at elliott@cohere.ai, let's arrange time to chat. We did head to head evals with Voyage Large / Voyage Multimodal and I can share them with you if you are serious about moving your embeddings over. We tested configurations of top open-source, closed-source, multi-vector and single-dense embedding models but I can only choose so many to put on a graph and I'm not in the business of giving Voyage free advertising haha. I agree with you that there is some complexity on multi-modal reranking w.r.t to inference time speeds as well as data transfer / network latency costs. Happy to talk more :)
moojacob
I messed up, I apologize.
I looked at the NDCG and thought that was the dataset.since voyage and cohere both used NDCG. I now realize it was separate benchmarks with the same evaluation metric.
esafak
Why? How do you pick an embedding model without benchmarks?
moojacob
The comment by SparkyMcUnicorn worded it better than I did.
You’re right, there’s no other way to compare embeddings than a benchmark.
Just that what the benchmark used by Voyage and Cohere tracks might not be relevant to your own needs.
cahaya
Wondering how this compares to the Gemini (preview) embeddings as they seem to perform significantly better than OpenAI embeddings 3 large. I don't see any MTEB scores so hard to compare.
mahjongmen
Hey Cahaya,
While we benchmarked internally, on BEIR, we opted not to report our model onto MTEB for the following reason:
1) MTEB has been gamed - if you look at this model (https://huggingface.co/voyageai/voyage-3-m-exp) on the MTEB leaderboard, its an intermediate checkpoint of Voyage-3-Large where they finetuned it on datasets that represent MTEB datasets.
2) If you look at the recent datasets in MMTEB, you'll find that it has quite a lot of machine translated or "weird" datasets that are quite noisy
In general, for our Search Models, we benchmark on these public academic datasets but we definitely do not try to hillclimb in this direction as we find it has little correlation with real use-cases
moralestapia
A bit expensive but the benchmarks look quite good!
distantsounds
so which stolen properties were used to train this model?
mirekrusin
You picked the wrong post to mention it, copyright holders don't complain about embedding models.
I have huge respect for Cohere and this embedding model looks like it could be best-in-class, but I find it hard to commit to a proprietary embedding model that's only available via an API when there are such good open weight models available.
I really like the approach Nomic take: their most recent models are available via their API or as open weights for non-commercial use only (unless you buy a license). They later relicense their older models under Apache 2.0 licenses.
This gives me confidence that I can continue to use my calculated vectors in the future even if Nomic's model is no longer available because I can run the local one instead.
Nomic Embed Vision 1.5 for example started out as CC-BY-NC-4.0 but was later relicensed to Apache 2.0: https://www.nomic.ai/blog/posts/nomic-embed-vision