Meta Segment Anything Model 3

trevorhlynn

This was front page for a while last week

https://news.ycombinator.com/item?id=45982073

the_duke

Side question: what are the current top goto open models for image captioning and building image embeddings dbs, with somewhat reasonable hardware requirements?

Glemkloksdjf

I would suggest YOLO. Depending on your domain, you might also finetune these models. Its relativly easy as they are not big LLMs but either image classification or bounding boxes.

I would recommend bounding boxes.

smallerize

Which YOLO?

NitpickLawyer

Try any of the qwen3-vl models. They have 8, 4 and 2B models in this family.

Workaccount2

I do a test on multimodal LLMs where I show them a dog with 5 legs, and ask them to count how many legs the dog has. So far none of them can do it. They all say "4 legs".

Segment anything however was able to segment all 5 dog legs when prompted to. Which means that meta is doing something else under the hood here, and may lend itself to a very powerful future LLM.

Right now some of the biggest complaints people have with LLMs stems from their incompetence processing visual data. Maybe meta is onto something here.

jampekka

Segmentation doesn't need to count legs. I'd guess something like YOLO could segment 5 legged dogs too.

chompychop

YOLO is not a segmentation model.

jampekka

https://docs.ultralytics.com/tasks/segment/

PunchTornado

I doubt that gemini 3 cannot do it.

HN

Meta Segment Anything Model 3

Meta Segment Anything Model 3