Qwen3-VL

richardlblair

As I mentioned yesterday - I recently needed to process hundreds of low quality images of invoices (for a construction project). I had a script that had used pil/opencv, pytesseract, and open ai as a fallback. It still has a staggering number of failures.

Today I tried a handful of the really poor quality invoices and Qwen spat out all the information I needed without an issue. What's crazier is it gave me the bounding boxes to improve tesseract.

deepdarkforest

The Chinese are doing what they have been doing to the manufacturing industry as well. Take the core technology and just optimize, optimize, optimize for 10x the cost/efficiency. As simple as that. Super impressive. These models might be bechmaxxed but as another comment said, i see so many that it might as well be the most impressive benchmaxxing today, if not just a genuinely SOTA open source model. They even released a closed source 1 trillion parameter model today as well that is sitting on no3(!) on lm arena. EVen their 80gb model is 17th, gpt-oss 120b is 52nd https://qwen.ai/blog?id=241398b9cd6353de490b0f82806c7848c5d2...

jychang

They still suck at explaining which model they serve is which, though.

They also released today Qwen3-VL Plus [1] today alongside Qwen3-VL 235B [2] and they don't tell us which one is better. Note that Qwen3-VL-Plus is a very different model compared to Qwen-VL-Plus.

Also, qwen-plus-2025-09-11 [3] vs qwen3-235b-a22b-instruct-2507 [4]. What's the difference? Which one is better? Who knows.

You know it's bad when OpenAI has a more clear naming scheme.

[1] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...

[2] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...

[3] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...

[4] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...

helloericsf

If you're in SF, you don't want to miss this. The Qwen team is making their first public appearance in the United States, with the VP of Qwen Lab speaking at the meetup below during SF teach week. https://partiful.com/e/P7E418jd6Ti6hA40H6Qm Rare opportunity to directly engage with the Qwen team members.

be7a

The biggest takeaway is that they claim SOTA for multi-modal stuff even ahead of proprietary models and still released it as open-weights. My first tests suggest this might actually be true, will continue testing. Wow

Computer0

I feel like most Open Source releases regardless of size claim to be similar in output quality to SOTA closed source stuff.

ACCount37

Most multi-modal input implementations suck, and a lot of them suck big time.

Doesn't seem to be far ahead of existing proprietary implementations. But it's still good that someone's willing to push that far and release the results. Getting multimodal input to work even this well is not at all easy.

sergiotapia

Thank you Qwen team for your generosity. I'm already using their thinking model to build some cool workflows that help boring tasks within my org.

https://openrouter.ai/qwen/qwen3-235b-a22b-thinking-2507

Now with this I will use it to identify and caption meal pictures and user pictures for other workflows. Very cool!

BUFU

The open source models are no longer catching up. They are leading now.

causal

That has got to be the most benchmarks I've ever seen posted with an announcement. Kudos for not just cherrypicking a favorable set.

esafak

We should stop reporting saturated benchmarks.

willahmad

China is winning the hearts of developers in this race so far. At least, they won mine already.

swyx

so.. why do you think they are trying this hard to win your heart?

michaelt

I can see how it would be in China's interest to make sure there was an LLM that produced cutting edge performance in Chinese-language conversations.

And some uses of LLMs are intensely political; think of a student using an LLM to learn about the causes of the civil war. I can understand a country wanting their own LLMs for the same reason they write their own history textbooks.

By releasing the weights they they can get free volunteer help, win hearts and minds with their open approach, weaken foreign corporations, give their citizens robust performance in their native language, and exercise narrative control - all at the same time.

willahmad

They might have dozens of reasons, but they already did what they did.

Some of the reasons could be:

- mitigation of US AI supremacy

- Commodify AI use to push forward innovation and sell platforms to run them, e.g. if iPhone wins local intelligence, it benefits China, because China is manufacturing those phones

- talent war inside China

- soften the sentiment against China in the US

- they're just awesome people

- and many more

llllm

they aren’t even trying hard, it’s just that no one else is trying

brokencode

Maybe they just want to see one of the biggest stock bubble pops of all time in the US.

binary132

Surprising this is the first time I’ve seen anyone say this out loud.

protocolture

I know I do

natrys

Models:

- https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking

- https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct

drapado

Cool! Pity they are not releasing a smaller A3B MoE model

daemonologist

Their A3B Omni paper mentions that the Omni at that size outperformed the (unreleased I guess) VL. Edit: I see now that there is no Omni-235B-A22B; disregard the following. ~~Which is interesting - I'd have expected the larger model to have more weights to "waste" on additional modalities and thus for the opposite to be true (or for the VL to outperform in both cases, or for both to benefit from knowledge transfer).~~

Relevant comparison is on page 15: https://arxiv.org/abs/2509.17765

jadbox

How does it compare to Omni?

HN

Qwen3-VL

Qwen3-VL