DeepSeek-v3.1 Release

hodgehog11

For reference, here is the terminal-bench leaderboard:

https://www.tbench.ai/leaderboard

Looks like it doesn't get close to GPT-5, Claude 4, or GLM-4.5, but still does reasonably well compared to other open weight models. Benchmarks are rarely the full story though, so time will tell how good it is in practice.

YetAnotherNick

Depends on the agent. Rank 5 and 15 are claude 4 sonnet, and this stands close to 15th.

coliveira

My personal experience is that it produces high quality results.

amrrs

Any example or prompt you use to make this statment?

seunosewa

The DeepSeek R1 in that list is the old model that's been replaced.

yorwba

Yes, and 31.3% is given in the announcement as the performance of the new v3.1, which would put it in sixteenth place.

seunosewa

It's a hybrid reasoning model. It's good with tool calls and doesn't think too much about everything, but it regularly uses outdated tool formats randomly instead of the standard JSON format. I guess the V3 training set has a lot of those.

esafak

It seems behind Qwen3 235B 2507 Reasoning (which I like) and gpt-oss-120B: https://artificialanalysis.ai/models/deepseek-v3-1-reasoning

Pricing: https://openrouter.ai/deepseek/deepseek-chat-v3.1

bigyabai

Those Qwen3 2507 models are the local creme-de-la-creme right now. If you've got any sort of GPU and ~32gb of RAM to play with, the A3B one is great for pair-programming tasks.

pdimitar

Do you happen to know if it can be run via an eGPU enclosure with f.ex. RTX 5090 inside, under Linux?

I'm considering buying a Linux workstation lately and I want it full AMD. But if I can just plug an NVIDIA card via an eGPU card for self-hosting LLMs then that would be amazing.

HN

DeepSeek-v3.1 Release

DeepSeek-v3.1 Release