Skip to content(if available)orjump to list(if available)

SWE-Bench Pro

SWE-Bench Pro

5 comments

·September 22, 2025

gpt5

Slightly tangent question - they said that they have protected the public test set with a strong copyleft license to prevent training private models on them.

Does it actually work? Isn’t AI training so far simply ignores all license and copyright restrictions completely?

stri8ed

Not a chance. Even if American companies did abide by it, there is no reason Chinese companies would. And good luck definitely proving that a model trained on it.

stephendause

This is a key question in my opinion. It's one of the things that make benchmarking the SWE capabilities of LLMs difficult. It's usually impossible to know whether the LLM has seen a problem before, and coming up with new, representative problem sets is time-consuming.

ej88

https://scale.com/leaderboard/swe_bench_pro_commercial

I definitely trust the totally private dataset more.

siliconc0w

Looks like the associated article is: https://scale.com/research/swe_bench_pro (link in the repo is wrong)