SWE-Bench Pro
5 comments
·September 22, 2025gpt5
stri8ed
Not a chance. Even if American companies did abide by it, there is no reason Chinese companies would. And good luck definitely proving that a model trained on it.
stephendause
This is a key question in my opinion. It's one of the things that make benchmarking the SWE capabilities of LLMs difficult. It's usually impossible to know whether the LLM has seen a problem before, and coming up with new, representative problem sets is time-consuming.
ej88
https://scale.com/leaderboard/swe_bench_pro_commercial
I definitely trust the totally private dataset more.
siliconc0w
Looks like the associated article is: https://scale.com/research/swe_bench_pro (link in the repo is wrong)
Slightly tangent question - they said that they have protected the public test set with a strong copyleft license to prevent training private models on them.
Does it actually work? Isn’t AI training so far simply ignores all license and copyright restrictions completely?