Large language models are improving exponentially?

Y_Y

Lies, damn lies, statistics, confident LLM hallucinations, tech hype journalism

fendy3002

Because I always believe that Pareto Principle applies in most aspect of computing: https://en.wikipedia.org/wiki/Pareto_principle, I believe it'll also apply on this case too, and I find that it tracks with the progress of LLM/AIs.

Breaking over 80% accuracy and solving the rest of 20% problem will be the main challenge of next-gen (or next-2gen) LLM, not to mention they still have tasks to bring down the computing costs.

EDIT: that said, solving 80% of problems with 80% of accuracy with significant time saving is a solution that's worth to account, though we need to keep sceptical because the rest 20% may be gotten much worse because the 80% solved is in bad quality.

Yoric

There is a big difference between LLMs and most other tech improvements, though: with most technologies that I can think of that solve 80% of the problem, it's easy to find out whether the technology works. When you're working with an LLM, though, it's really hard to know whether the answer is correct/usable or not.

null

[deleted]

timr

For those people who won’t read anything more than the headline, this is a silly paper based on a metric that considers only “task completion time” at “a specified degree of reliability, such as 50 percent” for “human programmers”.

Then, in a truly genius stroke of AI science, the current article extrapolates this to infinity and beyond, while hand-waving away the problem of “messiness”, which clearly calls the extrapolation into question:

> At the heart of the METR work is a metric the researchers devised called “task-completion time horizon.” It’s the amount of time human programmers would take, on average, to do a task that an LLM can complete with some specified degree of reliability, such as 50 percent. A plot of this metric for some general-purpose LLMs going back several years [main illustration at top] shows clear exponential growth, with a doubling period of about seven months. The researchers also considered the “messiness” factor of the tasks, with “messy” tasks being those that more resembled ones in the “real world,” according to METR researcher Megan Kinniment. Messier tasks were more challenging for LLMs [smaller chart, above]

dang

What would be a more accurate and neutral headline?

dang

I thought there had been more threads about this but could only find the following. Others?

Predictions from the METR AI scaling graph are based on a flawed premise - https://news.ycombinator.com/item?id=43885051 - May 2025 (25 comments)

AI's Version of Moore's Law - https://news.ycombinator.com/item?id=43835146 - April 2025 (1 comment)

Forecaster reacts: METR's bombshell paper about AI acceleration - https://news.ycombinator.com/item?id=43758936 - April 2025 (74 comments)

Measuring AI Ability to Complete Long Tasks – METR - https://news.ycombinator.com/item?id=43423691 - March 2025 (1 comment)

untitled2

Classic mistake is that if 1 worker will produce 10 products a day, 10 workers will produce 100. Fact is what one software developer will do in a week, ten will do in a year. Copypasta can be fast and very inaccuare today -- it will be faster and much more inaccurate later.

nickpeterson

The Skynet Funding Bill is passed. The system goes on-line August 4th, 1997. Human decisions are removed from strategic defense. Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th

donkey_brains

I’m sure someone more knowledgeable and well-spoken than I will provide a more scathing takedown of this article soon, but even I can laugh at its breathless endorsement of some very dubious claims with no supporting evidence.

“AI might write a decent novel by 2030”? Have you read the absolute dreck they produce today? An LLM will NEVER produce a decent novel, for the same reason it will never independently create a decent game or movie: It can’t read the novel, play the game, or watch the movie, and have an emotional response to it or gauge it’s entertainment value. It has no way to judge if a work of art will have an emotional impact on its audience or dial in the art to enhance that impact or make a statement that resonates with people. Only people can do that.

All in all, this article is unscientific, filled with hand-waving “and then a miracle occurs”, and meaningless graphs that in no way indicate that LLMs will undergo the kind of step change transformation needed to reliably and independently accomplish complex tasks this decade. The study authors themselves give the game away when they use “50% success rate” as the yardstick for an LLM. You know what we call a human with a 50% success rate in the professional world? Fired.

I don’t think it was responsible of IEEE to publish this article and I expect better from the organization.

kcplate

Likely due to my nearly 40 years experience in the tech industry, and knowing where we were then compared to where we are now—I am floored by what LLMs are doing and how much better they are even in the last 2 years I have been tracking on them.

That said, I will make no definitive statements like “never” and “can’t” as it relates to AI in the next 5 years because it is already doing things that I would have thought unlikely just 5 years ago…and frankly would have thought functionally impossible back 40 years ago.

ysofunny

the LLMs will do to novels something else:

I think it'll be possible to publish a "book" as a series of prompts.

which the LLMs can expand out into the narrative story.

it's a novel you can chat with. the new novel for the post-LLMs era is more like publishing the whole author... which then you can "intervew" as an LLM (reminiscent of Harry Potter when Ron's sister find the evil journal, and she basically "chats" with the notebook)

dom96

It takes a human 167 hours to start a new company? What does that even mean?

pu_pe

We can see exponential improvement in LLM performance in all sorts of metrics. The key question is whether this improvement will be sustained in coming years.

fl0id

I call BS. That graph seems very misleading, like just getting faster for me is not improving exponentially. By improving exponentially most ppl would understand getting smarter

ecocentrik

It's already very misleading that they have used "Answering a question" as a the most trivial task to anchor their trend line. In the middle of their trend line they have humans taking 8 minutes to "find a fact on the web". Both of those tasks have a large variance in time requirements and outcomes.

satisfice

These comments are a balm to my soul. But usually when I make them I get voted down for being mean to AI.

revskill

Is there any limit ?

tbalsam

The only limit is yourself

Source: One of the most classic internet websites, zombo.com (sound on)

tbalsam

For those curious: https://en.m.wikipedia.org/wiki/Zombo.com

coderatlarge

“ If the idea of LLMs improving themselves strikes you as having a certain singularity-robocalypse quality to it, Kinniment wouldn’t disagree with you. But she does add a caveat: “You could get acceleration that is quite intense and does make things meaningfully more difficult to control without it necessarily resulting in this massively explosive growth,” she says. It’s quite possible, she adds, that various factors could slow things down in practice. “Even if it were the case that we had very, very clever AIs, this pace of progress could still end up bottlenecked on things like hardware and robotics.” “

HN

Large language models are improving exponentially?

Large language models are improving exponentially?