GPT 4.5 level for 1% of the price
205 comments
·March 16, 2025GavCo
InkCanon
To try to avoid the inevitable long arguments about which benchmarks or sets of them are universally better: there is no such thing anymore. And even within benchmarks, we're increasingly squinting to see the difference.
threatripper
Do the benchmarks reflect real-world usability? My feeling is that the benchmark result numbers stop working above 75%.
In a real problem you may need to get 100 things right in a chain which means a 99% chance of getting each single one correct results in only 37% change of getting the correct end result. But creating a diverse test that can correctly identify 99% correct results in complex domains sounds very hard since the answers are often nuanced in details where correctness is hard to define and determine. From working in complex domains as a human, it often is not very clear if something is right or wrong or in a somewhat undefined and underexplored grey area. Yet we have to operate in those areas and then over many iterations converge on a result that works.
Not sure how such complex domains should be benchmarked and how we objectively would compare the results.
bdelmas
You know what's sad? Every Western company has been using this technique for a long time...
iandanforth
So, fairly accurate if you're Chinese?
GavCo
It doesn't really matter what nationality or ethnicity you are, but if you communicate with the model in Chinese you might get better results from this model.
Then again, if they've misrepresented the strength of the model overall, there might be some other shenanigans with their results. The fact that their results show their model is worse than GPT-4.5 on 2 Chinese language benchmarks, while it's so much stronger on some of the others, is a bit weird.
ksec
I guess this is the end of OpenAI? No more dreaming of Universal Basic Compute for AI, Multi Trillion for Fabs and Semi?
This is just like everything in China. They will find ways to drive down cost to below anyone previously imagined, subsidised or not. And even just competing among themselves with DeepSeek vs ERNIE and Open sourcing them meant there is very little to no space for most.
Both DRAM and NAND industry for Samsung / Micron may soon be gone, I thought this was going to happen sooner but it seems finally happening. GPU and CPU Designs are already in the pipelines with RISC-V, IMG and ARM-China. OLED is catching up, LCD is already taken over. Batteries we know. The only thing left is foundries.
Huawei may release its own Open Source PC OS soon. We are slowly but surely witnessing the collapse of Western Tech scene.
SirensOfTitan
> We are slowly but surely witnessing the collapse of Western Tech scene
Generally, I’ve found that almost no founders or friends I speak with have any vision for the future anymore. They care only about making money and do not care how. It’s a spectacular collapse of vision and purpose—these people have always existed but it feels incredibly pervasive now.
With that, I realize your comment is much broader than AI so below is too domain specific but…
VC has been investing in AI as-if it were a winner takes all market, but it has been obvious that isn’t the case.
Not only that, but the massive amount of cash thrown to anyone with even marginal credentials has undermined the constraints that often lead to innovation.
There is 0 reason that Safe Superintelligence should be raising for the second time at a 30 B valuation with no product.
Imustaskforhelp
can I say something.
This is one of the most finest and most accurate things that I have read in a long time.
This really could be a blog post which I encourage you to make! (I would prefer github pages but if you really want , I have a domain name on cloudflare and I am more than willing to host the static page of such blog on my own domain name for absolutely free (lets go , cloudflare!)
Its just facts. Pure facts. "" Generally, I’ve found that almost no founders or friends I speak with have any vision for the future anymore. They care only about making money and do not care how. It’s a spectacular collapse of vision and purpose—these people have always existed but it feels incredibly pervasive now. ""
Why did I read it in a monotonous way as if a student from the future understands the current scenario. I felt as if it was the same level of sadness in my heart as that when you listen to some video which has raining background and he reads the dark comedy (something like burialgoods oats shitposting but this time more serious and real!)
Currently saving this on wayback machine just for this comment. Internet needs to preserve this comment , no matter what.
Imustaskforhelp
https://web.archive.org/web/20250316121222/https://news.ycom...
May archive never go down!
tmpz22
Founder risk has been nil for a long time either because they pay themselves six figures out the gate or because the job market has been hot enough that they can market utter failure to get another job.
There’s a lot of opportunity to make low cost software that out competes big tech just because it doesn’t demand 10000x returns on every if statement.
I’d encourage Europeans to start replacing American software vendors with small teams today. You won’t become the next American oligarch but you’ll be able to clean up millions from the incompetent Americans.
dukeyukey
I think there's huge opportunity for software that replicates the 80% most used features from big tech companies/packages, but at half the price and a tiny fraction of the overall complexity. Think of Python Anywhere, offering a very simple Python-focused VPS/PaaS platform, but so so much easier to use than AWS.
jarsin
> I’ve found that almost no founders or friends I speak with have any vision for the future anymore.
I think in general there is a feeling that the time to get your bag is shrinking rapidly.
Once everything is built by these things there will be no reason to create anything as the platform owners (big tech) will be able to take everything for themselves and no longer have to share 70% with those pesky creators/small business/startups etc.
prng2021
When it comes to hardware who pioneered all those technologies? Definitely not China. They’ve stolen unimaginable amounts of IP and will continue to do so. But yes you’re right, they surprise everyone with how well they can scale the stolen innovation.
ak_111
Possible, but if you look at the graduate students and lecturers behind many of these IPs you will find they are Chinese (or Russians or Iranians).
This is the paradox in those who are championing barring Chinese students from the US to prevent them from stealing IP, they don't see that at least 50% of this IP is generated by students from China, in a way they will be handing the CCP a gift by incentivising those students to remain in China.
j_maffe
All developing markets "steal" until they've caught up with the competition. Just look at the US and how they "stole" innovations and tech from Europe.
tonyhart7
"Just look at the US and how they "stole" innovations and tech from Europe."
except that they are not
mateus1
Would you say the west “stole” the IP for paper , ice cream, tea and noodle? Weird notion.
Havoc
>When it comes to hardware who pioneered all those technologies?
Not sure that matters anymore in the new world order.
Frontier tokens are largely fungible now. The details of how they came about doesn't make them any less useful.
SalmoShalazar
These sour grapes comments are so goofy, and honestly a little racist. The millions of Chinese engineers working out in China are extremely talented, and to downplay their achievements like this and to chalk them all up as thieves is ridiculous. They have the skills, the man power, and the vision, and they’re eating the West’s lunch regardless of your feelings on how fair it is.
paganel
By the 1890s both the US and Germany had surpassed Britain when it came to industrial output, I don’t think it was any consolation for the Brits that they had invented it (almost) all.
rapsey
It is entirely irrelevant who pioneered the tech. This is why no one gives a crap about xerox anymore.
Dismissing Chinese tech is foolish. They are tech leaders in many areas and moving to new ones every day. Solar, Nuclear, Batteries, EVs, Drones, Robotics etc. They have no one to copy in those fields because they have left the rest of the world behind.
j_maffe
Expertise in HVDC is entirely contained in China it's incredible. They're definitely capable of innovations when needed.
infinghxsg
[dead]
pera
I guess the majority in here would agree that without strong market intervention OpenAI will soon implode. They urgently need:
- WIPO copyright exemption
- Anti-China protectionist measures
- Hard-line hardware export control
- Multi-billion dollar government contracts
ForTheKidz
Of course, none of this will prevent china from producing technology that's clearly as impressive, if not more so.
bakuninsbart
This is the old way of doing it, and probably the way the US is going to go with, at the detriment of its own population. - I would posit that since we are talking about digital goods, there is a better way:
Require open source / open weights of any company that used data to it doesn't own to train its models. If chinese companies do not comply, their copyright becomes void in the US, and these models are very easy to copy. Treat advances in architecture as a utility, and let the utilization of those architectures be the market for companies to compete in.
viraptor
A copyright exemption would just put them at the level of deepseek officially, but they've been working around that anyway in practice. I'm not sure that change would make any difference.
rustc
> but they've been working around that anyway in practice
Working around how?
megous
This is all protectionistic measures. Just let it implode, and let companies with better technology take over.
Why should people involved in some hyped company deserve all this "socialism for the rich" from the state?
wruza
We are slowly but surely witnessing the collapse of Western Tech scene.
I think you're witnessing it rather getting back in touch with reality than collapsing. Multi-trillion out of jsx generator was too much from the beginning. You folks just don't know what to do with too much money you have.
Imustaskforhelp
Its a collapse from fictional reality to real reality , but a collapse nonetheless.
Sometimes reality acts more weird than fiction itself. I have just now decided to call this "fictional reality"
Like yesterday when I realized that nuclear bombs weren't that far away from the creation of chemical resonance & they happened after world war I and I think , just really 5-6 years before nuclear bombs but still!
It actually gave me a lot of hope because I felt that a lot of people were focusing on AI , so I can use AI (sometimes , if I want) to focus on a passion project that I want , to maybe earn some money.
I have also thought of creating AI projects but that too for fun. I don't know two shits but I just want to know what the hype is about from a theoretical standpoint.
qwertox
Let's just hope they contribute back as much as the west has contributed to spreading knowledge and knowledge-tools to the world instead of just free-riding on it and then pretending it wasn't foundational. Linux, Wikipedia, RISC-V, ...
It's not just the Chinese who are lacking the acknowledgment of these contributions.
j_maffe
My man literally the best open source model (that happens to be excellently documented) is Chinese.
rakejake
That is because they want to undercut the US and prevent them from making money. It remains to be seen if they'll be as benevolent in making tech open-source if they are the clear winners. Frankly, I don't see why they will.
syntex
cheaper hardware usually means more adoption of the software and then even more demand for hardware
benjaminva
Correct answer, never think about the future in terms of linear extrapolations. It's a non-linear differential equation with lots of variables and expect complex feedback loops. Systems react to change.
nwellnhof
Jevons paradox: https://en.wikipedia.org/wiki/Jevons_paradox
apwell23
you are assuming that cost is stopping from ppl using these technologies.
These things are not actually useful. They hyper optimzed it for coding usecase but it still sucks balls at it.
david-gpu
When the cost of training a model goes down, it doesn't simply become cheaper to the end user. In addition to that, the provider will train even larger and more capable models.
cheschire
Hardware tech, to be specific. Plenty of room left in software services still. The east can compete on price but user experience is incomparable.
WiSaGaN
What user experience are you talking about? Chatbot? Or software in general? Cause Tiktok beats Facebook out of water. Chatbot for English communities sure, I also prefer Claude over Deepseek in terms of project support and UI. But this is because they are focusing on Chinese communities, Doubao has much better features that is used by Chinese. It's not really comparable even if all US chatbots were accessible in China. Once LLM tech slows, I am sure Chinese chatbots would beat the American ones in terms of user experience.
ptero
So far. China has been focused on becoming a world's factory for 30 years. They started moving up the food chain fairly recently.
Give it another generation and if China will not walk off the ledge with either government or societal issues (which, granted, is where they are slowly going IMO) they will own the UX and design as well. My 2c.
ninetyninenine
They’ve been moving up the food chain for 30 years. The only difference is recently that upward movement has begun to surpass the US.
Given that the US has been number one for a really long time now it doesn’t sit well with a lot of the patriotic identity people have about the US. People either can’t accept reality or the make up some excuse about unfair economic practices.
The last thing they want to admit is that China is more competent or more superior.
jampekka
A lot of consumer tech with very competitive UX is coming out of China. They are also getting very strongly into e.g. web frontend tech. I see no reason why the west would have any special advantage in this.
csomar
They are ahead of Europe (at least in car UX) but quite behind of the US. This is the last stronghold that the US has but given the recent layoffs and transition toward "AI everything", I am not sure if the US tech industry will survive this too.
Time to start learning Chinese.
vbezhenar
Chinese hardware docs are very scarce if you don't know mandarin and don't have Chinese connections.
So to compete on software West might encounter unexpected difficulties. You need good platform docs to develop good software.
Probably not a problem for big companies.
yoz-y
That feels like something AI could solve.
ninetyninenine
Software is more trivial than hardware. That’s why you see bootcamps for software but not for hardware. China can easily eclipse the US on the software front. And they have.
rixed
> We are slowly but surely witnessing the collapse of Western Tech scene
Is economy a zero sum game now? Isn't economic development supposed to be a good thing? Can the West only exist in a world of poverty and underdevelopment?
patrickhogan1
What's interesting about Baidu's AI model Ernie is that Baidu and its founder, Robin Li, have been working on AI for a long time. Robin Li has a strong background in AI research going back many years. Also notable is that some of the key early research on scaling laws—important for understanding how AI models improve as they get bigger—was done by Baidu's AI lab. This shows Baidu's significant role in the ongoing development of AI.
https://research.baidu.com/Blog/index-view?id=89
I am excited to see Baidu catchup. It feels like they have earned it. Being very early.
gdiamos
Here’s a true story I find funny about scaling laws at Baidu.
From 2016-17 I did a projection using our scaling law equation with my coauthors about how many GPUs it would take to train an LLM with a step function in capability. Joel Hestness in particular did excellent experimental work to enable this.
I came out with a projection of about a $1 Billion GPU budget.
Baidu was in the middle of downsizing the US research center (SVAIL) in favor of AI in China and I was participating in the layoff of many of my colleagues while trying to keep the lights on long enough to finish our scaling law experiments, which I personally thought would change the world.
I actually wrote a report to Robin explaining the implications of scaling laws and asking for a $1 billion budget to train a Baidu LLM in 2016 and sat on it through 2017.
But I never sent it because I thought it would never have been supported in that environment. I sometimes wonder what Robin would have thought about it and how the world may have been different if Baidu had released ChatGPT.
We may be about to find out because the AI moat filled with simple algorithms and scale seems to be much more shallow than the processor and systems moat.
I have a huge amount of respect for Dario and Ilya for carrying on scaling laws at OpenAI or it may have never seen the light of day.
If there is one problem for the AI community to solve by 2030 I think it is the moat problem.
KaoruAoiShiho
Dario, founder of Anthropic is an ex-Baidu AI employee, it was at Baidu that he learned the bitter lesson.
ninetyninenine
Do most people feel the way you do? This is one factor out of multitudes of factors representing Chinas rise as a super power that will eclipse the US in technological, economical and military might.
I’m excited but most people are patriotic and I feel things like this or even the whole situation with BYD producing better cars then Tesla is something people take as an attack to their identity. If not an attack it’s definitely represents an eroding of their patriotic identity.
Unfortunately Trump can’t slap a tariff on this. Maybe he can ban it like he was going to do with TikTok? The US really needs to get off its high horse and not associate its identity with being the sole economic super power in the world.
entropyneur
It's not about patriotism. Many people outside the US, myself included, see a problem with authoritarian superpowers per se. Although now that the US is rapidly drifting towards authoritarianism, that just seems like an inevitable future to prepare for.
ninetyninenine
Agreed. Within the US though a lot of it is definitely patriotism. But even for Europe a new super power on the block is not necessarily a good thing.
Would you prepare for such a future by banning TikTok and placing tariffs on all goods like BYD cars? I would say no. Those acts are done out of patriotism.
tw04
I think (hope) most folks care less about the “attack on patriotic identity” and are more concerned that what is essentially a dictatorship is rising in power significantly. History has shown dictatorships rarely end well for the general populace and the rest of the world.
Democracy has its flaws, but one of the features that most people prefer is that it can significantly change how it looks and operates to reflect the will of its people without violence.
somenameforme
I don't think this is really true. History mostly just shows that hegemonic powers rarely end well for other countries, and ultimately even for the people under said hegemony. The same will obviously be written of the US in the history books. We've invaded, overthrown, or tried to overthrow so many countries that you'd have a far easier time counting the countries we haven't tried to dominate in one way or the other.
And historically many of the greatest eras under Ancient Greece and Rome were under autocratic systems that advanced humanity by essentially every single metric. For that matter China has been among the most powerful countries in the world countless times - yet I think relatively few would ever know this because it's always been a quite insular nation, and never pursued hegemony in the same way as Western empires. Of course that could change but it seems extremely unlikely. Pursuing the perpetuation of global hegemony has been anything but fruitful for the US, and it should be a great lesson for the rest of the world. Those times, not just of the US - but of any global hegemon, are probably behind us.
czottmann
> more concerned that what is essentially a dictatorship is rising in power significantly
Which one?
tm-infringement
Honestly I'm more worried about the US backsliding to full authoritarianism with the usually "spicier" foreign policy. The more politically insular China from the current regime seems stable enough. Xi could have even 15 years left in the tank before succession shenanigans start. Obviously this from a LATAM perspective, I'm not in Taiwan or South Korea, I would be considerably more spooked then.
xbmcuser
That has been falsely taught to you but the real fight has never been about the type of rule. But rather on the type of economy US and the west hate China not because of how its dictatorship but rather because its economy is not private capital economy that is showing it can succeed without private citizens completely taking over the country.
As in the last 40-50 years is has been the US and western countries that have been involved in bringing down democracies that had slightest socialist tendencies and propped up dictatorships that allowed the companies to exploit the countries resources. So it is not about the type of government rather the type of economy.
krapp
>Democracy has its flaws, but one of the features that most people prefer is that it can significantly change how it looks and operates to reflect the will of its people without violence.
Internally, maybe. But China becoming a de facto supowerpower doesn't mean everyone else becomes Chinese any more than America being a superpower means everyone else becomes American. The salient point for most people is how that superpower balances the carrot of trade and the stick of violence to maintain its hegemony. To that end the US has far worse of a track record than does China.
Unless the implication is that China intends to directly colonize Western countries, which is something only the US is currently threatening to do.
znpy
I’m more concerned about the silence from congress and other similar government entities., to be honest. Are they complicit?
sunaookami
As a European I can say that I like this development because prices go down and models get better and OpenAI has no monopol anymore.
ben_w
Like 95% of the planet, I'm not American. Like 82% of the planet, I'm not Chinese.
BYD being better than Tesla isn't a matter of patriotism in most of the world. DeepSeek and Baidu can spend as long as they want playing musical chairs/rap battles with Anthropic and OpenAI, it makes no odds to me which wins.
America and China both have politics that have no reason to care for people like me, nor people like my friends, that they are for different reasons and differ in penalties for being an out-group doesn't matter when I'm a foreigner to both, when my antecedent are who the 13 Colonies rebelled against and more recent antecedent forced unwanted opium sales on China.
somenameforme
It has nothing to do with just giving up and going 'Wellp, I guess China wins.'
China and the US are obviously very different culturally in just about every way possible. This difference makes for great competition. Someone in another topic mentioned something that seemed pretty insightful to me - in that where LLM companies failed in the US was in basically becoming clones of each other, whereas DeepSeek (and now perhaps Baidu) were going in a different way, and that way turned out to be better.
US companies will inevitably copy these strategies, one way or the other, as will Chinese companies copy what ends up working well from the US (see their latest rockets looking more than a little inspired by Starship). And the true competitiveness ensures in the end that the main people who will win will not be whichever guy ended up founding an AI company first, but you and I. It's how capitalism is supposed to work - companies beat themselves down into a race to the bottom, and society reaps the rewards. It only gets really messed up when there's no "real" competition, which is an increasingly frequent state of affairs. But that definitely will not be the case here.
Expect the same thing from India in the future as well. Their economy is advancing rapidly, and soon enough we're going to have another 1.4 billion people able to fully utilize the outliers such a population entails to similarly drive things forward in their own unique way. It's a great future for the world as a whole.
flir
Given the framing of "most people" and "patriotic": China's got 1bn+ people.
greenie_beans
don't know why you're getting down voted because it's true. we should work together with the new world superpower instead of fight it.
and don't start on some dictator BS. the US does/has done as many, if not more, bad things as china.
jampekka
And open weights promised for June. China is really taking over in the ML game.
pacifika
Is the title claim correct? It is not mentioned as such in the tweet.
throaway55623
I feel like Deepseek had such good media reception, and SOTA models are so close that "GPT4x performance at y% the price" is an easy tagline that companies will be using in the coming 6 months. It's an easy goal to achieve because of diminishing returns in compute and game-able benchmarking, cherry-picking, distilling etc.
Not to say there can't be actual interesting improvements in performance/cost, but in many cases it will be more of a marketing angle.
Alifatisk
Yeah I was wondering about that too, the benchmarks look good but this seems to be more like a competitor to GPT 4o, not GPT-4.5
decide1000
ERNIE 4.5: Input and output prices start as low as $0.55 per 1M tokens and $2.2 per 1M tokens, respectively.
Comparison models: https://x.com/Baidu_Inc/status/1901094083508220035/photo/1
jamesblonde
Baidu have a long history in the scalable distributed deep learning space. PaddlePaddle (so good they named it twice) predates Ray and supports both data parallel and model-parallel training. It is still being developed.
https://github.com/PaddlePaddle/Paddle
They have pedigry.
simonw
Anyone managed to try this yet? https://yiyan.baidu.com/ appears to require a Chinese phone number.
taosx
Just tried it. Not sure exactly what model is behind the scenes but it was cringe. I provided specs for a coding task, it told me that the specs are possible but too complex so it just gave me an alternative naive way of doing it. I use LLMs as a tool so I'm trying to be very exact with my requirements and wording, this felt like it was basically negotiating the requirements with me...kinda annoyed me, lol. My suspicion is that it was trained too much on chinese forums and the data was not refined enough.
dhx
You get one free question answered without a login. You can dismiss the login prompt which appears after submitting your question and use copy/paste with keyboard shortcuts or browser debug tools to retrieve the full answer (including the part hidden with CSS rules). Either use XPath of '//div[@id="answer_text_id"]//text()' or copy the text/eventstream response for the API call to https://yiyan.baidu.com/eb/chat/conversation/v2 once the SSE session has closed.[1] Clear cookies and site data and you'll get a new session and can keep going.
It can take about 20 seconds to return all tokens so it appears likely the login prompt is there to minimise resource consumption.
[1] https://developer.mozilla.org/en-US/docs/Web/API/Server-sent...
pogue
I'm trying to figure out the same thing. They make claums about it being totally free, but everything is in Chinese and you appear to need a Chinese mobile number to register.
lopkeny12ko
"Free" does not mean "available to everyone."
andsoitis
The tweet is in English, which strongly suggests that the product is accessible in English, but then doesn’t appear to be.
That begs the question what the point is of an announcement in English?
siva7
America, is this the future you want?
borgdefenser
Surely, this is as inevitable as not being able to use Wechat as an American.
The models aren't what worry me anyway. China is going to kick our ass when it comes to AI integration into society and the economy.
Imagine the difficulties faced by America vs China in integrating AI into healthcare.
We are just too worried about winning this AI model sporting event even though the entire concept is flawed and doomed to failure. We actually have to figure out how to use these models for more than how many Rs are in strawberry. That appears to be the actual hard part.
Of course, none of this is helped by having wasted an entire generation of some of America's best minds on javascript programming for obscene profit.
infecto
An entire generation is not wasted. The bigger issue is that China has no concern wiping out whole classes of jobs to be replaced by the next iteration and America struggles with keeping those voters happy. Think about things like our lack of dock work automation in favor of keeping some labor unions happy.
ohso4
Lmarena.ai is a very accurate eval (with stylecontrol). Other benchmarks like AIME and whatever can be trained on/optimized for and therefore should not be trusted. Most ai companies do something fishy to boost their benchmark scores.
kleiba
US: Could I interest you in my lunch?
China: Thanks, already on it.
unhappy_meaning
Man the AI race is just launching at all fronts.
Logge
GTP 4.5 is not a reasoning model. Reasoning models outperform it clearly. Even OpenAIs o3-mini is smarter while being magnitudes cheaper. Those 2 should be compared in my opinion. GPT 4.5 feels like a failed experiment to see how far you can push non-thinking models.
logicchains
>GPT 4.5 feels like a failed experiment to see how far you can push non-thinking models
It's not a failed experiment, it's a very good experiment, because it produced a very useful piece of information for the world (that there's limited return to further size scaling).
Logge
Good point. But pushing it as a product with that knowledge still puts it in a weird spot for me.
Surprised nobody has pointed this out yet — this is not a GPT 4.5 level model.
The source for this claim is apparently a chart in the second tweet in the thread, which compares ERNIE-4.5 to GPT-4.5 across 15 benchmarks and shows that ERNIE-4.5 scores an average of 79.6 vs 79.14 for GPT-4.5.
The problem is that the benchmarks they included in the average are cherry-picked.
They included benchmarks on 6 Chinese language datasets (C-Eval, CMMLU, Chinese SimpleQA, CNMO2024, CMath, and CLUEWSC) along with many of the standard datasets that all of the labs report results for. On 4 of these Chinese benchmarks, ERNIE-4.5 outperforms GPT-4.5 by a big margin, which skews the whole average.
This is not how results are normally reported and (together with the name) seems like a deliberate attempt to misrepresent how strong the model is.
Bottom line, ERNIE-4.5 is substantially worse than GPT-4.5 on most of the difficult benchmarks, matches GPT-4.5 and other top models on saturated benchmarks, and is better only on (some) Chinese datasets.