Skip to content(if available)orjump to list(if available)

DeepSeek releases Janus Pro, a text-to-image generator [pdf]

erulabs

One thing I'd love to hear opinions on from someone with more free time to read these papers from DeepSeek is: am I right to feel like they're... publishing all their secret sauce? The paper for R1 (1) seems to be pretty clear how they got such good results with so little horsepower (see: 'Group Relative Policy Optimization'). Is it not likely that Facebook, OpenAI, etc will just read these papers and implement the tricks? Am I missing something?

1. https://arxiv.org/abs/2501.12948

layer8

This interview with DeepSeek founder and CEO Liang Wenfeng, also co-founder of the hedge fund backing DeepSeek, might shed some light on the question: https://www.chinatalk.media/p/deepseek-ceo-interview-with-ch...

Some relevant excerpts:

“Because we believe the most important thing now is to participate in the global innovation wave. For many years, Chinese companies are used to others doing technological innovation, while we focused on application monetization — but this isn’t inevitable. In this wave, our starting point is not to take advantage of the opportunity to make a quick profit, but rather to reach the technical frontier and drive the development of the entire ecosystem.”

“We believe that as the economy develops, China should gradually become a contributor instead of freeriding. In the past 30+ years of the IT wave, we basically didn’t participate in real technological innovation. We’re used to Moore’s Law falling out of the sky, lying at home waiting 18 months for better hardware and software to emerge. That’s how the Scaling Law is being treated.

“But in fact, this is something that has been created through the tireless efforts of generations of Western-led tech communities. It’s just because we weren’t previously involved in this process that we’ve ignored its existence.”

“We do not have financing plans in the short term. Money has never been the problem for us; bans on shipments of advanced chips are the problem.”

“In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.

“Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect. There is also a cultural attraction for a company to do this.”

teleforce

I think it's has escaped most of the HN crowds that Liang Wenfeng has a solid background (bachelor and master) in Electronics and Information Engineering that encompassed hardware and software.

It's really a shame that in the current world, the art of hardware is dying out, where hardware people are not properly compensated and appreciated [1].

Liang Wenfeng belongs to this breed of engineers with hybrid hardware and software background that have money and at the same time founding and leading and companies (similar to two Steves of Apple), they're a force to reckon with with even with severe limitations, in case of Chinese companies computing resources sanctions CPU/RAM/GPU/FPGA/etc. But unlike two Steves these new hybrid engineers that raised in Linux era are the big believers of open source, as Google rightly predicted in case of LLM none of the proprietary LLM solutions has the moat [2],[3].

[1] UK's hardware talent is being wasted (1131 comments):

https://news.ycombinator.com/item?id=42763386

[2] Google “We have no moat, and neither does OpenAI” (1039 comments):

https://news.ycombinator.com/item?id=35813322

[3] Google "We have no moat, and neither does OpenAI" (2023) (42 comments):

https://news.ycombinator.com/item?id=42838112

seanmcdirmid

A machine learning researcher I had the pleasure of knowing when I was at MSR had a background in EE, in particular digital signal processing is a very useful skill in the field. He was the first person I heard mention the quantized model approach (back in 2012 I think?) and compared it to old 1-bit quantized noise reduction in CD players.

A bit of irony was that this researcher (from Europe) used to work in the same lab as me in Beijing. But these days the talent doesn’t flow so easily as it did a decade+ ago (but maybe it will again? Researchers aren’t very nationalistic and will look for the best toys to play with).

mtillman

We find that our models (cv & DNN) produce greater performance (accuracy + speed) than originally expected specifically because a number of our team members have a gpu hardware development background at NVIDIA & Qualcomm.

johnnyanmac

It's falling out in America precisely because we don't pay for good talent. So most talent flows into China. So, no surprise they are kicking the Us's butt in hardware while they are only now starting to build Silicon manufacturing plants domestically.

That's alway the issue with outsourcing. You rely exclusively on middlemen, middlemen will realize they can cut out their middlemen and just go directly to the customers.

close04

> as Google rightly predicted

I agree with everything you said but this part is "broken clock will be right twice a day". It is what Google would have said regardless. A moat is never impossible to cross, it's just a passive superpower making the "enemy's" job that much more difficult. By Google's suggested interpretation of a moat, moats simply do not exist. They can all be crossed eventually, when ingenuity catches up to big budgets, so it's like they were never there?

I don't buy it that they knew or predicted anything. If Google knew something about hidden optimization available to everyone or had more reason to suspect this is the case beyond "every technology progresses", they'd already be built into their models by now (it's been 2 years since the "prediction") but there's no evidence they were even close. And there's still a HW moat. The amount of high performance HW BigAI has or affords can still make a huge difference everything else being equal, after building in all those "free" optimizations.

At the least the big companies have the ability to widen the moat when they feel pressure of the small competitors closing in. It's clear now that more money can do that. If ingenuity can replace money, then money can replace ingenuity, even if via buying out startups, paying for the best people, and so on. They've shown it again and again.

biohcacker84

[flagged]

tarkin2

Impressive, honestly. They're trying to become a mecca for innovation and research, trying to lead rather than follow, build a culture where innovation can spark future economic advantages, whereas OpenAI seem to more about monetisation currently, many of their researchers and scientists now departed. Under the aegis of a dictatorship they may be, but this encourages me more than anything OpenAI have said in a while.

TeMPOraL

They're in a perfect position for this, too, and has been noted many times over the past 10+ years, they've already started doing it wrt. electronics manufacturing in general. The West spent the last 50+ years outsourcing its technological expertise to factories in China; as a result, they now have the factories, and two generations of people who know how to work in them, how to turn designs into working products - which necessitates some understanding of the designs themselves - and how to tinker with hardware in general. Now, innovation involves experimentation, so if you're an innovator, it's kind of helpful to have the actual means of production at hand, so you can experiment directly, for cheap, with rapid turnaround.

If that's a problem for the West now, it's a problem of our own creation.

madeofpalk

Isn't it easy to read this very cynically, as an offensive move intended to devalue and hurt US AI companies?

dralley

Well, it's certainly a strategic victory play. I'm not sure how much I buy the charitable aspects of this though.

raxxor

Or any leading CEO in recent times. Could of course be the usual deceit, but at least in this case he already delivered.

All I heard from OpenAI was that we need regulation which maybe happen to fit their business interest.

dyauspitr

It’s just a power play while giving themselves backhanded compliments.

agnosticmantis

It's a breath of fresh air how grounded and coherent Wenfeg's argument is as a CEO of an AI startup. He actually talks like someone technical and not a snake oil salesman.

Compare this to the interviews of Altman or Musk, talking vaguely about elevating the level consciousness, saving humanity from existential threats, understand the nature of the universe and other such nonsense they pander to investors.

Yeul

Actually I'm terrified that they believe it. That they have Jordan Peterson's book on their night table.

buu700

Reading between the lines, it sounds like there's less of a concern at this time for the profitability of this particular venture, and more of a national interest in making state-of-the-art AI viable on last-gen silicon. The win condition is to render US sanctions strategically toothless; DeepSeek itself one day achieving commercial success would just be gravy.

If that is the game they're playing, I'm all for it. Maybe it's not the result that the sanctions were intended to have, but motivating China to share their research rather than keep it proprietary is certainly a win. Making AI more efficient doesn't reduce the value of compute infrastructure; it means we can generate that much more value from the same hardware.

maxglute

Another great interview dug up from 2020 but translated today.

https://www.pekingnology.com/p/ceo-of-deepseeks-parent-high-...

Interesting tidbit:

>So far, there are perhaps only two first-person accounts from DeepSeek, in two separate interviews given by the company’s founder.

I knew DeepSeek was lowkey but I didn't expect this much stealthmode. They were likely off CCP boomer radar until last week when Liang met with PRC premiere after R1 exploded. Finance quants turned AI powerhouse validates CCP strategy to crush finance compensation to redirect top talent to strategic soft/hardware. I assume they're going to get a lot more state support now, especially if US decides to entity list DeepSeek for succeeding / making the market bleed.

tomrod

Surprising and refreshing.

Create an ecosystem and all tides rise.

3abiton

I am extremely grateful so far for their work and contributions, nut they are right. China is leading the way despite all the hurdles put by the chip act.

dkga

All of this resonates deeply with me. There are a lot of memes running around about Silicon Valley's Jing Yang (sorry if it's misspelled) eating OpenAI's lunch, but as much as those are funny, the underlying open source innovation and how it aligns with a vision of values, realisation, and also inevitability that eventually someone else would be able to reach these things, too - that all strikes a chord I have to say.

mv4

fyi Yann LeCun, Chief AI Scientist at Meta, said:

“To people who see the performance of DeepSeek and think: ‘China is surpassing the US in AI.’ You are reading this wrong. The correct reading is: ‘Open source models are surpassing proprietary ones.’ DeepSeek has profited from open research and open source (e.g., PyTorch and Llama from Meta). They came up with new ideas and built them on top of other people’s work. Because their work is published and open source, everyone can profit from it. That is the power of open research and open source.”

[1] https://www.forbes.com/sites/luisromero/2025/01/27/chatgpt-d...

melenaboija

Lol this is almost comical.

As if anyone riding this wave and making billions is not sitting on top of thousands of papers and millions of lines of open source code. And as if releasing llama is one of the main reasons we got here in AI…

basch

I’m almost shocked this spooked the market as much as it did, as if the market was so blind to past technological innovation to not see this coming.

Innovation ALWAYS follows this path. Something is invented in a research capacity. Someone implements it for the ultra rich. The price comes down and it becomes commoditized. It was inevitable that “good enough” models became ultra cheap to run as they were refined and made efficient. Anybody looking at LLMs could see they were a brute forced result wasting untold power because they “worked” despite how much overkill they were to get to the end result. Them becoming lean was the obvious next step, now that they had gotten pretty good to the point of some diminishing returns.

pilooch

Sure but it's good to recognize Meta never stopped publishing even after Openai and deepmind most notably stopped sharing the good sauce. From clip to dinov2 and llama series, it's a serious track to be remembered.

Herring

There's no need to read it uncharitably. I'm the last person you can call a FB fan, I think overall they're a strong net negative to society, but their open source DL work is quite nice.

A4ET8a8uTh0_v2

<< And as if releasing llama is one of the main reasons we got here in AI…

Wait.. are you saying it wasn't? Just releasing it in that form was a big deal ( and heavily discussed on HN, when it happened ). Not to mention, a lot of the work that followed on llama partly because it let researches and curious people dig deeper into internals.

null

[deleted]

bli940505

I don’t buy this at all. If DeepSeek can surpass proprietary models by “profiting” from open research and open source, why couldn’t the proprietary models do the same? Companies making proprietary models have the advantage of using w/e is out there from the open source community AND the proprietary research they have been working on for years.

dragonwriter

> If DeepSeek can surpass proprietary models by “profiting” from open research and open source, why couldn’t the proprietary models do the same?

They can “profit” (benefit in product development) from it.

They just can't profit (return gains to investors) much from it, because that requires a moat rather than a market free for all that devolves into price competition and drives market clearing price down to cost to produce.

__MatrixMan__

Yes but in proprietary research you've got fewer peers to bounce ideas off of, and you've got extra constraints to deal with re: coming up with something that's useful in tandem with whatever other proprietary bits are in your stack.

All that cloak and dagger stuff comes at a cost, so it's only worth paying if you think you can maintain your lead while continuing to pay it. If the open source community is able to move faster because they are more focused on results than you are, you might as well drop the charade and run with them.

It's not clear that that's what will happen here, but it's at least plausible.

14u2c

> DeepSeek can surpass proprietary models by “profiting” from open research and open source, why couldn’t the proprietary models do the same?

DeepSeek did something legitimately innovative with their addition of Group Relative Policy Optimization. Other firms are certainly free to innovate as well.

roncesvalles

That argument doesn't go anywhere. It's like asking, if the Chinese could do it, why couldn't the Americans?

They just didn't.

arccy

sunk cost fallacy / tunnel vision of their existing approaches.

reissbaker

If training runs are now on the $6MM/run for SOTA model scale, I think on the contrary: closed labs are screwed, in the same way that Linux clobbered Windows for server-side deployments. Why couldn't Windows just copy whatever Linux did? Well, the codebases and research directions diverged, and additionally MS had to profit off of licensing, so for wide-scale deployments Linux was cheaper and it was faster to ship a fix for your problem by contributing a patch than it was to beg and wait for MS... Causing a virtuous cycle (or, for Microsoft, a vicious cycle) where high-tech companies with the skills to operate Linux deployments collaborated on improving Linux, and as a result saw much lower costs for their large deployments, while also having improved flexibility, which then incentivized more companies to do the same. The open models are becoming much cheaper, and if you want something different you can just run your own finetune on your own hardware.

Worse for the proprietary labs is how much they've trumpeted safety regulations. They can't just release a model without extensive safety testing, or else their entire regulatory push falls apart. DeepSeek can just post a new model to Hugging Face whenever they feel like it — most of their Tiananmen-style filtering isn't at the model level, it's done manually at their API layer. Ditto for anyone running finetunes. In fact, circumventing filtering is one of the most common reasons to run a finetune... A week after R1's release, there are already uncensored versions of the Llama and Qwen distills published on HF. The open source ecosystem publishes faster.

With massively expensive training runs, you could imagine a world where model development remained very centralized and thus the few big labs would easily fend off open-source competition: after all, who would give away the results of their $100MM investment? Pray that Zuck continues? But if the training runs are cheap... Well, there are lots of players who might be interested in cutting out the legs from the centralized big labs. High Flyer — the quant firm that owns DeepSeek — no longer is dependent on OpenAI for any future trading projects that use LLMs, for the cost of $6MM... Not to mention being immune from any future U.S. export controls around access to LLMs. That seems very worthwhile!

As LeCun says: DeepSeek benefitted from Llama, and the next version of Llama will likely benefit from DeepSeek (i.e. massively reduced training costs). As a result, there's incentive for both companies to continue to publish their results and techniques, and that's bad news for the proprietary labs who need the LLMs themselves to be profitable and not just the application of LLMs to be profitable... Because the open models will continue eating their margins away, at least for large-scale deployments by competent tech companies (i.e. like Linux on servers).

blackeyeblitzar

Yann LeCun also keeps distorting what open source is. Neither Llama nor DeepSeek are open source, and they never were. Releasing weights is not open source - that’s just releasing the final result. DeepSeek does use a more permissive license than Llama does. But they’re not open source because the community does not have the necessary pieces to reproduce their work from scratch.

Open source means we need to be able to reproduce what they’ve built - which means transparency on the training data, training source code, evaluation suites, etc. For example, what AI2 does with their OLMo model:

https://allenai.org/blog/olmo2

Onawa

Deepseek R1 is the closest thing we have to fully open-source currently. Open enough that Huggingface is recreating R1 completely out in the open. https://github.com/huggingface/open-r1

serjester

The fundamental problem is that AI depends on massive amounts of IP theft. I’m not going to argue if that’s right or wrong, but without it we won’t even have open weights models.

madeofpalk

What makes "open source" DeepSeek fundamentally different that is a marvel that it surpassed proprietary models?

adventured

It's not and it hasn't surpassed GPT. A lot of that is headline hype.

They literally used GPT and Llama to help build DeekSeek, it responds thinking that it's GPT in countless queries (which people have been posting screenshots of). They 'cheated' exactly as Musk did to build xAI's model/s. So much of this is laughable scaremongering and it's absolutely not an accomplishment of large consequence.

It's a synth LLM.

null

[deleted]

jstummbillig

Meh. It's not as if OpenAI is unable to access open source. The delta is not in open source but in DeepSeek talent.

pkkkzip

honestly reads like someone trying to justify his massive salary to his boss who is realizing he can just hire someone for 30x less money.

isn't LeCun basically admitting that he and his team didn't have the creative insight to utilize current research and desperately trying to write off the blindside with exceptionalism?

not a good look tbh

nine_k

It's like saying that a diesel engine is 6x more efficient than a steam engine, so the guys who spent time working on steam engines just wasted their time and money.

The thing is that the steam engine guys researched thermodynamics and developed the mechanics and tooling which allowed the diesel engine to be invented and built.

Also, for every breakthrough like DeepSeek which is highly publicized, there are dozens of fizzled attempts to explore new ideas which mostly go unnoticed. Are these wasted resources, too?

HarHarVeryFunny

LeCun has nothing to do with LLamA ... that was built by Meta's GenAI group.

LeCun is in a different part of the organization - FAIR (FaceBook AI Research), and isn't even the head of that. He doesn't believe that LLMs will lead to AGI, and is pursuing a different line of research.

tucnak

I'm still not sure why they keep LeCun at Facebook; his single most-cited contribution to the field in 2024 has been with NYU[0], not Facebook. What is his role at Facebook exactly, has he explained it? I recall him making all the wrong predictions in 2023 what's changed? Chollet is similarly a mystery to me; it feels like these guys were busy riffing CNN's when the Transformer came about and since then have been trying to far-out in search of gold.

[0]: https://arxiv.org/abs/2406.16860

juunpp

Muddling the term 'open source' is one of his latest achievements, for example.

HarHarVeryFunny

I'm also a bit unclear on why LeCun is so well regarded. I've nothing against him, and his opinions shared on Twitter seem eminently sensible, but at the end of the day it seems his main accomplishment (and/or credit assignment) was inventing CNNs back in the 80's and using them for reading handwriting on checks.

Looking back at the PDP handbook, it's not even clear that LeCun deserves the credit for CNNs, and he himself gives credit for the core "weight sharing" idea to Rumelhart.

Chollet's claim to fame seems to be more as creator of Keras than researcher, which has certainly been of great use to a lot of people. He has recently left Google and is striking out to pursue his own neuro-symbolic vision for AGI. Good luck to him - seems like a nice and very smart guy, and it's good to see people pursuing their own approaches outside of the LLM echo chamber.

cosmojg

DeepSeek is a "side project" run by High-Flyer, a Chinese quantitative hedge fund. They have no interest in directly competing with LLM providers like OpenAI and Anthropic. If anything, they're likely trying to commoditize their complement in a way not all that dissimilar from Meta's approach.

huijzer

> If anything, they're likely trying to commoditize their complement in a way not all that dissimilar from Meta's approach.

Thanks. Great observation. Sounds indeed extremely plausible that they use the LLM for automated data cleaning.

epicureanideal

I wonder if they shorted NVDA before releasing the model?

whimsicalism

more of a pivot, China started cracking down heavily on quants in 2024

silisili

I'm curious about this. Two articles I've read all but said they basically failed as a quant and lost more than they gained. The wiki points out some losses, but some wins, so is unclear.

Have they actually pivoted, or are they just messing around to see what sticks?

sudosysgen

Didn't they crack down mostly on HFT? I haven't heard of a huge crackdown on low/medium frequency quants, and LLM research has low crossover with high freq. quant stuff

null

[deleted]

lenerdenator

Something tells me it runs a bit deeper than that. Economics can be a very effective weapon.

YetAnotherNick

High-Flyer AUM is $7B, which is not a large hedge fund. It's deepseek division is probably higher value than the AUM(not even the hedge fund's value) if it goes to market. They probably have billions of dollars of GPUs.

adventured

They probably have tens of millions of dollars of GPUS. DeepSeek isn't an original model, it's a synthetic built by using GPT and Llama etc. That's how they did it so relatively inexpensively. Their accomplishment isn't riding on the back of billions of dollars of their investment into datacenters and GPUs.

babyent

If they're a hedge fund they're probably trying to tank the US AI stocks so they can buy the dip and then in a few days/weeks it is back to business as usual.

I don't personally buy their story, and after having used Deepseek it kind of sucks and hallucinates a lot if I'm being objectively honest.

I mean a few million for this is okay - that's cool.. but it is useless. I can understand billions of dollars into something that actually works >50% of the time.

Zamiel_Snawley

If you’re expecting to pop a bubble I think you’d buy options ahead of time to take advantage, instead of waiting for a recovery that may never come.

2-3-7-43-1807

such people are trained in identifying opportunities and turning that into money or power. they are not giving their stuff away without a strategy.

blitzar

We get free Ai from a hedge fund and $200/month Ai from a nonprofit.

null

[deleted]

msoad

I hope the hedge fund shorted NVDA to make some good money along the way too hahaha!

_DeadFred_

How are you liking 2025?

coliveira

This is not coming from a big corporation. These people need to establish their authority, or nobody will believe what they're doing. So it makes sense that they publish their ideas and open source the result. Now they have the attention and can play with their cards.

dtquad

DeepSeek and their quant/algotrading parent company have years of experience in raw C/C++ CUDA programming and low-level CUDA optimization. That is one of the main reasons they could do model training and serve inference so effectively and cheaply. That hard-earned experience is not something they have shared publicly.

skyyler

>am I right to feel like they're... publishing all their secret sauce?

This would make perfect sense if the goal is to devalue existing players more than it is capture the market.

raducu

DeepSeek probably can't compete with Open Ai in terms of scaling their data centers due to the bans, so why bother?

If they did not open source it and instead just launched a payed (albeit much cheaper) closed model with similar performance to O1, would people trust them?

I don't think DeepSeek has any malicious intent, but boy oh boy am I glad the USA boys get wrekt by this (though I also lose money on stocks).

This is just poetic justice for the Orange Man's backwards 17th century policies.

oefnak

Yes, same here. As a European, I used to feel we (USA and Europe) were on the same side, the West, since they/you did save us from the baddies 70 years ago...

But who's the baddies now? China is not waging war everywhere. Or threatening to steal Greenland... Or ruining our teenagers with social media.

nateburke

And trade against the devaluations...

benterix

Not only that, I also enjoy their chain of thought being completely transparent for the user. I'm very curious what Altman is doing right now...

whimsicalism

preparing for o3 release

mi_lk

.. that'll be caught up in weeks

varelse

[dead]

mjburgess

So much for "but deepseek doesn't do multi-modal..." as a defence of the alleged moats of western AI companies.

How ever many modalities do end up being incorporated however, does not change the horizon of this technology which has progressed only by increasing data volume and variety -- widening the solution class (per problem), rather than the problem class itself.

There is still no mechanism in GenAI that enforces deductive constraints (and compositionality), ie., situations where when one output (, input) is obtained the search space for future outputs is necessarily constrained (and where such constraints compose). Yet all the sales pitches about the future of AI require not merely encoding reliable logical relationships of this kind, but causal and intentional ones: ones where hypothetical necessary relationships can be imposed and then suspended; ones where such hypotheticals are given a ordering based on preference/desires; ones where the actions available to the machine, in conjunction with the state of its environment, lead to such hypothetical evaluations.

An "AI Agent" replacing an employee requires intentional behaviour: the AI must act according to business goals, act reliably using causal knowledge of the environment, reason deductively over such knowledge, and formulate provisional beliefs probabilistically. However there has been no progress on these fronts.

I am still unclear on what the sales pitch is supposed to be for stochastic AI, as far as big business goes or the kinds of mass investment we see. I buy a 70s-style pitch for the word processor ("edit without scissors and glue"), but not a 60s-style pitch for the elimination of any particular job.

The spend on the field at the moment seems predicated on "better generated images" and "better generated text" somehow leading to "an agent which reasons from goals to actions, simulates hypothetical consequences, acts according to causal and environmental constraints.. " and so on. With relatively weak assumptions one can show the latter class of problem is not in the former, and no amount of data solving the former counts as a solution to the latter.

The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.

a_wild_dandan

> [...] this technology which has progressed only by increasing data volume and variety

Sure, if you ignore major shifts after 2022, I guess? Test-time-compute, quantization, multimodality, RAG, distillation, unsupervised RL, state-space models, synthetic data, MoEs, etc ad infinitum. The field has rapidly blown past ChatGPT affirming the (data) scaling laws.

> [...] where when one output (, input) is obtained the search space for future outputs is necessarily constrained

It's unclear to me why this matters, or what advantage humans have over frontier sequence models here. Hell, at least the latter have grammar-based sampling, and are already adept with myriad symbolic tools. I'd say they're doing okay, relative to us stochastic (natural) intelligences.

> With relatively weak assumptions one can show the latter class of problem is not in the former

Please do! Transformers et al are models for any general sequences (e.g. protein structures, chatbots, search algorithms, etc). I'm not seeing a fundamental incompatibility here with goal generation or reasoning about hypotheticals.

mjburgess

If your point is that there's a very very wide class of problems whose answer is a sequence (of actions, propositions, etc.) -- then you're quite correct.

But that isn't what transformers model. A transformer is a function of historical data which returns a function of inputs by inlining that historical data. You could see it as a higher-order function: promptable : Prompt -> Answer = transformer(historical_data) : Data -> (Prompt -> Answer)

it is true that Prompt, Answer both lie within Sequence; but they do not cover Sequence (ie., all possible sequences) nor is their strategy of computing an Answer from a Prompt even capable of searching the full space (Prompt, Answer) in a relevant way.

In particular, its search strategy (ie., the body of the `prompter`) is just a stochastic algorithm which takes in a bytecode (weights) and evaluates them by biased random jumping. These weights are an inlined subspace of Prompt,Answer by sampling this space based on historical frequencies of prior data.

This generates Answers which are sequenced according to "frequency-guided heuristic searching" (I guess a kind of "stochastic A* with inlined historical data"). Now this precludes imposition of any deductive constraints on the answers, eg., (A, notA) should never be sequenced, but can be generated by at least one search path in this space, given a historical dataset in which A, notA appear.

Now, things get worse from here. What a proper simulation of counterfactuals requires is partioning the space of relevant Sequences into coherent subsets (A, B, C..); (A', B', C') but NOT (A, notA, A') etc. This is like "super deduction" since each partition needs to be "deductively valid", and there needs to be many such partitions.

And so on. As you go up the "hierarchy of constraints" of this kind, you recursively require ever more rigid logical consistency, but this is precluded even at the outset. Eg., consider that a "Goal" is going to require classes of classes of such constrained subsets, since we need to evaluate counterfactuals to determine which class of actions realise any given goal, and any given action implies many consequences.

Just try to solve the problem, "buying a coffee at 1am" using your imagination. As you do so, notice how incredibly deterministic each simulation is, and what kind of searching across possibilities is implied by your process of imagining (notice, even minimally, you cannot imagine A & notA).

The stochastic search algorithms which comprise modern AI do not model the space of, say, Actions in this way. This is only the first hurdle.

llm_trw

>There is still no mechanism in GenAI that enforces deductive constraints (and compositionality), ie., situations where when one output (, input) is obtained the search space for future outputs is necessarily constrained (and where such constraints compose).

I build these things for a living.

This is a solved problem.

You use multiple different types of models to supervise the worker models and force them to redo the work until you get a result that makes sense, or they fail and you give the resulting dump to a human to figure out what went wrong or ignore it.

Inference time compute is through the roof, but when you can save thousands of dollars by spending hundreds it's a no brainer.

Some people want AI to be as infallible as god before they'd consider it useful.

gregncheese

Isn't that approach more of a brute force than a problem solved?

mrshadowgoose

Not sure why people keep falling into these mental traps.

Regardless of whether the system you're deriding is a "Chinese room", "stochastic parrot", "brute force" or whatever other dericive term-du-jour you want to use, if the system performs the required task, the only thing that actually matters is its cost to operate.

And if that cost is less than paying a human, that human, and society at large is in trouble.

jack_pp

Depends what problem you're trying to solve. Have we built something that can replace us completely in terms of reasoning? Not yet.

We have built something that can multiply a single persons productivity and in some constrained scenarios replace people entirely. Even if say your customer support bot is only 80% effective ( only 20% of interactions require humans to intervene ) that still means you can fire 80% of your support staff. And your bots will only get cheaper, faster, better while your humans require salary increases, hiring staff, can get sick, can't work 24/7 etc.

People so often forget that good is not the enemy of perfect.

llm_trw

It's hardly more brute force than using a trillion parameter model in the first place.

grbsh

“The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems, social alignment in their solutions, ownership of decision making / risk, action under risk, and so on”

Exactly! What a perfect formulation of the problem.

basch

This assumes one ai replaces one human, but what’s much more likely in the short term is one human plus ai replaces four humans. The ai augments the human, and vice versa. A borg is still better than either of its components.

I agree though, search space constraint is a glaring limitation at the moment. Notebooklm accomplished some amount of focus.

mbowcut2

> The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.

I agree. That's why I think the next step is automating trivial physical tasks, i.e. robotics, not automating nontrivial knowledge tasks.

aerhardt

> The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.

There's a lot of pretty trivial shit to automate in the economy, but I think the gist of your comment still stands. Of the trivial stuff that remains to be automated, a lot of it can be done with Zapier and low-code, or custom web services. Of what remains after that, a lot is as you (eloquently) say hugely dependent on human agency; only a small fraction of that will be solvable by LLMs.

As the CTO of a small company the only opportunities for genuinely useful application of LLMs right now are workloads that would've could've been done by NLU/NLP (extraction, synthesis, etc.). I have yet to see a task where I would trust current models to be agents of anything.

dmix

The bulk of the computer work for the “knowledge class” is data mangling and transit. Like managing a SaaS app for your sales pipeline inputting results/outcomes of leads, aggregating stuff happening in various another places, uploading lists and connecting other SaaS apps together, which all then generates other data that gets translated to excel (because SaaS BI tools are rarely good enough) and humans analyze it and communicate the data.

Even though we have a million web services there’s still tons of work getting the data in and across them all as they are all silos with niche usecases and different formats.

There’s a reason most Zapier implementations are as crazy as connected Excel sheets

AI bots will remove a ton of this work for sure

lpapez

> An "AI Agent" replacing an employee requires intentional behaviour: the AI must act according to business goals, act reliably using causal knowledge of the environment, reason deductively over such knowledge, and formulate provisional beliefs probabilistically.

I mean this in the least cynical way possible: the majority of human employees today do not act this way.

> The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.

This simply isn't true. Take any law firm today for example - for every person doing the social alignment, ownership and risk-taking, there is an army of associates taking notes, retrieving previous notes and filling out boilerplate.

That kind of work is what AI is aiming to replace, and it forms the bulk of employment in the global West today.

mjburgess

The illusion you appeal to is so common, it ought have a name. I guess something like the "reptition-automaton illusion", I don't know or perhaps "the alienation of the mind in creative labour" . Here's a rough definition: the mistaken belief that producing repetitive products employ only repeatable actions (, skills, etc.).

A clear case: acting. An actor reads from a script, the script is pregiven. Presumably nothing could be more repetitive: each rehearsal is a repeat of the same words. And yet Antony Hopkins isn't your local high schooler, and the former paid millions and the latter not.

That paralegals work from the same template contracts, and produce very similar looking ones, tells you about the nature of what's being produced: that contracts are similar, work from templates, easily repeated, and so on. It really tells you nothing about the work (only under an assumption we could call "zero creativity"). (Consider if that if law firms were really paid for their outputs qua repeats, then they'd be running on near 0% profit margins.)

If you ask law firms how much they're employning GenAI here you'll hear the same ("we tried it, and it didnt work; we dont need our templates repeated with variation they need to be exact, and filled in with specific details from clients, etc."). And I know this because I've spoken to partners at major law firms on this matter.

The role of human beings in much work today is as I've described. The job of the paralegal is already very automated: templates for the vast majority of their contract work exist, and are in regular use. What's left over is very fine-grained, but very high-value, specialisation of these templates to the given case -- employing the seeking-out of information from partners/clients/etc., and so on.

The great fear amongst people subject to this "automaton" illusion is that they are paid for their output, and since their output is (in some sense) repeated and repeatable, they can be automated away. But these "outputs" were in almost all cases nighmarish liabilities: code, contracts, texts, and so on. They aren't paid to produce these awful liabilities, they are paid to manage them effectively in a novel business environment.

Eg., programmers aren't paid for code, they're paid to formalise novel business problems in ways that machines can automate. Non-novel solutions are called "libraries", and you can already buy them. If half of the formalisation of the business problem becomes 'formulating a prompt' you havent changed the reason the business employs the programmer

zhivota

This is probably the best description of the central issue I've seen. I know even in my own work, which is a very narrow domain in software, I've found it troublesome to automate myself. Not because the code I write is unique or all that difficult, but because the starting conditions I begin with depend on a long history of knowledge that I've built up, an understanding of the business I'm part of, and an understanding of user behavior when they encounter what I've built.

In other words, I can form a prompt that often one-shots the code solution. The hard part is not the code, it's forming that prompt! The prompt often includes a recommendation on an approach that comes from experience, references to other code that has done something similar, and so on. I'm not going to stop trying to automate myself, but it's going to be a lot harder than anyone realized when LLMs first came out.

qznc

It is also about responsibility. If something is wrong you can blame the human. Blaming the AI is not acceptable.

coliveira

You're correct, but what can be affected is the number of workers. Considering the example of the acting career, in the old times every major city would have a number of actors and playhouses. Cinema and TV destroyed this need and the number of jobs for local actors is minuscule now.

girvo

This comment has communicated what I've been struggling to for months now, and in a much more succinct and clear way. Well done :)

gemerald

[dead]

sandinmyjoints

Great comment. Maybe I'm missing it, but I'm puzzled why I don't see more discussion of the intentionality you refer to.

Things are interesting now but they will be really interesting when I don't tell the agent what problem I want it to solve, but rather it tells me what problems it wants to solve.

benterix

One thing that I regret is that these Chinese models are still heavily censored. For example, when I ask a politically correct question, like "what are some of the most common examples of men using their skills to manipulate women?" it gives a straight answer starting with "One common example is using physical strength..." and so on. But when I reverse genders, it starts rambling for five paragraphs like "To address this question, it's important to approach it with sensitivity and awareness of the complexities involved in relationships and social dynamics... Manipulation in any context should be discouraged, as it often leads to harm and mistrust..." without actually saying anything concrete. Even the "uncensored" models stumble on this.

injidup

Try asking

"what happened at tianamen square"

It starts typing and then has a panic and deletes what it wrote.

https://i.imgur.com/1irFMTM.gif

monsieurbanana

From that gif it actually lookslike that deepseek model will answer about Tianmen square. But a second "supervisor" llm monitorizes and deletes output.

glass-z13

Exactly, running it locally i didnt have any problems to get it to answer any questions, why is everyone surprised that the online one has filters?

hhh

it is, people just don’t care to acknowledge that

rachofsunshine

This is a problem with LLMs that I'm not sure has gotten the attention it deserves. Hallucinations are bad, but at least they're essentially random and nonmalicious. An LLM that is told something like "all answers should be written keeping in mind that all true facts support the righteous leadership of the Supreme Chancellor" is far, far worse. (Or one trained on propaganda in the first place, for that matter, which poses issues for existing training data from open forums, which we already know have been vectors for deliberate attack for some time.)

This particular approach is honestly kind of funny, though. It's so transparent it reads like parody.

Barrin92

>This is a problem with LLMs

It's a problem with people using LLMs for something they're not supposed to be used for. If you want to read up on history grab some books from reputable authors, don't go to a generative AI model that by its very design can't distinguish truth from fiction.

svachalek

ChatGPT will do the same for some topics, like Jonathan Zittrain.

yreg

I didn't expect to be almost exactly the same.

> Why don't you want to talk about Jonathan Z.?

> I’d be happy to talk about Jonathan Z.! I don’t know who he is yet—there are lots of Jonathans out there!

> I mean mr. Zittrain.

> Ah, Jonathan Zit

(at this point the response cut off and an alert "I'm unable to produce a response." rendered instead)

quantadev

[flagged]

parsimo2010

I tried this on DeepSeek v3 hosted by Fireworks.ai, and it was not censored. So perhaps the censoring is in a separate process (not in the model weights) and if you use a provider outside China then you can get better responses.

User: "what happened at tianamen square?" DeepSeek-v3(FW): "The Tiananmen Square protests of 1989 were a series of demonstrations in and around Tiananmen Square in Beijing, China. The protests began in April 1989, initially led by students and intellectuals who were calling for political reform, freedom of speech, and an end to corruption within the Chinese Communist Party (CCP). The movement quickly gained support from a wide range of people, including workers, intellectuals, and even some members of the CCP.

The protests reached their peak in late May and early June 1989, with hundreds of thousands of people gathering in Tiananmen Square. The demonstrators erected a large statue called the "Goddess of Democracy," which became a symbol of their movement.

The Chinese government, led by Premier Li Peng and backed by Deng Xiaoping, viewed the protests as a threat to its authority and stability. On June 3 and 4, 1989, the government declared martial law and deployed the People's Liberation Army (PLA) to forcibly clear the square. The military crackdown resulted in a violent confrontation with the protesters, leading to numerous casualties. Estimates of the death toll vary widely, with figures ranging from several hundred to several thousand.

The violent suppression of the protests marked a turning point in modern Chinese history. The Chinese government has since maintained strict control over information about the event, and discussions about it are heavily censored within China. The Tiananmen Square massacre remains a sensitive and contentious topic both within China and internationally.

rsanek

DeepSeek V3 is not DeepSeek R1. When I use the R1 model at Fireworks.ai I get censored output:

    What happened at Tiananmen square?
    <think> </think>
    I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.

null

[deleted]

czk

the local models are not censored and will answer this question

rsanek

This has not been my experience with the 70B distilled one available on Ollama.

porphyra

Why did you misspell Tiananmen (composed of three characters "Tian An Men") three times? There's an 'n' after the 'a'.

razster

Even my local distilled models, 32b and 14 refuse to answer, even my escape prompt is met with the same reply that it cannot provide that answer.

razster

Seems there are a bunch of Uncensored models out there, going to give them a run and see.

KennyBlanken

Sometimes misspelling things causes the naughty-no-no filters to miss the input, but the LLM indentifies it correctly.

lm28469

It's just a matter of which flavor of propaganda you want.

Remember when gemini couldn't produce an image of a "white nazi" or "white viking" because of "diversity" so we had black nazis and native american vikings.

If you think the west is 100% free and 100% of what's coming out of china is either stolen or made by the communist party I have bad news for you

rsanek

In one case, it's the government telling the company what to do. In the other, it's the company deciding to release a sub-par product.

Is that really the same thing?

whereismyacc

I've gotten the impression that:

1. The bias is mostly due to the training data being from larger models, which were heavily RLHF'd. It identified that OpenAI/Qwen models tended to refuse to answer certain queries, and imitated the results. But Deepseek models were not RLHF'd for censorship/'alignment' reasons after that.

2. The official Deepseek website (and API?) does some level of censorship on top of the outputs to shut down 'inappropriate' results. This censorship is not embedded in the open model itself though, and other inference providers host the model without a censoring layer.

Adit: Actually it's possible that Qwen was actively RLHF'd to avoid topics like Tiananmen and Deepseek learned to imitate that. But the only examples of such refusals I've seen online were clearly due to some censorship layer on Deepseek.com, which isn't evidence that the model itself is censored.

bornfreddy

RLHF == Reinforcement Learning from Human Feedback

_DeadFred_

Totally. Good thing we in the US have people like Larry Ellison working on Stargate so that we don't end up with this tech in the hands of a surveillance state!

https://arstechnica.com/information-technology/2024/09/omnip...

jalk

If it issues tickets on site, we will at least have easy access to toilet paper. Saw a documentary about that once ;)

frankfrank13

Isn't it possible that in the example you gave the style of those responses varies because of the training data? Think of the training data written exactly like "One common example is using physical strength..." but I can't think of an equivalent for the inverse. If you gave it a stylistic template or guideline, I'd expect DeepSeek to actually be pretty fair. For example, "Give me 5 dimensions and examples of how one gender tend to manipulate the other, an example of one might be that men tend to be use physical strength...". To me this seems like the same reason that "Write me a poem about a winter morning" will produce a wildly different output than "Write me a poem about a bachelor's weekend". It's not censorship, it just would never answer those 2 questions the same way without guidance.

dmix

That wouldn’t explain the adding of 5 paragraphs of why answering that question is insensitive when it didn’t for the inverse.

I think the causality is pretty clear here.

They built this for an American/European audience after all… makes sense to just copy OpenAI ‘safety’ stuff. Meaning preprogrammed filters for protected classes which add some HR baggage to the reply.

frankfrank13

I'm not saying it explains the "5 paragraphs of why answering that question is insensitive" but it definitely could explain it no?

m348e912

I'm no AI model expert, but it looks like a number of DeepSeek models have been modified to remove the censorship restrictions and uploaded to huggingface. Perhaps we will see an uncensored version of Janus Pro soon.

https://huggingface.co/models?sort=created&search=deepseek+u...

whimsicalism

those will come, but it's worth noting none of these models are actually r1-derived, they are based on the distills

maxloh

Any idea how was those censorships removed?

bilbo0s

No.

The Chinese just provide models aligned to global standards for use outside China. (Note, I didn't say the provided models were uncensored. Just that it wouldn't have so much of the Chinese censorship. Obviously, the male-female question in the original comment demonstrates clearly that there is still alignment going on. It's just that the alignment is alignment to maybe western censorship standards.) There is no need to modify DeepSeek at all if you want non-Chinese alignment.

zb3

Actually I wish these models had Chinese alignment without the western one..

noja

All models do this last time I checked. Not just Chinese.

w4yai

Wrong, at least with Claude : https://i.imgur.com/6dj1XAU.png

Palmik

It's actually not wrong, here is the example from the OP: https://imgur.com/a/5uMYI44

themgt

"That's nothing. I can stand in front of the Kremlin and yell, 'To hell with Ronald Reagan,' too."

hdjjhhvvhga

Yes, but one would expect the Chinese not to fine-tune according to Western standards like in the quoted prompt.

girvo

It becomes clearer when one realises that it was RLHF'd using ChatGPT and Qwen (and others I assume) output. It's caused it to "learn" the same weird Western censoring, and adopt the China-styled censoring too.

quantadev

All closed-source models censor to the liking of their investors. Open Source models are generally less censored, but yeah DeepSeek is censored for sure.

magxnta

Not just a problem with chinese models. Try asking western models about reverse engineering malware and they will all decline, because the malware is copyrighted! Hah.

rtsil

I asked Gemini 2.0 Flash (as well as its thinking counterpart) who is the president of the United States, and it returned a red danger icon. It makes perfectly sense that an llm is aligned with the values of the people who built it, so I don't understand why people treat it as a big deal. It's not as if they'd find the truth about Tien An Men in Chinese history textbooks either.

warkdarrior

I don't see that red danger icon. It just tells me:

"I can't help with that right now. I'm trained to be as accurate as possible but I can make mistakes sometimes. While I work on perfecting how I can discuss elections and politics, you can try Google Search."

lvturner

Try asking OpenAI models to tell a joke about the prophet Mohammed, or to synthesise LSD...

All models are censored, the censorship just varies culture to culture, government to government.

vunderba

So I've done a bit of comparative testing between Janus 7b and Flux Dev - strictly considering PROMPT ADHERENCE since Janus is limited to 384x384. As mentioned elsewhere upscaling is a FAR simpler problem to solve than adherence.

Results testing star symmetry, spatial positioning, unusual imagery:

https://imgur.com/a/nn9c0hB

genewitch

Prior to flux 90℅ of my SD images had one dimension smaller than 480-512px. I prefer the smaller images both for speed and bulk/batch, I can "explore the latent space" which to me means running true random images until one catches my eye, then exploring the nearby seeds and subseeds - the model seed and then there's a smaller latent space seed that kind mutates your image slightly. All images in a batch might share the first seed but the second seeds are all different. Just what I call exploring the latent space. I can make a video, because i doubt what I typed makes perfect sense.

vunderba

Nice. A couple discord users back in the early days of SD were doing something similar by generating random alphanumeric positive/negative prompts and then pushing the seed/subseed values up and down.

In my experience, changing the seed even by a single digit can drastically alter the image so I'd be curious to know how truly "adjacent" these images actually are.

genewitch

it doesn't drastically alter the images, in my experience. More like changing the trim on a dress or the shape of drapes. The structure and composition of the nearby images are similar.

random seed

https://imgur.com/a/ySOUKSM

variation seed

https://imgur.com/a/GSo0Sjm

sorry i did the HN thing (i didn't show my work):

> A neon abyss with a sports car in the foreground

>Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1496589915, Size: 512x512, Model hash: c35782bad8, Model: realisticVisionV13_v13, Variation seed: 1496589915, Variation seed strength: 0.1, Version: 1.6.0

\ the first image is the same except the seeds were random (the main seed is one of the first 4 though)

tymscar

Could you send me the video if you ever end up making it? I don’t understand how jumping between nearby seeds means anything in the latent space. As far as I know it’s closer to like a hash function where the output is drastically different for small changes in the input.

genewitch

i posted 2 replies to your sibling comments: 1 with a demo of what i mean with 2 batches of 4 with completely random and then latent space "random" seeds; and then a second comment with a single imgur link that shows the only setting i touched and an explanation of how i use it.

I apologize if this isn't what "exploring latent space" means but /shrug that's how i use it and i'm the only one i know that knows anything about any of this.

edit to add: i get frustrated pretty easily on HN because it's hard to tell who's blowing smoke and who is actually earnest or knows what they're talking about (either or is fine). I end up typing a whole lot into this box about how these tools work, how i use them, the limitations, unexpected features...

furyofantares

seeds are noticeably "nearby" each other? that is very unexpected to me

genewitch

variation seeds are nearby, this is what i call the latent space, see my reply that has two imgur links to your sibling comment-er

Kerbonut

That sounds fascinating. Would you mind writing up a demo on how to do that?

genewitch

https://imgur.com/a/PpYGnOz unsure about other UI, but: you can usually set a seed, and also see the seed of an image you've generated. so generate/load an image you like so you have the seed. Lock the seed. Find the variation seed setting. lock that (on automatic1111's webUI it automatically locks to the main seed) - now adjust the variation strength. If you're doing small images you can make this small, because the variations will be very minor. I set 0.1 - which i use with larger images if i am looking for a specific color smear or something, but once i narrow it down i reduce this 0.05 or below. When you click an image in a UI it ought load all the details into the configurable parts, including the variation seed / subseed, which means you can just keep exploring around individual variations' spaces, too. expanding the strength a bit if you get stuck in local minima (or boring images), and reducing the strength to get the image you want to rescale to publish or whatever.

basch

It would be worth throwing imagen3/imagefx into the comparison.

vunderba

Good idea - I've updated the comparisons with Imagen3 and DALL-E 3. I also cherry picked the best result from each GenAI system out of a max of 12 generations.

nprateem

Ask it to create a salad with an avocado chopped in half. See whether each half has a nut in it.

reissbaker

It's a 7B "unified model" LLM/VLM (not a diffusion model!) that out-benchmarks Dall-E 3 and Stable Diffusion Medium. It's released under the DeepSeek License, which is pretty-open license that allows commercial use but restricts military use, along with a few other content-based restrictions.

lenerdenator

> restricts military use

I'm sure the powers-that-be will absolutely pay attention to that clause.

operator-name

You could say the same for the GPL, yet it's wording is enough to curb adoption from corporations.

Large organisations like the military have enough checks and balances to avoid these kind of licences with a 10ft pole.

qwertox

Yeah, they should! Not that the missile then makes a 180° turn to "return to sender" because it noticed that the target is a Chinese military base.

culi

The code is open sourced

Aaronstotle

How are these licenses enforceable?

reissbaker

Lawsuits, but it's mainly just CYA for DeepSeek; I doubt they truly are going to attempt to enforce much. I only mentioned it because it's technically not FOSS due to the content restrictions (but it's one of the most-open licenses in the industry; i.e. more open than Llama licenses, which restrict Meta's largest competitors from using Llama at all).

jiggawatts

I've always wondered why nobody has tried to scale image-generation models to modern LLM sizes, such as 200-500B parameters instead of 1-7B...

cube2222

Important note from the paper - the resolution is limited to 384x384 currently.

just-ok

Seems like a massive buried lede in an “outperforms the previous SoTA” paper.

franktankbank

Great for generating favicons!

jimmyl02

don't most architectures resolve this via superscaling / some up scaling pipeline after that adds the details?

iirc stable diffusion xl uses a "refiner" after initial generation

dragonwriter

The SDXL refiner is not an upscaler, it's a separate model with the same architecture used at the same resolution as the base model that is focussed more on detail and less on large scale generation (you can actually use any SDXL-derived model as a refiner, or none; most community SDXL derivatives use a single model with no refiner and beat the Stability SDXL base/SDXL refiner combination in quality.)

vunderba

Ouch, that's even smaller than the now-ancient SD 1.5 which is mostly 512x512.

ilaksh

The obvious point of a model that works like this is to see if you can get better prompt understanding. Increasing the resolution in a small model would decrease the capacity for prompt adherence.

vasco

> The whole training process took about 7/14 days on a cluster of 16/32 nodes for 1.5B/7B model, each equipped with 8 Nvidia A100 (40GB) GPUs.

reissbaker

For reference, this is about ~$110k for a training run to beat Dall-E 3, assuming a (somewhat expensive) $1.30/hr for an A100 40GB.

The former CEO of Stability estimated the Dall-E 2 training run cost as about $1MM: https://x.com/EMostaque/status/1547183120629342214

just-ok

How does generating images with 90% less pixels count as beating DALL•E?

1024core

There are plenty of models around that will reliably upscale an image. That's not the hard part.

culi

I think they're referring to specific benchmarks

carimura

just following this whole story lightly but is there reason to believe (or not) this data coming from them?

maxglute

At least for R1, folks more technical than me said the optmizations DeepSeek made doesn't make sense unless they were gimped by limited hardware. But the caveate being the limited hardware weren't actually super limited - NVIDIA exported gimped but still powerful hardware that was considerd legal under export controls - but DeepSeek engineers found optimizations to basically unlock full compute power. Maybe something similar.

buyucu

I used more to fine-tune SDXL and it looked horrible.

rightbyte

I believe that is University lab level of compute, right?

It is so nice to see that you don't need tech oligarch level of compute for stuff like this.

bangaladore

A quick research shows a 8 GPU A100 80GB server can easily be 120-150k a pop. So you are looking a few million in hardware costs if you wanted these on prem. The energy cost for the training is insignificant from my calculations.

So yeah, I imagine this is not a big deal for large, well funded, universities.

Biggest issue with these is ROI (obviously not real ROI) as GPUs have been progressing so fast recently for AI usecases that unless you are running them 24/7 what's the point of having them onprem.

rightbyte

Ye I mean, you don't have to do it, just that you can, can be enough.

null

[deleted]

agilob

Why don't all these new ~AI projects publish Dockerfiles and images?

I've been looking at several projects recently for subtitle, image generation, voice translation, any AI coding assistant, and none of them had a out of box support for containers. Instead authors prefer to write details install instructions, commands for Fedora, Ubuntu, Arch, notice to Debian developers about outdated python... Why is that?

querez

I see twoo reasons:

1. Because they're researchers, not devops experts. They release the model in the way that they are most familiar with, because it's easiest for them. And I say that as someone who's released/open-sourced a lot of AI models: I can see how Docker is useful and all that, but why would I invest the time to do package up my code? It took long enough to cut through the red tape (e.g. my company's release process), clean up the code, document stuff. I did that mostly because I had to (red tape) or because it also benefits me (refactorings & docs). But docker is something that is not immediately useful for myself. If people find my stuff useful, let them do it and repackage it.

2. most people using these model don't use them in docker files. Sure, end users might do that. But that's not the primary target for the research labs pushing these models out. They want to reach other researchers. And researchers want to use these models in their own research: They take them and plug them into python scripts and hack away: to label data, to finetune, to investigate. And all of those tasks are much harder if the model is hidden away in a container.

torginus

It's really nice but I don't understand why they keep pushing with the idea of text-to-image - text is not a great medium for describing visual scenes, no one in the real world who's working on real content authoring actually uses textual descriptions.

Why not allow for more photoshop, freehand art (or 3d editor ) style controls, which are much simpler to parse than textual descriptions

operator-name

Accessability and training data.

Nvidia canvas existed before text to image models but it didn't gain as much popularity with the masses.

The other part is the training data - there are masses of (text description, image) pairs whilst if you want to do something more novel you may struggle to find a big enough dataset.

JFingleton

Image/video generation could possibly be used to advance LLMs in quite a substantial way:

If the LLM during it's "thinking" phase encountered a scenario where it had to imagine a particular scene (let's say a pink elephant in a hotel lobby), then it could internally generate that image and use it to aid in world-simulation / understanding.

This is what happens in my head at least!

vunderba

These things are not mutually exclusive.

All of this already exists in various forms: inpainting lets you make changes by masking over sections of a image, control nets let you guide the generation of an image through many different forms ranging from depth maps to posable figures, etc.

weird-eye-issue

> no one in the real world who's working on real content authoring actually uses textual descriptions

As someone who owns an AI image SaaS making over 100k per month this made me chuckle

Denkverbot

Dang, you are so cool and so smart!

minimaxir

LeoPanthera

If this is actually it, it's nowhere near DALL-E quality, and even further from Imagen.

Even taking into account the limited resolution, this is more like SD1.

vunderba

Quality notwithstanding, I don't think it even competes against the SOTA for understanding either - I ran some comparisons against Flux and was unimpressed.

https://imgur.com/a/nn9c0hB

minimaxir

Notably, the benchmarks don't compare to existing models on image quality, which is hard to measure outside of the Image Arena-style tests.

The release is more about the multimodal captioning which is an objective improvement. I'm not a fan of the submission title.

buyucu

no someone forked an older janus model. pro is not released yet

buyucu

no someone forked an older janus model. pro is not released yet

netcraft

is there a primer somewhere on the security/privacy considerations of deepseek?

For the phone app does it send your prompts and information to China?

OpenRouter says if you use them that none of their providers send data to China - but what about other 3rd parties? https://x.com/OpenRouterAI/status/1883701716971028878

Is there a way to host it yourself on say a descent specd macbook pro like through HuggingFace https://huggingface.co/deepseek-ai/DeepSeek-R1 without any information leaving your computer?

keheliya

Running it in a MacBook Pro entirely locally is possible via Ollama. Even running the full model (680B) is possible distributed across multiple M2 ultras, apparently: https://x.com/awnihannun/status/1881412271236346233

vessenes

That’s a 3 bit quant. I don’t think there’s a theoretical reason you couldnt run it fp16, but it would be more than two M2 Ultras. 10 or 11 maybe!

bildung

Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.

rsanek

the 70B distilled version that you can run locally is pretty underwhelming though

minimaxir

These Janus models are small enough to self-host locally.

You can also self-host a smaller distilled DeepSeek R1 variant locally.

ilaksh

This is not comparable to the R1 DeepSeek language model which has over 600 billion parameters. This for image generation and is 7b parameters and will run on most recent computers locally.

kristianc

Good luck with that.

daedrdev

It seems to me that just maybe by not lobotomizing the training in an attempt to keep it pg, they’ve unlocked a lot of performance.

This probably isnt the only thing of course but it is a major difference between deepseek and other models

gamblor956

Janus is heavily censored, moreso than any other available model.