Skip to content(if available)orjump to list(if available)

Big LLMs weights are a piece of history

intellectronica

I love the title "Big LLMs" because it means that we are now making a distinction between big LLMs and minute LLMs and maybe medium LLMs. I'd like to propose the we call them "Tall LLMs", "Grande LLMs", and "Venti LLMs" just to be precise.

saltcured

I'd prefer to see olive sizes get a renaissance. I was always amused by Super Colossal when following my mom around a store as a little kid.

From a random web search, it seems the sizes above Large are: Extra Large, Jumbo, Extra Jumbo, Giant, Colossal, Super Colossal, Mammoth, Super Mammoth, Atlas.

VectorLock

How about wine bottle sizes since we're "bottling" a "distillation" of information...

https://en.wikipedia.org/wiki/Wine_bottle#Sizes

taneq

Needs more superlatives. “Biggest” < “Extra Biggest” < “Maximum Biggest”. :D

enlightens

maximum_biggest_final_2

fernmyth

"Non Plus Ultra"

Followed by another company introducing their "Plus Ultra" model.

inciampati

And I'd love to see data compression terminology get an overhaul. Do we need big LLMs or just succinct data structures? Or maybe "compact" would be good enough? (Yeah LLMs are cool but why not just, you know, losslessly compress the actual data in a way that lets us query its content?)

rowanG077

Well the obvious answer is that LLMs are more then just pure search. They can synthesize novel information from their learned knowledge.

xanderlewis

And the US ‘small’ LLMs will actually be slightly larger than the ‘large’ LLMs in the UK.

aziaziazi

I wonder how does the skinnies get dressed oversea: I wear European S which translate to XXS in the US, but there’s many people skinnier than me, still within a “normal" BMI. Do they have to find XXXS? Do they wear oversized clothes? Choosing trousers is way easier because the system of cm/inches of length+perimeter correspond to real values.

tbrownaw

> Choosing trousers is way easier because the system of cm/inches of length+perimeter correspond to real values.

They're not merely real values, they're also rational.

Spivak

It's a crazy experience being just physically larger than most of the world. Especially when the size on the label carries some implicit shame/judgement. Like I'm skinny, I'm pretty much the lowest weight I can be and not look emaciated / worrying. But when shopping for a skirt in Asian sizes I was a 4XL, and usually an or L-2XL in European sizes. Having to shift my mental space that a US M is the "right" size for me was hard for many years. But like I guess this is how sizing was always kinda supposed to work.

deepsun

We ordered swag T-shirts for a conference from two providers, but EU provider L's were actually larger than US L!

jgalt212

It's funny you say that, but when travelling abroad I wondered how Europeans and Japanese stay sufficiently hydrated.

jdietrich

For healthy adults, thirst is a perfectly adequate guide to hydration needs. Historically normal patterns of drinking - e.g. water with meals and a few cups of tea or coffee in between - are perfectly sufficient unless you're doing hard physical labour or spending long periods of time outdoors in hot weather. The modern American preoccupation with constantly drinking water is a peculiar cultural phenomenon with no scientific basis.

floriannn

Is this a thing about how restaurants in some European countries charge for water?

miki123211

> The UK

You mean the EU, right? The UK isn't covered by the AI act.

/s

t_mann

Big LLM is too long as a name. We should agree on calling them BLLMs. Surely everyone is going to remember what the letters stand for.

nullhole

I still like Big Data Statistical Model

heyjamesknight

I want to apologize for this joke in advance. It had to be done.

We could take a page from Trump’s book and call them “Beautiful” LLMs. Then we’d have “Big Beautiful LLMs” or just “BBLs” for short.

Surely that wouldn’t cause any confusion when Googling.

cowsaymoo

Weirdly enough, the ITU already chose the superlative for the bigliest radio frequency band to be Tremendous:

- Extremely Low Frequency (ELF)

- Super Low Frequency (SLF)

- Ultra Low Frequency (ULF)

- Very Low Frequency (VLF)

- Low Frequency (LF)

- Medium Frequency (MF)

- High Frequency (HF)

- Very High Frequency (VHF)

- Ultra High Frequency (UHF)

- Super High Frequency (SHF)

- Extremely High Frequency (EHF)

- Tremendously High Frequency (THF)

Maybe one day some very smart people will make Tremendously Large Language Models. They will be very large and need a lot of computer. And then you'll have the Extremely Small Language Model. They are like nothing.

https://en.wikipedia.org/wiki/Radio_frequency?#Frequency_ban...

temp0826

Bureau of Large Land Management

Arcuru

I've been labeling LLMS as "teensy", "smol", "mid", "biggg", "yuuge". I've been struggling to figure out where to place the lines between them though.

zargon

itsy-bitsy <= 3B

teensy 4B to 29B

smol 30B to 59B

mid 60B to 99B

biggg 100B to 299B

yuuge 300B+

badlibrarian

I've sat in more than one board meeting watching them take 20 minutes to land on t-shirt sizes. The greatest enterprise sales minds of our generation...

ben_w

I've seen things you people wouldn't believe.

I’ve seen corporate slogans fired off from the shoulders of viral creatives. Synergy-beams glittering in the darkness of org charts. Thought leadership gone rogue… All these moments will be lost to NDAs and non-disparagement clauses, like engagement metrics in a sea of pivot decks.

Time to leverage.

badlibrarian

... destroyed by madness, starving hysterical! Buying weed in a store then meeting with someone off Craiglist to score eggs.

latexr

Name them like clothing sizes: XXLLM, XLLM, LLM, MLM, SLM, XSLM XXSLM.

null

[deleted]

swyx

i did this!

XXLLM: ~1T (GPT4/4.5, Claude Opus, Gemini Pro)

XLLM: 300~500B (4o, o1, Sonnet)

LLM: 20~200B (4o, GPT3, Claude, Llama 3 70B, Gemma 27B)

~~zone of emergence~~

MLM: 7~14B (4o-mini, Claude Haiku, T5, LLaMA, MPT)

SLM: 1~3B (GPT2, Replit, Phi, Dall-E)

~~zone of generality~~

XSLM: <1B (Stable Diffusion, BERT)

4XSLM: <100M (TinyStories)

https://x.com/swyx/status/1679241722709311490

ai-christianson

MLM... uh oh

anonym29

I hate those ponzi schemes! Never buy a cutco knife or those crappy herbalife supplements.

Alternatively, just make sure you keep things consensual, and keep yourself safe, no judgement or labels from me :)

HarHarVeryFunny

But of course these are all flavors of "large", so then we have big large language models, medium large language models, etc, which does indeed make the tall/grande/venti names appropriate, or perhaps similar "all large" condom size names (large, huge, gargantuan).

guestbest

Why not LLLM for large LLM’s and SLLM for small LLM’s, assuming there is no middle ground

flir

M, LM, LLM, LLLM, L3M, L4M.

Gotta leave room for future expansion.

dan_linder

Hopefully the USB making team does NOT step into this...

LLM 3.0, LLM 3.1 Gen 1, LLM 3.2 Gen 1, LLM 3.1, LLM 3.1 Gen 2, LLM 3.2 Gen 2, LLM 3.2, LLM 3.2 Gen 2x2, LLM 4, etc...

kolinko

VLLM, Super VLLM, Almost Large Language Model

_heimdall

What makes it a Small Large Language Model? Why jot just an SLM?

technol0gic

Smedium Language Model

guestbest

If we can’t have fun with names, why even be in IT?

gpderetta

S and L cancel out, so it just an LM.

orbital-decay

SLM is a widespread term already.

guestbest

Slim pickings, then?

dr_dshiv

“We should regard the Internet Archive as one of the most valuable pieces of modern history; instead, many companies and entities make the chances of the Archive to survive, and accumulate what otherwise will be lost, harder and harder. I understand that the Archive headquarters are located in what used to be a church: well, there is no better way to think of it than as a sacred place.”

Amen. There is an active effort to create an Internet Archive based in Europe, just… in case.

blmurch

Yup! We're here and looking to do good work with Cultural Heritage and Research Organizations in Europe. I'm very happy to be working with the Internet Archive once again after a 20 year long break.

https://www.stichtinginternetarchive.nl/

stogot

What kind of volunteer help can the community do?

ttul

Well, it did establish a new HQ in Canada…

https://vancouversun.com/news/local-news/the-internet-archiv...

(Edited: apparently just a new HQ and not THE HQ)

thrance

With this belligerent maniac in the White House who recently doubled-down on his wish to annex Canada [1], I wouldn't feel safe relocating there if the goal is to flee the US.

[1] https://www.nbcnews.com/politics/donald-trump/trump-quest-co...

badlibrarian

Anyone who takes even an hour to audit anything about the Internet Archive will soon come to a very sad conclusion.

The physical assets are stored in the blast radius of an oil refinery. They don't have air conditioning. Take the tour and they tell you the site runs slower on hot days. Great mission, but atrociously managed.

Under attack for a number of reasons, mostly absurd. But a few are painfully valid.

dr_dshiv

Their yearly budget is less than the budget of just the SF library system.

badlibrarian

Then maybe they should've figured out how to keep hard drives in a climate controlled environment before they decided to launch a bank.

https://ncua.gov/newsroom/press-release/2016/internet-archiv...

floam

I realized recently, who needs torrents? I can get a good rip of any movie right there.

aziaziazi

I understand what you describe is prohibited in many jurisdictions, however I’m curious about the technical aspect : in my experience they host the html but often not the assets, especially big pictures and I guess most movies files are bigger that pictures. Do you use a special trick to host/find them?

jart

Mozilla's llamafile project is designed to enable LLMs to be preserved for historical purposes. They ship the weights and all the necessary software in a deterministic dependency-free single-file executable. If you save your llamafiles, you should be able to run them in fifty years and have the outputs be exactly the same as what you'd get today. Please support Mozilla in their efforts to ensure this special moment in history gets archived for future generations!

https://github.com/Mozilla-Ocho/llamafile/

visarga

LLMs are much easier to port than software. They are just a big blob of numbers and a few math operations.

andix

I think software is rather easy to archive. Emulators are they key. Nearly every platform from the past can be emulated on a modern arm/x86 Linux/windows system. Arm/x86/linux/windows are ubiquitous, even if they might fade away there will be emulators around for a long time. With future compute power it should be no problem to just use nested emulation, to run old emulators on an emulated x86/linux.

throwaway314155

> I think software is rather easy to archive.

* assuming someone else already spent tremendous effort to develop an emulator for your binary's target that is 100% accurate...

jsight

Indeed. In 50 years, loading the weights and doing math should be much easier than getting some 50 year old piece of cuda code to work.

Then again, CPUs will be fast enough that you'd probably just emulate amd64 and run it as CPU-only.

jart

llamafiles run natively on both amd64 and arm64. It's difficult to imagine both of them not being in play fifty years hence. There's definitely no hope for the cuda module in the future. We have enough difficulties getting it to work today. That's why cpu mode is the default.

refulgentis

LLMs are much harder, software is just a blob of two numbers.

;)

(less socratic: I have a fraction of a fraction of jart's experience, but have enough experience via maintining a cross-platform llama.cpp wrapper to know there's a ton of ways to interpret that bag o' floats and you need a lot of ancillary information.)

GeoAtreides

Just like the map isn't the territory, so summaries are not the content nor the library fillings the actual books.

If I want to read a post, a book, a forum, I want to read exactly that, not a simulacrum built by arcane mathematical algorithms.

visarga

The counter perspective is that this is not a book, it's an interactive simulation of that era. The model is trained on everything, this means it acts like a mirror of ourselves. I find it fascinating to explore the mind-space it captured.

defgeneric

While the post talks about big LLMs as a valuable "snapshot" of world knowledge, the same technology can be used for lossless compression: https://bellard.org/ts_zip/.

laborcontract

I miss the good ol days when I'd have text-davinci make me a table of movies that included a link to the movie poster. It usually generated a url of an image in an s3 bucket. The link always worked.

andix

I think it’s fine that not everything on the internet is archived forever.

It has always been like that, in the past people wrote on paper, and most of it was never archived. At some point it was just lost.

I inherited many boxes of notes, books and documents from my grandparents. Most of it was just meaningless to me. I had to throw away a lot of it and only kept a few thousand pages of various documents. The other stuff is just lost forever. And that’s probably fine.

Archives are very important, but nowadays the most difficult part is to select what to archive. There is so much content added to the internet every second, only a fraction of it can be archived.

hedgehog

This doesn't make much sense to me. Unattributed heresay has limited historical value, perhaps zero given that the view of the web most of the weights-available models have is Common Crawl which is itself available for preservation.

Terr_

I suspect the idea is that sometimes breadth wins out over accuracy. Even if it's unsuited as a primary source, this kind of lossy compression of many many documents might help a conscientious historian discover verifiable things through other routes.

fl4tul4

> Scientific papers and processes that are lost forever as publishers fail, their websites shut down.

I don't think the big scientific publishers (now, in our time) will ever fail, they are RICH!

thayne

Perhaps a shorter term risk is the publishers consider some papers less profitable, so they stop preserving them.

Legend2440

That means nothing. Big companies fail all the time. There is no guarantee any of them will be here in 50 years, let alone 500.

bookofjoe

So was the Roman Empire

pama

I would be curious to know if it would be possible to recunstruct approximate versions of popular common subsets of internet training data by using many different LLMs that may have happened to read the same info. Anyone knows pointers to math papers about such things?

sourtrident

Imagine future historians piecing together our culture from hallucinated AI memories - inaccurate, sure, but maybe even more fascinating than reality itself.

dstroot

Isn’t big LLM training data actually the most analogous to the internet archive? Shouldn’t the title be “Big LLM training data is a piece of history”? Especially at this point in history since a large portion of internet data going forward will be LLM generated and not human generated? It’s kind of the last snapshot of human-created content.

antirez

The problem is, where is this 20T tokens that are being used for this task? No way to access them. I hope that at least OpenAI and a few more have solid historical storage of the tokens they collect.