Big LLMs weights are a piece of history
27 comments
·March 16, 2025intellectronica
t_mann
Big LLM is too long as a name. We should agree on calling them BLLMs. Surely everyone is going to remember what the letters stand for.
guestbest
Why not LLLM for large LLM’s and SLLM for small LLM’s, assuming there is no middle ground
HarHarVeryFunny
But of course these are all flavors of "large", so then we have big large language models, medium large language models, etc, which does indeed make the tall/grande/venti names appropriate, or perhaps similar "all large" condom size names (large, huge, gargantuan).
de-moray
What does a 20 LLM signify?
tonyhart7
can we have tiny LLM that can run on smartphone now
winter_blue
Apple Intelligence has an LLM that runs locally on the iPhone (15 Pro and up).
But the quality of Apple Intelligence shows us what happens when you use a tiny ultra-low-wattage LLM. There’s a whole subreddit dedicated to its notable fails: https://www.reddit.com/r/AppleIntelligenceFail/top/?t=all
One example of this is “Sorry I was very drunk and went home and crashed straight into bed” being summarized by Apple Intelligence as ”Drunk and crashed”.
samstave
I want a tiny_phone_based LLM to do thought tracking and comms awareness..
I actually applied to YC in like ~2014 or such for thus;
-JotPlot - I wanted a timeline for basically giving a histo timeline of comms btwn me and others - such that I had a sankey-ish diagram for when and whom and via method I spoke with folks and then each node eas the message, call, text, meta links...
I think its still viable - but my thought process is too currently chaotic to pull it off.
Basically looking at a timeline of your comms and thoughts and expand into links of thought - now with LLMs you could have a Throw Tag od some sort whereby you have the bot do work on research expanding on certain things and plugging up a site for that Idea on LOCAL HOST (i.e. your phone so that you can pull up data relevant to the convo - and its all in a timeline of thought/stream of conscious
hopefully you can visualize it...
null
laborcontract
I miss the good ol days when I'd have text-davinci make me a table of movies that included a link to the movie poster. It usually generated a url of an image in an s3 bucket. The link always worked.
api
That's really what these are: something analogous to JPEG for language, and queryable in natural language.
Tangent: I was thinking the other day: these are not AI in the sense that they are not primarily intelligence. I still don't see much evidence of that. What they do give me is superhuman memory. The main thing I use them for is search, research, and a "rubber duck" that talks back, and it's like having an intern who has memorized the library and the entire Internet. They occasionally hallucinate or make mistakes -- compression artifacts -- but it's there.
So it's more AM -- artificial memory.
Edit: as a reply pointed out: this is Vannevar Bush's Memex, kind of.
hengheng
I've been looking at it as an "instant reddit comment". I can download a 10G or 80G compressed archive that basically contains the useful parts of the internet, and then I all can use it to synthesize something that is about as good and reliable as a really good reddit comment. Which is nifty. But honestly it's an incredible idea to sell that to businesses.
Guthur
And so what would the point be of anyone actually posting on the internet if no one actually visits the sites because large corps have essentially stolen and monetized the whole thing.
And I'm sure they have or will have the ability to influence the responses so you only see what they want you to see.
api
Reddit seems to puppet humans via engagement farming to do what LLMs do in some cases. Posts are prompts, replies are responses.
Of course they vary widely in quality.
flower-giraffe
Or 80 years to MVP memex
“Vannevar Bush's 1945 article "As We May Think". Bush envisioned the memex as a device in which individuals would compress and store all of their books, records, and communications, "mechanized so that it may be consulted with exceeding speed and flexibility".
GolfPopper
>like having an intern who has memorized the library and the entire Internet. They occasionally hallucinate or make mistakes
Correction: you occasionally notice when they hallucinate or make mistakes.
antirez
I believe LLMs are both data and processing, but even humans reasoning is based in strong ways on existing knowledge. However, for the goal of the post, indeed it is the memorization that is the key value, and the fact that likely in the future sampling such models can be used to transfer the same knowledge to bigger LLMs, even if the source data is lost.
api
I'm not saying there is no latent reasoning capability. It's there. It just seems to be that the memory and lookup component is much more useful and powerful.
To me intelligence describes something much more capable than what I see in these things, even the bleeding edge ones. At least so far.
danielbln
That's the problem with the term "intelligence". Everyone has their own definition, we don't even know what makes us humans intelligent and more often than not it's a moving goalpost as these models get better.
antirez
I offer a POV that is in the middle: reasoning is powerful to evaluate which solution is better among N in the context. Memorization allows sampling of many competing ideas from the problem space, than the LLM picks the best, making chain of thoughts so effective. Of course zero shot reasoning also is a part of the story but somewhat weaker, exactly like we are not often able to spit the best solution before evaluation of the space (unless we are very accustomed to the specific problem).
bob1029
If you want to see what this would actually be like:
https://lcamtuf.coredump.cx/lossifizer/
I think a fun experiment could be to see at what setting the average human can no longer decipher the text.
yannyu
There's a great article recently by Ted Chiang that elaborated on this idea: https://www.newyorker.com/tech/annals-of-technology/chatgpt-...
menzoic
Having memory is fine but choosing the relevant parts requires intelligence
Mistletoe
This is an excellent viewpoint.
nickpsecurity
People wanting this would be better off using memory architectures, like how the brain does it. For ML, the simplest approach is putting in memory layers with content-addressible schemes. I have a few links on prototypes in this comment:
HarHarVeryFunny
Animal brains do not separate long term memory and processing - they are one and the same thing - columnar neural assemblies in the cortex that have learnt to recognize repeated patterns, and in turn activate others.
I love the title "Big LLMs" because it means that we are now making a distinction between big LLMs and minute LLMs and maybe medium LLMs. I'd like to propose the we call them "Tall LLMs", "Grande LLMs", and "Venti LLMs" just to be precise.