The wall confronting large language models

Scene_Cast2

The paper is hard to read. There is no concrete worked-through example, the prose is over the top, and the equations don't really help. I can't make head or tail of this paper.

kristopolous

I used notebooklm to help go over it. I really don't find anything novel here. It looks like obvious things wrapped in an academic sheen and equations

lumost

This appears to be a position paper written by authors outside of their core field. The presentation of "the wall" is only through analogy to derivatives on the discrete values computer's operate in.

joe_the_user

Paper seems to involve a series of analogies and equations. However, I think if the equations accepted, the "wall" is actually derived.

The authors are computer scientists and people who work with large scale dynamic system. They aren't people who've actually produced an industry-scale LLM. However, I have to note that despite lots of practical progress in deep learning/transformers/etc systems, all the theory involved just analogies and equations of a similar sort, it's all alchemy and so people really good at producing these models seem to be using a bunch of effective rules of thumb and not any full or established models (despite books claiming to offer a mathematical foundation for enterprise, etc).

Which is to say, "outside of core competence" doesn't mean as much as it would for medicine or something.

HN

The wall confronting large language models

The wall confronting large language models