Skip to content(if available)orjump to list(if available)

The Curious Similarity Between LLMs and Quantum Mechanics

herculity275

This is godawful slop written by someone who seems to have a cursory enthusiast's understanding of either of these topics.

dlivingston

Great discoveries are often made when someone has the cleverness to join together multiple, seemingly unconnected domains.

I'm a layman in both QM and LLMs, and found a few of the comparisons here contrived, but it's an interesting idea nonetheless.

neutrinobro

I was half expecting a discussion of the eigenvalues of a random matrix, or something at least halfway plausible, but no, this is useless mumbo jumbo, a la Deepak Chopra. People with zero understanding of quantum mechanics should stop making analogies to everything else.

djeifbll

[flagged]

rdlecler1

You clearly have deep understanding of base reality and the emergence of complexity. So please share an actual argument.

thechao

This is slop.

mike_ivanov

This one makes more sense https://link.springer.com/article/10.1007/s10701-021-00503-3

Emergent Quantumness in Neural Networks

Abstract: ...Madelung equations, that is, a hydrodynamic form of the Schrödinger equation, can be derived from a canonical ensemble of neural networks where the quantum phase was identified with the free energy of hidden variables.

Sharlin

Both LLMs and QM are based on linear algebra. That’s pretty much where the similarities end.

photochemsyn

Similar computational demand for useful results, too. The specific uses of linear algebra are a bit different: QM chemistry is about eigenvalue-solvers for large matrices repesenting the system Hamiltonian, HY = EY, which doesn't come into play in LLMs, where the linear algebra seem mostly used in chain-rule differentiation in matrix form.

There are similarities in some areas, eg gradient descent compared to self-consistent field (SCF) iterations in computational QM:

In Hartree-Fock or Kohn-Sham DFT:

        Guess a wavefunction (or density),

        Construct a Fock (or Kohn-Sham) matrix,

        Solve the eigenvalue problem for that matrix,

        Update the density,

        Repeat until convergence to a physically meaningful value for comparison to experimental observations.
In neural network training:

        Guess initial parameters,

        Compute a forward pass to get predictions,

        Evaluate a loss that measures prediction error,

        Backprop to find the max gradient of the loss function wrt parameters,

        Update parameter values via a small step in the opposite direction,

        Repeat until the model converges to a good-enough solution that pleases the human user.
Neither has much to do with the original article, though.

gyrovagueGeist

ah, yes, the spooky similarities of hilbert spaces and probability theory

ajkjk

I think there's more to this than the skeptical commenters here think.

The closer you look at quantum mechanics the more it seems to be entirely (in some sense) information-theoretic, although this nature is largely concealed from our vantage point on the inside. Measurement and quantization are due to the fact that we're part of any quantum system we measure and we have to experience a single value for our measurement, which comes out as a probability distribution over possible values (although why it takes the exact distribution that it does is a still mysterious, I believe?). Entropy and decoherence under time evolution result from the fact that if you let a bunch of states mix together for a while they statistically tend into more decohered configurations. Conservation laws seem to be slightly emergent rather than completely fundamental: internal to systems some limited 'off-shell' interactions that don't conserve e.g. energy are allowed, but these are suppressed by the time we measure anything, so we only experience the laws holding exactly.

When all of this is peeled away, the resulting system looks like "a bunch of initial (discrete) data, plus rules for looking at the implications of that data as it is allowed to interact with other data, in such a way that certain high-amplitude states amplify and low-amplitude states get destructively interfered away, allowing complex patterns to crystallize and replicate/evolve at increasingly macroscopic scales". Which, if you squint, looks a lot like a cellular automata such as Conway's Game of Life. But it can also (under some squinting) look like how thinking works, or how neural networks work, -ish: start with some low level bits, look for patterns in those, look for patterns in those patterns, etc, and then observe that certain resulting states have their amplitudes driven to 1 while others are driven to 0. Which reminds me, at least, of convolutional NNs. I don't know much about how LLMs work but I suspect they are perhaps slightly less good of an analogy, although perhaps they do but it's just slightly more masked by the architecture.

I wouldn't, like, bet anything on the details, but I suspect that in the long run there will be some widely-held hypothesis (similar in status to the Church-Turing thesis) that the universe, the brain, neural networks, and cellular automata all have some essential symmetry in their structure. Something to do with the requirements for how a system has to work if it is going to have emergent complexity.

(Incidentally I think this is what Stephen Wolfram sees also and is the basis for his quirky "Wolfram Physics Project", although I suspect that in his overconfidence/immunity to critique he's pretending to see a lot more of it than he actually does.)

throwaway290

> Tokens take on definite semantic meaning defined by their context.

> Quantum particles resolve to particles upon measurement

Tokens don't have meaning though, it is in our heads. Particles are measurable on the other hand

Terr_

altruios

Everything is basically the same if you squint your eyes enough...