How can AI researchers save energy? By going backward

48 comments

·June 2, 2025

tamat

As a Software Engineer I found it hard to grasp the concepts explained here.

First it says we lose electrons by deleting information. But AFAIK we are losing electrons everywhere, most gates will operate on negation of a current, which I understand is what they refeer to losing electrons. So, are all gates bad now?

Also, why keeping a history of all memory changes will prevent losing heat? You will have to keep all that memory running so...

And finally, why would this be useful? Who needs to go back in time in their computations??

HPsquared

It's a thermodynamics thing. Reversible processes are the most efficient (something to do with entropy). Deleting information means it's no longer reversible. This is an entirely theoretical thing. There are theoretical limits to energy usage of computation based on this, but actual computers are nowhere near these theoretical limits, at all.

Edit: and yes, most of the logical operations in a regular chip like AND, OR, NAND etc are irreversible (in isolation, anyway)

rnhmjoj

> but actual computers are nowhere near these theoretical limits, at all.

The Landauer limit at ambient temperature gives something of the order of 10⁻²¹ J to irreversibly flip a bit. While, if I read this paper[1] correctly, current transistors are around 10⁻¹⁵ J. So, definitely not coming to AI "soon".

[1]: https://arxiv.org/pdf/2312.08595

null

[deleted]

tamat

thanks for your reply

thrance

Theoretically, a computer that never forgets anything can run without consuming any power (and thus never heating). That kind of computer would be called reversible (or adiabatic) as it would require its gates to be reversible (i.e. any computation can be undone). You would still need to expend energy to set the initial state (input) and copy the result (output).

Obviously, in real life, most power consumed by computers is lost by wire resistance, not through "forgetting" memory in logic gates. You would need superconducting wires and gates to build an actually reversible CPU.

Also, you would need to "uncompute" the result of a computation to bring back your reversible computer from its result back to its initial state, which may be problematic. Or you can expend energy to erase the state.

Quantum computers are reversible computers, if you seek a real life example. Quantum logic gates are reversible and can all be inverted.

tamat

Thanks for your explanation

naasking

> Also, why keeping a history of all memory changes will prevent losing heat?

How much power does a persistent storage (hard drive, SSD) require to preserve its stored data? Zero, which is why it emits zero heat.

> Who needs to go back in time in their computations??

At its most basic level, erasing/overwriting data requires energy. This generates a lot of heat. Heat dissipation is a major obstacle to scaling chips down even further. If you can design a computer that doesn't need to erase nearly as much data, you generate orders of magnitude less heat, and this potentially opens up more scaling potential and considerable power savings.

imurray

I'm sceptical about the energy motivation, but there are multiple reasons why making invertible deep learning architectures can be interesting or useful. Cf, a series of workshops from 2019-2021: https://invertibleworkshop.github.io/

Since then diffusion models have been popular. Generating from these can be seen as a special case of a continuous time normalizing flow, and so (in theory) is a reversible computation. Although the distilled/fast generation that's run in production is probably not that!

Simulating differential equations is not usually actually reversible in practice due to round-off errors. But when done carefully, simulations performed in a computer can actually be exactly bit-for-bit reversible: https://arxiv.org/abs/1704.07715

imurray

Another machine learning paper ("ancient", 2015) where being able to exactly reverse a computation was useful: https://arxiv.org/abs/1502.03492

MangoToupe

What does it mean for computation to have a direction? Said direction does not seem to refer to causality, which seems to me to be the natural interpretation—ie, producing inputs from outputs. It seems to me you'd necessarily need to run the program first with known inputs for that to work. So this just about preserving state by default to make backtracking easier?

random3

Yes, but at a physical level, so it needs different hardware. Delete (e.g. AND) generates heat, so you need different gates, like the Fredkin gate.

godelski

  > What does it mean for computation to have a direction?

Actually, all computation has a directionality! This is actually a subject that I get really excited about ^__^

Think about this, we have a function f, with input x, and output y. f(x) -> y We'd even use that notation! This is our direction.

Now, the reverse actually gets a bit tricky. If the reverse is straightforward, our function has an inverse. But, it might not always. So our function f(x)=mx+b is invertible, because we can write x = (f(x)-b)/m (well... m can't be 0), which provides a unique solution. Every x corresponds to a unique f(x). But if instead, we have the function f(x) = x^2, this is not true! x = sqrt(f(x)), and here every f(x) corresponds to both x and -x! They are not unique.

We can start adopting the language of images and preimages if you want to start heading down that route. There's a lot of interesting phenomena here and if you hadn't already guessed it, yes, this is related to the P=NP problem!

An easy way to see this visually is to write down the computational circuit. Polylog actually has a really good video and will make the connection to P vs NP[0]

In the context of machine learning, a Normalizing Flow would be invertible, while a diffusion model is reversible. A pet peeve of mine is that in ML people (I'm a ML researcher) call it the "inverse problem", such as GAN-Inversion, but that is a misnomer and we shouldn't propagate it... This also has to do with the naivety of these statements...[1,2]. If yo understand this you'll understand how one could make accurate predictions in one direction but fail in the other. Which really puts a whole damper on that causality stuff. Essentially, we run into the problem of generating counterfactuals.

  > Said direction does not seem to refer to causality

Understanding this, I think you can actually tell that there's a direct relationship to causality here! In physics we love to manipulate equations around because... well... the point of physics is generating a causal mapping of the universe. But there's some problems... Entropy is the classic prime example (but there are many more in QM), and perhaps this led to his demise[3]. (This is also related to the phenomena of emergence and chaos.)

Here the issue is that we can take some gas molecules, run our computation forward and get a new "state" (configuration of our molecules). But now... how do we run this in reverse? We will not generate a unique solution, but instead we have a family of solutions.

Funny enough, you ran into this when you took calculus! That's why when you integrated your professor always got mad if you dropped the "+C"! So here you can see that differentiation isn't (necessarily) an invertible process. All f(x)+c map to f'(x)! It is a many to one relationship, just like with f(x)=x^2

  > So this just about preserving state by default to make backtracking easier?

I think here we should have some more clarity? If not, think about our gas distribution problem. If instead of just sampling at time 0 and time T we sampled at {0,t0,t1,...,T} we greatly reduce the solution space, right? Because now our mapping from T->0 needs to pass through all such states. It's still a lot of potential paths, but it's still fewer...[4]

[0] https://www.youtube.com/watch?v=6OPsH8PK7xM

[1] https://www.reddit.com/r/singularity/comments/1dhlvzh/geoffr...

[2] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=449s

[3] The opening of Goldstein's States of Matter book (the standard graduate textbook on statistical mechanics). Be sure to also read the first line of the second paragraph: https://i.imgur.com/Dm0PeJU.png

[4] I know...

bravesoul2

I am guessing it minimises irreversible operations (information deletion), I.e.:

2 + 2 + 2

<=> reversable

2 + 2 + 2, 2 + 4

<=> reversable

2 + 2 + 2, 2 + 4, 6

=> irreversible

MangoToupe

Interesting. I would have thought that a reversible computation would produce a new algorithm with the domain and range swapped. Naming truly is the final boss of computation. But now I see it's one-step backtracking in a way that allows saving energy, supposedly. Very much still reversible, but definitely nothing remotely comparable to time travel. "uncomputation" was the much, much better name.

Edit: i see now. Well, this is much less exciting than I thought. Still, I'm excited for all the other people excited.

bravesoul2

Anything to get the PPM co2 down is exciting although this probably has no effect (because Jevon) and we just need more trees, wind and solar

worldsayshi

I've been following Mike P Frank for a while on Twitter and he has often had interesting things to say about reversible computing and AI:

https://x.com/MikePFrank

user____name

I hope this will help when the next GPU datacenter driven software fad comes around. Though it probably won't, Jevons paradox and all.

throwawaymaths

what is the plan here? has anyone even demonstrated a reversible compute matmul? theres a lot of information destruction that happens at the edges of even that transformation

YetAnotherNick

There of course is reversible matmul for reversible matrix. There is no reversible relu though.

But in any case I don't understand the claim of the article. If you can reverse the computation(say only use reversible matrix) you can do it for less energy?

throwawaymaths

demonstrated is the key word here. you can make most circuits ~reversible by just shuffling off to a pool of bits that you destroy eventually (but i presume you destroy a minimal amount), but is the juice worth the squeeze? i would worry that the scaling factor for floorplan is not linear in the matrix size (which is already O(nxk))

if you're also batching matmuls isnt there an unavoidable information loss that happens when you evict the old data and drop in the new batch?

bravesoul2

Isn't it all reversable if you keep the original data?

Let f: V -> V

g: V -> V x V is reversable form of f, where g(v) = (v, f(v))

g'((v, w)) = v

g' can be "fooled" with a fake w, but that is of no concern here. We trust our own chip!

thrance

Your g is not reversible though, its input space must be the same dimension than its output space. On the other hand, you're correct in that any irreversible function can be extended to a reversible one, although the process isn't always straightforward. The general way is to do somthing like:

f: V -> V

g: V x A -> V x A

with g(x, a) = g(f(x), b) for some value(s) of a. And b cannot be set to x, because then you can't find a back with g', and your function is not invertible.

cwillu

The irreversible component of a computation is what actually generates heat, more or less.

bee_rider

IMO it would be more accurate to say: the delete operation is the one that we know thermodynamics and information theory must charge us for, on a physics level.

In current computers, we’re nowhere near those limits anyway. Reversible computing is interesting research for the future.

physix

I'm wondering... What happens when I do a reversible computation where no information is lost, does cutting the power on the unit create heat?

TheDudeMan

Only at the theoretic limits of efficiency. In real life, not true.

throwawaymaths

i don't think that's categorically true. if your footprint from expanding out a circuit to make it reversible gets sufficiently large other things like resistance can dominate.

TheDudeMan

No. It is a purely theoretic result. It has zero real-world applicability.

ajb

People have been working on commercialising this stuff for decades, actually. The problem is that you are competing with the staggering level of investment that goes into CMOS. To get early revenue you need a niche which is forgiving of not being as good as CMOS in other dimensions - AI is almost certainly not that.

JoshuaDavid

LeakyReLU is reversible in all but the least significant bits, right?

thrance

You're mistaking invertible matrices for reversible computing, which are two unrelated concepts*. You can devise a reversible implementation of ReLU (and of anything else for that matter), using ansible variables. Like in addition:

add(x, a) = (x + a, x - a), and add†(y, b) = ((y + b) / 2, (y - b) / 2)

It's a well known thing in thermodynamics that a reversible process doesn't increase entropy (heat). So in theory, a reversible computer consumes no power whatsoever, as it doesn't leak any. In practice, most power leakage in real life computers is due to wire resistance, not the irreversibility of computations. Also, even with a perfectly reversible CPU, you would need to expend some energy to (re)initialize the state of your computer (input) and to copy its results out (output). Alternatively, once a computation is done, you can always "uncompute" it to get back to the initial state without using any power, at the cost of time.

If you want an example of a real invertible computer, look into quantum computers, which are adiabatic (reversible) by necessity, in accordance with the laws of quantum physics.

* Actually, you can represent reversible gates with invertible matrices, and that has quite profound implications. A gate/operation is reversible if and only if its corresponding matrix is invertible. But let's not get into that here.

rollcat

Says the headline a web page that takes 12s to load on a modern machine.

People tend to ignore a problem if it's someone else's. The costs of [insert disruptive technology here] are largely externalised - on our natural environment, on individuals' livelihoods, on violated copyrights, on independent hosts' infrastructure, on pedestrians, on about-to-be burnt-out/jobless/homeless, etc. What you gain in efficiency, you will use to bring more for yourself, not to bring less harm to someone else. ¯\_(ツ)_/¯

thrance

In Stephen Baxter's Time [1], in the far far future, when all the stars have died out and black holes have finally all evaporated, the descendants of mankind are left in a maximally entropic universe, with no free enrgy left at all. They are condemned to live in giant simulations powered by reversible computing (consuming no power), reliving the same events over and over again, as computations are uncomputed and then recomputed.

[1] https://en.wikipedia.org/wiki/Time_(novel)

sadboots

it has been shown that using LLMs plenty of times require less electricity than boiling a kettle

charcircuit

This article is disappointing. Without any sort of benchmark or evidence of pytorch supporting it, how can this compete with nvidia? There is no proof that this finally is competitive against traditional chips.

Edit: One of their white papers mentions " Application Framework: A PyTorch-compatible interface supports both AI applications and general-purpose computing, ensuring versatility without sacrificing performance."

null

[deleted]

null

[deleted]

HN

How can AI researchers save energy? By going backward

How can AI researchers save energy? By going backward