Skip to content(if available)orjump to list(if available)

HN

RubyLLM: A delightful Ruby way to work with AI

Transformers Without Normalization

jiachenzhu.github.io

Kerning, the Hard Way

home.octetfont.com

Athena landed in a dark crater where the temperature was minus 280° F / 173° C

arstechnica.com

Briar: Peer to Peer Encrypted Messaging

briarproject.org

Apple will soon support encrypted RCS messaging with Android users

Decrypting encrypted files from Akira ransomware using a bunch of GPUs

Noloco (YC S21) Is Hiring a Product Designer in Barcelona

ycombinator.com

I'm Peter Roberts, immigration attorney who does work for YC and startups. AMA

A look at Firefox forks

Why do transit agencies keep falling for the hydrogen bus myth?

cleantechnica.com

'Once in a Century' Proof Settles Math's Kakeya Conjecture

quantamagazine.org

Switching to BunnyCDN in Less Than 2 Hours

jonathan-frere.com

Ask HN: Any insider takes on Yann LeCun's push against current architectures?

Can a Geothermal Startup Vaporize Rock to Drill the Deepest Holes?

Show HN: eli – Embedded Lisp Interpreter

Samsung Q990D unresponsive after 1020 firmware update

us.community.samsung.com

The curious surge of productivity in U.S. restaurants

bfi.uchicago.edu

Using a graphics tablet as a programming tool (2018)

jeandavidmoisan.com

Why do some birds mimic the sounds of other species?

allaboutbirds.org

Show HN: Web Audio Spring-Mass Synthesis

blog.cochlea.xyz

Show HN: Online Python Compiler with Libraries

Block Diffusion: Interpolating between autoregressive and diffusion models

Exo: Exocompilation for productive programming of hardware accelerators

Transformers Without Normalization

Transformers Without Normalization

4 comments

·March 15, 2025

kouteiheika

If true this is very nice incremental improvement. It looks like it doesn't meaningfully improve the capabilities of the model, but is cheaper to compute than RMSNorm (which essentially all current state of art LLMs use) which means faster/cheaper training.

gdiamos

What are the practical implications of this?

gricardo99

from the abstract

  By incorporating DyT, Transformers without normalization can match or exceed the performance of their normalized counterparts, mostly without hyperparameter tuning.

adamnemecek

It feels like the end goal of this is energy-based models, Yann LeCun's favorite ML approach.

We at Traceoid http://traceoid.ai have identified a promising approach for scaling EBMs. Join the discord channel https://discord.com/invite/mr9TAhpyBW