Skip to content(if available)orjump to list(if available)

HN

DeepSeek-R1-671B-Q4_K_M with 1 or 2 Arc A770 on Xeon

Tailscale is pretty useful

Cognitive Behaviors That Enable Self-Improving Reasoners

Git without a forge

chiark.greenend.org.uk

QwQ-32B: Embracing the Power of Reinforcement Learning

qwenlm.github.io

Arva AI (YC S24) Is Hiring an AI Product Engineer

ycombinator.com

Zentool – AMD Zen Microcode Manipulation Utility

Nebu: A Spreadsheet Editor for Varvara

wiki.xxiivv.com

The US stops sharing air quality data from embassies worldwide

Exploring the Paramilitary Leaks

Tiny Ten DSP-Based HF Transceiver

Dear Apple: Add "Disappearing Messages" to iMessage

blog.cryptographyengineering.com

Things we've learned about building products

newsletter.posthog.com

Yoke: Infrastructure as code, but actually

Apple takes UK to court over 'backdoor' order

theregister.com

The Tiny Star Explosions Powering Moore's Law

spectrum.ieee.org

There Was a Texas Lottery Arbitrage

Math Academy pulled me out of the Valley of Despair

mikelikejordan.bearblog.dev

Solving First Order Differential Equations with Julia

ritog.github.io

Show HN: Leaflet.pub – a web app for creating and sharing rich documents

Show HN: Search and analyze millions of SEC filings with AI.

My Beancount books are 95% automatic after 3 years (2024)

SepLLM: Accelerate LLMs by Compressing One Segment into One Separator

SepLLM: Accelerate LLMs by Compressing One Segment into One Separator

2 comments

·March 3, 2025

kevmo314

This paper seems like it misses the forest for the trees. The analysis is certainly interesting and the proposal sounds viable, sort of like a sliding window attention with a little more history.

But if it is true that the separators contribute the most towards the attention scores, wouldn't that imply that the tokenization scheme can be improved? Introducing a compression scheme seems like patching around that compared to if the model naturally generated a more random attention distribution.

xp84

Or, put another way:

'Why waste time say lot token when few token do trick?"

-Kevin Malone