Skip to content(if available)orjump to list(if available)

SepLLM: Accelerate LLMs by Compressing One Segment into One Separator

kevmo314

This paper seems like it misses the forest for the trees. The analysis is certainly interesting and the proposal sounds viable, sort of like a sliding window attention with a little more history.

But if it is true that the separators contribute the most towards the attention scores, wouldn't that imply that the tokenization scheme can be improved? Introducing a compression scheme seems like patching around that compared to if the model naturally generated a more random attention distribution.

xp84

Or, put another way:

'Why waste time say lot token when few token do trick?"

-Kevin Malone