Skip to content(if available)orjump to list(if available)

S1: Simple Test-Time Scaling

S1: Simple Test-Time Scaling

2 comments

·February 3, 2025

mncharity

> This is similar to the "Superficial Alignment Hypothesis" presented in LIMA (Zhou et al., 2023), where the authors find that 1,000 examples can be sufficient to align a model to adhere to user preferences.

Link: LIMA: Less Is More for Alignment https://proceedings.neurips.cc/paper_files/paper/2023/file/a... , 1k cites: https://scholar.google.com/scholar?cites=1642843440474691780...

randomcatuser

really cool! i wonder what happens when you teach it to use tools inside the reasoning as well. could be even better!