Skip to content(if available)orjump to list(if available)

HN

Donald Knuth's 2024 Christmas Lecture: Strong and Weak Components [video]

Transformer – Spreadsheet

skilldrick.github.io

Kill the "user": Musings of a disillusioned technologist

Understanding Reasoning LLMs

magazine.sebastianraschka.com

Robust autonomy emerges from self-play

Announcing the data.gov archive

lil.law.harvard.edu

There's Math.random(), and then there's Math.random() (2015)

Show HN: SQLite disk page explorer

How does life happen when there's barely any light?

quantamagazine.org

Blocking the telemetry of Adobe apps

a.dove.isdumb.one

HTML Whitespace Is Broken

Simulating water over terrain

lisyarus.github.io

Ask HN: How to handle pushback on a team switch?

OpenLDK: A Java JIT compiler and runtime in Common Lisp

There May Not Be Aha Moment in R1-Zero-Like Training

oatllm.notion.site

Steve Meretzky – Working with Douglas Adams on the Hitchhiker's Guide

spillhistorie.no

It is time to standardize principles and practices for software memory safety

TKey – Security for the New World

If your customers don't talk, NPS is a vanity metric

elliotcsmith.com

AI by Hand Exercises in Excel

AI-generated Answers experiment on Stack Exchange sites

meta.stackexchange.com

Practical use of the null garbage collector

devblogs.microsoft.com

Show HN: An homage to Tom Dowdy's 1991 screensaver, "Kaos"

thestrikeagency.com

There May Not Be Aha Moment in R1-Zero-Like Training

There May Not Be Aha Moment in R1-Zero-Like Training

4 comments

·February 7, 2025

Jean-Papoulos

>We found Superficial Self-Reflection (SSR) from base models’ responses, in which case self-reflections do not necessarily lead to correct final answers.

I must be missing something here. No one was arguing that the AI answers are correct to begin with, just that self-reflection leads to more correct answers when compared to not using the process ?

littlestymaar

TL;DR;

Base models exhibit what rhe authors call "Superficial Self-Reflection" where it looks like it's reasoning but it doesn't lead to an actual improvement in answer quality. Then with RL the models learn to effectively use this reflection to improve answer quality.

The whole read is interesting but I don't think the title is really an accurate description of it…

jamiequint

Some interesting discussion in the author's X thread here: https://x.com/zzlccc/status/1887557022771712308

yash302

[flagged]