Apple Photos App Corrupts Images
tenderlovemaking.com
Tinycolor supply chain attack post-mortem
sigh.dev
DeepSeek writes less secure code for groups China disfavors
washingtonpost.com
Depression Reduces Capacity to Learn to Actively Avoid Aversive Events
eneuro.org
U.S. investors, Trump close in on TikTok deal with China
wsj.com
Drought in Iraq Reveals Ancient Tombs Created 2,300 Years Ago
smithsonianmag.com
Tau² Benchmark: How a Prompt Rewrite Boosted GPT-5-Mini by 22%
quesma.com
Ask HN: Is anyone else sick of AI splattered code
Event Horizon Labs (YC W24) Is Hiring
ycombinator.com
How to motivate yourself to do a thing you don't want to do
ashleyjanssen.com
Alibaba's new AI chip: Key specifications comparable to H20
news.futunn.com
Launch HN: RunRL (YC X25) – Reinforcement learning as a service
runrl.com
Determination of the fifth Busy Beaver value
arxiv.org
Ask HN: What's a good 3D Printer for sub $1000?
YouTube addresses lower view counts which seem to be caused by ad blockers
9to5google.com
Ton Roosendaal to step down as Blender chairman and CEO
cgchannel.com
Anthropic irks White House with limits on models’ use
semafor.com
Procedural Island Generation (III)
brashandplucky.com
Microsoft Python Driver for SQL Server
github.com
UUIDv47: Store UUIDv7 in DB, emit UUIDv4 outside (SipHash-masked timestamp)
github.com
Stategraph: Terraform state as a distributed systems problem
stategraph.dev
Firefox 143 for Android to introduce DoH
blog.mozilla.org
Hey HN, we’re Andrew and Derik at RunRL (https://runrl.com/). We've built a platform to improve models and agents with reinforcement learning. If you can define a metric, we'll make your model or agent better, without you having to think about managing GPU clusters.
Here's a demo video: https://youtu.be/EtiBjs4jfCg
I (Andrew) was doing a PhD in reinforcement learning on language models, and everyone kept...not using RL because it was too hard to get running. At some point I realized that someone's got to sit down and actually write a good platform for running RL experiments.
Once this happened, people started using it for antiviral design, formal verification, browser agents, and a bunch of other cool applications, so we decided to make a startup out of it.
How it works:
- Choose an open-weight base model (weights are necessary for RL updates; Qwen3-4B-Instruct-2507 is a good starting point)
- Upload a set of initial prompts ("Generate an antiviral targeting Sars-CoV-2 protease", "Prove this theorem", "What's the average summer high in Windhoek?")
- Define a reward function, using Python, an LLM-as-a-judge, or both
- For complex settings, you can define an entire multi-turn environment
- Watch the reward go up!
For most well-defined problems, a small open model + RunRL outperforms frontier models. (For instance, we've seen Qwen-3B do better than Claude 4.1 Opus on antiviral design.) This is because LLM intelligence is notoriously "spiky"; often models are decent-but-not-great at common-sense knowledge, are randomly good at a few domains, but make mistakes on lots of other tasks. RunRL creates spikes precisely on the tasks where you need them.
Pricing: $80/node-hour. Most models up to 14B parameters fit on one node (0.6-1.2 TB of VRAM). We do full fine-tuning, at the cost of parameter-efficiency (with RL, people seem to care a lot about the last few percent gains in e.g. agent reliability).
Next up: continuous learning; tool use. Tool use is currently in private beta, which you can join here: https://forms.gle/D2mSmeQDVCDraPQg8
We'd love to hear any thoughts, questions, or positive or negative reinforcement!