Athena landed in a dark crater where the temperature was minus 280° F
arstechnica.com
Decrypting encrypted files from Akira ransomware using a bunch of GPUs
tinyhack.com
New York Times shut down Tor Onion service
open.nytimes.com
Samsung Q990D unresponsive after 1020 firmware update
us.community.samsung.com
Briar: Peer to Peer Encrypted Messaging
briarproject.org
I'm Peter Roberts, immigration attorney who does work for YC and startups. AMA
Ask HN: Any insider takes on Yann LeCun's push against current architectures?
Show HN: Web Audio Spring-Mass Synthesis
blog.cochlea.xyz
GCVR (YC W22) is hiring a lead technical artist
ycombinator.com
Exo: Exocompilation for productive programming of hardware accelerators
github.com
FBI, EPA, and Treasury told Citibank to freeze funds to claw back climate money
techcrunch.com
Block Diffusion: Interpolating between autoregressive and diffusion models
arxiv.org
I-cant-believe-its-not-webusb: Hacking around lack of WebUSB support in Firefox
github.com
Why do transit agencies keep falling for the hydrogen bus myth?
cleantechnica.com
Magpies and crows are using “anti-bird spikes” to make nests (2023)
audubon.org
In S3 simplicity is table stakes
allthingsdistributed.com
My Scammer Girlfriend: Baiting a Romance Fraudster
bentasker.co.uk
Owl: Optimized Workforce Learning for multi-agent collaboration
github.com
A 2FA app that tells you when you get `314159` (2024)
blog.jacobstechtavern.com
Show HN: OCR Benchmark Focusing on Automation
nanonets.com
Apple will soon support encrypted RCS messaging with Android users
theverge.com
It doesn't cost much to improve someone's life
ourworldindata.org
Hey HN, after years building some of the core AI and NLU systems in Google Search, we decided to leave and build outside. Our goal was to put the advanced ML and DS techniques we’ve been using in the hands of all software engineers, so that everyone can build AI and Search apps at the same level of performance and sophistication as the big labs.
This was a hard technical challenge but we were very inspired by the MVC architecture for Web development. The intuition there was that when a data model changes, its view would get auto-updated. We built a similar architecture for AI. On one side is a scoring system, which encapsulates in a set of metrics what’s good about the AI application. On the other side is a set of optimizers that “compile” against this scorer - prompt optimization, data filtering, synthetic data generation, supervised learning, RL, etc. The scoring system can be calibrated using developer, user or rater feedback, and once it’s updated, all the optimizers get recompiled against it.
The result is a setup that makes it easy to incrementally improve the quality of your AI in a tight feedback loop: You update your scorers, they auto-update your optimizers, your app gets better, you see that improvement in interpretable scores, and then you repeat, progressing from simpler to more advanced optimizers and from off-the-shelf to calibrated scorers.
We would love your feedback on this approach. https://build.withpi.ai has a set of playgrounds to help you quickly build a scorer and multiple optimizers. No sign in required. https://code.withpi.ai has the API reference and Notebook links. Finally, we have a Loom demo [1].
More technical details
Scorers: Our scoring system has three key differences from the common LLM-as-a-judge pattern.
First, rather than a single label or metric from an LLM judge, our scoring system is represented as a tunable tree of metrics, with 20+ dimensions which get combined into a final (non-linear) weighted score. The tree structure makes scores easily interpretable (just look at the breakdown by dimension), extensible (just add/remove a dimension), and adjustable (just re-tune the weights). Training the scoring system with labeled/preference data adjusts the weights. You can automate this process with user feedback signals, resulting in a tight feedback loop.
Second, our scoring system handles natural language dimensions (great for free-form, qualitative questions requiring NLU) alongside quantitative dimensions (like computations over dates or doc length, which can be provided in Python) in the same tree. When calibrating with your labeled or preference data, the scorer learns how to balance these.
Third, for natural language scoring, we use specialized smaller encoder models rather than autoregressive models. Encoders are a natural fit for scoring as they are faster and cheaper to run, easier to fine-tune, and more suitable architecturally (bi-directional attention with regression or classification head) than similar sized decoder models. For example, we can score 20+ dimensions in sub-100ms, making it possible to use scoring everywhere from evaluation to agent orchestration to reward modeling.
Optimizers: We took the most salient ML techniques and reformulated them as optimizers against our scoring system e.g. for DSPy, the scoring system acts as its validator. For GRPO, the scoring system acts as its reward model. We’re keen to hear the community’s feedback on which techniques to add next.
Overall stack: Playgrounds next.js and Vercel. AI: Runpod and GCP for training GPUs, TRL for training algos, ModernBert & Llama as base models. GCP and Azure for 4o and Anthropic calls.
We’d love your feedback and perspectives: Our team will be around to answer questions and discuss. If there’s a lot of interest, happy to host a live session!
- Achint, co-founder of Pi Labs
[1] http://loom.com/share/c09a1fda8cdf4003a5664fa9cfbf7804