Skip to content(if available)orjump to list(if available)

HN

The Pain That Is GitHub Actions

Austral: A Systems Language with Linear Types and Capabilities (2022)

Bolt3D: Generating 3D Scenes in Seconds

szymanowiczs.github.io

How fast the days are getting longer (2023)

joe-antognini.github.io

Diagrams AI can, and cannot, generate

Fetch-MCP: Playwright-Based MCP Server with Batch URL Fetching Support

Hunyuan3D-2-Turbo: fast high-quality shape generation in ~1s on a 4090

DESI Opens Access to the Largest 3D Map of the Universe Yet

newscenter.lbl.gov

AI Blindspots – Blindspots in LLMs I've noticed while AI coding

ezyang.github.io

Silicon Labs Shrinks Wireless SoCs to Extend BLE to Miniature Devices

allaboutcircuits.com

LLM Agents Are Simply Graph – Tutorial for Dummies

zacharyhuang.substack.com

Teaching a new way to prevent outages at Google

How I accepted myself into Canada's largest AI hackathon

fd: A simple, fast and user-friendly alternative to 'find'

Introduction to Deep Learning (CMU)

deeplearning.cs.cmu.edu

Muons used to test the condition of a road bridge in Estonia

GPascal – A Blast from the Past (2011)

Restoring Faith: Crete's Ancient Minoan Civilisation

historytoday.com

Launch HN: Modernbanc (YC W20) – Modern and fast accounting software

Orpheus-3B – Emotive TTS by Canopy Labs

Fine-tune Google's Gemma 3

Database management in a single PHP file

PackagePhobia – Find the cost of adding a new dev dependency to your project

packagephobia.com

Show HN: Codemcp – Claude Code for Claude Pro subscribers – ditch API bills

Writing an LLM from scratch, part 10 – dropout

Writing an LLM from scratch, part 10 – dropout

4 comments

·March 20, 2025

tony-allan

https://www.manning.com/books/build-a-large-language-model-f...

gpjt

OP here -- that's the one! Highly recommended.

Scene_Cast2

I never did as much thinking or testing of dropout on transformers as the author, but it didn't seem to help with my "baby" (~10 million param) transformer models. IIRC the latest Llama models don't use dropout either.

mattnewton

Same, I was never able to debug why dropout > 5% really hurt convergence speed for my toy LLMs. I chalked it up to the models not having enough parameters to fit fineweb and just stop using it.