Cloudflare Global Network experiencing issues
cloudflarestatus.com
Gemini 3 for developers: New reasoning, agentic capabilities
blog.google
Gemini 3 Pro Preview Live in AI Studio
aistudio.google.com
Pebble, Rebble, and a Path Forward
ericmigi.com
A Day at Hetzner Online in the Falkenstein Data Center
igorslab.de
5 Things to Try with Gemini 3 Pro in Gemini CLI
developers.googleblog.com
Google Brings Gemini 3 AI Model to Search and AI Mode
blog.google
Solving a Million-Step LLM Task with Zero Errors
arxiv.org
Strix Halo's Memory Subsystem: Tackling iGPU Challenges
chipsandcheese.com
How Quake.exe got its TCP/IP stack
fabiensanglard.net
Nearly all UK drivers say headlights are too bright
bbc.com
Do Not Put Your Site Behind Cloudflare If You Don't Need To
huijzer.xyz
Show HN: Guts – convert Golang types to TypeScript
github.com
Show HN: Optimizing LiteLLM with Rust – When Expectations Meet Reality
github.com
Beauty in/of mathematics: tessellations and their formulas
tandfonline.com
A squeaky nail, or the wheel that sticks out
prashanth.world
Mathematics and Computation (2019) [pdf]
math.ias.edu
Google Antigravity, a New Era in AI-Assisted Software Development
antigravity.google
Ruby 4.0.0 Preview2 Released
ruby-lang.org
I've Wanted to Play That 'Killer Shark' Arcade Game Briefly Seen in 'Jaws'
remindmagazine.com
I've been working on Fast LiteLLM - a Rust acceleration layer for the popular LiteLLM library - and I had some interesting learnings that might resonate with other developers trying to squeeze performance out of existing systems.
My assumption was that LiteLLM, being a Python library, would have plenty of low-hanging fruit for optimization. I set out to create a Rust layer using PyO3 to accelerate the performance-critical parts: token counting, routing, rate limiting, and connection pooling.
The Approach
- Built Rust implementations for token counting using tiktoken-rs
- Added lock-free data structures with DashMap for concurrent operations
- Implemented async-friendly rate limiting
- Created monkeypatch shims to replace Python functions transparently
- Added comprehensive feature flags for safe, gradual rollouts
- Developed performance monitoring to track improvements in real-time
After building out all the Rust acceleration, I ran my comprehensive benchmark comparing baseline LiteLLM vs. the shimmed version:
Function Baseline Time Shimmed Time Speedup Improvement Status
token_counter 0.000035s 0.000036s 0.99x -0.6%
count_tokens_batch 0.000001s 0.000001s 1.10x +9.1%
router 0.001309s 0.001299s 1.01x +0.7%
rate_limiter 0.000000s 0.000000s 1.85x +45.9%
connection_pool 0.000000s 0.000000s 1.63x +38.7%
Turns out LiteLLM is already quite well-optimized! The core token counting was essentially unchanged (0.6% slower, likely within measurement noise), and the most significant gains came from the more complex operations like rate limiting and connection pooling where Rust's concurrent primitives made a real difference.
Key Takeaways
1. Don't assume existing libraries are under-optimized - The maintainers likely know their domain well 2. Focus on algorithmic improvements over reimplementation - Sometimes a better approach beats a faster language 3. Micro-benchmarks can be misleading - Real-world performance impact varies significantly 4. The most gains often come from the complex parts, not the simple operations 5. Even "modest" improvements can matter at scale - 45% improvements in rate limiting are meaningful for high-throughput applications
While the core token counting saw minimal improvement, the rate limiting and connection pooling gains still provide value for high-volume use cases. The infrastructure I built (feature flags, performance monitoring, safe fallbacks) creates a solid foundation for future optimizations.
The project continues as Fast LiteLLM on GitHub for anyone interested in the Rust-Python integration patterns, even if the performance gains were humbling.
Edit: To clarify - the negative performance for token_counter is likely in the noise range of measurement, suggesting that LiteLLM's token counting is already well-optimized. The 45%+ gains in rate limiting and connection pooling still provide value for high-throughput applications.