Cognition (Devin AI) to Acquire Windsurf
cognition.ai
LIGO Detects Most Massive Black Hole Merger to Date
caltech.edu
Building Modular Rails Applications: A Deep Dive into Rails Engines
panasiti.me
Embedding user-defined indexes in Apache Parquet
datafusion.apache.org
Strategies for Fast Lexers
xnacly.me
Replicube: 3D shader puzzle game, online demo
replicube.xyz
Japanese grandparents create life-size Totoro with bus stop for grandkids (2020)
mymodernmet.com
Cidco MailStation as a Z80 Development Platform (2019)
jcs.org
NeuralOS: An Operating System Powered by Neural Networks
neural-os.com
Lightning Detector Circuits
techlib.com
Two guys hated using Comcast, so they built their own fiber ISP
arstechnica.com
Meticulous (YC S21) is hiring in UK to redefine software dev
tinyurl.com
Data brokers are selling flight information to CBP and ICE
eff.org
The Corset X-Rays of Dr Ludovic O'Followell (1908)
publicdomainreview.org
It took 45 years, but spreadsheet legend Mitch Kapor finally got his MIT degree
bostonglobe.com
Impacts of adding PV solar system to internal combustion engine vehicles
jstor.org
Tandy Corporation, Part 3 Becoming IBM Compatible
abortretry.fail
Show HN: Refine – A Local Alternative to Grammarly
refine.sh
East Asian aerosol cleanup has likely contributed to global warming
nature.com
Lossless Float Image Compression
aras-p.info
Immigration raids leave crops unharvested and California farms at risk
japantimes.co.jp
Show HN: Ten years of running every day, visualized
nodaysoff.run
A Century of Quantum Mechanics
home.cern
I think this post is a response to some new file format initiatives, based on the criticism that the Parquet file format is showing its age.
One of the arguments is that there is no standardized way to extend Parquet with new kinds of metadata (like statistical summaries, HyperLogLog etc.)
This post was written by the DataFusion folks, who have shown a clever way to do this without breaking backward compatibility with existing readers.
They have inserted arbitrary data between footer and data pages, which other readers will ignore. But query engines like DataFusion can exploit it. They embed a new index to the .parquet file, and use that to improve query performance.
In this specific instance, they add an index with all the distinct values of a column. Then they extend the DataFusion query engine to exploit that so that queries like `WHERE nation = 'Singapore'` can use that index to figure out whether the value exists in that .parquet file without having to scan the data pages (which is already optimized because there is a min-max filter to avoid scanning the entire dataset).
Also in general this is a really good deep dive into columnar data storage.