Skip to content(if available)orjump to list(if available)

Airport for DuckDB

Airport for DuckDB

11 comments

·May 19, 2025

mrbungie

I was almost going to build a lakehouse* with DuckDB because I low-key love it, easiest and strongest analytical engine I've found yet: scale from laptops to big metal, while being mostly out-of-core when doing sane stuff, and avoiding distributed computing for SQL in the process (looking at you Spark).

That is until I found out it does not support Iceberg writes[1], big nono as I would need another engine for inserts, and I want a simple stack :(. What a bummer.

[1] https://github.com/duckdb/duckdb_iceberg/issues/37

*that is what they are called now aren't they? I just can't follow the terms anymore haha.

mritchie712

it's coming. they already have hive style parquet writes. Iceberg is more complicated than that, but it's certainly doable.

mrbungie

Yeah, it just would be great if it already did so and I hope it supports Iceberg soon, as it would enable me to change expensive (and bad) engines like AWS Athena for something more manageable.

Don't get me wrong, I'm just being a tongue-in-check egotistical bastard data engineer from hell. DuckDB is a fine piece of software as it is, and those mantainers deserve heaven.

jeadie

This is one of the ideas behind using DuckDB in github.com/spiceai/spiceai

mrbungie

Looks very cool! I will take a look, tysm!

r3tr0

I love duck db. We use it a ton for indexing and organizing system / kernel level metrics exported by eBPF.

Check out our sandbox:

https://yeet.cx/play

rubenvanwyk

Does this mean the data source and destination both have to set up flight servers? I imagine then this won’t be useful for integration of third-party services.

blef

This is a cool thought exercise to think that everything that we do in the data world can be done in SQL, from SQL. In a sense this is the MCPs but for the DuckDB world.

the_optimist

What’s the situation where this is useful? Seems like ‘replace your remote duckDB instance—used to replace a DB server—with duckDB instance + a flight server (or a bunch of them!)’. Who has a problem for which this is the solution?

simlevesque

A Flight server paired with duckdb is a good way to get concurrent writes.