Skip to content(if available)orjump to list(if available)

Iceberg, the right idea – the wrong spec – Part 2 of 2: The spec

ozgrakkurt

Great analysis of what iceberg does but don’t agree with so much criticism.

It is very basic compared to a database, and even when you go into details of databases there are many things that don’t make sense in terms of doing the absolute best thing.

You could ciritisize parquet in a similar way if you go through the spec but because it is open and so popular people are going to use it no matter what.

If you need more performance/efficiency simplicity etc. just don’t use parquet but have conversion between your format and parquet.

Or you can build on top of parquet with external indices, keeping metadata in memory and having a separate WAL for consistency.

Similarly it should be possible to build on top of iceberg spec to create something like a db server that is efficient.

It is unlikely for something so usable for so many use cases to be the technically pure and most sensible option.

dkdcio

I think this criticism is missing the order of magnitude aspect -- I agree, people do not choose the most technically pure option. But one that launches on day 1 that can be used in SQL or Python with a few lines of code, across any cloud provider, and it basically "just works" is an order of magnitude or more simple than using Iceberg, at least in my experience in Python. It's always been odd how every non-JVM client for Iceberg has supported reads, but never writes...

People don't choose on tech on technical purity, but they often chose on simplicity & ease of use