HTAP is Dead
92 comments
·May 28, 2025thom
You cannot say HTAP is dead when the alternative is so much complexity and so many moving parts. Most enterprises are burning huge amounts of resources literally just shuffling data around for zero business value.
The dream is a single data mesh presenting an SQL userland where I can write and join data from across the business with high throughput and low latency. With that, I can kill off basically every microservice that exists, and work on stuff that matters at pace, instead of half of all projects being infrastructure churn. We are close but we are not there yet and I will be furious if people stop trying to reach this endgame.
physix
> You cannot say HTAP is dead when the alternative is so much complexity and so many moving parts. Most enterprises are burning huge amounts of resources literally just shuffling data around for zero business value.
We built an HTAP platform as a layer over Cassandra for precisely that reason round about when Gartner invented the term.
In finance and fintech, there are ample use cases where the need for transactional consistency and horizontal scalability to process and report on large volumes come together, and where the banks really struggle to meet requirements.
I dug out an old description of our platform, updated it a bit, and put it on Medium, in case anyone is interested: https://medium.com/@paul_42036/a-technical-description-of-th...
sgarland
> The dream is a single data mesh presenting an SQL userland where I can write and join data from across the business with high throughput and low latency.
That exists, and has for years: an extremely large DB loaded to the gills with RAM and local NVMe drives. Add some read replicas if you need them, similarly configured. Dedicate one for OLAP.
jandrewrogers
This doesn’t work quite as well as people assume. The first limit is simply size, you can only cram a few petabytes of NVMe in a server (before any redundancy) and many operational analytic workloads are quite a bit larger these days. One of the major advantages of disaggregated storage in theory is that it allows you to completely remove size limits. Many operational analytic workloads don’t need a lot of compute, just sparse on-demand access to vast amounts of data. With good selectivity (another open issue), you could get excellent performance in this configuration.
Ignoring the storage size limits, the real issue as you scale up is that the I/O schedulers, caching, and low-level storage engine mechanics in a large SQL database are not designed to operate efficiently on storage volumes this large. They will work technically, but scale quite a bit more poorly than people expect. The internals of SQL databases are (sensibly) optimized for working sets no larger than 10x RAM size, regardless of the storage size. This turns out to be the more practical limit for analytics in a scale-up system even if you have a JBOD of fast NVMe at your disposal.
layer8
> many operational analytic workloads are quite a bit larger these days.
What are the use cases where such workloads come up, aside from Google-level operations? Just trying to understand what we are talking about.
sgarland
It works to a certain point, yes, but I daresay that the overwhelming majority of OLTP needs are in the <= TB range, not PB. OLAP is its own beast, though I'll also say that most modern tech companies' schema is hot garbage, full of denormalized tables for no good reason, JSON everywhere, etc. and thus the entire thing could be much, much smaller if RDMBS was used as it was intended: relationally.
dehrmann
A sibling mentioned GraphQL. That works, but it was really built for clients interacting with Meta's Ent framework. The web layer is largely a monolith, and user objects are modeled as "ents," linked to each other, and stored in heavily cached MySQL. GraphQL exposes access to them.
pradn
On the data warehousing side, I think the story looks like this:
1) Cloud data warehouses like Redshift, Snowflake, and BigQuery proved to be quite good at handling very large datasets (petabytes) with very fast querying.
2) Customers of these proprietary solutions didn't want to be locked in. So many are drifting toward Iceberg tables on top of Parquet (columnar) data files.
Another "hidden" motive here is that Cloud object stores give you regional (multi-zonal) redundancy without having to pay extra inter-zonal fees. An OLTP database would likely have to pay this cost, as it likely won't be based purely on object stores - it'll need a fast durable medium (disk), if at least for the WAL or the hot pages. So here we see the topology of Cloud object stores being another reason forcing the split between OLTP and OLAP.
But how does this new world of open OLTP/OLAP technologies look like? Pretty complicated.
1) You'd probably run PostGres as your OLTP DB, as it's the default these days and scales quite well.
2) You'd set up an Iceberg/Parquet system for OLAP, probably on Cloud object stores.
3) Now you need to stream the changes from PostGres to Iceberg/Parquet. The canonical OSS way to do this is to set up a Kafka cluster with Kafka Connect. You use the Debezium CDC connector for Postgres to pull deltas, then write to Iceberg/Parquet using the Iceberg sink connector. This incurs extra compute, memory, network, and disk.
There's so many moving parts here. The ideal is likely a direct Postgres->Iceberg write flow built-into PostGres. The pg_mooncake this company is offering also adds DuckDB-based querying, but that's likely not necessary if you plan to use Iceberg-compatible querying engines anyway.
Ideally, you have one plugin for purely streaming PostGres writes to Iceberg with some defined lag. That would cut out the third bullet above.
jgraettinger1
> There's so many moving parts here.
Yep. At the scope of a single table, append-only history is nice but you're often after a clone of your source table within Iceberg, materialized from insert/update/delete events with bounded latency.
There are also nuances like Postgres REPLICA IDENTITY and TOAST columns. Enabling REPLICA IDENTITY FULL amplifies you source DB WAL volume, but not having it means your CDC updates will clobber your unchanged TOAST values.
If you're moving multiple tables, ideally your multi-table source transactions map into corresponding Iceberg transactions.
Zooming out, there's the orchestration concern of propagating changes to table schema over time, or handling tables that come and go at the source DB, or adding new data sources, or handling sources without trivially mapped schema (legacy lakes / NoSQL / SaaS).
As an on-topic plug, my company tackles this problem. Postgres => Iceberg is a common use case.
[0] https://docs.estuary.dev/reference/Connectors/materializatio...
gjvc
can you explain this please "not having it means your CDC updates will clobber your unchanged TOAST values" ?
sgarland
They’re referring to this: https://debezium.io/blog/2019/10/08/handling-unchanged-postg...
lmz
This may be helpful for you https://clickhouse.com/docs/integrations/clickpipes/postgres...
moonikakiss
totally agreed on 3. You're also missing the challenges of dealing with updates/deletes; and managing the many small files.
CDC from OLTP to Iceberg is extremely non-trivial.
pradn
The small writes problem that Iceberg has is totally silly. They spend so much effort requiring a tree of metadata files, but you still need an ACID DB to manage the pointer to the latest tree. At that point, why not just move all that metadata to the DB itself? It’s not sooo massive in scale.
The current Iceberg architecture requires table reads to do so many small reads, of the files in the metadata tree.
The brand new DuckLake post makes all this clear.
https://duckdb.org/2025/05/27/ducklake.html
Still Iceberg will probably do just fine because every data warehousing vendor is adding support for it. Worse is better.
brightball
This is essentially what Crunchydata does with their Crunchydata Warehouse product. It’s really cool.
pradn
Their product looks promising. It looks like the PostGres schema and writes have to be "Iceberg-aware": special work to get around the fact that a small write results in a new, small Parquet file. That's not the end of the world - but perhaps ideally, you wouldn't be aware of Iceberg much at all when using PostGres. That might be a dream though.
Fully using PostGres without awareness of Iceberg would require full decoupling, and a translation layer in between (Debezium, etc). That comes with its own problems.
So perhaps some intimacy between the PostGres and Iceberg schemas is a good thing - especially to support transparent schema evolution.
DuckLake and CrunchyBridge both support SQL queries on the backing Iceberg tables. That's a good option. But a big part of the value of Iceberg comes in being able to read using Spark, Flink, etc.
BewareTheYiga
I'd argue the bigger value is keeping the data in one storage place and bringing the compute to it. Works especially well for Big Corp use cases where entire divisions of the corp go their own way. Throw in M&A activity and it is a good hedge for the unknown (I.e you might be an Databricks and Azure shop and you just bought a Snowflake & AWS company). Keep the data in an open table format, and let everyone query using their preferred engine to their hearts desire.
orefalo
I have compiled the following table to compare OLTP and OLAP
TOMDM
Terrible scrolling aside;
> pg_mooncake is a PostgreSQL extension adding columnstore tables with DuckDB execution for 1000x faster analytics. Columnstore tables are stored as Iceberg or Delta Lake tables in your Object Store. Maintained by Mooncake Labs, it is available on Neon Postgres.
Seems to summarise the reason this article exists.
Not that I really disagree with the premise or conclusion of the article itself.
jarbaugh
I'm skeptical of this. The cost of maintaining the "disaggregated data stack" can be immense at scale. A database that can handle replication from a row-based transactional store to, for example, a columnar one that can support aggregations could really reduce the load on engineering teams.
My work involves a "disaggregated data stack" and a ton of work goes into orchestrating all the streaming, handling drift, etc between the transactional stores (hbase) and the various indexes like ES. For low-latency OLAP queries, the data lakes can't always meet the need either. I haven't gotten the chance to see an HTAP database in action at scale, but it sounds very promising.
0xbadcafebee
Don't worry. All architectures get recycled eventually. Everything is new again.
One of the biggest problems with having more data is it's just hard to manage. That's why cloud data warehouses are here to stay. They enable the "utility computing" of cloud compute providers, but for data. I don't think architecture is a serious consideration for most people using it, other than the idea that "we can just throw everything at it".
NewSQL didn't thrive because it isn't sexy enough. A thing doesn't succeed because it's a "superior technology", it survives if it's overwhelmingly more appealing than existing solutions. None of the NewSQL solutions are sufficiently sexier than old boring stable databases. This is the problem with every new database. I mean, sure, they're fun for a romp in the sheets; but are they gonna support your kids? Interest drops off once everyone realizes it's not overwhelmingly better than the old stuff. Humans are trend-seekers, but they also seek familiarity and safety.
hn_throwaway_99
> Most workloads don’t need distributed OLTP. Hardware got faster and cheaper. A single beefy machine can handle the majority of transactional workloads. Cursor and OpenAI are powered by a single-box Postgres instance. You’ll be just fine.
I thought this was such an important point. Sooooo many dev hours were spent figuring out how to do distributed writes, and for a lot of companies that work was never needed.
roncesvalles
I thought it was the weakest point. The need for a distributed DB is rarely performance, it's availability and durability.
hn_throwaway_99
I think you misunderstood his point (and mine). There are usually much better ways to support availability and durability than to have multiple simultaneous write servers. On the contrary, having multiple write servers is usually worse for availability and durability because of the complexity.
For example, look at how Google Cloud SQL's aptly name "High Availability" configuration supports high availability: 1 primary and 1 standby. The standby is synced to the primary, and the roles are switched if a failover occurs.
davidgomes
But you can get more availability and more durability with much easier alternatives:
- Availability: spin up more read replicas.
- Durability: spin up more read replicas and also write to S3 asynchronously.
With Postgres on Neon, you can have both of these very easily. Same with Aurora.
(Disclaimer: I work at Neon)
roncesvalles
No loss of committed transactions is acceptable to any serious business.
>I work at Neon
In my opinion, distributed DB solutions without synchronous write replication are DOA. Apparently a good number of people don't share this opinion because there's a whole cottage industry around such solutions, but I would never touch them with a 10 foot stick.
_benedict
This doesn’t seem to provide higher write availability, and if the read replicas are consistent with the write replica this design must surely degrade write availability as it improves read availability, since the write replica must update all the read replicas.
This also doesn’t appear to describe a higher durability design at all by normal definitions (in the context of databases at least) if it’s async…?
growlNark
Something tells me neither cursor nor openai need write workloads, so they would probably do just as fine using a flat file. I'm honestly curious what use either would have for queries that you couldn't get with a filesystem.
Certainly neither products have much obvious need for OLTP workloads. Hell, neither have any need for transactions at all. You're just paying them for raw CPU.
growlNark
Update: in my mind, this reflects analytics of queries. Just further reason to run your own models I guess....
skissane
> Back in the ’70s, one relational database did everything. Transactions (OLTP) during the day and reports after hours (OLAP). Databases like Oracle V2 and IBM DB2 ran OLTP and OLAP on the same system; largely because data sets still fit on a few disks and compute was costly.
The timeline is a bit off - Oracle V2 was released in second half of 1979, so although it technically came out at the very end of the 1970s, it isn’t really representative of 1970s databases. Oracle V1 was never released commercially, it was used as an internal name while under development starting circa 1977, inside SDL (which renamed itself RSI in 1979, and then Oracle in 1983). Plus Larry Ellison wanted the first release to be version 2 because some people are hesitant to buy version 1 software. Oracle was named after a database project Ellison worked on for the CIA while employed at Ampex, although I’m not sure anyone can really know exactly how much the abandoned CIA database system had in common with Oracle V1/V2, definitely taking some ideas from the CIA project but I’m not sure if it took any of the actual code.
The original DB2 for MVS (later OS/390 and now z/OS) was released in 1983. The first IBM RDBMS to ship as a generally available commercial product was SQL/DS in 1981 (for VM/CMS), which this century was renamed DB2 for VM/VSE. I believe DB2/400 (now renamed DB2 for IBM i) came out with the AS/400 and OS/400 in 1988, although possibly there was already some SQL support in S/38 in the preceding years. The DB2 most people nowadays would encounter is the Linux/AIX/Windows edition (DB2 LUW) is a descendant of OS/2 EE Database Manager, which I think came out in 1987. Anyway, my point - the various editions of DB2 all saw their initial releases in the 1980s, not the 1970s.
While relational technology was invented as a research concept in the 1970s (including the SQL query language, and several now largely forgotten competitors), in that decade its use was largely limited to research, along with a handful of commercial pilots. General commercial adoption of RDBMS technology didn’t happen until the 1980s.
The most common database technologies in the 1970s were flat file databases (such as ISAM and VSAM databases on IBM mainframes), hierarchical databases (such as IBM IMS), the CODASYL network model (e.g. IDS, IDMS), MUMPS (a key-value store with hierarchical keys), early versions of PICK, inverted list databases (ADABAS, Model 204, Datacom)-I think many (or even all) of these were more popular in the 1970s than any RDBMS. The first release of dBase came out in 1978 (albeit then called Vulcan, it wasn’t named dBase until 1980)-but like Oracle, it falls into the category “technically released in late 1970s but didn’t become popular until the 1980s”
null
One thing none seem to notice is the rise of “Operational Warehouses” such as RisingWave or Materialize. A big ‘problem’ in OLAP, as the article mentions, is people expects aggregations or analytic views on live data. These solutions solve it. In principle, this shows that just having incrementally maintained materialised views, really goes a long way towards achieving the HTAP dream on a single DB.