In-Memory C++ Leap in Blockchain Analysis

47 comments

·June 18, 2025

Hey HN

We’re the core engineering team at Caudena (which is used globally by investigative and intelligence agencies, including: Europol, Interpol, BKA, DHS, IRS-CI, FBI, NPA and others), and we just released the technical details behind Prism - our real-time, in-memory C++ database for blockchain analysis.

To tackle the massive scale and complexity of blockchain data, we had to get creative with low-level engineering:

- We utilize barebone servers with 2TB RAM and 48 Cores.

- Implemented lock-free concurrent data structures

- Developed a custom memory management system

- Leveraging CPU-level vectorization

- Built a custom in-memory columnar/graph database from scratch

We’d love to AMA about:

- the engineering choices we made

- crazy optimizations that paid off

- pitfalls we hit

Ask us anything about scaling, memory trade-offs, building real-time analytics on immutable data, or the crypto-forensics space.

Looking forward to a great convo!

Visit

canyp

You really had to call it Prism (PRISM), didn't you?

It's great to see C++ resulting in orders of magnitude cost reduction anyway. Do you have more details on the various C++ tricks done for optimization?

caudena

Yeah, we figured people would compare it to PRISM :)

There are many possible optimizations, but they’re all highly specific to the particular problems you’re trying to solve.

generalenvelope

Curious why you chose C++? Were there aspects of other languages/ecosystems like Rust that were lacking? Would choosing Rust be advantageous for blockchains that natively support it (like Solana)?

To be clear: I don't mean to imply you should have done it any other way. I'm interested mainly in gaps in existing ecosystems and whether popular suggestions to "deprecate C++ for memory safe languages" (like one made by Azure CTO years ago) are realistic.

npalli

Rust is the future of systems programming and will always be for the foreseeable future. The memory issue will mostly be addressed as needed, see from John Carmack yesterday[1], the C++ ecosystem advantage (a broad sense of how problems whether DS, Storage, OS, Networking, etc. have been solved) will be very hard to overcome for newer programming languages. I think it is ironic how modern C++ folks just keep chugging along releasing products while Rust folks are generally haranguing everyone about "memory safety" and generally leaving half finished projects (turns out writing Rust code is more fun than reading someone else, who would have guessed).

[1] https://x.com/ID_AA_Carmack/status/1935353905149341968

wgjordan

> The memory issue will mostly be addressed as needed

I have no allegiance to either lang ecosystem, but I think it's an overly optimistic take to consider memory safety a solved problem from a tweet about fil-c, especially considering "the performance cost is not negligible" (about 2x according to a quick search?)

npalli

Performance drop of 2x for memory safety critical sections vs Rust rewrite taking years/decades, not even a contest. Now, if that drop was 10x maybe, but at 2x it is no brainer to continue with C++. I'm not certain Fil-C totally works in all cases, but it is an example of how the ecosystem will evolve to solve this issue and not migrate to Rust.

caudena

Because we are on the 'unsafe' territory. And Rust doesn't even have a defined memory model. Rust is a little bit immature. We have some other services written in Rust though.

kanbankaren

What is wrong with C++?

With POSIX semaphores, mutexes, and shared pointers, it is very rare to hit upon a memory issue in modern C++.

Source: Writing code in C/C++ for 30 years.

CharlesW

> With POSIX semaphores, mutexes, and shared pointers, it is very rare to hit upon a memory issue in modern C++.

There is a mountain of evidence (two examples follow) that this is not true. Roughly two-thirds of serious security bugs in large C++ products are still memory-safety violations.

(1) https://msrc.microsoft.com/blog/2019/07/we-need-a-safer-syst... (2) https://www.chromium.org/Home/chromium-security/memory-safet...

kanbankaren

Show me a memory issue that was caused by proper usage of POSIX concurrency primitives.

nesarkvechnep

The worst code is usually written by someone who’s doing it for 30 years and can’t find a problem with their technology of choice.

Especially with shared pointers you can encounter pretty terrible memory issues.

kanbankaren

Dude, provide examples of "terrible" memory issues. Otherwise, you are just repeating the folklore which is outdated.

wat10000

What a terrifying statement.

Edit: to be less glib, this is like saying “our shred-o-matic is perfectly safe due to its robust and thoroughly tested off switch.” An off switch is essential but not nearly enough. It only provides acceptable safety if the operator is perfect, and people are not. You need guards and safety interlocks that ensure, for example, that the machine can’t be turned on while Bob is inside lubricating the bearings.

Mutexes and smart pointers are important constructs but they don’t provide safety. Safety isn’t the presence of safe constructs, but the absence of unsafe ones. Smart pointers don’t save you when you manage to escape a reference beyond the lifetime of the object because C++ encourages passing parameters by reference all over the place. Mutexes and semaphores don’t save you from failing to realize that some shared state can be mutated on two threads simultaneously. And none of this saves you from indexing off the end of a vector.

You can probably pick a subset of C++ that lets you write reasonably safe code. But the presence of semaphores, mutexes, and shared pointers isn’t what does it.

Source: also writing C and C++ for 30 years.

lisper

> Safety isn’t the presence of safe constructs, but the absence of unsafe ones.

Exactly. Here is a data point: https://spinroot.com/spin/Doc/rax.pdf

Tl;DR: This was software that ran on a spacecraft. Specifically designed to be safe, formally analyzed, and tested out the wazoo, but nonetheless failed in flight because someone did an end-run around the safe constructs to get something to work, which ended up producing a race condition.

secondcoming

What's this Rust thing?

joshstrange

Do you see crypto as anything more than scams/crime/speculation?

Most people involved in crypto pretend it's the future and their business models depend on pumping up crypto. That might be the same for you all but I figure of anyone in the space, a group dedicated to tracking down where coins are moving for government agencies (I assume for scams/crime reasons) might not have the wool so pulled over their eyes.

caudena

First of all, at Caudena, we are not involved in crypto projects or investments ourselves. Our expertise lies in analyzing blockchains and providing deep technical insights into how various blockchains operate. We focus on tracking and understanding the flow of digital assets, often in support of government agencies investigating scams, fraud, and other illicit activities.

That said, we absolutely believe that blockchain and cryptocurrency will shape the future of the financial system. When you look beyond the noise of scam tokens, speculative NFTs, and high-profile scandals, there is significant and meaningful financial innovation happening. This extends beyond DeFi to include the tokenization of RWA, where major institutions like BlackRock and JPM Chase are actively exploring and implementing blockchain-based solutions. Numerous projects are driving real progress, and there’s a slow but steady movement toward a more decentralized and transparent financial ecosystem.

jnkl

Can you be a bit more specific about the practical aspects of block chain technology regarding RWA?

germandiago

Who says that crypto is exclusively scams? There is that of course, but not only that. I do not find Bitcoin to be a scam.

IshKebab

Apart from Bitcoin is there anything successful that isn't a scam? I never heard of any.

newswasboring

There are like attempts at non scam projects, but none of them get any traction and usually end up closing. What, in your opinion, is a success story in this space?

seviu

Cadena post above yours mentioned quite a few use successful cases, all built on top of Ethereum or copy cats (Ethereum is by itself a successful use case)

Without thinking too hard, Aave is shaping to be a giant by its own as lending protocol.

Circle recently had a very successful IPO.

Farcaster and Lens are attempting to compete as social network platforms (surprisingly they lack much of the toxicity that comes on the most known ones)

And lastly don’t forget Polymarket, which is pretty well known beyond the crypto space.

The list goes on and on if you care to dig a bit deeper

CharlesW

> "Built a custom in-memory columnar/graph database from scratch"

This seems like an odd place to spend your resources. What do Prism's benchmarks look like vs Memgraph, KX kdb+, Apache Ignite, TigerGraph, etc.?

plq

When implementing the lock-free stuff, was portability (across processors) a goal? If yes, did you have to deal with anything specific? Do you notice any difference in behavior of correct implementations when ran on different processors? How do you test for correctness of lock-free stuff?

EDIT: Oh and did you implement from scratch? Why not use eg. the RCU implementation from folly?

caudena

We never targeted weakly-ordered architectures like ARM, only x86. We never used a wide variety of different processors. We are not developing the Linux kernel and are not into control dependencies, just relying on the fences and the memory model. There may be some CPU-dependent performance differences, like discrepancy because of NUMA or false sharing being noticeable on one processor, but not on another. RCU and hazard pointers are nothing new. For the disjoint sets we don't need them. For the forest patches and the tries we do. We are using TBB and OpenMP whenever possible and trying to keep things simple.

Snoozus

We built something very similar back in 2016, in the jvm with unsafe memory and garbage-free data structures to avoid GC pauses. The dynamic clustering is not too hard, are you able to dynamically undo a cluster when new information shows up?

Are you running separate instances per customer to separate the information they have access to?

caudena

Assuming by undoing you mean splitting the cluster:

A linked list can be split in two in O(1). When it comes to updating the roots for all the removed nodes, there is no easy way out, but luckily:

- This process can be parallelized.

- It could be done just once for multiple clustering changes.

- This is a multi-level disjoint set, not all the levels or sub-clusters are usually affected. Upper level clustering, which is based on lower confidence level, can be rebuilt more easily.

If by undoing you mean reverting the changes, we don’t use a persistent data structure. When we need historical clustering, we use a patched forest with concurrent hash maps to track the changes, and then apply or throw them away.

We use a single instance for all clients, but when one CFD server processes new block data, it becomes fully blocked for read access. To solve this, we built a smart load balancer that redirects user requests to a secondary CFD server. This ensures there's always at least two servers running, and more if we need additional throughput.

BiraIgnacio

I didn't even know there were companies doing work in the "blockchain services" space. Kinda cool, tech begets tech, begets tech.

Love the C++ work, btw

Snoozus

If the FBI tells you wallet A and wallet B belong to the same actor, how do you use that information, so that they can see it on their view, without leaking it to Europol?

caudena

Are you from CA or CT? :)

FBI and Europol will work with the same forest (unless they are using on-premise setup), but with different "patches".

layer8

> barebone servers

You mean bare-metal servers?

caudena

Ohh, you're absolutely right!

dboreham

barebone server is a thing fwiw: A product that comprises a motherboard installed in a case with PSU. Customer adds CPU, memory and storage devices to make a complete usable server. We typically buy servers in this way because figuring out what motherboard fits in which case is a pita, conversely buying complete servers is more expensive and potentially runs into inventory issues at the vendor. So possibly they are running bare metal servers that were also barebone.

folk111

is it true that XMR / monero is untraceable?

caudena

No.

wslh

Thank you for the AMA. A few initial questions:

- Would it be possible to open source your DB in the future? I think there are challenges in blockchain analysis (e.g. internal transactions) that goes beyond the specific DB.

- Having used Chainalysis and others, your product seems superior based on your presentation. Which blockchains do you support?

- Is there a "HN Code" to test Prism?

caudena

Thanks for the questions! We don't currently have plans to open-source it. For anything else, feel free to reach out at pa@caudena.com - happy to discuss further there. We'd like to keep this thread focused on the technical side rather than product discussions :)

null

[deleted]