Colossus for Rapid Storage
128 comments
·April 10, 2025dang
(This was posted last night with https://cloud.google.com/blog/products/compute/whats-new-wit... above. We've changed the URL to the product-specific article.)
akshayshah
Very cool! This makes Google the only major cloud that has low-latency single-zone object storage, standard regional object storage, and transparently-replicated dual-region object storage - all with the same API.
For infra systems, this is great: code against the GCS API, and let the user choose the cost/latency/durability tradeoffs that make sense for their use case.
korkybuchek
> This makes Google the only major cloud that has low-latency single-zone object storage, standard regional object storage,
Absurd claim. S3 Express launched last year.
akshayshah
Sure, but AFAIK S3’s multi-region capabilities are quite far behind GCS’s.
S3 offers some multi-region replication facilities, but as far as I’ve seen they all come at the cost of inconsistent reads - which greatly complicates application code. GCS dual-region buckets offer strongly consistent metadata reads across multiple regions, transparently fetch data from the source region where necessary, and offer clear SLAs for replication. I don’t think the S3 offerings are comparable. But maybe I’m wrong - I’d love more competition here!
https://cloud.google.com/blog/products/storage-data-transfer...
korkybuchek
> Sure, but AFAIK S3’s multi-region capabilities are quite far behind GCS’s.
Entirely different claim.
grantwu
Isn't S3 Express not the same API? You have to use a "directory bucket" which isn't an object store anymore, as it has actual directories.
To be honest I'm not actually sure how different the API is. I've never used it. I just frequently trip over the existence of parallel APIs for directory buckets (when I'm doing something niche, mostly; I think GetObject/PutObject are the same.)
dastbe
?
s3: https://aws.amazon.com/pm/serv-s3
s3 express: https://aws.amazon.com/s3/storage-classes/express-one-zone/
cross-region replication: https://docs.aws.amazon.com/AmazonS3/latest/userguide/replic...
akshayshah
The cross-region replication I’ve seen for S3 (including the link you’ve provided) is fundamentally different from a dual-region GCS bucket. AWS is providing a way to automatically copy objects between distinct buckets, while GCS is providing a single bucket that spans multiple regions.
It’s much, much easier to code against a dual-region GCS bucket because the bucket namespace and object metadata are strongly consistent across regions.
alobrah
FYI this was unveiled at the 2025 Google Next conference, and they're apparently unveiling a gRPC client for Rapid Storage, which appears to be a very thin wrapper over Colossus itself, as this is just zonal storage.
jauntywundrkind
Struggling to find a definition, but seemingly zonal just means there's a massive instance per cluster.
Did find some interesting recent (March 28th, 2025) reads though!
Colossus under the hood: How we deliver SSD performance at HDD prices https://cloud.google.com/blog/products/storage-data-transfer...
I kind of thought you meant ZNS / https://zonedstorage.io/ at first, or it's more recent better awesomer counterpart Host Directed Placement (HDP). I wish someone would please please advertize support for HDP, sounds like such a free win, tackling so many write amplification issues for so little extra complexity: just say which stream you want to write to, and writes to that stream will go onto the same superblock. Duh, simple, great.
Dylan16807
Delivering "HDD prices" is a bold claim there.
They charge $20/TB/month for basic cloud storage. You can build storage servers for $20/TB flat. If you add 10% for local parity, 15% free space, 5% in spare drives, and $2000/rack/month overhead, then triple everything for redundancy purposes, then over a 3 year period the price of using your own hard drives is $115/TB and google's price is $720. Over 5 years it's $145 versus $1200. And that's before they charge you massive bandwidth fees.
Lex-2008
I like your comparison with self-built storage, but comparing $20/TB/month with other CLOUD offerings, we see:
* hetzner storage box starts from $4/month for 1TB, and then goes down to $2.4/TB/month if you rent a 10TB box.
* mega starts from €10/month for 2TB, and goes down to €2/TB/month if you get a 16TB plan
* backblaze costs (starts from?) $6/TB/month
I was looking for a cheap cloud storage recently, so have a a list of these numbers :)
Moreover, these are not even the cheapest one. The cheapest one I found had prices starting from $6.5 for 5TB, going down to $0.64/TB/month for plans starting with 25TB (called uloz, but I haven't tested them yet).
Also, looking at lowendbox you can find a VPS in Canada with 2TB storage for $5/month and run whatever you want there.
How all that compares to $20/TB/month?!
Please feel free to correct me if i'm comparing apples to oranges, though. But I can't believe all of these offers are scam or so-called "promotional" offers which cost companies more than you pay for it.
saagarjha
You forgot paying yourself to set that up.
Twirrim
Leverage erasure encoding for durability and avoid both the tripling and local parity. You'll get better durability than 3x while only taking up significantly less than 2x the space Backblaze open sourced their library and talk about it here, https://www.backblaze.com/blog/Reed-Solomon. They use a 17:20 ratio that'll get them 3 drive failure resistance for just 1.17x stretch (ie a 100mb file gets that resilient while taking up 117mb of space)
moandcompany
"Zonal" relates to the concept of "availability zones" which are the next-smallest unit below a (physical) "region."
Most instances of a cloud ___ created in a region are allocated and exist at the zonal level (i.e. a specific zone of a region).
A physical "region" usually consists of three or more availability zones, and each zone is physically separated from other zones, limiting the potential for foreseeable disaster events from affecting multiple zones simultaneously. Zones are close enough networking-wise to have high throughput and low latency interconnection, but not as fast as same-rack, same-cluster communications.
Systems requiring high availability (or replication) generally attain this by placing instances (or replicas) in multiple availability zones.
Systems requiring high-availability generally start with multi-zone replication, and Systems with even higher availability requirements may use multi-region replication, which comes at greater cost.
bushbaba
It’s GCP’s answer to AWS S3 express zone https://aws.amazon.com/s3/storage-classes/express-one-zone/
derefr
In Google Cloud parlance, "regional" usually means "transparently master-master replicated across the availability zones within a region", while "zonal" means "not replicated, it just is where it is."
noahl
Slight nit: "zonal" doesn't necessarily mean "not replicated", it means that the replicas could all be within the same zone. That means they can share more points of failure. (I don't know if there's an official definition of zonal.)
NB: I am on the rapid storage team.
re-thc
> Struggling to find a definition, but seemingly zonal just means there's a massive instance per cluster.
There are a number of zones in a region. Region usually means city. Zone can mean data center. Rarely just means some sort of isolation (separate power / network).
jeffbee
What on this page gives you that impression? Do I have to watch the 2-hour video to learn this?
korkybuchek
Of course not. Gemini can summarize it for you.
CobrastanJorji
I mean, sure, it can easily provide quick text summaries of this sort of thing, but I only consume ML summaries in the forms of podcast discussions between two simulated pundits, as God intended.
carbocation
This could actually speed up some of my scientific computing (in some cases, data localization/delocalization is an important part of overall instance run-time). I will be interested to try it.
steveBK123
Had to go back to the classic microservices video as I was pretty sure they used Colossus but it was actually Galactus & Omega Star.
thethimble
This is what OP is referring to in case you haven’t been enlightened https://youtu.be/y8OnoxKotPQ?si=JAK5iPMcG1yoAhiT
bushbaba
Glad to see the zonal object store take off. Such massive bandwidth speed will re define data analytics where 99% of all queries able to run on a single node faster than what distributed compute can offer.
nashashmi
This link makes so much more sense than the previous link did.
SSDs with high random I/o speeds are a significant contributor to the advantage. I think 20m writes per second are likely distributed over a network of drives to make that kind of speed possible.
__turbobrew__
I want chubby as a service so I can throw etcd and zookeeper in the trash.
acstorage
Similar to S3 express one zone
nodesocket
Is S3 Express One Zone performance greatly improved to standard S3 like GCP rapid storage? My understanding is S3 Express One Zone is just more cost effective.
> 20x faster random-read data loading than a Cloud Storage regional bucket.
nodesocket
Update: Just read this article[1] which clarifies S3 Express One Zone. Yes, performance is greatly improved, but actually storage costs are 8x more than a standard S3 bucket. The naming S3 Express One Zone is terrible and a bit misleading on pricing changes.
[1] https://www.warpstream.com/blog/s3-express-is-all-you-need
cowsandmilk
I understand your belief that One Zone implies less expensive, but I’m staunchly in favor of them having it in the name so people know that their data is in a single AZ. The storage class succinctly summarizes faster with lower availability.
dangoodmanUT
Yes, it’s horribly more expensive… I think you are thinking of one zone infrequent access
coredog64
AWS just reduced prices on One Zone Express today.
jashmatthews
AWS claims 10x lower latency but I haven't personally checked.
alobrah
For some reason, text highlight didn't work, so here's the text-highlighted link: https://cloud.google.com/blog/products/compute/whats-new-wit...
dang
That link doesn't work for me, so here's the relevant bit:
Rapid Storage: A new Cloud Storage zonal bucket that enables you to colocate your primary storage with your TPUs or GPUs for optimal utilization. It provides up to 20x faster random-read data loading than a Cloud Storage regional bucket.
(Normally we wouldn't allow a post like this which cherry-picks one bit of a larger article, but judging by the community response it's clear that you've put your finger on something important, so thanks! We're always game to suspend the rules when doing so is interesting.)
noahl
There's now another blog post about Rapid storage specifically: https://cloud.google.com/blog/products/storage-data-transfer... . (That wasn't up yet when the original post was made.)
dang
Ah excellent—that's what we were waiting for. I've changed the URL to that from https://cloud.google.com/blog/products/compute/whats-new-wit... above. Thanks!
alobrah
Apologies! First time making a post on hacker news, and I thought this was really exciting news. FWIW, I talked to the presenter after this was revealed during the NEXT conference today, and he seems to have implied that zonal storage is quite close to what Google seems to have with Colossus.
dang
Oh no, don't apologize - this was a case where you did exactly the right thing and I'm glad you posted!
(I was just adding some explanation for more seasoned users who might wonder why we were treating this a bit differently.)
Also, welcome to posting on HN and we hope you'll continue!
null
xk3
The gods strip off interesting bits of URLs when you submit it
dang
if you saw that code you wouldn't deify it
SweetLlamaMyth
It took me 4-5 attempts to not read:
> If you saw that code, you wouldn't _defy_ it
lazide
Moloch was also a god!
EE84M3i
Is this related at all the the private invite only anywhere caches? (or maybe they're GA now?)
leg
Anywhere Cache and Rapid Storage share some infrastructure inside of GCS and both are good solutions for improving GCS performance, but Anywhere Cache is an SSD cache in front of the normal buckets while Rapid Storage is a new type of bucket.
(I work on Google storage)
minzi
Can you expand a bit on when it would make sense to use one versus the other?
leg
Anywhere Cache shines in front of a multi-regional bucket. Once the data is cached, there's no egress charges and there's much better latency. This is great for someone who looks for spot compute capacity to run computations anywhere in the multi-region. It will also improve performance in front of regional buckets but as a cache, you'll see the difference between hits and misses.
Rapid Storage will have all of your data local and fast, including writes. It also adds the ability to have fast durable appends, which is something you can't get from the standard buckets.
null
Was at the Google Next 2025 conference, and they've unveiled a zonal bucket version of GCS and what seems to be a gPRC interface over Google Colossus for Rapid Storage.