You might not need Redis

154 comments

·March 8, 2025

g9yuayon

When I was in Uber back in 2015, my org was trying to convert zip-code-based geo partitioning with a hexagon-based scheme. Instead of partitioning a city into on average tens of zip codes, we may partition the city into potentially hundreds of thousands of hexagons and dynamically create areas. The first launch was in Phoenix, and the team who was responsible for the launch stayed up all night for days because they could barely scale our demand-pricing systems. And then the global launch of the feature was delayed first by days, then by weeks, and then by months.

It turned out Uber engineers just loved Redis. Having a need to distribute your work? Throw that to Redis. I remember debating with some infra engineers why we couldn't throw in more redis/memcached nodes to scale our telemetry system, but I digressed. So, the price service we built was based on Redis. The service fanned out millions of requests per second to redis clusters to get information about individual hexagons of a given city, and then computed dynamic areas. We would need dozens of servers just to compute for a single city. I forgot the exact number, but let's say it was 40 servers per an average-sized city. Now multiply that by the 200+ cities we had. It was just prohibitively expensive, let alone that there couldn't other scalability bottlenecks for managing such scale.

The solution was actually pretty simple. I took a look at the algorithms we used, and it was really just that we needed to compute multiple overlapping shapes. So, I wrote an algorithm that used work-stealing to compute the shapes in parallel per city on a single machine, and used Elasticsearch to retrieve hexagons by a number of attributes -- it was actually a perfect use case for a search engine because the retrieval requires boolean queries of multiple attributes. The rationale was pretty simple too: we needed to compute repetitively on the same set of data, so we should retrieve the data only once for multiple computations. The algorithm was of merely dozens of lines, and was implemented and deployed to production over the weekend by this amazing engineer Isaac, who happens to be the author of the library H3. As a result, we were able to compute dynamic areas for 40 cities, give or take, on a single machine, and the launch was unblocked.

ckrapu

I love H3. Isaac and Uber did a real service to the geospatial community with that one.

jiggawatts

To me H3 looked over-engineered and unnecessarily complex. Hexagons don't tile nicely at multiple resolutions, for one! Just overcoming that is decidedly non-trivial.

Implementing Google's S2 is simpler, but it has the same overall benefits as H3 such as a hierarchical data structure.

g9yuayon

H3's algorithms involve some intricate maths, but the library itself is conceptually simple. Check this page out for some really fun and neat ideas: https://www.redblobgames.com/grids/hexagons/.

Uber internally had extensive research on what kind of grid system to use. In fact, we started with S2 and geo-hash, but H3 is superior. Long story short, hexagons are like discretized circles, and therefore offer more symmetry than S2 cells[1]. Consequently, hexagons offer more uniform shapes when we compose hierarchical structures. Besides, H3 cells have more consistent sizes in different latitudes, which is very important for uber to compute supply and demand of cars.

[1] One of the complications is that H3 has to have pentagons to tile the entire world, just like a soccer ball. We can easily see why by Euler's characteristic formula.

necubi

Funny enough, exactly the same at Lyft. Redis everywhere. The original version of the dynamic pricing system was a series of cron jobs reading and writing to redis, before it was replaced with a Flink pipeline (which still wrote to redis for serving).

pstuart

The aforementioned hex mapping tool: https://h3geo.org/

For anyone doing geo queries it's a powerful tool.

mdaniel

I'm firmly in the "you only need Postgres" camp, so I went into your story thinking it was going to end with you saying that you used PostGIS

Err, now that I think more about that, IIRC Uber is a monster mysql shop so it may cause them to break out in hives if someone installed Postrges there

muggermuch

Cool anecdote, thanks for sharing!

tombert

I have gotten in arguments with people who over-deploy Redis. Redis is cool, I don't dislike it or anything, but a lot of the time when people use it, it actually slows things down.

Using it, you're introducing network latency and serialization overhead. Sometimes that's worth it, especially if your database is falling over, but a lot of the time people use it and it just makes everything more complex and worse.

If you need to share cached data across processes or nodes, sometimes you have to use it, but a lot of the stuff I work with is partitioned anyway. If your data is already partitioned, you know what works well a lot of the time? A boring, regular hashmap.

Pretty much every language has some thread-safe hashmap in there, and a lot of them have pretty decent libraries to handle invalidation and expiration if you need those. In Java, for example, you have ConcurrentHashMap for simple stuff, and Guava Caches or Caffeine Caches for more advanced stuff.

Even the slowest [1] local caching implementation will almost certainly be faster than anything that hits the network; in my own testing [2] Caffeine caches have sub-microsecond `put` times, and you don't pay any serialization or deserialization cost. I don't think you're likely to get much better than maybe sub-millisecond times with Redis, even in the same data center, not to mention if you're caching locally that's one less service that you have to babysit.

Again, I don't hate Redis, there are absolutely cases where it's a good fit, I just think it's overused.

[1] Realistic I mean, obviously any of use could artificially construct something that is slow as we want.

[2] https://blog.tombert.com/posts/2025-03-06-microbenchmark-err... This is my own blog, feel free to not click it. Not trying to plug myself, just citing my data.

ohgr

My trick is saying no to redis full stop. Every project where it was used as a cache only it developed retention ans backup requirements and every project where it was a key value store someone built a relational database on top of it.

There’s nothing worse than when someone does the latter. I had to write a tool to remove deletes from the AOF log because someone fucked up ordering of operations big time trying to pretend they had proper transactions.

ysavir

I love Redis, but my rule is that we should be able to flush the redis data at any time without any problems. Any code that makes that unfeasible is rejected.

vrosas

I've never done it IRL but I've always wanted to delete my company's redis instances and see what happens, chaos monkey style. If you're service breaks because it expected the cache to be there or your database immediately goes down because of too many requests, you're going to have a bad time _eventually_.

zombiwoof

Yes, this design rule it’s very useful

Delomomonl

I don't get it

I'm using redis only for temp state data like a session (when I can't use a jwt).

Or when I have to scale and need a warmed up cache

Is that bad now?

I'm also wondering right now why there is no local cache with p2p self discovery and sync. Should be easier than deploying an extra piece of software.

lucb1e

If sessions die when your system reboots, that means you can't reboot the system (update the service) without breaking whatever any users were currently doing on your site or in your software. That does sound bad to me and like a bad fit for Redis the memory cache. (I know it can do persistence optionally but that's what the person above you was complaining about: this is not what it's good at)

Why not use a regular database for this (can be as simple as an sqlite file, depending on your needs), or the default thingy that comes with your framework or programming language? This is built into everything I've ever used, no need to reinvent session storage or overengineer the situation with jwt or some other distributed cryptographic system and key management

jiggawatts

> I'm also wondering right now why there is no local cache with p2p self discovery and sync. Should be easier than deploying an extra piece of software.

The whole design space for this type of API is weirdly under-explored, but there are some well-supported mainstream solutions out there.

Fundamentally, Redis ought to be a NuGet library, a Rust crate, or something like it. It's just a distributed hash table, putting it onto its own servers is a bit bizarre if the only need is caching.

Microsoft's Service Fabric platform and the Orleans library both implement distributed hash tables as fundamental building blocks. Both can trivially be used "just" as a cache to replace Redis, and both support a relatively rich set of features if you need more advanced capabilities.

Of course, there's Scala's Akka and the Akka.NET port also.

fabian2k

I prefer caching in memory, but a major limitation once you have more than one process is invalidation. It's really only easy to stuff you can cache and just expire on time, not if you need to invalidate it. At that point you need to communicate between your processes (or all of them need to listen to the DB for events).

tombert

Yeah, if you need to do things across processes then something like Redis or memcached might be necessary.

The thing that bothers me is people adding it in places that don't make sense; I mentioned in a sibling thread that the I've seen people use it as a glorified global variable in stuff like Kafka streaming. Kafka's stuff is already partitioned, you likely don't gain anything from Redis compared to just keeping a local map, and at that point you can just use a Guava Cache and let it handle invalidation in-process.

koolba

Not just across concurrent processes, but also serial ones. Externalizing a cache into something like Redis lets you bounce your process with no reload time. You can get around it for some things like web sessions with a signed cookie, but that opens up expiration and invalidation issue.

But that doesn’t work for caching non trivial calculations or intermediate state. There’s a sweet spot for transitory persistence.

blazing234

I think the crazy thing is people think redis is the only thing that catches in memory.

You could throw a bunch of your production data in SSAS tabular and there you go you have an in memory cache. I've actually deployed that as a solution and the speed is crazy.

Elucalidavah

> need to listen to the DB for events

You could store the key->version separately, and read the said version. If the cached version is lower, it's a cache miss.

Of course, evicting something from cache (due to memory constraints) is a bit harder (or less efficient) in such setup.

Seattle3503

I wonder if there are language neutral alternatives to Infinispan.

ozim

I’ve seen the same, like when I just mentioned caching a team mate would hear „implement redis”.

Then I would have to explain „no, we have caching stuff ‚in process’, just use that, our app will use more RAM but that’s what we need„.

vrosas

I'm a fan of memcache specifically because ALL it can do is be a cache. No one can come in later and add a distributed queue to it. In-memory caching is also underrated, I agree. Using a hashmap and a minuscule TTL (like 5 seconds) can have huge performance benefits depending on your traffic, and it takes like 5 minutes to code up.

evil-olive

an antipattern I've observed when giving system design interviews is that a lot of people, when faced with a performance problem, will throw out "we should add a caching layer" as their first instinct, without considering whether it's really appropriate or not.

for example, if the problem we're talking about is related to slow _writes_, not slow reads, the typical usage of a cache isn't going to help you at all. implementing write-through caching is certainly possible, but has additional pitfalls related to things like transactional integrity between your cache and your authoritative data store.

GaryNumanVevo

It's a super common "new to SRE" behavior to overindex on caching as a silver bullet, especially because literally every DB has mechanisms to scale reads fairly easily. In my experience, redis is often needed when you have a DB team that doesn't want to put in the effort to scale reads

sgarland

Or when the devs don’t want to rewrite their schemata in a way that would massively reduce I/O requirements.

Then when you lose a cache node, the DB gets slammed and falls over, because when the DB team implemented service-based rate-limiting, the teams cried that they were violating their SLOs so the rate limits were bumped waaaay up.

lucb1e

> throw out "we should add a caching layer" as their first instinct, without considering whether it's really appropriate or not

Could be worse: you could have met me! I used to laugh at caching and thought that if your website is so slow that you need a caching layer (Wordpress comes to mind), you're just doing it wrong: perhaps you're missing indexes on your database or you simply can't code properly and made it more complex than necessary (I was young, once). Most of my projects are PHP scripts invoked by Apache, so they have no state and compute everything fresh. This is fine (think <30ms typical page generation time) for 95% of the types of things I make, but in more recent years I had two projects where I really struggled with that non-pragmatic mentality and spent long hours experimenting with different writing strategies (so data wouldn't change as often and MariaDB's built-in optimizations better), indexes on low-cardinality columns, indexes on combined columns in specific orders, documenting with each query which index it requires and maps to, optimizing the query itself of course, in one experiment writing my own on-disk index file to search through some gigabytes of data much faster than the database seemed to be able to do for geospatial information, upgraded the physical hardware from HDD to SSD...

Long story short, I now run Redis and the website is no longer primarily bound by computation power but, instead, roughly equally by bandwidth

I'm still very wary of introducing Redis to projects lest I doom them: it'll inevitably outgrow RAM if I indiscriminately stick things in there, which means turning them off (so far, nearly no links or tools on my website ever turned 404 because they're all on a "keep it simple" WAMP/LAMP stack that can do its thing for many years, perhaps search-and-replacing something like mysql_query() with mysqli->query() every five years but that's about the extent of the maintenance)

So anyway I think we're in agreement about "apply where appropriate" but figured I'd share the counter-example of how one can also be counterproductive in the other direction and that there is something to be said for the pragmatic people that consider/try a cache, which often does help even if there's often a different underlying problem and my perfectionism wouldn't like it

JamesSwift

I appreciate that you came around, but I think its important to highlight this common misunderstanding of the role of caching in achieving scaling beyond certain hard boundaries. You previously thought caching was bad, and instead should look at your indexes.... which are just a different kind of cache! I see this disconnect a lot, especially when in ruby/rails world when people think that the bottleneck is going to be the framework, while ignoring that the actual solution to scale is caching in various forms.

re-thc

> an antipattern I've observed when giving system design interviews is that

It's an interview though. Most people just watch youtube videos and "copy and paste" the answer.

In a way it's the format of the interview that's the problem. Similar to leet code style interviews a lot of the times we're not checking for what we need.

evil-olive

> Similar to leet code style interviews a lot of the times we're not checking for what we need.

right, all interview formats are imperfect...but some are more or less imperfect than others.

a crucial difference in my mind, is that leetcode-on-the-whiteboard style interviews correspond quite poorly to the actual day-to-day job of coding.

a well-prepared system design question, on the other hand, does correspond fairly well to part of the actual job - we have an existing system, it has a performance problem, or needs a new feature we didn't anticipate originally that requires the design to be reworked in some way. and we're sitting in a meeting trying to come up with what we think will be the Least Bad option.

(importantly, the question I like to use is not a greenfield "design Tinder for dogs" / "design Youtube for cats" type of question, because as you say those can be reduced to a formula that candidates can regurgitate, instead I'm intentionally asking a question about a brownfield system that I summarize for them first)

ultimately, that's what I'm probing for with that interview style - I don't particularly care whether a candidate arrives at some "right answer" or not, I'm looking for "do I want to sit in a meeting with you and try to hammer out a design to some non-trivial problem that you and me and other people on the team will then go and implement?"

Too

Disagree on this one. In an interview there is no "the answer", it's a dialogue. I've interviewed a lot of people, often using performance related questions, and trust me, there are lots of candidates whose only answer to those is "add a cache", even after multiple follow-up questions or hints like "is there anything else that can be done?", "try thinking outside the box", "what can be done with the database itself", etc. Only a novice interviewer will be fooled by the first answer. If you cannot demonstrate more solutions after that, it shows that you clearly have no experience or problem-solving ability, which is the whole point of the interview to find out, not whether you have studied through a set of common questions.

btw, "scale up" is the second most common answer from those who can't provide better solutions. :)

dcow

At this point, what format of interview isn’t a problem?

JamesSwift

Well, thats probably because caching is generally the answer to all scaling problems. Once you hit the theoretical wall of performance, all you can do is cheat. And that generally means caching.

hajimuz

In most cases It’s not about the speed, it’s about data sharing for containers or distributed systems. Filesystem or in-memory doesn’t work. I agree that in most cases a normal database is enough though.

tombert

Yeah I mentioned that, if you need to share stuff between processes or differnet nodes, then maybe Redis might be a fit.

But I've seen people use Redis as a glorified "global variable" for stuff like Kafka streaming. The data is already partitioned, it's not going to be used across multiple nodes, and now you've introduced another service to look at and made everything slower because of the network. A global hashmap (or cache library, like previously mentioned) would do the job faster, with less overhead, and the code would be simpler.

Salgat

We use an event database (think Kafka) as our source of truth and we've largely shifted away from redis and elasticsearch in favor of local in-memory singletons. These get pretty big too, up to 6GB in some cases for a single mapping. Since it's all event based data, we can serialize the entire thing to json asynchronously along with the stream event numbers specific to that state and save the file to s3. On startup we can restore the state for all instances and catchup on the remaining few events. The best part is that the devs love being able to just use LINQ on all their "database" queries. We do however have to sometimes write these mappings to be lean to fit in memory for tens of millions of entries, such as only one property we use for a query, then we do a GET on the full object in elasticsearch.

slt2021

redis is needed to share data with other microservices, that are possibly written in different language.

polyglot teams when you have big data pipeline running in java, but need to share data with node/python written services.

if you dont have multiple isolated micro services, then redis is not needed

antirez

I believe that the issue is that the culture about Redis usage didn't evolve as much as its popularity. To use it memcached alike has many legitimate use cases but it's a very reductive way to use it. For instance sorted set ranking is something that totally changes the dynamics of what you can and can't do with traditional databases. Similarly large bitmaps that allow to retain very fast real time one bit information to do analytics otherwise very hard to do is another example. Basically, Redis helps a lot more as the company culture around it increases, more patterns are learned, and so forth. But in this regard a failure on the Redis (and my) side is that there isn't a patterns collection book: interviewing folks that handled important use cases (think at Twitter) to understand the wins and the exact usage details and data structure usages. Even just learning the writable cache pattern totally changes the dynamics of your Redis experience.

kshitij_libra

Do you plan to write the book ? I’d like to read it

mleonhard

Yes, please write it. I will buy and read it.

jordiburgos

I'll buy it, read it and take notes.

dinobones

It’s not about Redis vs not Redis, it’s about working with data that does not serialize well or lend itself well to extremely high update velocity.

Things like: counters, news feeds, chat messages, etc

The cost of delivery for doing these things well with a LSM based DB or RDB might actually be higher than Redis. Meaning: you would need more CPUs/memory to deliver this functionality, at scale, than you would with Redis, because of all the overhead of the underlying DB engine.

But for 99% of places that aren’t FAANG, that is fine actually. Anything under like 10k QPS and you can do it in MySQL in the dumbest way possible and no one would ever notice.

daneel_w

"But for 99% of places that aren’t FAANG, that is fine actually. Anything under like 10k QPS and you can do it in MySQL in the dumbest way possible and no one would ever notice."

It's not fine. I feel like you're really stretching it thin here in an almost hand-waving way. There are so many cases at far smaller scale where latency is still a primary bottleneck and a crucial metric for valuable and competitive throughput, where the definitively higher latency of pretty much any comparable set of operations performed in a DBMS (like MySQL) will result in large performance loss when compared to a proper key-value store.

An example I personally ran into a few years ago was a basic antispam mechanism (a dead simple rate-limiter) in a telecoms component seeing far below 10k items per second ("QPS"), fashioned exactly as suggested by using already-available MySQL for the counters' persistence: a fast and easy case of SELECT/UPDATE without any complexity or logic in the DQL/DML. Moving persistence to a proper key-value store cut latency to a fraction and more than doubled throughput, allowing for actually processing many thousands of SMSes per second for only an additional $15/month for the instance running Redis. Small operation, nowhere near "scale", huge impact to performance and ability to process customer requests, increased competitiveness. Every large customer noticed.

sgarland

A well-designed schema in a properly-sized-and-tuned [MySQL, Postgres] instance can and will execute point lookups in a few hundred microseconds.

That said, I agree that if you need a KV store, use a KV store. Though of course, Postgres can get you close out of the box with `CREATE UNLOGGED TABLE (data hstore);`.

lmm

> processing many thousands of SMSes per second for only an additional $15/month for the instance running Redis. Small operation, nowhere near "scale", huge impact to performance and ability to process customer requests

The vast majority of companies never need to deal with even one thousand of anything per second. Your situation was absolutely an unusually large scale.

cess11

I'm sure something other than the MySQL engine itself was the bottleneck in that case, like bad configuration or slow disk or something.

Did you profile the issue?

daneel_w

Unreplicated MEMORY tables, prepared and cached statements, efficient DDL and sane indices, no contention or locking, no access from multiple sessions, some performance tuning of InnoDB, ample resources, DB not stressed, no difference in pure network latency.

MySQL's query optimizer/planner/parser perform a lot more "gyrations" than Redis or MemcacheDB do before finally reaching the point of touching the datastore to be read/written, even in the case of prepared statements. Their respective complexities are not really comparable.

packetlost

I've only ever seen Redis used in two scenarios: storing ephemeral cache data to horizontally scale Django applications and for ephemeral job processing where the metadata about the job was worthless.

I reevaluated it for a job processing context a couple of years ago and opted for websockets instead because what I really needed was something that outlived an HTTP timeout.

I've never actually seen it used in a case where it wasn't an architecture smell. The codebase itself is pretty clean and the ideas it has are good, but the idea of externalizing datastructures like that just doesn't seem that useful if you're building something correctly.

ljm

Redid + Sidekiq was a default for a long time in the Rails world as well, but it’s an unnecessary complication (and expense) for most use cases. Just use your existing DB until you need to seriously scale up, and then look at a message queue.

I’ve used Redis for leaderboards and random matchmaking though, stuff which is doable in postgres but is seriously write-heavy and a bit of a faff. Gives you exactly the sort of goodies you need on top of a K/V store without being difficult to set up.

As for caching - it’s nice to use as an engineer for sure, but pretty pricey. It wouldn’t be my default choice any more.

bdcravens

Rails is attempting to solve this with Solid Queue, which was inspired heavily by GoodJob, both of which use Postgresql (and more in the case of Solid Queue). Both seem to be fairly capable of "serious scale", at least being equivalent to Sidekiq.

alabastervlog

Using Redis exclusively remotely never made much sense to me. I get it as a secondary use case (gather stats from a server that’s running Redis, from another machine or something) but if it’s not acting as (effectively) structured, shared memory on a local machine with helpful coordination features, I don’t really get it. It excels at that, but all this Redis as a Service stuff where it’s never on the same machine as any of the processes accessing it don’t make sense to me.

Like you have to push those kinds of use cases if you’re trying to build a business around it, because a process that runs on your server with your other stuff isn’t a SaaS and everyone wants to sell SaaS, but it’s far enough outside its ideal niche that I don’t understand why it got popular to use that way.

pdimitar

To your last: yep, especially having in mind that Redis is ephemeral. I've had much more success with SQLite + a bunch of stricter validators (as SQLite itself is sadly pretty loose), and more performance too.

gytisgreitai

Exactly. Lots of people read post by companies doing millions of qps and then decide that they need redis, kafka, elastic, nosql, etc right from start. And that complicates things. We are currently at 500k RPS scale and we have probably around a handful of use cases for Redis and it works great

hinkley

I worked for a company that had enough customers that AWS had to rearrange their backlog for cert management to get us to come on board, and our ingress didn’t see 10,000 req/s. We put a KV store in front of practically all of our backend services though. We could have used Redis, but memcached was so stable and simple that we just manually sharded by service. We flew too close to the sun trying to make the miss rate in one of the stores a little lower and got bit by OOMKiller.

By the time it was clear we would have been better off with Redis’ sharding solution the team was comfortable with the devil they knew.

lr4444lr

100% this. Also, is it data whose scale and speed is more important than its durability?

I actually agree with the author that Redis was not the right solution for the situations he was presented with, but he's far from proving it is not the solution for a whole host of other problems.

karmakaze

Even then you can do a lot of things to spread write contention with an RDBMS.

e.g. MySQL 8.0.1+ adds SKIP LOCKED modifier to SELECT ... FOR UPDATE.

Then you can increment the first available row, otherwise insert a new row. On read aggregate the values.

0xbadcafebee

Software development today is largely just people repeating what other people do without thinking. Which is how human culture works; we just copy what everyone else is doing, because it's easier, and that becomes "normal", whatever it is.

In the software world in the mid 00's, the trend started to work around the latency, cost and complexity of expensive servers and difficult databases by relying on the speed of modern networks and RAM. This started with Memcached and moved on to other solutions like Redis.

(this later evolved into NoSQL, when developers imagined that simply doing away with the complexity of databases would somehow magically remove their applications' need to do complex things... which of course it didn't, it's the same application, needing to do a complex thing, so it needs a complex solution. computers aren't magic. we have thankfully passed the hype cycle of NoSQL, and moved on to... the hype cycle for SQLite)

But the tradeoff was always working around one limitation by adding another limitation. Specifically it was avoiding the cost of big databases and the expertise to manage them, and accepting the cost of dealing with more complex cache control.

Fast forward to 2025 and databases are faster (but not a ton faster) and cheaper (but not a ton cheaper) and still have many of the same limitations (because dramatically reinventing the database would have been hard and boring, and no software developer wants to do hard and boring things, when they can do hard and fun things, or ignore the hard things with cheap hacks and pretend there is no consequence to that).

So people today just throw a cache in between the database, because 1) databases are still kind of stupid and hard (very very useful, but still stupid and hard) and 2) the problems of cache complexity can be ignored for a while, and putting off something hard/annoying/boring until later is a human's favorite thing.

No, you don't need Redis. Nobody needs Redis. It's a hack to avoid dealing with stateless applications using slow queries on an un-optimized database with no fast read replicas and connection limits. But that's normal now.

edoceo

> hype cycle for SQLite

Drop Redis, replace with in-memory SQLite.

But for real, the :memory: feature is actually pretty awesome!

paulryanrogers

That won't help if you need a centralized cache or centralized KV store, like for sessions.

MuffinFlavored

yeah... I'm seeing a big sentiment change on "don't just use Redis because other people do here!" but I'm not really seeing "do light/quick distributed things this way instead ___"

cmbothwell

This hits at the true nature of the problem which has _nothing_ to do with Redis at all (which is a fine piece of technology written by a thoughtful and conscientious creator) and has everything to do with the fact that our industry at large encourages very little thinking about the problems we are trying to solve.

Hence, fads dominate. I hate to sound so cynical but that has been my experience in every instance of commercial software development.

briandear

Recently just left a MongoDB project. A total nightmare.

bassp

I agree with the author 100% (the TanTan anecdote is great, super clever work!), but.... sometimes you do need Redis, because Redis is the only production-ready "data structure server" I'm aware of

If you want to access a bloom filter, cuckoo filter, list, set, bitmap, etc... from multiple instances of the same service, Redis (slash valkey, memorydb, etc...) is really your only option

sgarland

Postgres has Bloom filters: https://www.postgresql.org/docs/current/bloom.html

It also has arrays, sets, and bitstrings, though for the latter you can just as easily (and with less space consumed) map it in your app, and store an integer.

jasonthorsness

Yes, while the default idea of Redis might be to consider it a key/value cache, the view of the project itself is definitely about being a "data structure server" - it's right at the top of the https://github.com/redis/redis/blob/unstable/README.md and antirez has focused on that (I can't find one quote I am looking for specifically but it's evident for example in discussion on streams https://antirez.com/news/114). Although I've definitely seen it be used just as a key/value store in the deployments I'm familiar with ¯\_(ツ)_/¯

e_hup

All of those can be serialized and stored in an RDMS. You don't need Redis for that.

bassp

They can (and that's probably the right choice for a lot of use cases, especially for small data structures and infrequently updated ones), but serializing and storing them in a database requires you to (in your application code) implement synchronization logic and pay the performance cost for said logic; for instance, if you want to `append` to a shared list, you need to deserialize the list, append to the end of it in your application code, and write it back to the DB. You'd need use some form of locking to prevent appends from overwriting each other, incurring a pretty hefty perf penalty for hot lists. Also, reading an entire list/tree/set/whatever back just to add/delete one element is very wasteful (bandwidth/[de]serialization cost-wise)

evil-olive

> for instance, if you want to `append` to a shared list, you need to deserialize the list, append to the end of it in your application code, and write it back to the DB.

this seems like a classic case of impedance mismatch, trying to implement a Redis-ism using an RDBMS.

for a shared list in a relational database, you could implement it like you've said, using an array type or a jsonb column or whatever, and simulate how it works in Redis.

but to implement a "shared list" in a way that meshes well with the relational model...you could just have a table, and insert a row into the table. there's no need for a read-modify-write cycle like you've described.

or, if you really need it to be a column in an existing table for whatever reason, it's still possible to push the modification to the database without the heavy overhead. for example [0]:

> The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array.

0: https://www.postgresql.org/docs/current/arrays.html#ARRAYS-M...

hot_gril

Doesn't a Postgres table fulfill this? You don't shove the whole list into a single column, you make a separate row per entry. This also works as a map or set.

kflgkans

You might not need a cache. In my previous company (~7 years) all teams around me were introducing caches left and right and getting into a lot of complexity and bugs. I persevered and always pushed back adding caches to apps in my team. Instead focusing on improving the architecture and seeking other performance improvements. I can proudly say my teams have stayed cached-free for those 7 years.

superq

The issues that I have with Redis are not at all its API (which is elegant and brilliant) or even its serialized, single-core, single-threaded design, but its operational hazards.

As a cache or ephemeral store like a throttling/rate limiting, lookup tables, or perhaps even sessions store, it's great; but it's impossible to rely on the persistence options (RDB, AOF) for production data stores.

You usually only see this tendency with junior devs, though. It might be a case where "when all you have is a hammer, all you see are nails", or when someone discovers Redis (or during the MongoDB hype cycle ten years ago), which seems like it's in perfect alignment with their language datatypes, but perhaps this is mostly because junior devs don't have as many production-ready databases (from SQL like Postgresql, CockroachDB, Yugabyte to New/NoSQL like ScyllaDB, YDB, Aerospike) to fall back on.

Redis shines as a cache for small data values (probably switch to memcache for larger values, which is simpler key-value but generally 3 to 10 times faster for that more narrow use case, although keep an eye on memory fragmentation and slab allocation)

Just think carefully before storing long-term data in it. Maybe don't store your billing database in it :)

noisy_boy

I have seen horrifying use of Redis where I inherited the maintainance of an application whose original developer implemented his own home grown design to manage relationships between different types of key value pairs, pretending like they were tables including cross-referencing logic; it took me a week to just add test cases with sufficient logging to reveal the "schema" and mutation logic. All this with the non-technical manager wondering why it took so long to make the change which directly dependended on understanding this. To top it all, the code was barely better than spaghetti with less than ten lines of comments across maybe 5k LOC. The irony was that this was not a latency sensitive application - it did data quality checks and could have been implemented in a much more cleaner and flexible way using, e.g., PostgreSQL.

intelVISA

Non-technical 'manager'

progbits

Redis as ephemeral cache is ok, but nothing extra.

Redis as transactional, distributed and/or durable storage is pretty poor. Their "active active" docs on conflict resolution for example don't fill me with confidence given there is no formalism, just vague examples. But this comes from people who not only not know how do distributed locks, they refuse to learn when issues are pointed out to them: https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...

Every time I find code that claims to do something transactional in Redis which is critical for correctness, not just latency optimization, I get worried.

dimgl

I'm really surprised that the pendulum has swung so far in the other direction that people are recommending not to use Redis.

Sure, don't introduce a data store into your stack unless you need it. But if you had to introduce one, Redis still seems like one of the best to introduce? It has fantastic data structures (like sorted sets, hash maps), great performance, robust key expiry, low communication overhead, low runtime overhead... I mean, the list goes on.

esafak

I'd like to draw attention to its probabilistic data structures, in particular: HyperLogLog, Bloom filter, Cuckoo filter, t-digest, Top-K, Count-min sketch

https://redis.io/docs/latest/develop/data-types/probabilisti...

dimgl

This is wild, I had no idea it even had this.

hot_gril

Or a simple KV cache, which is pretty often something you need. Sure you can use Postgres for this if you really don't want another dep, but Redis or Memcached is better suited for it and probably cheaper.

ks2048

Yeah, my first thought was, “I don’t need redis, but I want redis”.

igortg

I followed with this rationale in a small project and opted for PostgreSQL pub/sub instead Redis. But I went through so much trouble correctly handling PostgreSQL disconnections that I wonder if Redis wouldn't be the better choice.

bdcravens

Another category is using Redis indirectly via dependencies. For example in Rails, Sidekiq is a common background job library. However, there are now Postgresql-backed options (like GoodJob and the baked-in Solid Queue, which supports other RDBMSes as well)

jherdman

I just wanted to take a moment to highlight both GoodJob and Solid Queue. They're excellent choices, and any RoR folks reading these comments should give them a fair shake.

HN

You might not need Redis

You might not need Redis