Wrong ways to use the databases, when the pendulum swung too far

70 comments

·June 12, 2025

freetime2

Anecdotally, the worst codebase I ever worked on made heavy use of stored procedures. Over the years people couldn’t be bothered or were afraid to update them - which really was the root of the problem. This led to all kinds of crazy patterns in the application code like calling things in a loop where a bulk operation was needed. Or stringing together several stored procedure calls to get the desired outcome, when really a refactor was need to combine or decompose them as needs evolved.

If you’re hiring devs for their experience with C# (which the app was written in) then you probably want to have the bulk of your business logic written in C#. Even the DBAs at my current company tend to advocate for using stored procedures only as a last resort for this reason.

That team eventually decided they needed to rewrite the whole app using a NoSQL database. I didn’t stick around long enough to see how that decision turned out.

bytefish

I am not going to dismiss your experience here. Stored Procedures can turn into wild monsters. Pair it with Triggers and you are in for chasing a noodle.

But it's also a reality, that relational databases often become the main integration point in companies. In those environments it’s hard (next to impossible) and dangerous to use something like ORMs.

Often enough I don't "own the tables" and I don't "own the columns" for lack of a better word. The DBA only gives me SELECT and EXECUTE permissions on the database. How am I going to work here without Stored Procedures?

And while this sounds weird, these limited permissions are to protect me from accidentally writing to wrong columns. Wrong columns that could impact systems, I didn't even know about. Is it possible to write to the same columns with a Stored Procedure? Of course! But it's not as dangerous as giving an application fat UPDATE permissions.

By using SQL Views I can build a data model upon these tables for reading the data, and build a more consumable data model. And you mentioned C#: I can use EF Core to query these Views with LINQ. Stored Procedures are used to store data to multiple tables in a "legacy database" within a transaction.

This could also be done with EF Core, but I need to explicitly log all changes to the data. How should this be done without a Stored Procedure or Triggers? CDC doesn't help a lot here.

null

[deleted]

roenxi

How many DBAs have you dealt with that only give you SELECT and EXECUTE? That seems somewhat crazy. It doesn't cut down the surface area for bugs at all (the same amount of SQL is still required and it'd be the same code as the app was going to execute anyway as a normal query). What scenarios are they worried about here?

mickeyp

Amending data obviously. Building a dashboard using payroll data and having the power to change it are two wildly different things, and no sane large corporation would allow a dashboard's user account to change that data .

You're coming at this from the idea that one user has full access to everything when that is likely the biggest security lapse you'll find in modern apps that use databases: a Web app that has full ddl and even dml access when it shouldn't have.

simonw

I'm convinced that the solution to stored procedures that people are afraid to update is automated tests. Write unit tests for them just like you would any other complex logic - the tests can evolve over time into a detailed spec for how they should work, and give you the ability to refactor or improve them with confidence later on.

For some reason stored procedures and automated testing seem to often not go together, maybe because they're often accompanied by an environment where DBAs and software developers are separate teams with separate sets of skills?

samwillis

I think a lot of the problems come from the fact testing stored procedures ventures into e2e testing land. You have to stand up infra in order to test them. There's not really been a simple way to unit test stored procedures as part of your application codes testing framework.

(Aside: I think this is something PGlite helps with if your in Postgres land)

dottedmag

PostgreSQL server is a single process that starts in under 100ms on a developer's laptop.

In the company I work for we use real PostgreSQL in unit tests — it's cheap to start one at the beginning of a suite, load the schema and go, and then shut it down and discard its file store.

I keep thinking of moving that file store to tmpfs when run on Linux, but it's nowhere near the top of the performance improvements for the test suite.

So: no more mocks or subsitute databases with their tiny inconsistencies.

sdeframond

In good resource on testing stored procedures ?

MaxBarraclough

There are unit-testing frameworks for the purpose. For SQL Server there are several, [0] I've worked briefly with one called tSQLt [1] and was quite impressed. Frameworks also exist for other RDBMSs. [0]

[0] https://en.wikipedia.org/wiki/List_of_unit_testing_framework...

[1] https://tsqlt.org/user-guide/quick-start/

zabzonk

> you probably want to have the bulk of your business logic written in C#

Perhaps, if the C# applications are the only ones accessing the database. But suppose multiple applications written in different languages (and for different purposes) need the database access? You can do this via stored procs or (better in my experience) by adding an intermediate server process which the applications use, via a publicly documented API, and which then talks to the the database which may provide stored proc access to the intermediate server.

Equally anecdotally:

An example is a server me and a couple of colleagues worked on at a big investment bank. The underlying databases were mostly proprietary from suppliers or written by other teams in the bank. And versioned (badly, eek!). We wrote the SQL queries/procs to access the data that the VB (now C# I guess) would not have been capable of (without enormous waste of effort), and then provided COM interfaces to our server that ran our queries. Mercifully, we didn't have to provide updates, but the same idea would have applied. This gives you more or less complete isolation between app and database.

Anyway, SPs can certainly fit into such an architecture, but might not be first (or second) choice.

awesome_dude

> Perhaps, if the C# applications are the only ones accessing the database. But suppose multiple applications written in different languages (and for different purposes) need the database access?

Without wishing this to sound like a personal attack, YUCK

A database that's accessed by multiple applications, regardless of the number of languages, is a really bad smell.

If I have a user db, and hundreds of other applications want to make use of the data in it, they come through MY application. I own the db, I decide what's in it, how the data is shaped, and when it changes.

Multiple applications accessing a single database is asking for a bureaucratic nightmare - "App A wants to change the first name field to UTF-8" - EVERY other user, of that database needs to be informed (best of luck figuring out what other apps are, and who owns them). If you are tempted to say "Those other apps should support UTF-8 as well" then that's the sound of someone that's never had legacy code in the environment.

henrikschroder

> A database that's accessed by multiple applications, regardless of the number of languages, is a really bad smell.

Except that really was the original model back in the 90's. All "good" databases had an extensive roles and permissions system, and a point of pride was the number of ODBC connectors for different languages and environments.

You were supposed to have The Database, looked after and controlled by the hallowed DBAs, who had their own hardware budget, their own organization, and who controlled access to The Database, giving trusted applications access to it, but only after they had vetted the schema and queries that those dirty developers wanted to run against it. Trusted users could get SELECT access, but only to the tables they needed to run their custom reports and queries.

It was a whole ass thing that's fallen completely to the wayside as database software went from underpinning Larry's latest mega-yacht, to free as in beer, and we learned how to clone and shard the data instead.

zabzonk

Precisely why I said the applications should access the data via a server process - they never know the underlying database schema.

hyperman1

Oh man. I've seen an org where the stored procedures were written by a different team, and friction was high.

Then one java dev found a glaring sql injection hole. The whole app quickly reorganized around using it as an API for everything. Management was pleased speed went up and bugs went down, DBAs were pleased for less work, and Java devs were pleased they could understand what was going on in the app. Everybody happy I guess?

jbverschoor

I’d rewrite the complete team instead.

Quarrelsome

Religion and engineering do not make good bedfellows. I got into a pointless argument with someone on LinkedIn who was trashing on ORMs, strawmanning them by stating they would all pull the entire data into memory to just perform a count (some ORMs are much more sophisticated than this). It was some sort of weird religious argument because the chap thought every single piece of data processing should be written in a stored procedure (much like your first example).

However the flip side of the coin in your case is somehow even worse, because people failed to identify the good in the RDBMS and only identified it all with evil, despite then storing their blobs in an RDBMS.

To me, the use of distributed transactions is the smell. If your solution to the problem of a multi-database model is a sticking plaster that can generate even worse problems (e.g. the deadlocks) then it just shows a poor approach to problem solving or allocating enough resource to development.

scuff3d

I work for a company that's still obsessed with micro services. "Religious" is a pretty good way to describe it. Drives me absolutely nuts, especially when no one seems to understand what micro services are. I've seen at least half a dozen teams that have just created a distributed monolith and called it a micro service because they're using Docker and Kubernetes.

gfody

ORMs are basically the enemy of rdbms, by encouraging antipatterns and undermining language features they have easily done more harm than good. of course reasonable people can be trusted to use them reasonably but if you're arguing about it on the internet..

withinboredom

Haha. I’ve actually had an ORM attempt to load the whole database just to do a count. That was a fun bug to dig into (nobody noticed until we had a big enough customer) and an easy fix. Literally, all I had to do was mark some fields as lazy loaded.

t43562

I think the problem is that it's not very obvious to developers that one or the other thing is going wrong until - as you say - you get a big enough customer.

jsight

Yeah, I've had almost the same experience with people and ORMs. Often they'll make a good point about some weakness of some specific ORM, but it is almost always some use case that represents 1% (or less) of their application. Worse, there's nothing stopping them from just using sql for those couple of times that they need to bypass the ORM. After all, there's no reason to be religious about it and only use one tool.

TZubiri

I'm not hugely experimented, and SQL has always been enough for me, perhaps due to my simple requirements. But so far I hold the opinion that ORM is always a mistake.

I find a lot of programmers don't know how to write an array of floats to disk, if you forced me to choose between an ORM or No DB at all, I would choose no DB all day.

ORM feels like an abstraction on top of an abstraction. I don't trust that those who chose ORMs have studied and exhausted the possibilities of the two underlying layers, I feel more often than not they recourse to higher layers of technology without understanding the lower level.

But I may be wrong, maybe those who use ORMs know all about File systems and Posix and SQL itself.

It also reminds me of the people who ask chatgpt to write a prompt for them, it's as if you had a dishwasher and you were too lazy to load the dishes onto the dishwasher, so you buy a product that loads the dishes to the dishwasher.

sdeframond

> ORM feels like an abstraction on top of an abstraction. I don't trust that those who chose ORMs have studied and exhausted the possibilities of the two underlying layers, I feel more often than not they recourse to higher layers of technology without understanding the lower level.

I agree with that. However I feel that teams that choose not to use an ORM end up having one somehow reimplemeted by the seniors, and just used as you describe by the juniors.

I'd rather have the seniors master an existing ORM and spend their time elsewhere.

biorach

> I find a lot of programmers don't know how to write an array of floats to disk

what does that have to do with it?

docandrew

I think his point is that ORMs (and maybe DBs in general) are used for data persistence by folks who just don’t know any alternative.

tanelpoder

The reluctance of using stored procedures where they’d be valuable is also a skill + will issue. I do get the non-database-developer view that if 98% of your app is gonna be written in some other language anyway, why complicate your app layers by adding a bit of PL/SQL code + testing infra here and there.

But for processing (and validating) various data loads and incoming data streams, by looking up matching values in existing reference tables, stored procedures can increase performance/efficiency and reduce complexity (of having to maintain connection pools and caching layers just for validating incoming record batches).

As far as the unwillingness to even evaluate stored procedures issue goes, I sometimes joke that maybe the DB vendors should start calling stored procedures as “database lambdas” instead…

bob1029

> But for processing (and validating) various data loads and incoming data streams, by looking up matching values in existing reference tables, stored procedures can increase performance/efficiency and reduce complexity

Performing operations on data directly in the SQL provider is the peak of human enlightenment. It takes a lot of leadership or wisdom to push a modern team away from using crap like EF to process everything, but 100x slower.

In exactly 0% of cases of cycling records through code will you see a higher performance result than executing a single T-SQL or PL/SQL script against the provider.

The procedural SQL languages are Turing complete. SQL itself is as of recursive common table expressions. There's not any reasonable argument for not trying this if all the information is already in the SQL store.

Moving information is way more expensive than processing information that is already in place (cache). Your SQL server process can iterate on items in L1 millions of times before a packet makes it across the data center one time.

slooonz

Operational concerns trumps raw performances most of the time. Stored procedures live in a different CI/CD environment, with a different testing framework (if there’s even one), on a different deployment lifecycle, using a different language than my main code. It is also essentially an un-pinnable dependency. Too much pain for the gain.

Now, give me ephemeral, per-connection procedures (call them unstored procedures for fun) that I can write in the language I want but that run on provider side, sure I’ll happily use them.

bob1029

> Stored procedures live in a different CI/CD environment

They don't have to. The procedural SQL commands can be source controlled along with the rest of the codebase. Transmitting the actual command text to the SQL server that does all the work is not the inefficient part.

t43562

So far in my career, such as it is, I have been on lots of rewrite projects and not one of them was a good idea.

There were one or two outright failures (new code abandoned) but the more subtle ones were just efforts that took so long to deliver value that the whole market opportunity was lost.

In every single case it was possible to take the existing system and gradually morph it towards something better - but the desire for revolution prevented evolution and forced everyone to live with shit in the hope of a future that didn't arrive.

bob1029

I've got the opposite experience. Most rewrites I was on had substantial uplift in the product value.

The trick is to engage with the customer early, directly and often. If the customer isn't willing to pay more money for something, you should probably not do it.

null

[deleted]

jpgvm

It really depends on the origin of the system to be replaced. If it was never designed to be replaceable you are usually in for a bad time. Similarly if the drive to replace it comes from new folk that simply don't understand the original system you are probably completely doomed.

I often write code with full intention of replacing it, generally by writing it in such a way it's replacement is forced. That latter bit is pretty important because systems generate inertia, you need to have some hard forcing function baked into prevent it ossifying.

i.e my current project is to replace a pretty awful pile of Bash I wrote last year and replace it with a much more sophisticated and automated system in Golang. The new system is already in production and the old tools are being phased out.

Writing the original in Bash was how I basically ensured it would get rewritten and not continually upgraded/morphed into something grotesque. There was a ton of pushback, lots of people saying "why not Go! why not Python! This is an awful decision, 3k LOC of Bash is an unmaintainable nightmare!". Literally all of those people missed the point.

Building simple systems that you 100% will replace (not intend, that really isn't the same thing) is a really good pattern to find a real understanding of the solution space before building something gold plated/spaceship-like.

It generally means once you implement the follow up it very rarely becomes "legacy" software without horrible mismanagement, i.e bringing in new management that wants to rewrite things for the sake of rewriting them, usually in some FoTM nonsense.

branko_d

I never quite understood these kinds of arguments, event though they seem to be repeated a lot:

> 1. Many critical APIs call would spend most of its CPU time inside the finite database servers. New APIs, or updating existing APIs meant you were hogging the DB/CPU time from every one else.

It's not that easy for a stored procedure to spend CPU on anything other than data processing. If you are going to do that data processing, then it's going to cost this amount of CPU regardless of whether it is initiate internally (through a stored procedure) or externally (by the client submitting queries). Furthermore, a frequently-used stored procedure's plan is likely already cached, so it doesn't have to be re-parsed and re-planned on each call. On the other hand, when the client submits the text of the query, that text has to at least be hashed, so it can be compared to the hashes in the query plan cache (even if the plan is already in the cache). So if anything, client-submitted queries will consume at least as much CPU as stored procedures, if not more.

> 2. API latency was at the mercy of the Query Planner. Any slight change in data statistics or business logic would cause the a different Query Plan to be picked, which drastically changed the execution time, and usually caused timeouts. A hash join suddenly became a full table scan across billions of rows without any warning.

This is also true for client-submitted queries. Basically the same techniques are useful for both stored procedures an client-submitted queries: query hints and plan guides, and above all: sound indexing. That being said, this is a big problem in practice and, I think, a fundamental problem in the design of SQL as "declarative" language where you are not supposed to concern yourself with the physical access path, even though it could make a big difference in performance. Fortunately SQL Server has mitigations for this problem (hints and plan guides), but not all DBMSes do (PostgreSQL still doesn't support hints natively, if I'm not mistaken).

luuio

1. Beyond just querying, the stored proc spent a lot of time processing data. As in, looping through cursors, making business logic decisions, calculating things, etc.

2. Having the business logic (not just loading the data) inside the stored procs meant that a change in business logic that would normally only need to update application code, now invalidates the stored procedure's cached execution plan.

ako

If you’re going to process a lot of data, doing that in the database is usually faster, as you avoid moving all the data into a separate server process and then moving it back again. For many queries the round trip to the database server from the application server takes longer that the query itself.

branko_d

> looping through cursors, making business logic decisions, calculating things, etc.

Interesting. Can you share more details about the "non-data" processing that was done? Were they doing heavy mathematical calculations and such?

> change in business logic that would normally only need to update application code, now invalidates the stored procedure's cached execution plan

As for plan cache invalidation - the most extreme case I saw was on the order of 5-10s. Basically, it depends on the size of that one stored procedure - not on all other stored procedures that may call it or be called by it. What was the actual time that they got?

marc_abonce

> The company is still standing and seems to be doing well financially, so I guess things turned out well enough, or maybe some of the technical decisions started trending more reasonable.

Perhaps I've been lucky or I haven't been observant enough, but I've never seen a company suffer financially because of inefficient code. Don't get me wrong, I still value good code for its own sake, but in my experience there is no correlation between that and profits.

zdragnar

I've seen customers be driven away by poorly performing interfaces. I've seen downtime caused by exponentially growing queries. I've seen poorly written queries return such large datasets that they cause servers to run out of memory processing the request.

Unless you're doing stock trades or black Friday sales, though, it can be pretty hard to pin down a specific inefficiency to a concrete measure of list income. Instead, people move on from products for general "we don't like it" vibes.

The most concrete measure, as someone else pointed out, is heavily inflated PAAS spend caused by poorly written code that requires beefier than necessary servers. In theory, moving faster means you're spending less money on developer salaries (the Ruby/rails mantra of old) but there's a distinct tipping point where you have to pony up and invest in performance improvements.

viccis

My previous job designed their data lake and operations on it with horrific incompetence, and their solution was just to use AWS Lambdas scaling into the thousands and tens of thousands to do stuff over it.

They made so much money but would then squander it on hopelessly inefficient designs that required an AWS spend that basically prevented them from ever financially scaling.

spockz

IME, the suffering of bad performing code is mostly secondary. It increases compute costs. Mostly because requiring more beefy VMs than strictly required which is still benign and possibly more cost-efficient than spending more engineering effort. Sometimes because of the lack of performance now more scaling and orchestration is required which comes additional complexity and therefore compute and staffing cost. This is rare to get noticed and fixed due to organisational momentum.

The worst is when the performance is so bad it starts to prevent onboarding new features or customers.

sumanthvepa

The real cost is an opportunity cost. It doesn't show up in the financials. Your ability to react quickly to new business opportunities is hurt. Most CEOs and boards don't notice it, until it's too late.

t0mas88

Only if the bad code affects customer experience significantly. That only happens to a big enough amount of you really let things grow out of contol. At some point you'll get bugs that take forever to solve and angry customers as a result.

wongarsu

I bet Atlassian could make even more money with Jira if it wasn't this slow. They are not struggling as it is, but it's bad enough that it is costing them customers

But generally I would agree

t43562

They are an example where someone decides "we will use jira" but they aren't necessarily the ones using it every day so the **ness doesn't matter to them.

hnaccount_rng

Do you have any source for the "they are loosing customers"? I always thought, that they consciously decided to have that shit of an interface because no one relevant to them (i.e. purchasing departments) cares?

timewizard

As a general rule it is hard to measure lost opportunity costs. That doesn't mean they don't exist or shouldn't be considered. I mean.. why do humans even acknowledge efficiency at all.. let alone almost always as a virtue?

bazoom42

Stored procedures are actually great when used for bulk set-oriented data operations. Much simpler and more efficient than passing large datasets back and forth between servers.

The problems arise when processing data row-by-row using cursors or simular procedural approach, or with too much buisness logic in procedual form.

I guess the moral of the story is: use the right tool for the job instead of deciding on the “one true pattern”.

jajko

The problem I've seen, very few people from Java / C# / other languages actually know plsql or ever bothered to learn it. Its not that hard, but I guess not a cool enough tech to have on resume? When it could massively help with speed of certain use cases, in ways no external code could ever dream of achieving.

jrochkind1

> The checkpoint system worked like this: every time we needed to perform a write, we would generate a UUID, store this UUID into a “checkpoint” for the current flow....

>Logically, that was… fine. In practice, writes into the same database which previously required 5 IO round trips, now required almost double the number of trips for the extra checkpointing operations...

So this sounds like trying to invent a form of optimistic locking, but it's definitely possible to do optimistic locking where no extra reads or writes are necessary unless there is a conflict. you have to write that value (whether timestamp or uuid) on every write, and then you do every write with a conditional "do this write as long as the lock value has not been changed."

But I guess the store they were using didn't allow even a conditional write like that? Although I'm not sure how they managed to make this "checkpoint" system work without that feature either... this seems a bit confusing, it seems like a more typical optimistic locking system should have been possible using whatever primitives the checkpointing system used? Maybe not?

luuio

The KV store had etag support for conditional writes. Etags are only useful to make sure the data didn't change underneath between your read and your write.

Storing the checkpoints along with the mutation was for idempotency. If the checkpoint was in the document, that meant the mutation had succeeded and a retry should be no-op

cl0ckt0wer

Does the author really call joins "joints"? I thought it was maybe a spelling thing but then they make a Snoop joke...

nemothekid

The author was implying that in between writing the business logic, they were smoking a lot of weed.

jrochkind1

Yeah, I wondered what was going on there. Maybe just a joke or error, but definitely makes one wonder how much experience the author has with sql to compare/contrast the approaches to a more standard approach.

null

[deleted]

cjfd

Never try to smoke an inner joint.

kragen

Not Dan Luu, is it? The site seems to leave that intentionally ambiguous.

Tevo

Seems to be this[1] guy, which you can find out by skulking around an older snapshot of his website from the Wayback Machine, where he displayed his real name within the footer on every page, and went by a slightly different username up to seemingly as late as April this year. Maybe a rather poor attempt at dissociating his real identity from the website somehow? (on second thought, he still links to the webpage from his GitHub profile, so that wouldn't make much sense, I think. Hmm.)

[1]: https://github.com/luuhq

luuio

Not Dan Luu. And to the other comment, not intentionally hiding my identity. Just that for most of my writing, who I am is irrelevant.

null

[deleted]

zupa-hu

Is it just me? No conclusion, no takeaway, nothing.

HN

Wrong ways to use the databases, when the pendulum swung too far

Wrong ways to use the databases, when the pendulum swung too far