Skip to content(if available)orjump to list(if available)

Database Protocols Are Underwhelming

tanelpoder

The title says Database Protocols, but the author writes only about MySQL and Postgres. There are other databases out there, for example Oracle that I’m the most familiar with and it has had a break/reset mechanism built in to its protocol for at least 30 years [1]. Its a 2 step operation (2 roundtrips), one sending an error back to the client instead of the expected result and the second step will reset “current call” (not current session or connection) state on the server side and then request the client do the same.

Databases/protocols that differentiate between connection, session and call scopes, can reliably perform resets without requiring to tear down the entire connection.

Retrying is safe with no client side transaction idempotency tokens needed, as the error handling will roll back any changes done only by the failed call, not the entire transaction or session.

[1]: https://tanelpoder.com/2008/04/10/sqlnet-breakreset-to-clien...

Edit: added the comment about retry and idempotency.

bob1029

Works in SQL Server as well.

In SSMS you can press Alt + Break to send an OOB attention message to the active TDS that will cancel the running command.

https://learn.microsoft.com/en-us/openspecs/windows_protocol...

I believe DB2 supports a similar technique.

atombender

Postgres does have "reset" support.

There are several ways commands depending on what you want to reset. If you're only interested in mutable state changeable with SET, you can use RESET ALL instead. This resets configuration settings such as the time zone, statement timeout, and so on.

Postgres has different fine-grained ways to reset other aspects of a session, such as closing cursors (CLOSE ALL), unlistening from notifications (UNLISTEN), deallocating prepared plans (DEALLOCATE), and so on. Finally, there's DISCARD ALL, which resets everything.

I agree about error handling. Most clients just disconnect on a connection error.

One gotcha about Postgres queries is that running queries do not get cancelled when the client disconnects. This has always been a weird one that's quite dangerous, as it can lead to dog-piling — if a client starts a slow query, then times out or cancels its connection, the query lives on; then if another identical request starts, same thing happens, and soon you have 100 queries running that are doing the same thing and getting slower and slower. You have to be very careful with retrying.

anarazel

> One gotcha about Postgres queries is that running queries do not get cancelled when the client disconnects

You can configure that these days, at least if the server is running on some common platforms:

https://www.postgresql.org/docs/current/runtime-config-conne...

atombender

Thank you, that's the one. I kept looking for the option and couldn't find it, and ChatGPT confidently hallucinated an option that doesn't exist.

klysm

I can see the argument for letting a query finish if it is doing any kind of mutation. For a read-only query though, surely it would be a good addition in postgres to cancel the query? Cancelling things can be quite difficult though in practice.

rini17

If you're doing mutations, you're supposed to explicitly use transactions anyway, with well defined logical units of work. "DELETE FROM table LIMIT 10" does not look like anything well defined.

In case of connection issues you can check whether it was all actually committed or rolled back. Often it was propagated to the user, they got error message, refreshed and checked and retried and all was okay in the end. That's probably why the DB engines did not bother micromanaging individual SQL statemens.

atombender

My understanding is that the difficulty is knowing the client disconnected. Postgres doesn't know until it tries to write the result to the stream. That's the core team's explanation in this [1] ancient thread, at least.

I don't know why Postgres can't write keepalive messages while executing the query; speculating here, but it's possible that the architecture is synchronous and doesn't support doing anything with the connection while the query is executing. It's an old threading model where one process is started per connection.

I could have sworn I had read about a new configuration option introduced in Postgres 14, but I can't find anything about it.

[1] https://www.postgresql.org/message-id/e09785e00907280622v9b1...

marcosdumay

The database must stay correct if you cut power to the machine instead of disconnecting the client, so I don't see how mutation would make it impossible to drop partial work.

kiitos

> In both cases the idea is the same, we’re configuring the connection, making it behave differently.

This is the impedance mismatch. The _connection_ exists at a layer beneath the database/application layer. You're not configuring the _connection_, you're configuring the _session_! And the relationship between connections and sessions is one-to-many, not one-to-one.

majkinetor

I am interested in "Idempotency Keys" that are mentioned in the article. Anybody using those? How are you saving them? What about responses? Are they cached or recalculated..

shoo

Idempotency keys can also be known as request ids [1] or correlation ids. They can help in situations where you have a distributed system with an API provider and an API client and some operation needs to be performed "once-and-only-once". Since the network can drop requests or responses, the client must retry requests that appear to have failed, but perhaps might have succeeded. Since the client might retry an operation that actually succeeded and send multiple messages for the same operation, either the operation must be inherently idempotent, or if not, the API provider must be able to de-dupe requests.

Adding an idempotency key is a way to allow the API provider to detect and de-dupe requests. It requires work and collaboration on the part of the API client and the API provider. The API client must generate a unique idempotency key for each logically different operation it attempts to execute, and it must retry requests for operations that appear to have failed using that fixed idempotency key. If the operation is not inherently idempotent, the API provider needs to implement some way of de-duping repeated requests - perhaps storing requests in a durable cache for N hours/days keyed by the idempotency key.

One industry where these "once-and-only-once" execution guarantees are useful is banking. Customers expect their payments to be executed exactly once, so their accounts are debited once and not zero times or 2+ times.

There's an interesting infoq podcast from 2018 with Jason Maude of Starling Bank [2] talking about how they design their bank systems for "once-and-only-once" execution. From memory, Jason decomposes the responsibilities as: API clients are responsible for ensuring operations execute at least once. API providers are responsible for ensuring operations execute at most once.

[1] see e.g. google's API design guidelines for request identification: https://google.aip.dev/155

[2] https://www.infoq.com/podcasts/cloud-based-banking-startup-j...

majkinetor

Just the info I needed, you are awesome.

I am creating a bank currently, and I was asking this with double payments in mind.

frumplestlatz

Most these issues seem specific to, and created by, the impedance mismatch between ORMs (like ActiveRecord) and databases.

To work correctly, application code must be aware of transactions, and be prepared to roll back state and safely retry a transaction on failure.

ORMs like ActiveRecord struggle to do this because they attempt to present a transactional database as a low-friction, in-memory, mutable, non-transactional data model that does not support rollback.

If you lose a connection to the database prior to successfully committing your transaction, this is just another form of transaction failure.

You need to discard any pending application state changes, open a new connection and transaction, requery data required to determine the actual current state, and then if necessary, retry your transaction.

ActiveRecord can’t do that, hence the ill-advised desire to be able to automatically retry statements on failure — which abandons database-mediated transaction isolation, and incorrectly shifts responsibility for database state preservation/management to the client.

If SQL was trivially idempotent, we wouldn’t need transactions or different levels of transaction isolation in the first place.

anarazel

FWIW, you can parametrize statements on postgres without preparing them.

throwanem

"I would like to share a prepared statement across database client sessions" is nearly the exact use case for a view, or some other possibly partially materialized decomposition. You are of course still going to use prepared statements in your clients to avoid naïve SQL injection, but that's always going to be true; by the time the engine has parsed a query sufficiently to validate it can be safely executed with given parameters, it has effectively been "prepared," so you're not actually getting a meaningful perf benefit there. Where you do see that benefit is, as almost always, in query complexity: if your statements cost so much to prepare and your sessions are so unreliable that this becomes a genuine problem, optimizing query preparation is very much the wrong place to start solving.

If you find yourself so frequently hitting socket timeouts attempting to execute database queries that you need to modify your ORM to work around this problem automatically, your problems likewise almost certainly begin at the schema layer. I'm not going to blame an ORM maintainer for having to deal with that concern in a way that avoids irritating library consumers, and I suppose I can see where it does no good arguing that one's own clients (said consumers) are using the database wrong in order to need such functionality.

I'd have thought to see perf considered to some extent beyond blaming the engine, but I suppose there is a kind of honesty in simply omitting any discussion of causes for these apparently so common "network errors," in cases where they are so common and thus must be so transparently retried that the entire existence of this behavior could be totally overlooked by the application developer nominally responsible for the database interaction.

(I'm curious why a library maintainer needs to assert responsibility over session variables at all times, but I suppose it makes sense from a perspective that one's users are not especially trustworthy, as I suppose most ORM maintainers and especially ActiveRecord's must conclude. I guess it's fair to say that connections needing to manipulate the session that granularly should exist outside the ORM entirely, but then we're back to the whole tiresome architectural question that's vexed the field for now nearly two decades since Neward's "Vietnam" paper: https://web.archive.org/web/20060703024451/http://blogs.tedn....)

atombender

> I'm curious why a library maintainer needs to assert responsibility over session variables at all times

When pooling connections, you need to reset the connection when putting it back in the pool. For example, if you grab a connection from the pool and do:

    SET SESSION statement_timeout to '1s';
    SET timezone TO 'Europe/London';
…and then release the connection, you don't want the next connection to inherit these settings.

It has nothing to do with the ORM (although the ORM itself may want to set some baseline settings based on config), and everything to do with sharing connections between different parts of a single application.

throwanem

I mean I get that, what I don't get is why screwing around that way is not explicitly defined as "don't do this, here's why it isn't supported and here's what you should try to do instead; if you ignore all that and still break something, you get to keep both its pieces" in the documentation, rather than trying to work around it with some kind of global reset.

Why should the engine burden itself with an undo stack (even if only one frame deep) for every client, on behalf of a minority so incoherently implemented that it can't keep straight what it's doing across the span of two (application and ORM library) codebases? A metaphor already exists in the interface for getting a fresh connection: you do this by creating a fresh connection. If you want some other way to accomplish the same task, it's fair to require justification for the effort of providing that second option, and no such justification is presented in the article under discussion.

I don't like ORMs, I think it's long obvious. But to be excruciatingly clear, I grant them the same courtesy as any other library in that they absolutely may declare things explicitly out of scope. Seeing that not done, in a case where the consumer is pretty explicitly screwing around with internals of a database connection belonging to the library, is what's surprising, even in the context of ORMs as a category, where maximalism seems so constantly the order of the day.

hobs

This very much depends on the SQL engine you are talking about - many early sql engines literally compiled stored procedures and didn't allow the dynamism you imply - some still offer such features.

Some SQL engines are more sensitive (due to caching plans in the first place or not) to this problem as well - SQL Server famously utilizes parameter sniffing for performance, which has positive implications of skipping work, and the negative of skipping work you might need to do.

throwanem

A stored procedure and a prepared statement aren't the same, though. I'm not sure how persuasive an argument from how one is optimized is meant to be for the other.

My experience has long been that in almost every case where a "database performance" problem occurs while an ORM is in use, presumptively refactoring to eliminate the ORM almost immediately reveals the buried n+1 or other trivially pathological pattern which, if the ORM's implementor knows to try to avoid it, the ORM's interface typically is not sufficiently expressive to disambiguate in any case. (Fair do's: In the uncommon case where that interface is so expressive, one does usually find the implementation accounts for the case.)

Hence my earlier reference to the "Vietnam" paper, which dissects in extensive (if unfortunately mostly fruitless) detail the sunk cost fallacy at the heart of this problem.

throwanem

(Belatedly revisiting after the edit window to note, when I say 'refactoring to eliminate...' above, I should be clear I mean experimentally eliminating ORM calls for the poorly performing query and not across an entire codebase, or any other such ocean-boiling nonsense. In a sane world this would all be reasonably implicit, but we live here, so.)

sgarland

Some of this doesn’t make sense. If you have common settings that you always want enabled, like sql_mode, then set it on the server’s config, and stop wasting time doing so on every session.

If you do need some session-specific variables, use a connection pooler (you should anyway) that handles that. ProxySQL, for one, tracks the per-session variables that are set, so when a client requests a connection, it hands it one already set to its requested specification (if it’s available).

> The reason this is important is that whenever the write or the read fails, in the overwhelming majority of cases, you don’t know whether the query was executed on the server or not.

Yes, you definitely can know this in almost all cases. Did you receive a result? It succeeded. Did you receive a warning? It succeeded, but maybe read the warning. Did you receive an error? It did not succeed. Parse the error message to determine what went wrong, and if a retry makes sense. Did you receive nothing? Now, and only now, do you not know the state.

It blows my mind the number of devs who haven’t read the API specification for their RDBMS. There is a ton of useful information you can glean that’s often ignored.

> These statements also have a cost on the server side. Each statement requires some amount of memory in the database server. So you have to be careful not to create an unbounded amount of them, which for an ORM isn’t easy to enforce.

Which is why you can limit them. MySQL has a default of 16382; not sure about Postgres.

Also, to the complaint about them being session-scoped, this is yet another reason to use a connection pooler. They can often share them across connections. I know PgBouncer can.

Read. Application. Manuals.