Skip to content(if available)orjump to list(if available)

You're protecting your data wrong – Introducing the Protected Query Pattern

nomercy400

I am kind of missing how the 'pure' query problem is solved.

Say I have 10000 rows, and my authorization gives access to 3 of those rows.

With security in the database, you return 3 rows. From what I can read, the protect pattern returns 10000 rows from the database, but discard 9997 of them afterwards. Doesn't this increase load and memory usage? Shouldn't there be a balance?

whilenot-dev

Besides those performance considerations, the article starts off with you need a data access layer, but has no idea about controllers and middleware? Seeing stuff like...

  if (!(user.isAdmin || document.authorId === user.id)) {
    forbidden();
  }
makes me almost a bit angry.

Isn't every contemporary authorization system shifting towards ReBAC (based on Google's Zanzibar paper)? The ReBAC paradigm favors the segregation of authorization logic from business logic. It'd even be possible to reimplement ABAC/RBAC-styles if you prefer to do so, but your application layer shouldn't need to care.

  // this one is usually done in middleware layer
  const user = await resolveUser(req);
  if (!user) {
    return res.status(401);
  }

  // this one is usually done in the controller layer
  const canReadDocuments = await auth.canReadDocuments(user);
  if (!canReadDocuments) {
    return res.status(403);
  }

  // all from here is usually done in the service layer
  const canReadAllDocuments = await auth.canReadAllDocuments(user);
  if (!canReadAllDocuments) {
    return findManyDocuments(user);
  }

  return findManyDocuments();
How is protecting individual queries (after retrieval!) more scalable?

jussinevavuori

This is not what I'd intended to communicate with this article. The Kilpi.filter pattern is not the primary point of this article, it is only a minor utility provided by Kilpi for special cases. I do not suggest to fetch all rows and return only authorized rows. The inner query function should still be performant and return only the requested data.

My point was to show how you can co-locate your queries and authorization logic, just as you would with any sensible data access layer. However, this approach keeps the inner function pure and e.g. easily cacheable with the upcoming Next.js "use cache" directive, and also allows easy bypassing of the authorization logic when required by your application.

I hope this clarifies my intent.

Winsaucerer

I've experimented (i.e., not used in anger) with doing something similar to this, except by placing the authorisation logic in the database via things like row level security and stored procedures. So if you run the query using the 'unsafe' method, then it skips security checks, but if you run it via the normal path, you're forced to provide details about the requester.

The difference here though is that the normal path only provides information about the requester, and not the actual business logic for whether they're authorised. That goes in the database in the form of RLS or other permission checks in stored procedures.

It uses a pattern similar to PostgREST, where there is a local value set with the user ID (e.g., 'set local "request.web.sub" = '{}''). Though there is some trickiness with connection sharing, to ensure that these settings get reset properly.

And to use an example from another commenter here, this means that if there's only 3 rows that should be returned, the database returns just 3 rows, rather than 10,000 that need to be discarded down to 3.

I haven't come up with a good answer regarding protecting which fields can be seen yet.

jussinevavuori

CLARIFICATION TO ARTICLE

Hello, it's me, the author!

I'd like to take the time to clarify a few things about the article. I'm not claiming this to be revolutionary to all people, I'm just claiming it to be useful for some people.

The intent of the protected query pattern is the following:

- Co-locate query and authorization logic (easier to maintain) - Allow (but not require) inner functions to be pure (e.g. trivial to cache with upcoming Next.js "use cache" API). - Allow easy bypassing of authorization for some contexts. - Simplify logic elsewhere (e.g. getDocument.protect("id") is a pretty one liner).

In most cases, I use the protector pattern to either allow or disallow the entire query, based on the results. Filtering or redacting the data is done much more rarely. †

What I did NOT intend:

- I did not intend to select all rows and return only the useful. You are right, that would be horrible for performance. However, I understand how easy it is to come to that conclusion, seeing how I introduced the Kilpi.filter function.

Thank you all for your feedback, this allows me to improve on this pattern and Kilpi in general!

† Your worries about performance have given me ideas of a new "preprotector" function that would stop the query before it ever even starts. I'll have to think about adding it in the future!

Garlef

I don't know.

- The counter-examples seem strange to begin with since they mix presentation concerns (redirect) with data access concerns (getDocument) and authentication (user.isAdmin).

- I'd prefer an explation of the pattern that is not tied to a library. Kilpi already seems to abstract away something.

So it's a bit hard to understand what problem this pattern really solves.

Kilpi looks interesting, though.

(PS: the Kilpi.filter definitely needs a better explanation! What does it do? And how does it relate to the policies? I can make a guess what happens if what your query is fetching is a list of things. But what it it's not?)

kikimora

Every codebase I’ve seen encapsulates access control logic in a service layer of some sorts. This is a solved problem and does not need new patterns. I don’t mind doing it the way article suggest. But it is hardly anything new.

bob1029

> To start off, you need a data access layer. This is non-negotiable for well-structured applications to properly secure your data. Scattered SQL queries across your codebase will most likely lead to unmaintainable authorization, even data leaks.

Super hard disagree with this.

Much of the pain and suffering I deal with in legacy codebases can be attributed to someone attempting to force the entire product to use one logical view of the data. This typically manifests as a single folder full of ORM types. "Models", "Entities", etc.

Virtually all problems can benefit from taking more targeted and nuanced views of the underlying data. This is only possible when the original call site is directly responsible for composing the SQL query.

You talk about authorization but I think this is confounded by the strategy of one middle tier handling all data access. Now, you've got to move the auth code away from the business logic that justifies it into a box that's way too smart to be labeled "Data Access Layer" by this point.

All this really accomplishes is smearing your BL across a bunch of DLLs.

robertlagrant

> Much of the pain and suffering I deal with in legacy codebases can be attributed to someone attempting to force the entire product to use one logical view of the data.

This seems like a non sequitur. Data access layers don't require one logical view of the data.

yawnxyz

Yet at most mature-ish orgs this ends up happening... it's such a pain to deal with

5Qn8mNbc2FNCiVV

I don't see the issue this attempts to solve. The one thing mentioned is caching, but you shouldn't mix caches of different users (or user groups) anyways if f.e. you encode a user's access privilege in the query. This however doesn't even do it and always loads the data, so instead of attempting to cache the result of the query after access is checked, just cache the result of the query.

I checked the library and thought this would be something more revolutionary that I missed, but sadly it's "just ok". At least for me it doesn't come close to what f.e. https://casl.js.org/v6/en/ provides or you know, me just having the access control encoded into a where-clause

clintonb

This is a nice setup for Attribute-Based Access Control (ABAC). I arrived at a similar design a couple weeks ago, using Casbin as our authorization backend. The challenge is that I want to store user's roles and domains and use those domains to limit access. However, I do not want to replicate my existing DB to create a record in Casbin for every resource that might be accessed.

This means, yes, when validating access, you must load the data you want to access and compare its attributes against the user's authorization. One advantage of doing this is that the system can either return a 404 if the object belongs to a completely different domain, or, if the object belongs to a sub-domain, return a 403 and sufficient data to power a request for temporary privilege escalation.

I'm open to ideas for improving this.

jayd16

Seems awful to me. Why would you ever have unprotected queries and leave security holes sitting around for you to forget to "Simon says, protect()"?

Wouldn't a better system be authorization fully baked through and then service accounts and at most a dummy auth provider for unit tests and maybe local dev?

Instead of selecting the world and throwing things away, why don't you just optional join a protected table (or view even) when needed?

Am I crazy?

jussinevavuori

Thank you for the comment. This is not what I'd intended to communicate with this article. The Kilpi.filter pattern is not the primary point of this article, it is only a minor utility provided by Kilpi for special cases.

My point was to show how you can co-locate your queries and authorization logic, just as you would with any sensible data access layer. This method however keeps the inner function pure and easily cacheable, e.g. with the upcoming "use cache" Next.js directive, and allows easy explicit bypassing of the protection with e.g. `.unsafe()`, which is required in some scenarios in your application.

I hope this clarifies my intent.

clintonb

You're not crazy, but you assume everyone starts with security in mind. That is definitely not the case.

ajahahxbbd

1. Is this filtering after returning results from the database? That seems horribly inefficient in most cases. Most databases should support filtering the data before you return it. Or APIs or whatever.

2. How are these functions now “pure”? I dont think things are magically catchable because you removed some auth logic. It’s just a little simpler to cache, though if the auth logic data doesnt change often it’s not much different.

jussinevavuori

Thank you for the questions.

1. While yes, this would allow you to `SELECT * FROM table` and then filter on your server. However, I believe that most developers are aware that that is, as you said, horribly inefficient. For performance, you would definitely make your queries only return the authorized data.

(Although, now that you've mentioned it, I'm starting to think of a "pre-protect" API to stop the query before it ever even runs in the first place, which would solve the performance issue.)

I'm sure this won't suit everyone, however I've found it a very helpful pattern in my experience. In most cases in my apps, that the protector is only a safe-guard for exceptions, and the app would function securely even without protectors. However, they add an extra layer of security and confidence.

2. The function returned by Kilpi.query is not pure, however the inner function is pure. For example, considering the upcoming Next.js "use cache" directive, you can add it to the inner function wrapped by Kilpi.query and it just works.

carterschonwald

I spent a few years iterating on how to design a dbms / query language/ database platform that has user definable authorization datatypes as first class data. Still wanna revisit it but the value is really only in settings where you already have complex stuff going on.

Early stage engineering tends to want something much more initially simple.

ghc

Looking at this, I wonder if there should be a a warning attached to building bespoke auth systems akin to "don't roll your own crypto."

This "Protected Query Pattern" looks like bespoke ABAC, dealing with only a tiny fraction of the concerns mature solutions like XACML deal with.