I Went to SQL Injection Court
158 comments
·February 25, 2025chaps
foota
Out of curiosity, could you ask for something like "one row of data from every table in the CANVAS database"?
maCDzP
Have you tried looking for information from the developer about CANVAS? With any luck the developer has support documentation online that describes CANVAS and maybe you'll be able to narrow down your FOIA request.
manquer
I think the point of the lawsuit is less about CANVAS schema itself and more about the ability of the government to hide this kind of information from FOIA requests.
hathawsh
Kudos to you for enduring through this fight! We can only achieve transparency when people choose not to be complacent. Thank you.
What do you think are the next steps?
chaps
My first step is to actually finish my post :)
But after that, getting a reasonable law passed to fix this now-broken nonsense.
notjulianjaynes
Damn, this is impressive. I've been fighting with a state agency since December for 17,000 emails. I don't think I've ever tried to request emails and received zero push-back, but a $33 million estimate just, chef's kiss
foota
> Normally, a flustered public records officer would just reject a giant request for being for “unduly burdensome”… but this sort of estimate is practically unheard of. So much so that other FOIA nerds have told me that this is the second biggest request they've ever seen. The passive aggression is thick. Needless to say, it's not something I'm willing to pay for!
Welcome to Seattle :-)
mmaunder
Thanks for fighting the good fight for us all!
hn_user82179
This older post was such a fantastic read, thanks for sharing your story!
layoric
It's dated from ~2 weeks ago... is there other date information I am missing?
hn_user82179
ah no, I just said "older" since OP said it was older and I wanted to distinguish from the SQL post that this post is about
doctorpangloss
What are the administrators of CANVAS hiding?
chaps
Hard to say. One of my personal drivers for this lawsuit is a tip I received that said that Chicago has a list of vendors whose tickets are dropped in the back-end. When I requested that info, the city said they had no such list. I trust my source, so having schema information could help figure out the extent and if they were lying.
noboostforyou
Considering how much they fought to not release the schema, there's probably a column named "exempt_from_penalty" or something equally obvious.
MBCook
Well that certainly sounds suspicious. But it could also provide more damming evidence of targeting groups, people skimming the till, bribes to make tickets go away, all sort of fun shenanigans.
And boy they’re fighting suspiciously hard.
Good luck.
SkidanovAlex
While I believe that the city should share the schema, and that the city is effectively argues for security through obscurity, I disagree with the main premise of the article: that knowing SQL schema doesn't help the attacker.
If I understand the argument of the author here:
> Attackers like me use SQL injection attacks to recover SQL schemas. The schema is the product of an attack, not one of its predicates
The author appears to imply that once the vulnerability is found, the schema can be recovered anyway. It is not always the case. It is perfectly viable to find a SQL injection that would allow to fetch some data from the table that is being queried, but not from any other table, including `information_schema` or similar. If all the signal you get from the vunlerability is also "query failed" or "query succeeded, here's the data", knowing the schema makes it much easier to exploit.
> the problem is that every computer system connected to the Internet is being attacked every minute of every day
If you specifically log failed DB queries, than for all the possible injections that such 24/7 attacks would find you have already patched them. The log would then be not deafening until someone stumbles on the actual injection (that, for example, only exists for logged in users, and thus is not found by bots), in which case you have time to see it and patch before the attacker finds a way to actually utilize it.
Knowing schema both expedites their ability to take advantage of the vulnerability, but also increases their chances of probing the injection without triggering the query failure to begin with.
Volundr
I'm not an attacker, just a boring old software dev. If there's an SQL Injection I'd say all bets are off re: schema.
That said I've definitely worked on applications where knowing the schema could help you exfill data in the absence of a full injection. The most obvious being a query that's constructed based on url parameters, where the parameters aren't whitelisted.
So I actually do agree that the schema could potentially be of marginal benefit to the attacker.
pockmarked19
Reminds me that the recently discovered “leak emails using YouTube” exploit kicked off from reading what is essentially, a schema.
tptacek
If you specifically log failed database queries, where "failure" means "indicative of SQL injection", then nothing you can do with the schema is going to reduce the signal in that feed --- even a single SQL syntax error would be worth following up on. No, I don't think your logic holds.
kmoser
I don't understand your logic. Knowledge of the schema can give an attacker an edge because they now know the exact column names to probe. Whether these probes get logged is irrelevant; even if it makes the system more vulnerable for an instant, it's still more vulnerable.
Even if logging failed queries is your metric, then knowledge of column names would make it more likely for an attacker to craft correct queries, which would not get logged, thus making your logs less useful than if the attacker had to guess at column names and, in so doing, incur failed queries.
tptacek
To probe for what? How does knowledge of a column name make it easier for me to discern whether a SQL injection vulnerability exists? I've spent a lot of time in my career probing for SQL injection, and I can't remember an instance where my stimulus/response setup involved the table names.
SQL injection is a property of a SQL query, not of the schema itself. To have a meaningful chance of blind-one-shotting a query, getting a TRUE/FALSE answer about susceptibility without ever generating a SQL syntax error, I would need to see the queries themselves.
tptacek
Kurt posted this to troll me. Just know my audience here was, mostly, non-technical people involved in politics in my local Chicagoland municipality.
Permit me a PSA about local politics: engaging in national politics is bleak and dispiriting, like being a gnat bouncing off the glass plate window of a skyscraper. Local politics is, by contrast, extremely responsive. I've gotten things done --- including a law passed --- in my spare time and at practically no expense (drastically unlike national politics).
An amazing thing about local politics, at least in a lot of places, is that they revolve around message boards. The boards won't be in places you want to be (in particular: a lot of them are Facebook Groups) and you just have to suck it up. But if you enjoy participating in a community like HN, you can participate in politics, too, and message-board your way towards making things happen.
skissane
> Local politics is, by contrast, extremely responsive. I've gotten things done --- including a law passed
You live in a country where local governments have the power to make laws… in a lot of other countries they don’t - or, to be more precise, their lawmaking power is extremely limited.
Actually, even in the US, that’s often true too - only local governments with “home rule” can enact laws on any topic (provided it doesn’t contradict state or federal law), those without it can only enact laws on specific topics authorised by the state legislature. Some states grant home rule to all counties and municipalities, others none, others to some but not others (e.g. in Texas a municipality can give itself home rule powers, with approval of its voters, but only once it reaches a population of 5000).
bobthepanda
Even state legislators are, by their nature, pretty much locally driven given the relatively small size of their constituencies and thus the margin of victory.
Voters significantly underestimate their power even up to the House level; AOC’s first campaign was very scrappy and resulted in a bartender unseating the chair of the Congressional Democrat Caucus and likely successor to Nancy Pelosi, and that was the first campaign in which anyone bothered to primary him.
copypasterepeat
Would you care to elaborate which law you helped to pass?
Also, can you link to some good resources for someone who wants to get off the sidelines and get more involved in Chicago politics, whether the resources are on FB or elsewhere? I've previously tried Googling for some but with very limited success.
Thanks.
tptacek
We're the first municipality in Illinois to draft and adopt an instance of ACLU's CCOPS model legislation, which requires board approval at a recorded public board meeting before any agency (most especially our police force) can adopt any form of surveillance technology, given a broad (ACLU-supplied) definition of "surveillance". Previous to that, our police force could acquire arbitrary surveillance products so long as they kept under a discretionary budget threshold; they used that latitude to acquire a pilot deployment of Flock ALPR cameras, and CCOPS was a response to that.
My real goal is zoning.
In Chicago itself, I have less clarity, but am optimistic that somewhere on Facebook is a message board where the staff at your alderman's office reads posts, and the most politically engaged people in your neighborhood argue with each other. That's your starting point (and maybe your ending point). Just go, listen, and chime in with high-effort comments. If you're used to clearing the bar for HN comments, you're way past the threshold of coding like a super-thoughtful person in local politics.
pchristensen
My real goal is zoning.
God speed to you sir! What is your goal wrt zoning?hinkley
“Never doubt that a small group of thoughtful, committed citizens can change the world: indeed, it's the only thing that ever has.” - Margaret Mead
zahlman
>The boards won't be in places you want to be (in particular: a lot of them are Facebook Groups) and you just have to suck it up. But if you enjoy participating in a community like HN, you can participate in politics, too, and message-board your way towards making things happen.
How do you figure out where to go?
tptacek
The way you'd expect: I bumbled through a bunch of different Facebook Groups, starting with the one simply labeled for my neighborhood, and followed cross-posts. Eventually I found the two really important ones in my area (one is an organizing group for local progressives --- I live in a very blue muni, and the other is the main high-signal political group for the area, in which all the village electeds participate).
null
chaps
Aaaaaaa! I need to finish my post! :(
Y_Y
Is it not absurd that the supreme and appeal courts disagreed on a syntactical matter? Never mind that this isn't uncommon, or that (IMHO) it would be ridiculous to interpret it as "any file layouts at all, and other stuff too, but only bad other stuff". It's crazy to me that were happy for laws to sit on the books being utterly ambiguous.
I know this suits the courts who benefit from the leeway, and that (despite valiant efforts) we're not going to get "formal formal" language into statutes. I know that the law is an ass. I know that the laws are written by fallible and naive humans.
Even after all that, if the basic sentence structure of what's in the law isn't clear to the courts, hasn't the whole system fallen at the first hurdle?
copypasterepeat
I am not a lawyer, but my understanding is that's just how the justice system works. Reasonable people can disagree about what exactly a complicated statement says, since language is full of ambiguities. People have been discussing what the U.S. Constitution says exactly from the day it was written and there are still a lot of disagreements.
The standard response to this is that laws should be written in ways that are non-ambiguous but that's easier said than done. Not to mention that sometimes the lawmakers can't fully agree themselves so they leave some statements intentionally ambiguous so that they can be interpreted by the courts.
skissane
I’ve often thought we’d get more sensible results in court cases on computer-related issues if we had specialised courts where the judges were required to have a relevant degree (computer science, software engineering, computer engineering, information systems, etc). But I doubt it is going to happen any time soon.
ptsneves
Civil code law uses that way of thinking, where there are specialised courts for different areas: administrative, civil, labor, family, commercial and so on. I actually am not so sure it is great as these courts increase the depths of the bureaucracy to the point of being self serving. They also serve to segment expertise.
tptacek
To me it feels like the kind of dispute that is exactly why we have multiple levels of appeals court. The "file format" thing is super dumb, and they got it wrong, but the "that if disclosed" statutory interpretation is a thing that seems important to get a final, consistent determination on.
Y_Y
Of course I can't disagree that it's good that it's now settled. Still I can't help but imagine a world where the meaning, at least in terms of which words apply to which others (rather than qualifiers like "reasonable"), should be settled before the law is debated, voted on, and passed.
Even (some) programmers have learnt the dangers of parsing at run time (e.g. "eval is evil"). How can we decide it's the law we want if we don't know what it means yet?
null
Terr_
> Each spreadsheet has a header row, labeling the columns, like “price” and “quantity” and “name”. A database schema is simply the names of all the tabs, and each of those header rows.
This is also how I explain it to my relatives, I'm kind of surprised this analogy (one so direct that it's almost literal) didn't fly with the judges.
If database column names cannot be revealed, then shouldn't that mean the state is also able to redact the headers of all their spreadsheets?
duxup
Very interesting read.
It does seem absurd to think of divulging schema as protected, as described it allows for a magical sort of outcome where: "well it's in a database you can't know anything about, and if you can't tell me how to find it you're sol".
Working at a small company with lots of clients I wouldn't want to hand out DB schema outright, but I also go out of my way to search / get the client the data they want ... not reject them.
rectang
A private company wouldn't want to divulge their DB schemas because it's advantageous for competitors to see how you're doing things. That doesn't apply to government databases.
chaps
Not quite, and the details get hairier the closer you look. The database in-question here is an IBM system. The database itself is used for government functions, making it FOIA'able, despite it being managed by a third party company. IBM even tried to argue that the schema was trade secret, but the statute isn't straight forward. Here's my (successful) response when they tried:
You mentioned on Thursday over the phone that IBM is not too keen on having its database schema released, and, between IBM and Chicago, is seeking an exemption under 5 ILCS 140/7(1)(g) - an exemption that is only valid if the release of records would cause competitive harm. This email preemptively seeks to address that exemption within the context of this request in the hopes of a speedier release of records. It is FOI's belief that there is little room for the case for the valid use of 5 ILCS 140/7(1)(g) when considering the insignificance of the records in conjunction with the release of past documents:
1. Chicago released CANVAS's technical specification [1] seven years ago. To the extent that the specification's continued publication does not cause competitive harm, it is very unlikely that the release of CANVAS's database schema would cause any harm. 2. The claim that the release of a database schema would cause competitive harm is not unlike suggesting that the release of filing cabinets' labels can cause competitive harm.
Furthermore, in your response, please be mindful that the burden of proving competitive harm rests on the public body [2].
[1] https://www.cityofchicago.org/content/dam/city/depts/dps/Con... [2] http://foia.ilattorneygeneral.net/pdf/opinions/2018/18-004.p...
bob1029
The schema on the last project I worked on was probably our most important IP. Specifically, the ways in which we solved certain circular dependency issues.
I wouldn't take the ability to design a schema for granted. I don't think many people are any good at it. Do not underestimate the value of your work products.
hinkley
Part of the reason I’m so… enthusiastic… about tech debt is that I’ve worked a few times where we had a competitor whose lunch we were stealing or who was stealing ours and the ability or inability to copy features cheaply was substantially the difference between us.
That quad graph of value versus difficulty that everyone loves? It’s not quadrants it’s a gradient and the difficulty dimension depends quite a bit on context. What’s a 4 difficulty for me might be a 6 for someone else. Accidental versus intrinsic complexity plus similarity to or distinctions from things we have already done.
bornfreddy
Maybe. But now I'm really curious how bad that schema must be for them to hide it so viciously.
jrochkind1
I think it's just an excuse to avoid making it feasible for the public to get the data.
duxup
Your imagination can't cover how bad you might think it is (and yet it isn't that bad).
Or at least I don't want to explain to "20 years later Monday Morning Quarterback".
hot_gril
Maybe their schema has triggers and stuff
michaelmrose
Used to be relevant data was in a document but much is no stored in specialized web apps whose data in turn is stored in a db.
EMIRELADERO
Am I the only one slightly perplexed/worried by the point-blank source code exemption?
It's easy to imagine a scenario where the city decides to develop a specific software in-house and hide the "biases" in the source code, or any other thing one might not find desirable.
Hell, they don't even need to make everything from scratch! Could just patch and use a permissively licensed 3rd-party component.
In my opinion, the proposed amendment does not go far enough.
manquer
It shouldn't be surprising ?
It is the same problem people trying to open sourcing closed projects experience, there is all sorts of locked-in proprietary code which the developer and the customer only have the license to use but not share the source.
Even projects which are from day one are staunchly open and built without direct commercial pay off like government contractors also suffer from this. The linux kernel challenges for supporting ZFS or binary blob drivers in kernel/user space and so on are well known[1]
Paradoxically on one hand information wants to be free, and economics dictate that open source software will crowd out closed competitors over time, it is also expensive to open source a project and sometimes prohibitively so and that deters many managers and companies open sourcing their older tools etc, even if they would like to do so, involving legal and trying to find even the rights holder for each component can deter most managers.
If a government put requirements in contracts that the vendor should only use open source components in their entire dependency tree, it could drive the costs very high because a lot of those dependencies may not have equivalent open source ones or those lack features of the closed ones so would need budgets to flesh them out. In the short term and no legislature will accept that kind of additional expense, while in long term public will benefit.
---
[1] yes kernel problems are largely function of nature of GPL that say more permissive licenses like Apache 2 /MIT would not face, BSD variants after all had no challenges in supporting ZFS.
However a principled stance on public applications being open source by government would be closer to GPL than MIT in terms of licensing. Otherwise a vendor can just import the actual important parts as binary blobs "vendored" code and have some meaningless scaffolding in the open source component to comply.
dotdi
That's why it's important to push for "public money - open source" initiatives like some countries in the EU are trying to implement.
Off the top of my head, I think the last (now failed) German coalition had this in their programme but didn't deliver. Maybe the new government will.
jaxgeller
I FOIA'ed >1M pages of docs for my project cleartap.com, a DB of water quality of the USA.
Most states would charge a small amount to gather the documents.
Michigan wanted $50K to for the FOIA request. I think because of the Flint lead crisis. They wanted me to go away.
davethedevguy
I noticed that you do have data for Flint. Did you have to pay it, or is there some appeals process if you're quoted an unreasonable amount?
Great project by the way!
ajkjk
This was fine, legally, but I'd be pretty irritated if someone I knew wasted everyone's time on this. The schema clearly is (marginally) useful for hacking, but who cares; it clearly is a file layout also, but who cares; those matter legally but not morally. Morally, this is just dumb: it's not something they really needed, and they're just irritating people and wasting resources for the fun of it. Shameful.
probably_wrong
Random thought: someone should drive to Chicago, get a parking ticket, and then make a FOIA request for all of their information contained in that database.
It won't be the whole database schema, but it would be a start.
chaps
Short answer -- already been done.
This (spoiler) visualization's going into my eventual post about the lawsuit: https://observablehq.com/d/026992341cc47ff0
pavon
Great read. Frustrating that the court ruled that a schema was a file layout, since I don't think it is, but at the same time if it didn't fall under that exception, there is a strong arguments that would be considered "documentation pertaining to all logical ... design of computerized systems". A schema is literally, the logical design of the database, and the database is a part of the computerized system. Once it was ruled that those examples are "per se" exempt it was a long shot to argue that schema wasn't covered by any of the examples.
gregw2
I completely agree that (unlike/despite the Supreme Court ruling), database schema design (and other system designs) should fall under the Illinois statute as "documentation pertaining to all logical and physical design of computerized systems".
I'm not sure why that wasn't argued by the state and they argued it was a file format.
I disagree with you slightly however and would say that the schema table/column names should be considered "physical design" while the business naming/meaning of tables would be a "logical design" (or conceptual design), see Wikipedia: https://en.wikipedia.org/wiki/Logical_schema
SQL injection is really about physical schema designs, not logical ones (I do get that every bit of information including business naming of tables/columns helps in an attack, but it does change the degree of threat and thus the balancing tests of the risk which are relevant per the definitions and case law described in the original article.)
So in terms of what the law /SHOULD/ be, the law should not include logical design as a security exception, only physical design. It /SHOULD/ be possible for citizens to do FOIA requests and get a logical understanding of all the database fields without giving them the SQL names that can be handy for SQL injection. In that way citizens could ask for the data by a logical/business-named handle rather than a physical one.
And the state should create logical models or provide data dictionaries with business (not technical terms) on request as part of their FOIAable obligations to their citizens for the data they are maintaining.
hot_gril
Schema is definitely software, a operating protocol, source code, and file layout. Maybe also documentation.
pavon
I think a schema will definitely be part of the source listing, either in the main programming language source code or in a some other file used to define or initialize the database. But I don't think it is software, any more than a protocol is software. Software does something.
One tricky aspect of this is that even if the schema itself as a higher level concept doesn't fit into any of those definitions, all existing instances of the schema are likely considered either source listings or documentation. So the instances are barred from release per se, and you can't ask the government to create new documents.
tptacek
A schema isn't software in the sense imagined by the ILGA. If it was, every Excel spreadsheet would be too, and Excel spreadsheets are the basic currency of FOIA.
An "operating protocol" is a step-by-step list of things to accomplish some action. It's a finite state machine for humans. Obviously, a schema isn't that; a schema is declarative, and an operating protocol is imperative.
The court definitively established that SQL schemas aren't source code in the sense imagined by the ILGA. SQL queries can be. Schemas are not.
See downthread for why a schema isn't a file format. In fact, a schema is almost the opposite of a file format.
A court will look at the term "documentation" in the ordinary sense of the word; as in, "a prose description and set of instructions".
"Associated with automated data processing operations" isn't an element in the statute; it's a description of all of the elements.
hot_gril
If the Excel spreadsheet has formulas in it, it's software. If you're just talking about the data in the sheet, i.e. what you'd get exporting it as a CSV, then it's not.
Col types, unique/FK/PK constraints, default values, and computed cols define the steps for handling row inserts/updates/deletes. For example, insert row -> check that customer ID unique -> reject if not. Even adding a uniqueness constraint to an already-unique col will change how the code interacts with it, specifically how it deals with concurrency/locking. If they said it has to be an imperative programming language, then it's not that.
If they said the schema isn't source code then ok, but I still think it is.
n_plus_1_acc
An Excel formula should be considerd a kind of software, because you cab do code golf in it.
paulddraper
How is a database schema not a file layout?
kasey_junk
The article describes why. 2 different db engines (or even instances) can use different file layouts for the same schema.
In many was sql is all about divorcing the schema from the files.
ludston
But on the other hand, in all database systems the schema is used to determine how the files are laid out. Although I suppose the same thing could be argued for any data that is stored in a file, excepting that a schema is metadata that determines the organisation of data so it's a bit of a special case.
hot_gril
There's a solid chance that the schema gives away what DBMS is being used. But even if it didn't, I'd still call it a file layout in this context.
tptacek
Another way to think about it is that if a SQL schema is a file, so is an Excel spreadsheet template.
hyperpape
It literally does not describe a file, and does not literally describe the data layout of anything on disk (though with enough knowledge, you may be able to infer facts about probable layouts).
paulddraper
> does not literally describe the data layout of anything on disk
Huh? Depends on the DMBS, but each InnoDB table is a file.
And the schema determines the file structure.
dools
The schema describes the database layout. The file layout (if you were going to call it that) in a modern RDBMS would describe how the RDBMS implemented a particular database layout as described by the schema.
null
michaelmrose
Because it doesn't describe how data is laid out on disk.
hot_gril
Neither does a file layout. FS will decide that... even then, not physically.
Hi everyone, I'm the plaintiff in this lawsuit. I'm still working on my companion post for tptacek's post! I'll have it ready Soon TM, but feel free to me any questions in the meantime here.
While you're waiting, check out this older post: https://mchap.io/that-time-the-city-of-seattle-accidentally-...