Unexpected security footguns in Go's parsers
50 comments
·June 18, 2025octo888
liampulles
Yeah its a bad pattern. And we can expect to see it in a sizable number of production codebases.
diggan
ORMs love to put people and projects in the situation where it's much easier for data types in your program to match exactly with the tables in your database, but reality often requires them to be decoupled instead. That "create user" request contains "isAdmin" smells like someone mapped that exact request to a table, and just because the table has "isAdmin", the data type now also has it.
ajross
Exactly. This isn't a flaw in the runtime or the metaphor, the bug here is that the app exposes private data in a public API. That's a mistake that doesn't care about implementation choice. You can make the same blunder with a RESTful interface or a CGI script.
At most, you can argue that simple serialization libraries (Go's is indeed one of the best) make it more tempting to "just send the data" in such a design, so if you squint really (really) hard, you can call this a "footgun" I guess.
But the rest of the headline is 100% nonsense. This is not about "Go" or "parsers". At all.
tln
What? It's about parsers in Go's standard library.
ajross
A "parser" is the piece of software that translates unstructured input (usually text) into structured output that better reflects the runtime of a programming language. A "security bug" in a parser is normally construed to be the existence of an input that causes the software to do something incorrect/undocumented/unexpected.
Nothing in the article discusses a parser or anything like a parser bug.
The article doesn't like that the semantics of the user-facing API wrapped around the parser is, I guess, "easy to make mistakes with". That's an article about API design, at most. But that's boring and specious and doesn't grab clicks, so they want you to think that Go's parsers are insecure instead.
delusional
> No need to just have just 1 model that maps 1:1 to your DB row. This applies to all languages
One reason is to avoid copying data constantly. I don't just mean this from an efficiency perspective, but also (and maybe more so) from a simplicity one. If you have a library for shoving data into a struct mechanistically, but you then take the data from that struct and shove it into an additional struct, what's the point of the library? You're writing the code move the data anyway.
In my dayjob I see this tendency constantly to have a lot of different very narrow structs that somehow integrate into some library, and then a TON of supporting code to copy between those structs. Only to then do very little actually useful work with any of the data at the end. I generally think you'd be happier with fatter structs that integrated less with weird "struct-filling" libraries.
eadmund
> In my dayjob I see this tendency constantly to have a lot of different very narrow structs that somehow integrate into some library, and then a TON of supporting code to copy between those structs. Only to then do very little actually useful work with any of the data at the end.
Don’t think of it as doing a little useful work at the end; think of it as doing all the useful work in the centre. Your core logic should be as close to a pure implementation without external libraries as possible (ideally zero, but that is often not easily achievable), but call out to external libraries and services to get its work done as appropriate. That does mean a fair amount of data copying, but IMHO it’s worth it. Testing copies is easy and localised, whereas understand the implications of a JSON (or Protobuf, or Django, or whatever) object carried deep into one’s core logic and passed into other services and libraries is very very difficult.
There’s a common theme with the ORM trap here. The cost of a little bit of magic is often higher than the cost of a little bit of copying.
kgeist
1. As shown in the article, exposing the internal model to APIs directly is a security footgun. I've seen sites get hacked because the developer serialized internal objects without any oversight just like in the article and accidentally exposed secrets.
2. Exposing internal models to APIs directly also makes it hard to refactor code because refactoring would change APIs, which would require updating the clients (especially problematic when the clients are owned by other teams). I've seen this firsthand too in a large legacy project, people were afraid to refactor the code because whenever they tried, it broke the clients downstream. So instead of refactoring, they just added various complex hacks to avoid touching the old core code (and of course, their models also mapped directly to the UI).
In the end, codebases like that, with no layer separation, become really hard to maintain and full of security problems.
All because they thought it was "simpler" to skip writing ~10 lines of extra boilerplate per model to map models to DTOs.
Lack of layer separation becomes a problem in the long term. When you're just starting out, it may seem like overengineering, but it isn't
delusional
> Lack of layer separation becomes a problem in the long term. When you're just starting out, it may seem like overengineering, but it isn't
I actually agree, but you're setting of a false dichotomy. I do believe in strong layering at the interfaces, for exactly the reasons you line up. What I don't believe in is what I might call "struct annotation based parsing" at those interfaces.
Typically, you don't want to pass DTO's around your code. Usually, you take in that struct, and then immediatly have some code to poke it into the appropriate places in your actual data structures. It's very often much easier to simply take a well structured but more direct and generic interpretation of the input data, and write the code to poke it into the correct places directly.
It is not that you should define your inputs separately from your internal data storage. It's that the specification of your input structure shouldn't exist as a struct, it should exist as the consequence of your parsing code.
> When you're just starting out, it may seem like overengineering, but it isn't
It's a real shame that the internet has driven us to assume everybody is a novice.
masklinn
> In my dayjob I see this tendency constantly to have a lot of different very narrow structs that somehow integrate into some library, and then a TON of supporting code to copy between those structs.
Maybe that's the problem to solve, rather than exposing the entire internal world to the outside? Because different views of the same entities is pretty critical otherwise it's way too easy to start e.g. returning PII to public endpoints because some internal process needed it.
delusional
> exposing the entire internal world to the outside
That's not at all what I said.
You don't need a struct to avoid exposing internal data. If you're building a JSON object, you can just not write the code to format some fields out. You don't need a new data layout for that.
homebrewer
I don't know about Go, but in Java and .NET world this is trivially solvable with libraries like MapStruct. If you have a model with 20 fields and need to create a tiny slice of it (with let's say three fields), you need a few lines of boilerplate: create a record with those fields (1-3 LOC):
record Profile(int id, String login, boolean isAdmin) {}
create a mapper for it: interface UserMapper {
// arrays are just one example, plain models and plenty
// of other data structures are supported
Profile[] usersToProfiles(User[] user);
// other mappers...
}
and then use it: class UserController {
//
@GET("/profiles")
Profile[] getUserProfiles() {
var users = userRepo.getUsers();
return userMapper.usersToProfiles(users);
}
}
As long as fields' names match, everything will be handled for you. Adding another "view" of your users requires creating that "view" (as a record or as a plain class) and adding just one line to the mapper interface, even if that class contains all User's fields but one. So no need to write and maintain 19+ lines of copying data around.It also handles nested/recursive entities, nulls, etc. It's also using codegen, not reflection, so performance is exactly the same as if you had written it by hand, and the code is easy to read.
Go developers usually "don't need these complications", so this is just another self-inflicted problem. Or maybe it's solved, look around.
jerf
Actually it's even easier in Go with struct embedding:
type SmallerThing struct {
Id int
Login string
IsAdmin bool
}
type UserController struct {
SmallerThing
OtherField Whatever
OtherField2 SomethingElse
}
In principle this could break down if you need super, super complicated non-overlapping mappings, in practice I have yet to need that.kgeist
>you have a model with 20 fields and need to create a tiny slice of it (with let's say three fields), you need a few lines of boilerplate: create a record with those fields (1-3 LOC):
>create a mapper for it:
> ...
>Go developers usually "don't need these complications", so this is just another self-inflicted problem.
In Go:
type DTO struct {
A, B, C string
}
Somewhere in your API layer: // copy the fields to the DTO
return DTO{A: o.A, B: o.B, C: o.C}
I fail to see where the "self-inflicted problem" is and why it requires a whole library? (which seems to require around the same number of lines of code at the end of the day, if you count the imports, the additional mapper interface)TeMPOraL
> also (and maybe more so) from a simplicity one. If you have a library for shoving data into a struct mechanistically, but you then take the data from that struct and shove it into an additional struct, what's the point of the library? You're writing the code move the data anyway.
Super annoying if you need to do it by hand, and wastes compute and memory if you actually need to do copies of copies, but this is the mapping part of "object relational mapping", the M in ORM. Skipping it is a bad idea.
Your business/domain model should not be tied directly to your persistence model. It's a common mistake that's responsible for like half of the bad rep ORMs get. Data structures may look superficially similar, but they represent different concepts with different semantics and expectations. If you skip on that, you'll end up with tons of stupid mistakes like 'masklinn mentions, and more subtle bugs when the concepts being squashed together start pulling in opposite directions over time.
fainpul
I didn't see this mentioned in the article: wouldn't the obvious way be to make the private field private (lowercase)?
(I'm not a Go developer, just tried the language casually).
type User struct {
Username string `json:"username_json_key,omitempty"`
Password string `json:"password"`
isAdmin bool
}
https://go.dev/play/p/1m-6hO93Xcearp242
That works, but then you also can't access isAdmin from another package.
never_inline
You can annotate `json:"-"` which is equivalent of @JsonIgnore
zimpenfish
Covered in the article along with a potential snafu - adding anything else (`-,omitempty`) turns it into a key "-", not the "ignore this field" indicator.
em-bee
i am not aware of any parser that does that differently, but i would also argue that this is not the job of parsers. after parsing (or before exporting) there should be a data validation step based on whitelists.
so the user can send in unknown fields all they want, the code will only accept the username and firstname strings, and ignore the other ones.
same with fetching data and sending it to the user. i fetch only the fields i want and create the correct datastructures before invoking the marshaling step.
there are no footguns. if you expect your parser to protect you you are using it wrong. they were not designed for that.
input -> parse -> extract the fields we want, which are valid -> create a data-structure with those fields.
data -> get fields i want -> create datastructures with only wanted fields -> write to output format
securesaml
This is correct. In blog post they say: > Other examples exist, but most follow the same pattern: the component that does security checks and the component that performs the actions differ in their view of the input data.
This would be solved (as you described), by ensuring that the downstream layer uses only contents that are verified in the security check layer.
If they are using a microservice then: Security check API -> return verified data (i.e. re-serialize the verified JSON or XML into byte form, NOT the original input) -> Processing layer i.e. userCreate API uses verified data.
This is the method we used in fixing the ruby-saml example.
See: https://bsky.app/profile/filippo.abyssdomain.expert/post/3le...
aintnolove
I know these problems are easily avoidable... but I'm finally starting to see the appeal of protocol buffers.
Just to have the assurance that, regardless of programming language, you're guaranteed a consistent ser/de experience.
chubot
It would nice if it were true :-)
But it’s not, for reasons that have more to do with the languages themselves, than parsing
e.g. C++ numbers are different than Java numbers are different than Python numbers are different than JavaScript numbers
ditto for strings
liampulles
I think you can get similar benefits here from writing an RPC style JSON API into an OpenAPI spec and generating structs and route handlers from that. That's what I do for most of my Go projects anyway.
glenjamin
It’s worth noting that if you DisallowUnknownFields it makes it much harder to handle forward/backward compatible API changes - which is a very common and usually desirable pattern
asimops
In the case of Attack scenario 2, I do not get why in a secure design you would ever forward the client originating data to the auth service. This is more of a broken best practise then a footgun to me.
The logic should be "Parse, don't validate"[0] and after that you work on those parsed data.
[0]: https://hn.algolia.com/?q=https%3A%2F%2Flexi-lambda.github.i...
securesaml
See: https://bsky.app/profile/filippo.abyssdomain.expert/post/3le... that was about a signature wrapping attack in crypto, but it also applies here.
anitil
This was all very interesting, but that polyglot json/yaml/xml payload was a big surprise to me! I had no idea that go's default xml parser would accept proceeding and trailing garbage. I'd always thought of json as one of the simpler formats to parse, but I suppose the real world would beg to differ.
It's interesting that decisions made about seemingly-innocuous conditions like 'what if there are duplicate keys' have a long tail of consequences
jerf
There's a lot of good information in here, but if you think this is a time to go all language supremicist about how much better your language is and how this proves Go sucks, you might want to double-check the CVE database real quick first. A lot of these issues are more attributable to plain ol' user error and the fundamental messiness of JSON and XML than Go qua Go and I've seen many similar issues everywhere.
For instance, there simply isn't a "correct" way for a parser to handle duplicate keys. Because the problem they have is different layers seeing them differently, you can have the problem anywhere duplicate keys are treated differently, and it's not like Go is the only thing to implement "last wins". It doesn't matter what you do. Last wins? Varies from the many "first wins" implementations. First wins? Varies from the many "last wins" implementations. Nondeterministically choose? Now you conflict with everyone, even yourself, sometimes. Crash or throw an exception or otherwise fail? Now you've got a potential DOS. There's no way for a parser to win here, in any langauge. The code using the parser has some decisions to make.
Another example, the streaming JSON decoder "accepts" trailing garbage data because by the act of using the streaming JSON decoder you have indicated a willingness to potentially decode more JSON data. You can use this to handle newline-separated JSON, or other interesting JSON protocols where you're not committing to the string being just one JSON value and absolutely nothing else. It's not "an issue they're not planning on fixing", it's a feature with an absolutely unavoidable side effect in the context of streams. The JSON parser stops reading the stream at the end of the complete JSON object, by design, and anything else would be wrong because it would be consuming a value to "check" whether the next thing is JSON or not, when you may not even be "claiming" that the "next thing" is JSON, and whatever input it consumed to verify a claim that nobody is even making would itself be a bug.
Accepting user input into sensitive variables is a common mistake I've seen multiple times in a number of langauges. The root problem there is more the tension between convenience and security than languages themselves; any language can make it so convenient to load data that developers accidentally load more data than they realize.
Etc. The best lesson to take away from this is that there is more than meets the eye with JSON and XML and they're harder to use safely than its ease-of-use suggests.
Although in the interests of fairness I also consider case insensitivity in the JSON field names to be a mistake; maybe it should be an option, JSON can get messy in the real world, but it's a bad default. I have other quibbles but most of them are things where there isn't a correct answer where you can unambiguously say that some choice is wrong. JSON is really quite fundamentally messier than people realize, and XML, while generally more tightly specified at the grammer level than JSON is, is generally quite messy in the protocols people build on top of it.
e_y_
As someone who isn't a Go programmer, on the face of it using strings (struct tags) for field metadata seems pretty backwards compared to Rust macros (which parses the metadata at compile time) or Java annotations (which are processed at runtime but at least don't require parsing a string to break apart options).
The accidental omitempty and - are a good example of the weirdness even if they might not cause problems in practice.
liampulles
As someone who is regular Go programmer, yes struct tags do suck. It gets even worse if you want to try and combine multiple "annotations" into one struct tag string.
The reason its like that is that Go philosophically is very much against the idea of annotations and macros, and very strongly about the idea of a clear upfront control flow, and this is one of the reasons I love the language. But it does come at the cost of a few highly useful usecases for annotations (like mapping JSON and XML, etc.) becoming obtuse to use.
The idea of more compile-time macros in Go is interesting to me, but at the same time the ease of debugging and understanding the Go control flow in my programs is one of the reasons I love it so much, and I would not want to invite the possibility of "magic" web frameworks that would inevitably result from more metaprogramming ability in Go. So I guess I'm prepared to live with this consequence. :/
kjksf
For some it's stupidity. For others it's brilliance.
It's one of many examples of 80/20 design in Go: 80% of functionality with 20% of complexity and cost.
Struct tags address an important scenario in an easy to use way.
But they don't try to address other scenarios, like annotations do. They are not function tags. They're not variable tags. They are not general purpose annotations. They are annotations for struct fields and struct fields only.
Are they are as powerful as annotations or macros? Of course not, not even close.
Are they as complex to implement, understand, use? Also not.
80/20 design. 80% of functionality at 20% of cost.
Philpax
Go's simplifications often introduce complexities elsewhere, however, as this article demonstrates with the complexities of correctness of a stringly-typed DSL.
There's no free lunch here, and the compromises Go makes to achieve its outcomes have shown themselves to be error-prone in ways that were entirely predictable at design time.
xnorswap
As a .net programmer, the "stringly typed" nature of the metadata horrifies me, but the choices of Go have long confused me.
So in .NET, like Java as you mention, we have attributes, .
e.g.
[JsonPropertyName("username")]
[JsonIgnore]
etc.This is simple, and obvious. The JsonPropertyName attribute is an override, you can set naming policies for the whole class. camelCase by default, with kebab-case, snake_case etc as alternative defaults.
C#/.NET of course has the benefit of having public properties, which are serialised by default, and private properties, which aren't, so you're unlikely to be exposing things you don't want to expose.
This contrasts to Go's approach, much like python, of using casing convention to determine private vs public fields. ( Please correct me if I'm wrong on this? )
The first example still confuses me though, because either you want IsAdmin to come from the user, in which case you still want to deserialise it, or you don't, in which case it shouldn't even be in your DTO at all.
Deserialisation there is a bit of a red-herring, as there should be a validation step which includes, "Does this user have the rights to create an admin?".
The idea of having a user class, which gets directly updated using properties straight from deserialized user input, feels weird to me, but I'd probably be dismissed as an "enterprise programmer" who wants to put layers between everything.
masklinn
> This contrasts to Go's approach, much like python, of using casing convention to determine private vs public fields. ( Please correct me if I'm wrong on this? )
Go actually ties visibility to casing, instead of using separate annotations. And it will not serialise private fields, only public.
Python has no concept of visibility at all, conventionally you should not access attributes prefixed with `_` but it won't stop you.
hmry
> This contrasts to Go's approach, much like python, of using casing convention to determine private vs public fields. ( Please correct me if I'm wrong on this? )
I think calling it a convention is misleading.
In Python, you can access an _field just by writing obj._field. It's not enforced, only a note to the user that they shouldn't do that.
But in Go, obj.field is a compiler error. Fields that start with a lowercase letter really are private, and this is enforced.
So I think it's better to think of it as true private fields, just with a... unique syntax.
grey-area
Not just weird, it’s dangerous to do this - to easy to miss validation as fields are added over time. Better to explicitly validate all fields IMO.
grey-area
Yes they are a horrible idea for many reasons, not just security. It’s like a hidden ill-defined poorly understood dsl in strings.
You can just not use them though - you can unmarshal to a map instead and select the keys you want, perform validation etc and then set the values.
Same when publishing - I prefer to have an explicit view which defines the keys exposed rather than than publishing all by default based on these poorly understood string keys attached to types.
reactordev
struct tags greatly reduce the boilerplate code required to map fields to fields. It’s really quite novel once you understand it.
masklinn
> struct tags greatly reduce the boilerplate code required to map fields to fields.
Are you somehow under the impression that Go is unique in having a terse way to map fields to fields?
> It’s really quite novel once you understand it.
It's the opposite of novel, putting ad-hoc annotations in unstructured contexts is what people used to do before java 5.
reactordev
No, not at all, but considering C has no such thing, I’ll take it.
jlouis
It's not very novel. There's far better ways of solving this than allowing a random string to be embedded as aux information to a struct field. Examples: F# type providers, or OCamls PPX system for extending the language in a well defined way. Macro rewriting systems also allow for better safety in this area.
This allows you to derive a safe parser from the structural data, and you can make said parser be really strict. See e.g., Wuffs or Langsec for examples of approaches here.
reactordev
I’m not disagreeing that there are better ways to solve this given how other languages have implemented theirs but considering the constraints they had at the time the Go team designed this, it allowed them to implement marshaling fairly easily and leaves it open for extensions by the community.
neuroelectron
Been seeing these same problems in services for decades now. It's almost like they made these protocol languages exploitable on purpose.
v5v3
Indeed...
What is "IsAdmin" doing in the "create user" request DTO in the first place? The examples seem to indicate inappropriate re-use of data models.
Would it not be better to:
etc?No need to just have just 1 model that maps 1:1 to your DB row. This applies to all languages