Skip to content(if available)orjump to list(if available)

Decoding JSON sum types in Go without panicking

shabbyrobe

IME this is a longstanding pain point with Go. There's an attempt to propose an encoding/json/v2 package [1] being kicked around at the moment [2], spawned from a discussion [3].

This at least seems to improve the situation of marshalling to/from an interface directly slightly by providing the ability to pass custom Unmarshalers for a specific type (via json.WithUnmarshalers and json.UnmarshalFunc) to the Unmarshal functions, but it appears to still have the inefficient double-decode problem. Or I just haven't found a decent way around it yet.

Looks like they're intentionally punting on a first class solution until (if) the language gets some sort of sum type, but I still think the second-class solution could do a bit more to make this extremely common use-case more convenient. Pretty much every serious production Go app I've worked on in the last 10 years or so has had some horrible coping strategy for the "map a field-discriminated object to/from implementations of an interface" gap, often involving some sort of double-unmarshal.

Quote from the proposal [1]:

> First-class support for union types: It is common for the type of a particular JSON value to be dynamically changed based on context. This is difficult to support in Go as the equivalent of a dynamic value is a Go interface. When unmarshaling, there is no way in Go reflection to enumerate the set of possible Go types that can be stored in a Go interface in order to choose the right type to automatically unmarshal a dynamic JSON value. Support for such use cases is deferred until better Go language support exists.

  [1]: https://pkg.go.dev/github.com/go-json-experiment/json
  [2]: https://github.com/golang/go/issues/71497
  [3]: https://github.com/golang/go/discussions/63397

klabb3

> IME this is a longstanding pain point with Go.

Understatement of the year. But it’s really not limited to encoding but generally lack of sum types is excruciating after having tasted them (in Rust, in my case). They click instantly as an abstraction and they solve countless real-world logic bugs. Not to mention their ergonomics in seemingly unrelated things like eliminating null and error handling with result types. Just sprinkle some pattern matching on top and you’re in paradise.

0x696C6961

Last time I checked, the json/v2 package fixed the double decode problem by passing the decoder into the unmarshaling callback.

shabbyrobe

That's not what I found in my own experiments, I still had to unmarshal once inside the callback to get the `type` field out, then again once I knew what the type was. Do you have an example handy?

dmix

The weird JSON handling was the main reason I stopped using Go for side projects long ago.

pstuart

> until (if) the language gets some sort of sum type

Is there any discussion with the Go team about this actually happening?

ajb

I don't know about recently, but people were asking about them from the first announcement in 2009; and got the answer that they were "under consideration"

To be fair, it's a significant advantage of go that they have been strict about keeping it's feature set small.

Edited to add:

There is this 2023 proposal from Ian Lance Taylor (on the go team) https://github.com/golang/go/issues/57644 But it makes all sum types include nil, which seems suboptimal

reactordev

Cool but all you really needed to do was fix the contract between NewActionDeleteObject’s struct creation and the switch statements result = print. What’s really crazy is you can create anonymous structs to unmarshal json without having to fully flesh out data models thanks to its “we’ll only map the fields we see in the struct” approach. Mapstructure takes this even further with squash. In the end, the type checker missed the error because it was user error by contract and that, due to lack of validation that the actions were constructed properly, resulted in a rabbit hole debugging session that ended with an excellent blog post about the gotchas of unmarshaling.

hinkley

> human error

Humans suck at formalisms.

When XML was the 'sane choice' I grew to hate it, until I met a person who introduced me to XMLSpy. It has a mode where instead of trying to define the schema you offer it example documents and it offers you a schema that seems to satisfy your examples, at which point you can either switch to manual or provide more examples. He was right, it did make me hate XML a lot less.

At least until the next job when they wouldn't pay for it.

mccanne

Nice article!

Decoding sum types into Go interface values is obviously tricky stuff, but it gets even harder when you have recursive data structures as in an abstract syntax tree (AST). The article doesn't address this. Since there wasn't anything out there to do this, we built a little package called "unpack" as part of the SuperDB project.

The package is here...

https://github.com/brimdata/super/blob/main/pkg/unpack/refle...

and an example use in SuperDB is here...

https://github.com/brimdata/super/blob/main/compiler/ast/unp...

Sorry it's not very well documented, but once we got it working, we found the approach quite powerful and easy.

mccanne

And somewhat ironically here, SuperDB not only implements sum-type decoding of JSON in package unpack, but it also implements native sum types in a superset of JSON that we call Super JSON (with a query language that understands how to rip and stitch sum types for columnar analytics... work in progress)

https://superdb.org/docs/formats/data-model/#25-union

nasretdinov

BTW double unmarshalling (and double marshalling) can be quite slow, so to speed up determining the object type you can extract the type field e.g. by using gjson (https://github.com/tidwall/gjson). It can be easily 10x faster for this kind of scenario

the_gipsy

Shameless plug: I wrote a "JSON Tagged Union" package for go: https://github.com/benjajaja/jtug

Let's you decode something like

    [
      { "type": "A", "count": 10 },
      { "type": "B", "data": "hello" }
    ]
into some go like:

    type Tag string
    const (
     A Tag = "A"
     B Tag = "B"
    )
    type StructA struct {
     Type  Tag `json:"type"`
     Count int `json:"count"`
    }
    type StructB struct {
     Type Tag    `json:"type"`
     Data string `json:"data"`
    }
by writing a (subjectively) minimal amount of boilerplate:

    type Union = jtug.Union[Tag]
    type List = jtug.UnionList[Tag, Mapper]
    type Mapper struct{}
    func (Mapper) Unmarshal(b []byte, t Tag) (jtug.Union[Tag], error) {
     switch t {
     case A:
      var value StructA
      return value, json.Unmarshal(b, &value)
     case B:
      var value StructB
      return value, json.Unmarshal(b, &value)
     default:
      return nil, fmt.Errorf("unknown tag: \"%s\"", t)
     }
    }
This shows that now it's possible to use `json.Unmarshal` directly:

    var list List
    err := json.Unmarshal([]byte(`[
        {"type":"A","count":10},
        {"type":"B","data":"hello"}
    ]`), &list)
    for i := range list {
        switch t := list[i].(type) {
            case StructA:
                println(t.Count)
            case StructB:
                println(t.Data)
            // etc.
        }
    }
Of course, it relies on reflection, and is generally not very efficient. If you control the API, and it's going to be consumed by go, then I would just not do tagged unions.

akpa1

Interesting to see V mentioned here. Is it still the chaotic mess of a language that'll never be like it was a few years ago?

byrnedo

Pulled my hair out about doing this all over the place when integrating with a node api, ended up writing https://github.com/byrnedo/pjson. Feels like this should be covered as a more first class thing in go.

timewizard

It is. That's what json.RawMessage is for. It's a shame the author didn't explore this more. It's a perfectly cromulent solution to this problem.

foofoo4u

Surprising to see V lang brought up. What’s it current reputation?

hesus_ruiz

Reading the article I got the same conclusion as every time I approach sum types: they are ONLY useful for addressing malformed JSON structs of hacking BAD data structure/logic design, at least for most business applications (for system-level programs my reasoning is different).

The example JSON in the article, even if it may be common, is broken and I would not accept such design, because an action on an object must require the action AND the object.

For many year, I have advised companies developing business applications to avoid programming constructs (like sum types) which are very far from what a business man would understand (think of a business form in paper for the first example in the article). And the results are good, making the business logic in the program as similar as possible to the business logic in terms of business people.

tshaddox

This is pretty much the exact opposite of how I see the world with regards to data modeling. I suppose I'm a sum type radicalist. There are few non-trivial things in the world that I would model without heavy use of sum types.

bnpxft

Yes exactly. The real world is full of examples of a fixed set of exclusive options.

A programming language without sum types and exhaustive pattern matching in its type system is unable model this real world concept in its type system.

alpaca128

I just checked, my last tax return form contains at least four inputs equivalent to a sum type.

Sum types are not purely programming constructs, they are such an everyday concept that you didn't even notice it. Not only do business people understand the concept, without such a basic understanding they wouldn't have been able to get a job in the first place.

You don't need to know category theory to understand sum types.

josephg

Huh?

I really don't understand this perspective. Sum types are crazy useful for data modelling. A couple examples:

- Its quite common to need to model user input events. Eg, MouseClickEvent { mouseid, button }, KeyboardEvent { key, modifierFlags }, and so on. The most natural way to express this is using a sum type, with an event loop in your application matching out the different event types.

- Actually, most events follow this pattern. Eg, window management events, event sourcing (eg in kafka), actor mailbox events, etc.

- I've been working in the field of collaborative editing for the last ~decade. There, we talk a lot about editing operations. For text editing, they're either Insert { position, content } or Delete { position, length }. Some systems have a Replace type too. Sum types are a much more natural way to express operations.

- Results are sum types. Eg, if I submit a request to an RPC server, I could get back Ok(result) or Err(error). Thats a sum type. Likewise, this is what I want when I call into the POSIX API. I don't want to check the return value. I want a function that gives me either a result or an error. (And not a result-or-null and error-or-null like you get in Go.)

How would you elegantly express this stuff without sum types? I guess for input events you could use a class hierarchy - but that makes it infinitely harder to implement data oriented design. Things like serialization / deserialization become a nightmare.

Frankly, all the alternatives to sum types seem worse. Now I've spent enough time in languages which have sum types (typescript, rust, swift, etc), I can't go back. I just feel like I'm doing the compiler's job by hand, badly.

airstrike

100% agreed. Unrelated but your blog is awesome.

incrudible

I have the exact opposite opinion, sum types naturally and logically model actual problems regarding data transformation, objects with methods obfuscate them, which is seldom a worthy tradeoff, even if it gives hypothetical business people a sense of understanding something (do they really?).

However, if you find yourself needing to use one of these regrettably ubiquitous languages that do not support them properly, it is gonna be painful either way.

sevensor

Huh? Forms do this all the time: fill out all of section A for identifying information and section B-1 for a new application or B-2 for a renewal. Append schedule F if you will be frobnicating in the Commonwealth of Massachusetts. I’d model that with sum types any day of the week.

timewizard

> malformed JSON structs of hacking BAD data structure/logic design

The real world is not particularly well structured.

> a business form in paper

There are several types of paper forms. They're typically differentiated by a type identifier (a.k.a. Title) in their headers. Scaling one layer out paper forms themselves are actually a "sum type." They have a common form factor which makes them useful but require an initial blind examination to understand the context of the rest of the document.