MessagePack: It's like JSON, but fast and small.
43 comments
·January 11, 2025buserror
masklinn
> Object and Array cannot be streamed when writing. They require a 'count' at the beginning
Most languages know exactly how many elements a collection has (to say nothing of the number of members in a struct).
touisteur
I think the pattern in question might be (for example) the way some people (like me) sometimes write JSON as a trace of execution, sometimes directly to stdout (so, no going back in the stream). You're not serializing a structure but directly writing it as you go. So you don't know in advance how many objects you'll have in an array.
ubutler
Although MessagePack is definitely not a drop-in replacement for JSON, it is certainly extremely useful.
Unlike JSON, you can’t just open a MessagePack file in Notepad or vim and have it make sense. It’s often not human readable. So using MessagePack to store config files probably isn’t a good idea if you or your users will ever need to read them for debugging purposes.
But as a format for something like IPC or high-performance, low-latency communication in general, MessagePack brings serious improvements over JSON.
I recently had to build an inference server that needed to be able to communicate with an API server with minimal latency.
I started with gRPC and protobuf since it’s what everyone recommends, yet after a lot of benchmarking, I found a way faster method to be serving MessagePack over HTTP with a Litestar Python server (it’s much faster than FastAPI), using msgspec for super fast MessagePack encoding and ormsgpack for super fast decoding.
Not sure how this beat protobuf and gRPC but it did. Perhaps the Python implementation is just slow. It was still faster than JSON over HTTP, however.
null
sd9
It drops the most useful aspect of JSON, which is that you can open it in a text editor.
It's like JSON in that it's a serialisation format.
janalsncm
In my experience protobuf was smaller than MessagePack. I even tried compressing both with zstd and protobuf was still smaller. On the other hand protobuf is a lot less flexible.
comonoid
MessagePack is self-describing (it contains tags like "next bytes are an integer"), but Protobuf uses external scheme.
toomim
MessagePack saves a little bit of space and CPU ... but not a lot:
https://media.licdn.com/dms/image/v2/D5612AQF-nFt1cYZhKg/art...
Source: https://www.linkedin.com/pulse/json-vs-messagepack-battle-da...
koito17
An approximate 20% reduction in bandwidth looks significant to me. I think the problem here is that the chart uses a linear scale instead of a logarithmic scale.
Looking at the data, I'm inclined to agree that not much CPU is saved, but the point of MessagePack is to save bandwidth, and it seems to be doing a good job at that.
motorest
> An approximate 20% reduction in bandwidth looks significant to me.
Significante with regards to what? Not doing anything? Flipping the toggle to compress the response?
tasuki
> An approximate 20% reduction in bandwidth looks significant to me.
To me it doesn't. There's compression for much bigger gains. Or just, you know, just send less data?
I've worked at a place where our backend regularly sent humongous jsons to all the connected clients. We were all pretty sure this could be reduced by 95%. But, who would try to do that? There wasn't a business case. If someone tried succeeded, no one would notice. If someone tried and broke something, it'd look bad. So, status quo...
bythreads
In a system that requires the absolute speediest throughput compression is actually usually the worst thing in a parsechain - so parsing without first decompression is valuable.
I've tried messagepack a few times, but to be honest the hassle of the debugging was never really worth it
fshafique
Does it solve the problem of repeating set of keys in an object array, eg. when representing a table?
I don't think using a dictionary of key values is the way to go here. I think there should be a dedicated "table" type, where the column keys are only defined once, and not repeated for every single row.
ubutler
MessagePack can encode rows as well and then you just need to manage linking the keys during deserialization. In fact, it can encode arbitrary binary without needing base64 like JSON.
feverzsj
You can just use array of array like most scientific applications do.
karteum
I discovered JSON Binpack recently, which works either schemaless (like msgpack) or - supposedly more efficiently - with a schema. I haven't tried the codebase yet but it looks interesting.
slurpyb
Shouts to msgspec - i havent had a project without it in awhile.
ubutler
+1 It’s almost as indispensable as tqdm for a data scientist at least.
eeasss
Serialziation vulnerabilities anyone
mdhb
CBOR: It’s like JSON but fast and small but also an official IETF standard.
camgunz
Disclaimer: I wrote and maintain a MessagePack implementation.
CBOR is MessagePack. The story is that Carsten Bormann wanted to create an IETF standardized MP version, the creators asked him not to (after he acted in pretty bad faith), he forked off a version, added some very ill-advised tweaks, named it after himself, and submitted it anyway.
I wrote this up years ago (https://news.ycombinator.com/item?id=14072598), and since then the only thing they've addressed is undefined behavior when a decoder encounters an unknown simple value.
nmadden
CBOR is basically a fork of MsgPack. I prefer the original - it’s simpler and there are more high-quality implementations available.
jpc0
CBOR is actively used in the WebAuthN spec (passkeys) so browsers ship with en implementation... And if you intend to support it even via a library you will be shipping an implementation as well.
https://www.w3.org/TR/webauthn-2/#sctn-conforming-all-classe...
camgunz
Disclaimer: I wrote and maintain a MessagePack implementation.
Reading through this, it looks like they toss out indefinite length values, "canonicalization", and tags, making it essentially MP (MP does have extension types, I should say).
https://fidoalliance.org/specs/fido-v2.0-ps-20190130/fido-cl...
throw5959
Which Web API can encode and decode CBOR? I'm not aware of any, and unless I'm mistaken you will need to ship your own implementation in any case.
feverzsj
It's too complex, and the implementation is poor[0].
[0]: https://github.com/getml/reflect-cpp/tree/main/benchmarks
otabdeveloper4
CBOR is a standard, not an implementation.
As a standard it's almost exactly the same as MsgPack, the difference is mostly just that CBOR filled out underspecified parts of MsgPack. (Things like how extensions for custom types work, etc.)
tjoff
Implementation is poor because of performance?
Performance is just one aspect, and using poor to describe it is very misleading. Say not performant if that is what you meant.
agieocean
Made an open image format with this for constrained networks and it works great
mattbillenstein
I've built a few systems using msgpack-rpc - serves really well as a transport format in my experience!
I played quite a bit with MessagePack, used it for various things, and I don't like it. My primary gripes are:
+ The Object and Array needs to be entirely and deep parsed. You cannot skip them.
+ Object and Array cannot be streamed when writing. They require a 'count' at the beginning, and since the 'count' size can vary in number of bytes, you can't even "walk back" and update it. It would have been MUCH, MUCH better to have a "begin" and "end" tag --- err pretty much like JSON has, really.
You can alleviate the problems by using extensions, store a byte count to skip etc etc but really, if you start there, might as well use another format altogether.
Also, from my tests, it is not particularly more compact, unless again you spend some time and add a hash table for keys and embed that -- but then again, at that point where it becomes valuable, might as well gzip the JSON!
So in the end it is a lot better in my experience to use some sort of 'extended' JSON format, with the idiocies removed (trailing commas, forcing double-quote for keys etc).