Progressive JSON

235 comments

·June 1, 2025

goranmoomin

Seems like some people here are taking this post literally, as in the author (Dan Abramov) is proposing a format called Progressive JSON — it is not.

This is more of a post on explaining the idea of React Server Components where they represent component trees as javascript objects, and then stream them on the wire with a format similar to the blog post (with similar features, though AFAIK it’s bundler/framework specific).

This allows React to have holes (that represent loading states) on the tree to display fallback states on first load, and then only display the loaded component tree afterwards when the server actually can provide the data (which means you can display the fallback spinner and the skeleton much faster, with more fine grained loading).

(This comment is probably wrong in various ways if you get pedantic, but I think I got the main idea right.)

danabramov

Yup! To be fair, I also don't mind if people take the described ideas and do something else with them. I wanted to describe RSC's take on data serialization without it seeming too React-specific because the ideas are actually more general. I'd love if more ideas I saw in RSC made it to other technologies.

hn_throwaway_99

GraphQL has similar notions, e.g. @defer and @stream.

tough

hi dan! really interesting post.

do you think a new data serialization format built around easier generation/parseability and that also happened to be streamable because its line based like jsonld could be useful for some?

danabramov

I don’t know! I think it depends on whether you’re running into any of these problems and have levers to fix them. RSC was specifically designed for that so I was trying to explain its design choices. If you’re building a serializer then I think it’s worth thinking about the format’s characteristics.

krzat

Am I the only person that dislikes progressive loading? Especially if it involves content jumping around.

And the most annoying antipattern is showing empty state UI during loading phase.

danabramov

Right — that’s why the emphasis on intentionally designed loading states in this section: https://overreacted.io/progressive-json/#streaming-data-vs-s...

Quoting the article:

> You don’t actually want the page to jump arbitrarily as the data streams in. For example, maybe you never want to show the page without the post’s content. This is why React doesn’t display “holes” for pending Promises. Instead, it displays the closest declarative loading state, indicated by <Suspense>.

> In the above example, there are no <Suspense> boundaries in the tree. This means that, although React will receive the data as a stream, it will not actually display a “jumping” page to the user. It will wait for the entire page to be ready. However, you can opt into a progressively revealed loading state by wrapping a part of the UI tree into <Suspense>. This doesn’t change how the data is sent (it’s still as “streaming” as possible), but it changes when React reveals it to the user.

[…]

> In other words, the stages in which the UI gets revealed are decoupled from how the data arrives. The data is streamed as it becomes available, but we only want to reveal things to the user according to intentionally designed loading states.

dominicrose

Smalltalk UIs used to work with only one CPU thread. Any action from the user would freeze the whole UI while it was working, but the positive aspect of that is that it was very predictable and bug free. That's helpful since Smalltalk is OOP.

Since React is functional programming it works well with parallelization so there is room for experiments.

> Especially if it involves content jumping around.

I remember this from the beginning of Android, you'll search for something and click on it and the time it takes you to click the list of results changed and you clicked on something else. Happens with adds on some websites, maybe intentionally?

> And the most annoying antipattern is showing empty state UI during loading phase.

Some low quality software even show "There are no results for your search" when the search didn't even start or complete.

igouy

> Smalltalk UIs used to work with only one CPU thread. Any action from the user would freeze the whole UI while it was working …

If that happened maybe a programmer messed-up the green threads!

"The Smalltalk-80 system provides support for multiple independent processes with three classes named Process, ProcessorScheduler, and Semaphore. "

p251 "Smalltalk-80 The Language and it's Implementation"

https://rmod-files.lille.inria.fr/FreeBooks/BlueBook/Blueboo...

sdeframond

You might be interested in the "remote data" pattern (for lack of a better name)

https://www.haskellpreneur.com/articles/slaying-a-ui-antipat...

Szpadel

alternative is to stare at blank page without any indication that something is happening

withinboredom

It’s better than moving the link or button as I’m clicking it.

leptons

I'm sure that isn't the only alternative.

ahofmann

Or, you could use caches and other optimizations to serve content fast.

hinkley

Ember did something like this but it made writing Ajax endpoints a giant pain in the ass.

It’s been so long since I used ember that I’ve forgotten the terms, but essentially the rearranged the tree structure so that some of the children were at the end of the file. I believe it was meant to handle DAGs more efficiently but I may have hallucinated that recollection.

But if you’re using a SAX style streaming parser you can start making progress on painting and perhaps follow-up questions while the initial data is still loading.

Of course in a single threaded VM, you can snatch Defeat from the jaws of Victory if you bollocks up the order of operations through direct mistakes or code evolution over time.

vinnymac

I already use streaming partial json responses (progressive json) with AI tool calls in production.

It’s become a thing, even beyond RSCs, and has many practical uses if you stare at the client and server long enough.

motorest

Can you offer some detail into why you find this approach useful?

From an outsider's perspective, if you're sending around JSON documents so big that it takes so long to parse them to the point reordering the content has any measurable impact on performance, this sounds an awful lot like you are batching too much data when you should be progressively fetching child resources in separate requests, or even implementing some sort of pagination.

Wazako

Slow llm generation. A progressive display of a progressive json is mandatory.

tough

how do you do that exactly?

richin13

Not the original commenter but I’ve done this too with Pydantic AI (actually the library does it for you). See “Streaming Structured Output” here https://ai.pydantic.dev/output/#streaming-structured-output

danenania

One way is to eagerly call JSON.parse as fragments are coming in. If you also split on json semantic boundaries like quotes/closing braces/closing brackets, you can detect valid objects and start processing them while the stream continues.

null

[deleted]

jatins

I have seen Dan's "2 computers" talk and read some of his recent posts trying to explore RSC and their benefits.

Dan is one of the best explainers in React ecosystem but IMO if one has to work this hard to sell/explain a tech there's 2 possibilities 1/ there is no real need of tech 2/ it's a flawed abstraction

#2 seems somewhat true because most frontend devs I know still don't "get" RSC.

Vercel has been aggressively pushing this on users and most of the adoption of RSC is due to Nextjs emerging as the default React framework. Even among Nextjs users most devs don't really seem to understand the boundaries of server components and are cargo culting

That coupled with fact that React wouldn't even merge the PR that mentions Vite as a way to create React apps makes me wonder if the whole push for RSC is for really meant for users/devs or just as a way for vendors to push their hosting platforms. If you could just ship an SPA from S3 fronted with a CDN clearly that's not great for Vercels and Netflifys of the world.

In hindsight Vercel just hiring a lot of OG React team members was a way to control the future of React and not just a talent play

danabramov

You’re wrong about the historical aspects and motivations but I don’t have the energy to argue about it now and will save it for another post. (Vercel isn’t setting React’s direction; rather, they’re the ones who funded person-decades of work under the direction set by the React team.)

I’ll just correct the allegation about the Vite — it’s being worked on but the ball is largely in the Vite team’s court because it can’t work well without bundling in DEV (and the Vite team knows it and will be fixing that). The latest work in progress is here: https://github.com/facebook/react/pull/33152.

Re: people not “getting” it — you’re kind of making a circular argument. To refute it I would have to shut up. But I like writing and I want to write about the topics I find interesting! I think even if you dislike RSC, there’s enough interesting stuff there to be picked into other technologies. That’s really all I want at this point. I don’t care to convince you about anything but I want people to also think about these problems and to steal the parts of the solution that they like. Seems like the crowd here doesn’t mind that.

andrewingram

I also appreciate that you’re doing these explainers so that people don’t have to go the long way round understand what problems exists that call for certain shapes of solutions — especially when those solutions can feel contrived or complicated.

As someone who’s been building web UI for nearly 30 years (scary…), I’ve generally been fortunate enough that when some framework I use introduces a new feature or pattern, I know what they’re trying to do. But the only reason I know what they’re trying to do is because I’ve spent some amount of time running into the problems they’re solving. The first time I saw GraphQL back in 2015, I “got” it; 10 years later most people using GraphQL don’t really get it because they’ve had it forced upon them or chose it because it was the new shiny thing. Same was true of Suspense, server functions, etc.

liamness

You can of course still just export a static site and host it on a basic CDN, as you say. And you can self host Next.js in the default "dynamic" mode, you just need to be able to run an Express server, which hardly locks you into any particular vendor.

Where it gets a little more controversial is if you want to run Next.js in full fat mode, with serverless functions for render paths that can operate on a stale-while-revalidate basis. Currently it is very hard for anyone other than Vercel to properly implement that (see the opennextjs project for examples), due to undocumented "magic". But thankfully Next.js / Vercel have proposed to implement (and dogfood) adapters that allow this functionality to be implemented on different platforms with a consistent API:

https://github.com/vercel/next.js/discussions/77740

I don't think the push for RSC is at all motivated by the shady reasons you're suggesting. I think it is more about the realisation that there were many good things about the way we used to build websites before SPA frameworks began to dominate. Mostly rendering things on the server, with a little progressive enhancement on the client, is a pattern with a lot of benefits. But even with SSR, you still end up pushing a lot of logic to the client that doesn't necessarily belong there.

lioeters

> thankfully Next.js / Vercel have proposed to implement (and dogfood) adapters that allow this functionality to be implemented on different platforms with a consistent API:

Seeing efforts like this (started by the main dev of Next.js working at Vercel) convinces me that the Vercel team is honestly trying to be a good steward with their influence on the React ecosystem, and in general being a beneficial community player. Of course as a VC-funded company its purpose is self-serving, but I think they're playing it pretty respectably.

That said, there's no way I'm going to run Next.js as part of a server in production. It's way too fat and complicated. I'll stick with using it as a static site generator, until I replace it with something simpler like Vite and friends.

throwingrocks

> IMO if one has to work this hard to sell/explain a tech there's 2 possibilities 1/ there is no real need of tech 2/ it's a flawed abstraction

There’s of course a third option: the solution justifies the complexity. Some problems are hard to solve, and the solutions require new intuition.

It’s easy to say that, but it’s also easy to say it should be easier to understand.

I’m waiting to see how this plays out.

metalrain

While RSC as technology is interesting, I don't think it makes much sense in practice.

I don't want to have a fleet of Node/Bun backend servers that have to render complex components. I'd rather have static pages and/or React SPA with Go API server.

You get similar result with much smaller resources.

pas

It's convenient for integrating with backends. You can use async/await on the server, no need for hooks (callbacks) for data loading.

It allows for dynamism (user only sees the menus that they have permissions for), you can already show those parts that are already loaded while other parts are still loading.

(And while I prefer the elegance and clean separation of concerns that come with a good REST API, it's definitely more work to maintain both the frontend and the backend for it. Especially in caes where the backend-for-frontend integrates with more backends.)

So it's the new PHP (with ob_flush), good for dashboards and big complex high-traffic webshop-like sites, where you want to spare no effort to be able to present the best options to the dear customer as soon as possible. (And also it should be crawlable, and it should work on even the lowest powered devices.)

presentation

That's fine for you, but not all React users are you. It makes much sense in practice for me.

ec109685

How do you avoid having your users stare at spinners while their browser makes api calls (some of them depending on each other) in order to render the page?

robertoandred

RSCs work just fine with static deployments and SPAs. (All Next sites are SPAs.)

chamomeal

Tangent: next.js is pretty amazing but it’s still surprising to me that it’s become to default way to write react. I just don’t enjoy writing next.js apps even though typescript is my absolute favorite language, and I generally love react as well.

presentation

for what it's worth I am a NextJS developer and everyone on my team had a pretty easy time getting used to client/server components.

Do I wish that it were something like some kind of Haskell-style monad (probably doable in TypeScript!) or a taint or something, rather than a magic string comment at the top of the file? Sure, but it still doesn't seem to be a big deal, at least on my team.

Garlef

I think there's a world where you would use the code structuring of RSCs to compile a static page that's broken down into small chunks of html, css, js.

Basically: If you replace the "$1" placeholders from the article with URIs you wouldn't need a server.

(In most cases you don't need fully dynamic SSR)

The big downside is that you'd need a good pipeline to also have fast builds/updates in case of content changes: Partial streaming of the compiled static site to S3.

(Let's say you have a newspaper with thousands of prerendered articles: You'd want to only recompile a single article in case one of your authors edits the content in the CMS. But this means the pipeline would need to smartly handle some form of content diff)

danabramov

RSC is perfectly capable of being run at the build-time, which is the default. So that’s not too far from what you’re describing.

kenanfyi

I find your analysis very good and agree on why companies like Vercel are pushing hard on RSC.

foo42

[flagged]

tomhow

Please don't do this here. If a comment seems unfit for HN, please flag it and email us at hn@ycombinator.com so we can have a look.

kenanfyi

Sorry, what? Is it just my phrasing or my rant on VC-backed entities pushing things to gain advantage?

usrbinbash

Or here is a different approach:

We acknowledge that streaming data is not a problem that JSON was intended, or designed, to solve, and ... not do that.

If an application has a usecase that necessitates sending truly gigantic JSON objects across the wire, to the point where such a scheme seems like a good idea, the much better question to ask is "why is my application sending ginormeous JSON objects again?"

And the answer is usually this:

Fat clients using bloated libraries and ignoring REST, trying to shoehorn JSON into a "one size fits all" solution, sending first data, then data + metadata, then data + metadata + metadata describing the interface, because finally we came full circle and re-invented a really really bad version of REST that requires several MB of minified JS for the browser to use.

Again, the solution is not to change JSON, the solution is to not do the thing that causes the problem. Most pages don't need a giant SPA framework.

hyfgfh

The thing I have seem in performance is people trying to shave ms loading a page, while they fetch several mbs and do complex operations in the FE, when in the reality writing a BFF, improving the architecture and leaner APIs would be a more productive solution.

We tried to do that with GraphQL, http2,... And arguably failed. Until we can properly evolve web standards we won't be able to fix the main issue. Novel frameworks won't do it either

danabramov

RSC, which is described at the end of this post, is essentially a BFF (with the API logic componentized). Here’s my long post on this topic: https://overreacted.io/jsx-over-the-wire/ (see BFF midway in the first section).

MaxBav

[dead]

onion2k

Doesn't that depend on what you mean by "shave ms loading a page"?

If you're optimizing for time to first render, or time to visually complete, then you need to render the page using as little logic as possible - sending an empty skeleton that then gets hydrated with user data over APIs is fastest for a user's perception of loading speed.

If you want to speed up time to first input or time to interactive you need to actually build a working page using user data, and that's often fastest on the backend because you reduce network calls which are the slowest bit. I'd argue most users actually prefer that, but it depends on the app. Something like a CRUD SAAS app is probably best rendered server side, but something like Figma is best off sending a much more static page and then fetching the user's design data from the frontend.

The idea that there's one solution that will work for everything is wrong, mainly because what you optimise for is a subjective choice.

And that's before you even get to Dev experience, team topology, Conway's law, etc that all have huge impacts on tech choices.

MrJohz

> sending an empty skeleton that then gets hydrated with user data over APIs is fastest for a user's perception of loading speed

This is often repeated, but my own experience is the opposite: when I see a bunch of skeleton loaders on a page, I generally expect to be in for a bad experience, because the site is probably going to be slow and janky and cause problems. And the more the of the site is being skeleton-loaded, the more my spirits worsen.

My guess is that FCP has become the victim of Goodhart's Law — more sites are trying to optimise FCP (which means that _something_ needs to be on the screens ASAP, even if it's useless) without optimising for the UX experience. Which means delaying rendering more and adding more round-trips so that content can be loaded later on rather than up-front. That produces sites that have worse experiences (more loading, more complexity), even though the metric says the experience should be improving.

PhilipRoman

It also breaks a bunch of optimizations that browsers have implemented over the years. Compare how back/forward history buttons work on reddit vs server side rendered pages.

Bjartr

> the experience should be improving

I think it's more the bounce rate is improving. People may recall a worse experience later, but more will stick around for that experience if they see something happen sooner.

motorest

> If you're optimizing for time to first render, or time to visually complete, then you need to render the page using as little logic as possible - sending an empty skeleton that then gets hydrated with user data over APIs is fastest for a user's perception of loading speed.

I think that OP's point is that these optimization strategies are completely missing the elephant in the room. Meaning, sending multi-MB payloads creates the problem, and shaving a few ms here and there with more complexity while not looking at the performance impact of having to handle multi-MB payloads doesn't seem to be an effective way to tackle the problem.

FridgeSeal

> speed up time to first input or time to interactive you need to actually build a working page using user data, and that's often fastest on the backend because you reduce network calls which are the slowest bit.

It’s only fastest to get the loading skeleton onto the page.

My personal experience with basically any site that has to go through this 2-stage loading exercise is that:

- content may or may not load properly.

- I will probably be waiting well over 30 seconds for the actually-useful-content.

- when it does all load, it _will_ be laggy and glitchy. Navigation won’t work properly. The site may self-initiate a reload, button clicks are…50/50 success rate for “did it register, or is it just heinously slow”.

I’d honestly give up a lot of fanciness just to have “sites that work _reasonably_” back.

zelphirkalt

30s is probably an exaggeration even for most bad websites, unless you are on a really poor connection. But I agree with the rest of it. Often it isn't even a 2-stages thing but an n-stages thing that happens there.

presentation

One huge point of RSC is that you can use your super heavyweight library in the backend, and then not send a single byte of it to the frontend, you just send its output. It's a huge win in the name of shaving way more than ms from your page.

One example a programmer might understand - rather than needing to send the grammar and code of a syntax highlighter to the frontend to render formatted code samples, you can keep that on the backend, and just send the resulting HTML/CSS to the frontend, by making sure that you use your syntax highlighter in a server component instead of a client component. All in the same language and idioms that you would be using in the frontend, with almost 0 boilerplate.

And if for some reason you decide you want to ship that to the frontend, maybe because you want a user to be able to syntax highlight code they type into the browser, just make that component be a client component instead of a server component, et voila, you've achieved it with almost no code changes.

Imagine what work that would take if your syntax highlighter was written in Go instead of JS.

xiphias2

At least this post explains why when I load a Facebook page the only thing that really matters (the content) is what loads last

globalise83

When I load a Facebook page the content that matters doesn't even load.

kristianp

What's a BFF in this context? Writing an AI best friend isn't all that rare these days...

continuational

BFF (pun intended?) in this context means "backend for frontend".

The idea is that every frontend has a dedicated backend with exactly the api that that frontend needs.

zelphirkalt

It is a terrible idea organizationally. It puts backend devs at the whims of often hype train and CV driven development of frontend devs. What often happens is, that complexity is moved from the frontend to the backend. But that complexity is not necessarily implicit, but often self inflicted accidental complexity by choices in frontend. The backend API should facilitate getting the required data to render pages and perform required operations to interact with that data. Everything else is optimization that one may or may not need.

elcomet

Too many acronyms, what's FE, BFF?

aeinbu

I was asking the same questions.

- FE is short for the Front End (UI)

- BFF is short for Backend For Frontend

holoduke

Front end and a backend for a frontend. In which you generally design apis specific for a page by aggregating multiple other apis, caching, transforming etc.

jerf

There are at least two other alternatives I'd reach for before this.

Probably the simplest one is to refactor the JSON to not be one large object. A lot of "one large objects" have the form {"something": "some small data", "something_else": "some other small data", results: [vast quantities of identically-structured objects]}. In this case you can refactor this to use JSON lines. You send the "small data" header bits as a single object. Ideally this incorporates a count of how many other objects are coming, if you can know that. Then you send each of the vast quantity of identically-structed objects as one-line each. Each of them may have to be parsed in one shot but many times each individual one is below the size of a single packet, at which point streamed parsing is of dubious helpfulness anyhow.

This can also be applied recursively if the objects are then themselves large, though that starts to break the simplicity of the scheme down.

The other thing you can consider is guaranteeing order of attributes going out. JSON attributes are unordered, and it's important to understand that when no guarantees are made you don't have them, but nothing stops you from specifying an API in which you, the server, guarantee that the keys will be in some order useful for progressive parsing. (I would always shy away from specifying incoming parameter order from clients, though.) In the case of the above, you can guarantee that the big array of results comes at the end, so a progressive parser can be used and you will guarantee that all the "header"-type values come out before the "body".

Of course, in the case of a truly large pile of structured data, this won't work. I'm not pitching this as The Solution To All Problems. It's just a couple of tools you can use to solve what is probably the most common case of very large JSON documents. And both of these are a lot simpler than any promise-based approach.

xelxebar

Very cool point, and it applies to any tree data in general.

I like to represent tree data with parent, type, and data vectors along with a string table, so everything else is just small integers.

Sending the string table and type info as upfront headers, we can follow with a stream of parent and data vector chunks, batched N nodes at a time. Tye depth- or breadth-first streaming becomes a choice of ordering on the vectors.

I'm gonna have to play around with this! Might be a general way to get snappier load time UX on network bound applications.

thethimble

You can even alternate between sending table and node chunks! This will effectively allow you to reveal the tree in any order including revealing children before parents as well as representing arbitrary graphs! Could lead to some interesting applications.

xelxebar

Good point! The parent vector rep is what allows arbitrary node order, but chunking the table data off chunks of node IDs is brilliant idea. Cheers!

dmkolobov

If you send the tree in preorder traversal order with known depth, you can send the tree without node ids or parent ids! You can just send the level for each node and recover the tree structure with a stack.

xelxebar

Well the whole point is to use a breadth first order here. I don't think there's a depth vector analogue for breadth first traversals. Is there?

But, indeed, depth vectors are nice and compact. I find them harder to work with most of the time, though, especially since insertions and deletions become O(n), compared to parent vector O(1).

That said, I do often normalize my parent vectors into dfpo order at API boundaries, since a well-defined order makes certain operations, like finding leaf siblings, much nicer.

dmkolobov

Yeah, it has its limits for sure. I like it for the streaming aspect.

I think you can still have the functionality described in the article: you would send “hole” markers tagged with their level. Then, you could make additional requests when you encounter these markers during the recovery phase, possibly with buffering of holes. It becomes a sort of hybrid DFS/BFS approach where you send as much tree structure at a time as you want.

ummonk

I’m not familiar with depth vectors, but wouldn’t the breadth first traversal analogue of each entry specifying its depth (in a depth first format) be each entry specifying the number of immediate children it has?

x-complexity

... It might be a pursuit worth making a small library for.

null

[deleted]

Velorivox

99.9999%* of apps don't need anything nearly as 'fancy' as this, if resolving breadth-first is critical they can just make multiple calls (which can have very little overhead depending on how you do it).

* I made it up - and by extension, the status quo is 'correct'.

danabramov

To be clear, I wouldn't suggest someone to implement this manually in their app. I'm just describing at the high level how the RSC wire protocol works, but narratively I wrapped it in a "from the first principles" invention because it's more fun to read. I don't necessarily try to sell you on using RSC either but I think it's handy to understand how some tools are designed, and sometimes people take ideas from different tools and remix them.

Velorivox

I get that. Originally my comment was a response to another but I decided to delete and repost it at the top level — however I failed to realize that not having that context makes the tone rather snarky and/or dismissive of the article as a whole, which I didn't intend.

danabramov

Np, fair enough!

conartist6

I'm already thinking of whether there's any ideas here I might take for CSTML -- designed as a streaming format for arbitrary data but particularly for parse trees

neRok

Multiple calls?! That sounds like n*n+1. Gross :P

I think the issue with the example json is that it's sent in OOP+ORM style (ie nested objects), whereas you could just send it as rows of objects, something like this;

  {
    header: "Welcome to my blog",
    post_content: "This is my article",
    post_comments: [21,29,88], # the numbers are the comment ID's
    footer: "Hope you like it",
    comments: {21: "first", 29: "second", 88: "third" }
  }

But then you may as well just go with protobufs or something, so your endpoints and stuff are all typed and defined, something like this;

  syntax = "proto3";
  service DirectiveAffectsService {
    rpc Get(GetPageWithPostParams) returns (PageWithPost);
  }
  message GetPageWithPostParams {
    string post_id = 1;
  }
  message PageWithPost {
    string page_header = 1;
    string page_footer = 2;
    string post_content = 3;
    repeated string post_comments = 4;
    repeated CommentInPost comments_for_post = 5;
  }
  message CommentInPost {
    string comment_id = 1;
    string comment_text = 2;
  }

And with this style, you don't necessarily need to embed the comments in 1 call like this, and you could cleanly do it in 2 like parent-comment suggests (1 to get page+post, second to get comments), which might be aided with `int32 post_comment_count = 4;` instead (so you can pre-render n blocks).

xtajv

There's nothing wrong with "accidentally-overengineering" in the sense of having off-the-shelf options that are actually nice.

There is something wrong with adding a "fancy" feature to an off-the-shelf option, if said "fancy" feature is realistically "a complicated engineering question, for which we can offer a leaky abstraction that will ultimately trip up anybody who doesn't have the actual mechanics in mind when using it".

motorest

> There's nothing wrong with "accidentally-overengineering" in the sense of having off-the-shelf options that are actually nice.

Your comment focuses on desired outcomes (i.e., "nice" things), but fails to acknowledge the reality of tradeoffs. Over engineering a solution always creates problems. Systems become harder to reason with, harder to maintain, harder to troubleshoot. For example, in JSON arrays are ordered lists. If you onboard an overengineered tool that arbitrarily reorders elements in a JSON array, things can break in non-trivial ways. And they often do.

echelon

We technically didn't need more than 640K either.

Having progressive or partial reads would dramatically speed up applications, especially as we move into an era of WASM on the frontend.

A proper binary encoded format like protobuf with support for partial reads and well defined streaming behavior for sub message payloads would be incredible.

It puts more work on the engineer, but the improvement to UX could be massive.

pigbearpig

Sure, if you’re the 0.00001% that need that. It’s going to be over engineering for most cases. There are so many simpler and easier to support things that can be done before trying this sort of thing.

Following the example, why is all the data in one giant request? Is the DB query efficient? Is the DB sized correctly? How about some caching? All boring, but if rather support and train someone on boring stuff.

null

[deleted]

atombender

You could stream incrementally like this without explicitly demarcating the "holes". You can simply send the unfinished JSON (with empty arrays as the holes), then compute the next iteration and send a delta, then compute the next and send a delta, and so on.

A good delta format is Mendoza [1] (full disclosure: I work at Sanity where we developed this), which has Go and JS/TypeScript [2] implementations. It expresses diffs and patches as very compact operations.

Another way is to use binary digging. For example, zstd has some nifty built-in support for diffing where you can use the previous version as a dictionary and then produce a diff that can be applied to that version, although we found Mendoza to often be as small as zstd. This approach also requires treating the JSON as bytes and keeping the previous binary snapshot in memory for the next delta, whereas a Mendoza patch can be applied to a JavaScript value, so you only need the deserialized data.

This scheme would force you to compare the new version for what's changed rather than plug in exactly what's changed, but I believe React already needs to do that? Also, I suppose the Mendoza applier could be extended to return a list of keys that were affected by a patch application.

[1] https://github.com/sanity-io/mendoza

[2] https://github.com/sanity-io/mendoza-js

andrewingram

For the use case of streaming data for UI, I don’t think empty arrays and nulls are sufficient information. At any moment during the stream, you need the ability to tell what data is pending.

If pending arrays are just returned as empty arrays, how do I know if it’s empty because it’s actually empty, or empty because it’s pending?

GraphQL’s streaming payloads try to get the best of both worlds, at any point in time you have a valid payload according the GraphQL schema - so it’s possible to render some valid UI, but it also communicates what paths contain pending data, and then subsequent payloads act as patches (though not as sophisticated as Mendoza’s).

atombender

As I commented in https://news.ycombinator.com/item?id=44150238, all you need is a way to express what is pending, which can be done using JSON key paths.

Of course, you could do it in-band, too:

    {"comments": {"state": "pending", "values": []}}

…at the cost of needing your data model to be explicit about it. But this has the benefit of being diffable, of course, so once the data is available, the diff is just the new state and the new values.

andrewingram

Yes, hence the last paragraph in my comment :)

__mattya

They want to know where the holes are so that they can show a loading state.

atombender

You don't need templating ($1 etc.) for that as long as you can describe the holes somehow, which can be done out-of-band.

If we imagine a streaming protocol of key/value pairs that are either snapshots or deltas:

    event: snapshot
    data: {"topPost":[], "user": {"comments": []}}
    pending: topPosts,user.comments

    event: delta
    data: [17,{"comments":[{"body":"hello world"}]},"user"]
    pending: topPosts

turtlebits

Progressive JPEG make sense, because it's a media file and by nature is large. Text/HTML on the other hand, not so much. Seems like a self-inflicted solution where JS bundles are giant and now we're creating more complexity by streaming it.

danabramov

Things can be slow not because they're large but because they take latency to produce or to receive. The latency can be on the server side (some things genuinely take long to query, and might be not possible or easy to cache). Some latency may just be due to the user having poor network conditions. In both cases, there's benefits to progressively revealing content as it becomes available (with intentional loading stages) instead of always waiting for the entire thing.

whilenot-dev

Agree with everything you're saying here, but to be fair I think the analogy with Progressive JPEG doesn't sit quite right with your concept. What you're describing sounds more like "semantic-aware streaming" - it's as if a Progressive JPEG would be semantically aware of its blob and load any objects that are in focus first before going after data for things that are out of focus.

I think that's a very contemporary problem and worth pursuing, but I also somehow won't see that happening in real-time (with the priority to reduce latency) without necessary metadata.

danabramov

It’s not an exact analogy but streaming outside-in (with gradually more and more concrete visual loading states) rather than top-down feels similar to a progressive image to me.

null

[deleted]

camgunz

I don't mean to be dismissive, but haven't we solved this by using different endpoints? There's so many virtues: you avoid head of line blocking; you can implement better filtering (eg "sort comments by most popular"); you can do live updates; you can iterate on the performance of individual objects (caching, etc).

---

I broadly see this as the fallout of using a document system as an application platform. Everything wants to treat a page like a doc, but applications don't usually work that way, so lots of code and infra gets built to massage the one into the other.

danabramov

Sort of! I have two (admittedly long) articles on this topic, comparing how the code tends to evolve with separate endpoints and what the downsides are:

- https://overreacted.io/one-roundtrip-per-navigation/

- https://overreacted.io/jsx-over-the-wire/

The tldr is that endpoints are not very fluid — they kind of become a "public" API contract between two sides. As they proliferate and your code gets more modular, it's easy to hurt performance because it's easy to introduce server/client waterfalls at each endpoint. Coalescing the decisions on the server as a single pass solves that problem and also makes the boundaries much more fluid.

camgunz

Oh I see. Maybe a way to restate this is "how do I communicate the costs of data to the client", that is the cost of returning top-level user data is let's just say 1; the cost of returning the last 10 comments is 2, and the cost of returning older comments is 2000. Because otherwise pushing that set of decisions back to the server doesn't exactly solve it, it just means you now actually can make that decision server side, even though you're still waiting a long time on comment 11 no matter what.

Re your "JSX Over The Wire" post, I think we've gone totally around the bend. A piece of code that takes 0 or more responses from a data backend and returns some kind of HTML is a web service. Like, that's CGI, that's PHP, that's Rails, Node, Django, whatever. If the argument here is "the browser should have some kind of state tracking/reactivity built in, and until that day we have a shim like jQuery or the old school thin React or the new school htmx" then OK, but this is so, so much engineering to elide `onclick` et al.

---

I kind of worry that we've spent way, way too much time in these weeds. There's millions and millions of lines of React out there, and certainly the majority of it is "stitch the responses of these N API calls together into a view/document, maybe poll them for updates from time to time", to the degree that AI just does it now. If it's so predictable that a couple of video cards can do it in a few seconds, why have we spent gazillions of engineering years polishing this?

rictic

If you've got some client side code and want to parse and render JSON progressively, try out jsonriver: https://github.com/rictic/jsonriver

Very simple API, takes a stream of string chunks and returns a stream of increasingly complete values. Helpful for parsing large JSON, and JSON being emitted by LLMs.

Extensively tested and performance optimized. Guaranteed that the final value emitted is identical to passing the entire string through JSON.parse.

jumploops

What’s the benefit of `jsonriver` over one of the myriad of “best effort” parsers[0][1][2] in a try/catch loop while streaming?

[0]https://github.com/beenotung/best-effort-json-parser

[1]https://github.com/unjs/destr

[2]https://www.npmjs.com/package/json-parse-even-better-errors

rictic

Good question. jsonriver is well optimized, exhaustively tested (tens of thousands of test cases), and provides a number of potentially useful invariants[0] about its parsing.

jsonriver's performance comes primarily from simplicity, and doing as little work per character as we can. Repeatedly reparsing from scratch on the other hand gets expensive quick, your parse time is quadratic in the length of the string to parse.

[0] https://github.com/rictic/jsonriver?tab=readme-ov-file#invar...

jumploops

I've been looking for a better solution for awhile now (if you couldn't tell), and will definitely try out jsonriver for our use-case. Thanks!

jarym

This appears conceptually similar to something like line-delimited JSON with JSON Patch[1].

Personally I prefer that sort of approach - parsing a line of JSON at a time and incrementally updating state feels easier to reason and work with (at least in my mind)

[1] https://en.wikipedia.org/wiki/JSON_Patch