We replaced H.264 streaming with JPEG screenshots (and it worked better)

133 comments

·December 23, 2025

mikepavone

> When the network is bad, you get... fewer JPEGs. That’s it. The ones that arrive are perfect.

This would make sense... if they were using UDP, but they are using TCP. All the JPEGs they send will get there eventually (unless the connection drops). JPEG does not fix your buffering and congestion control problems. What presumably happened here is the way they implemented their JPEG screenshots, they have some mechanism that minimizes the number of frames that are in-flight. This is not some inherent property of JPEG though.

> And the size! A 70% quality JPEG of a 1080p desktop is like 100-150KB. A single H.264 keyframe is 200-500KB. We’re sending LESS data per frame AND getting better reliability.

h.264 has better coding efficiency than JPEG. For a given target size, you should be able to get better quality from an h.264 IDR frame than a JPEG. There is no fixed size to an IDR frame.

Ultimately, the problem here is a lack of bandwidth estimation (apart from the sort of binary "good network"/"cafe mode" thing they ultimately implemented). To be fair, this is difficult to do and being stuck with TCP makes it a bit more difficult. Still, you can do an initial bandwidth probe and then look for increasing transmission latency as a sign that the network is congested. Back off your bitrate (and if needed reduce frame rate to maintain sufficient quality) until transmission latency starts to decrease again.

WebRTC will do this for you if you can use it, which actually suggests a different solution to this problem: use websockets for dumb corporate network firewall rules and just use WebRTC everything else

eichin

Probably either (1) they don't request another jpeg until they have the previous one on-screen (so everything is completely serialized and there are no frames "in-flight" ever) (2) they're doing a fresh GET for each and getting a new connection anyway (unless that kind of thing is pipelined these days? in which case it still falls back to (1) above.)

01HNNWZ0MV43FF

You can still get this backpressure properly even if you're doing it push-style. The TCP socket will eventually fill up its buffer and start blocking your writes. When that happens, you stop encoding new frames until the socket is able to send again.

The trick is to not buffer frames on the sender.

adamjs

They might want to check out what VNC has been doing since 1998– keep the client-pull model, break the framebuffer up into tiles and, when client requests an update, perform a diff against last frame sent, composite the updated tiles client-side. (This is what VNC falls back to when it doesn’t have damage-tracking from the OS compositor)

This would really cut down on the bandwidth of static coding terminals where 90% of screen is just cursor flashing or small bits of text moving.

If they really wanted to be ambitious they could also detect scrolling and do an optimization client-side where it translates some of the existing areas (look up CopyRect command in VNC).

djmips

The blog post did smell of inexperience. Glad to hear there is other approaches - is something like that open source?

cogman10

Yup. Go look into tigervnc if you want to see the source. But also you can just search for "tigervnc h.264" and you'll see extensive discussions between the devs on h.264 and integrating it into tiger. This is something that people spent a LOT of brainpower on.

Dylan16807

> When the network is bad, you get... fewer JPEGs. That’s it. The ones that arrive are perfect.

You can have still have weird broken stallouts though.

I dunno, this article has some good problem solving but the biggest and mostly untouched issue is that they set the minimum h.264 bandwidth too high. H.264 can do a lot better than JPEG with a lot less bandwidth. But if you lock it at 40Mbps of course it's flaky. Try 1Mbps and iterate from there.

And going keyframe-only is the opposite of how you optimize video bandwidth.

HelloUsername

> Try 1Mbps and iterate from there.

From the article:

“Just lower the bitrate,” you say. Great idea. Now it’s 10Mbps of blocky garbage that’s still 30 seconds behind.

Dylan16807

Rejecting it out of hand isn't actually trying it.

10Mbps is still way too high of a minimum. It's more than YouTube uses for full motion 4k.

And it would not be blocky garbage, it would still look a lot better than JPEG.

vscode-rest

1Mbps for video is rule of thumb I use. Of course that will depend on customer expectations. 500K can work, but it won’t be pretty.

brigade

Proper rate control for such realtime streaming would also lower framerate and/or resolution to maintain the best quality and latency they can over dynamic network conditions and however little bandwidth they have. The fundamental issue is that they don't have this control loop at all, and are badly simulating it by polling JPEGs.

j45

It might be possible to buffer and queue jpegs for playback as well to help with weird broken stall outs.

Video players used to call it buffering, and resolving it was called buffering issues.

Players today can keep an eye on network quality while playing too, which is neat.

kccqzy

There are so many things that I would have done differently.

> We added a keyframes_only flag. We modified the video decoder to check FrameType::Idr. We set GOP to 60 (one keyframe per second at 60fps). We tested.

Why muck around with P-frames and keyframes? Just make your video 1fps.

> Now it’s 10Mbps of blocky garbage that’s still 30 seconds behind.

10 Mbps is way too much. I occasionally watch YouTube videos where someone writes code. I set my quality to 1080p to be comparable with the article and YouTube serves me the video at way less than 1Mbps. I did a quick napkin math for a random coding video and it was 0.6Mbps. It’s not blocky garbage at all.

mdavid626

Setting to 1 FPS might not be enough. GOP or P frame setting needs to be adjusted to make every frame keyframe.

null

[deleted]

taberiand

This blog post smells of LLM, both in the language style and the muddled explanations / bad technical justifications. I wouldn't be surprised if their code is also vibe coded slop.

andai

Many moons ago I was using this software which would screenshot every five seconds and give you a little time lapse and the end of the day. So you could see how you were spending your computer time.

My hard disk ended up filling up with tens of gigabytes of screenshots.

I lowered the quality. I lowered the resolution, but this only delayed the inevitable.

One day I was looking through the folder and I noticed well almost all the image data on almost all of these screenshots is identical.

What if I created some sort of algorithm which would allow me to preserve only the changes?

I spent embarrassingly long thinking about this before realizing that I had begun to reinvent video compression!

So I just wrote a ffmpeg one-liner and got like 98% disk usage reduction :)

Tarean

Having pair programmed over some truly awful and locked down connections before, dropped frames are infinitely better than blurred frames which make text unreadable whenever the mouse is moved. But 40mbps seems an awful lot for 1080p 60fps.

Temporal SVC (reduce framerate if bandwidth constrained) is pretty widely supported by now, right? Though maybe not for H.264, so it probably would have scaled nicely but only on Webrtc?

keerthiko

> The fix was embarrassingly simple: once you fall back to screenshots, stay there until the user explicitly clicks to retry.

There is another recovery option:

- increase the JPEG framerate every couple seconds until the bandwidth consumption approaches the H264 stream bandwidth estimate

- keep track latency changes. If the client reports a stable latency range, and it is acceptable (<1s latency, <200ms variance?) and bandwidth use has reached 95% of H264 estimate, re-activate the stream

Given that text/code is what is being viewed, lower res and adaptive streaming (HLS) are not really viable solutions since they become unreadable at lower res.

If remote screen sharing is a core feature of the service, I think this is a reasonable next step for the product.

That said, IMO at a higher level if you know what you're streaming is human-readable text, it's better to send application data pipes to the stream rather than encoding screenspace videos. That does however require building bespoke decoders and client viewing if real time collaboration network clients don't already exist for the tools (but SSH and RTC code editors exist)

STELLANOVA

We did something similar +12 years ago with `streaming` AWS running app inside the browser. Basically you can run 3d studio max on chromebook. App is actually running on AWS instance and it just sending jpegs to browser to `stream` it. We did a lot of QoS logic and other stuff but it was actually working pretty nice. Adobe used it for some time to allow user to run Photoshop in the browser. Good old days..

laurencerowe

If you are ok with a second or so of latency then MPEG-DASH (standardized version of HTTP Live Streaming) is likely the best bet. You simply serve the video chunks over HTTP so it should be just as compatible as the JPEG solution used here but provide 60fps video rather than crappy jpegs.

The standard supports adaptive bit rate playback so you can provide both low quality and high quality videos and players can switch depending on bandwidth available.

robrain

"Think “screen share, but the thing being shared is a robot writing code.”"

Thinks: why not send text instead of graphics, then? I'm sure it's more complicated than that...

jodrellblank

Thinks: this video[1] is the processed feed from the Huygens space probe landing on Saturn's moon Titan circa 2005. Relayed through the Cassini probe orbiting Saturn, 880 million miles from the Sun. At a total mission cost of 3.25 billion dollars. This is the sensor data, altitude, speed, spin, ultra violet, and hundreds of photos. (Read the description for what the audio is encoding, it's neat!)

Look at the end of the video, the photometry data count stops at "7996 kbytes received"(!)

> "Turns out, 40Mbps video streams don’t appreciate 200ms+ network latency. Who knew. “Just lower the bitrate,” you say. Great idea. Now it’s 10Mbps of blocky garbage"

Who could do anything useful with 10Mbps. :/

[1] https://en.wikipedia.org/wiki/File:Huygens_descent.ogv

bambax

Yeah, I'm thinking the same thing. Capture the text somehow and send that, and reconstruct it on the other end; and the best part is you only need to send each new character, not the whole screen, so it should be very small and lightning fast?

Snild

Sounds kind of like https://asciinema.org/ (which I've never used, but it seems cool).

andai

I recognize this voice :) This is Claude.

karhuton

I made this because I got tired of screensharing issues in corporate environments: https://bluescreen.live (code via github).

Screenshot once per second. Works everywhere.

I’m still waiting for mobile screenshare api support, so I could quickly use it to show stuff from my phone to other phones with the QR link.