I designed my own fast game streaming video codec

I designed my own fast game streaming video codec – PyroWave

78 comments

·July 28, 2025

Visit

Thaxll

There is the creator of VLC that is working on something similar, very cutting edge.

https://streaminglearningcenter.com/codecs/an-interview-with...

Ultra low latency for streaming.

https://www.youtube.com/watch?v=0RvosCplkCc

torginus

Having worked in the space, I'd have to say hardware encoders and H.264 is pretty dang good - NVENC works with very little latency (if you tell it to, and disable the features that increase it, such as multiple frame prediction, B-frames).

The two things that increase latency are more advanced processing algorithms, giving the encoder more stuff to do, and schemes that require waiting multiple frames. If you go disable those, the encoder can pretty much start working on your frame the nanosecond the GPU stops rendering to it, and have it encoded in <10ms.

Wowfunhappy

> have it encoded in <10ms.

For context, OP achieved 0.13 ms with his codec.

pjc50

"0.13 ms on a RX 9070 XT on RADV."

"interesting data point is that transferring a 4K RGBA8 image over the PCI-e bus is far slower than compressing it on the GPU like this, followed by copying over the compressed payload."

"200mbit/s at 60 fps"

It's certainly a very different set of tradeoffs, using a lot more bandwidth.

dishsoap

10ms is quite long in this context.

RobRivera

>10 ms

Do not shame this dojo.

latchkey

Sadly appears to be unavailable.

_kb

This is a really nice walkthrough of matching trade offs to acceptable distortions for a known signal type. Even if you’re selecting rather than designing a codec, it’s a great process to follow.

For those interesting in the ultra low latency space (where you’re willing to trade a bit of bandwidth to gain quality and minimise latency), VSF have a pretty good wrap up of other common options and what they each optimise for: https://static.vsf.tv/download/technical_recommendations/VSF...

sippeangelo

I know next to nothing about video encoding, but I feel like there should be so much low hanging fruit when it comes to videogame streaming if the encoder just cooperated with the game engine even slightly. Things like motion prediction would be free since most rendering engines already have a dedicated buffer just for that for its own rendering, for example. But there's probably some nasty patent hampering innovation there, so might as well forget it!

torginus

'Motion vectors' in H.264 are a weird bit twiddling/image compression hack and have nothing to do with actual motion vectors.

- In a 3d game, a motion vector is the difference between the position of an object in 3d space from the previous to the current frame

- In H.264, the 'motion vector' is basically saying - copy this rectangular chunk of pixels from some point from some arbitrary previous frame and then encode the difference between the reference pixels and the copy with JPEG-like techniques (DCT et al)

This block copying is why H.264 video devolves into a mess of squares once the bandwidth craps out.

pornel

Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.

In typical video encoding motion compensation of course isn't derived from real 3D motion vectors, it's merely a heuristic based on optical flow and a bag of tricks, but in principle the actual game's motion vectors could be used to guide video's motion compensation. This is especially true when we're talking about a custom codec, and not reusing the H.264 bitstream format.

Referencing previous frames doesn't add latency, and limiting motion to just displacement of the previous frame would be computationally relatively simple. You'd need some keyframes or gradual refresh to avoid "datamoshing" look persisting on packet loss.

However, the challenge is in encoding the motion precisely enough to make it useful. If it's not aligned with sub-pixel precision it may make textures blurrier and make movement look wobbly almost like PS1 games. It's hard to fix that by encoding the diff, because the diff ends up having high frequencies that don't survive compression. Motion compensation also should be encoded with sharp boundaries between objects, as otherwise it causes shimmering around edges.

CyberDildonics

Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.

3D motion vectors always get projected to 2D anyway. They also aren't used for moving blocks of pixels around, they are floating point values that get used along with a depth map to re-rasterize an image with motion blur.

robterrell

Isn't the use of the H.264 motion vector to preserve bit when there is a camera pan? A pan is a case where every pixel in the frame will change, but maybe doesn't have to.

superjan

Yes, or when a character moves across the screen. They are quite fine grained. However, when the decoder reads the motion vectors from the bitstream, it is typically not supposed to attach meaning to them: they could point to a patch that is not the same patch in the previous scene, but looks similar enough to serve as a starting point.

ChadNauseam

I think you're right. Suppose the connection to the game streaming service adds two frames of latency, and the player is playing an FPS. One thing game engines could do is provide the game UI and the "3D world view" as separate framebuffers. Then, when moving the mouse on the client, the software could translate the 3D world view instantly for the next two frames that came from the server but are from before the user having moved their mouse.

VR games already do something like this, so that when a game runs at below the maximum FPS of the VR headset, it can still respond to your head movements. It's not perfect because there's no parallax and it can't show anything for the region that was previously outside of your field of view, but it still makes a huge difference. (Of course, it's more important for VR because without doing this, any lag spike in a game would instantly induce motion sickness in the player. And if they wanted to, parallax could be faked using a depth map)

WantonQuantum

A simple thing to start with would be akin to Sensor Assisted Video Encoding where phone accelerometers and digital compasses are used to give hints to video encoding: https://ieeexplore.ieee.org/document/5711656

Also, for 2d games a simple sideways scrolling game could give very accurate motion vectors for the background and large foreground linearly moving objects.

I'm surprised at the number of people disagreeing with your idea here. I think HN has a lot of "if I can't see how it can be done then it can't be done" people.

Edit: Also any 2d graphical overlays like HUDs, maps, scores, subtitles, menus, etc could be sent as 2d compressed data, which could enable better compression for that data - for example much sharper pixel perfect encoding for simple shapes.

mikepurvis

I’ve wondered about this as well, like most clients should be capable of still doing a bit of compositing. Like if you sent billboard renders of background objects at lower fidelity/frequency than foreground characters, updated hud objects with priority and using codecs that prioritize clarity, etc.

It was always shocking to me that Stadia was literally making their own games in house and somehow the end result was still just a streamed video and the latency gains were supposed to come from edge deployed gpus and a wifi-connected controller.

Then again, maybe they tried some of this stuff and the gains weren't worth it relative to battle-tested video codecs.

toast0

For 2d sprite games, OMG yes, you could provide some very accurate motion vectors to the encoder. For 3d rendered games, I'm not so sure. The rendering engine has (or could have) motion vectors for the 3d objects, but you'd have to translate them to the 2d world the encoder works in; I don't know if it's reasonable to do that ... or if it would help the encoder enough to justify.

sudosysgen

Schemes like DLSS already do provide 2D motion vectors, it's not necessarily a crazy ask.

markisus

The ultimate compression is to send just the user inputs and reconstitute the game state on the other end.

w-ll

The issue is the "reconstitute the game state on the other end" when it comes to at least how I travel.

I haven't in a while but I used to use https://parsec.app/ on a cheap intel Air to do my STO dailies on vacation. It sends inputs, but gets a compressed stream. Im curious of any OS of something similar.

null

[deleted]

IshKebab

I don't think games do normally have a motion vector buffer. I guess they could render one relatively easily, but that's a bit of a chicken and egg problem.

garaetjjte

They do, one reason is postprocessing effects like motion blur, another is antialiasing like TAA or DLSS upscaling.

IshKebab

Yeah I did almost mention motion blur but do many games use that? I don't play many AAA games TBF so maybe I'm just out of date...

Take something like Rocket League for example. Definitely doesn't have velocity buffers.

shmerl

Many games have it, but I always turn it off. I guess some like its cinematic effect, but I prefer less motion blur, not more.

ACCount36

Exposing motion vectors is a prerequisite for a lot of AI framegen tech. If you could tap that?

keketi

Have an LLM transcribe what is happening in the game into a few sentences per frame, transfer the text over network and have another LLM reconstruct the frame from the text. It won't be fast, it's going to be lossy, but compression ratio is insane and it's got all the right buzzwords.

jameshart

Frame 1:

You are standing in an open field west of a white house, with a boarded front door. There is a small mailbox here.

Eduard

(user input: mouse delta: (-20, -8))

Frame 2:

A few blades of grass sway gently in the breeze. The camera begins to drift slightly, as if under player control — a faint ambient sound begins: wind and birds.

Y_Y

kill jester

taneq

Ah, this explains why there are clowns under the bed and creepy children staring at me from the forest.

cyclotron3k

Send the descriptions via the blockchain so there's an immutable record

poglet

Maybe even one day we reach point where the game can run locally on the end users' machine.

foota

You've got my attention

raphman

Very cool - That's nearly exactly what I need for a research project.

FWIW, there's also the non-free JPEG-XS standard [1] which also claims very low latency [2] and might be a safer choice for commercial projects, given that there is a patent pool around it.

[1] https://www.jpegxs.com/

[2] https://ds.jpeg.org/whitepapers/jpeg-xs-whitepaper.pdf

jamesfmilne

JPEG-XS is great for low latency, but it uses more bandwidth. We're using it for low-latency image streaming for film/TV post production:

https://www.filmlight.ltd.uk/store/press_releases/filmlight-...

We currently use the IntoPIX CUDA encoder/decoder implementation, and SRT for the low-level transport.

You can definitely achieve end-to-end latencies <16ms over decent networks.

We have customers deploying their machines in data centres and using them in their post-production facilities in the centre of town, usually over a 10GbE link. But I've had others using 1GbE links between countries, running at higher compression ratios.

indolering

A patent pool doesn't make you safer: it's just a patent troll charging you to cross the bridge. They are not offering insurance against more patent trolls blackmailing you after you cross the bridge.

raphman

While I am personally opposed to software patents, I'd argue that the JPEG XS patent holders [1] are not 'patent trolls' in any meaningful sense of the word.

While I have no personal experience on that topic, I'd assume that a codec with a patent pool is a safer bet for a commercial project. Key aspects being protected by patents makes it less likely that some random patent troll or competitor extorts you with some nonsense patent. Also, using e.g., JPEG XS instead of e.g., pyrowave also ensures that you won't be extorted by the JPEG XS patent holders.

One may call this a protection racket - but under the current system, it may make economical sense to pay for a license instead of risking expensive law suits.

[1] https://www.jpegxspool.com/

Cadwhisker

This CODEC uses the same base algorithm as HTJ2K (High-Throughput JPEG 2000).

If the author is reading this, it would be very interesting to read about the differences between this method and HTJ2K.

CharlesW

> Given how niche and esoteric this codec is, it’s hard to find any actual competing codecs to compare against.

It'd be interesting to see benchmarks against H.264/AVC (see example "zero‑latency" ffmpeg settings below) and JPEG XS.

  -c:v libx264 -preset ultrafast -tune zerolatency \
  -x264-params "keyint=1:min-keyint=1:scenecut=0:rc-lookahead=0" \
  -bf 0 -b:v 8M -maxrate 8M -bufsize 1M

mleonhard

https://github.com/Themaister/pyrowave

monster_truck

Looks like NDI without any of the conveniences.

You're doing something wrong if nvenc is any slower, the llhp preset should be all you need.

sitkack

I'd want to know how it compares to https://github.com/phoboslab/qoi

One thing to note when designing a new video codec is to carpet bomb around the idea with research projects to stake claim to any possible feature enhancements.

Anything can have an improvement patent filed against, no matter the license.

Fidelix

Unbelievable... Good job mate.

Can't wait until one day this gets into Moonlight or something like it.

cpeth

Exactly what I was thinking. Wish I had the time and expertise to give adding support for this codec myself a go. Streaming Clair Obscure over my LAN via Sunshine / Moonlight is exactly my use-case and the latency could definitely be better.

kookamamie

Not bad. The closest competition would be NDI from NewTek, now Vizrt. It targets similar bitrate and latency ranges.