Skip to content(if available)orjump to list(if available)

Bilinear down/upsampling, aligning pixel grids, and that infamous GPU half pixel (2021)

waynecochran

Great article. That half pixel offset is crucial for understanding proper rasterization of detailed 3D meshes. No seams, no holes, no overlaps on shared triangle edges rely on exactly where the pixel centers are. This carries over to image processing and sampling. There are a few of us that lose sleep over these tiny details -- I am happy when I see kindred spirits with the same attention to detail.

null

[deleted]

qwertox

On a similar note, yesterday I stumbled across a 4-days old video related to dithering which is really worth watching. "Surface-Stable Fractal Dithering" [0] by "Rune Skovbo Johansen, indie game developer and procedural generation enthusiast"

[0] https://www.youtube.com/watch?v=HPqGaIMVuLs

magicalhippo

It was also covered here 4 days ago[1], again I might add.

Very cool effect, and one of my absolute favorite games.

[1]: https://news.ycombinator.com/item?id=42808889

lcnielsen

I spent a substantial part of my Ph.D. dealing with this stuff, existing frameworks and algorithms were not suitable for our exact use-case and implementation. I had more bugs/artefacts than I'd like to admit due to some off-by-less-than-0.5 in some particular direction in some particular frame...

johncostella

I'm biased, but Magic Kernel Sharp does a good job for many applications. :)

https://johncostella.com/magic/

calibas

Bilinear is nice for performance, but if you're concerned about image quality you generally want to use bicubic resampling instead.

jeffreygoesto

Or Lanczos resampling.

daneel_w

For upscaling, sure. For downscaling it has a tendency to both amplify and distort (rather than preserve) detail and sharpness, "imagining" detail that never existed, leaving you with a worse result. Bicubic and sometimes even bilinear does a better job, truer to the original, when you're scaling down.

magicalhippo

Lanczos if you want sharpness above all else. There's also the well-known Mitchell-Netravali filters[1], based on piece-wise cubic splines and with two tunable paramerers.

[1]: https://en.wikipedia.org/wiki/Mitchell%E2%80%93Netravali_fil...

jorvi

I don't know the fine details of it but afaik FSR <3.0 is Lanczos with a few tricks to both speed it up and make the visuals more stable.

Dwedit

Whenever I want to upscale, I use an Edge-Directed algorithm, such as SuperXBR. Diagonal lines stay diagonal and don't turn into blurry boxes.

Demiurge

what would it look like with nearest, or gaussian filters? Would be interesting to see the same sample graphs with different kernels.

null

[deleted]

Const-me

Good article, but IMO none of the discussed downsampling methods are actually ideal.

The ideal is trilinear sampling instead of bilinear. Easy to do with GPU APIs, in D3D11 it’s ID3D11DeviceContext.GenerateMips API call to generate full set of mip levels of the input texture, then generate each output pixel with trilinear sampler. When doing non-uniform downsampling, use anisotropic sampler instead of trilinear.

Not sure high level image processing libraries like the mentioned numpy, PIL or TensorFlow are doing anything like that, though.

bartwr

Original author here and it's been a while since I have read such word salad nonsense, sorry. Why people who have no idea or expertise comment on articles? GenerateMips API constructs a mip chain by using box/bilinear (equivalent for factors of two) log N times.

Trilinear interpolates across three dimensions, such as 3D textures or mip chains. It is not a method for downsampling, but a method for filtering that interpolates two bilinear results, such as two bilinear filters of mip levels that were generated with "some" downsampling filter (which can be anything from box to Lanczos).

Anisotropic is a hybrid between trilinear across a smaller interpolation axis under perspective projection of a 3D asset and multiple taps along the longer axis. (More expensive)

Const-me

> Trilinear interpolates across three dimensions, such as 3D textures or mip chains

I meant trilinear interpolation across the mip chain.

> generated with "some" downsampling filter (which can be anything from box to Lanczos)

In practice, whichever method is implemented in user-mode half of GPU drivers is pretty good.

> It is not a method for downsampling

No, but it can be applied for downsampling as well.

> under perspective projection of a 3D asset

Texture samplers don’t know or care about projections. They only take 2D texture coordinates, and screen-space derivatives of these. This is precisely what enables to use texture samplers to downsample images.

The only caveat, if you do that by dispatching a compute shader as opposed to rendering a full-screen triangle, you’ll have to supply screen-space derivatives manually in the arguments of Texture2D.SampleGrad method. When doing non-uniform downsampling without perspective projections, these ddx/ddy numbers are the same for all output pixels, and are trivial to compute on CPU before dispatching the shader.

> More expensive

On modern computers, the performance overhead of anisotropic sampling compared to trilinear is just barely measurable.

incrudible

The problem is that your suggestion is strictly worse and unnecessarily complicated for the case discussed in the article. If you want to downsample 2x, 4x, etc, then that's just one level in the MIP hierarchy, no need to compute the rest. The point of the article however is to explain how one level in that MIP chain can be computed in the first place.

If you want to downsample 3x or fractional, then interpolating between two MIP levels is gonna be worse quality than directly sampling the original image.

Perspective (the use case for anisotropic filtering) isn't discussed in the article, but even then, the best quality will come from something like an EWA filter, not from anisotropic filtering which is designed for speed, not quality.

mschuetz

Trilinear requires an image pyramid. Without downsampling to create that image pyramid, you can't even do trilinear sampling, so your argument strikes me as odd and circular. Like telling developers of APIs such as ID3D11DeviceContext.GenerateMips to simply use ID3D11DeviceContext.GenerateMips instead of developing ID3D11DeviceContext.GenerateMips. Also, I never took this article to be about 3D rendering and utilizing mip maps for trilinear interpolation. More about 2D image scaling.

Have you never downscaled and upscaled images in a non-3D-rendering context?

Const-me

> Have you never downscaled and upscaled images in a non-3D-rendering context?

Indeed, and I found that leveraging hardware texture samplers is the best approach even for command-like tools which don’t render anything.

Simple CPU running C++ is just too slow for large images.

Apart from a few easy cases like 2x2 downsampling discussed in the article, SIMD optimized CPU implementations are very complicated for non-integer or non-uniform scaling factors, as they often require dynamic dispatch to run on older computers without AVX2. And despite the SIMD, still couple orders of magnitude slower compared to GPU hardware while only delivering barely observable quality win.