Skip to content(if available)orjump to list(if available)

Faster Argmin on Floats

Faster Argmin on Floats

3 comments

·September 18, 2025

why_only_15

This trick is very useful on Nvidia GPUs for calculating mins and maxes in some cases, e.g. atomic mins (better u32 support than f32) or warp-wide mins with `redux.sync` (only supports u32, not f32).

TheDudeMan

How fast if you write a for loop and keep track of the index and value of the smallest (possibly treating them as ints)?

nine_k

I hazard to guess that it would be the same, because the compiler would produce a loop out of .iter(), would expose the loop index via .enumerate(), and would keep track of that index in .min_by(). I suppose the lambda would be inlined, maybe even along with comparisons.

I wonder could that be made faster by using AVX instructions; they allow to find the minimum value among several u32 values, but not immediately its index.