27000 Dragons and 10'000 Lights: GPU-Driven Clustered Forward Renderer
16 comments
·May 20, 2025unclad5968
zokier
[delayed]
logdahl
Well, the core issue is still drawing. I took another look at some profiles again and seems like its not the renderer limiting this to 27k! I still had some stupid scene-graph traversal... But clustering and culling is 53us and 33us respectively, but the draw is 7ms. So a frame (on the GPU-side) is like 7ms, and some 100-200 us on the CPU side.
Should really dive deeper and update the measurements for final results...
gmueckl
This seems fairly well optimized. There's probably room to squeeze out some more perf, but not dramatic improvements. Maybe preventing overdraw of shaded pixels by doing a depth prepass would help.
Without digging into the detailed breakdown, I would assume that the sheer amount of teeny tiny triangles is the main bottleneck in this benchmark scene. When triangles become smaller than about 4x4 pixels, GPU utilization for raterization starts to diminish. And with the scaled down dragons, there's a lot of then in the frame.
rezmason
Ten thousand lights! Your utility bill must be enormous
zeristor
Apostrophe as a number separator?
Where’s that from?
dahart
Switzerland and Italy for two. https://en.wikipedia.org/wiki/Decimal_separator#
Also note C++14 introduced the apostrophe in numeric literals! https://en.cppreference.com/w/cpp/language/integer_literal
qingcharles
I've started using the underscore in my code since that is becoming the (non-localized) standard and trendy:
https://en.wikipedia.org/wiki/Integer_literal#Digit_separato...
logdahl
Interesting that Sweden explicitly do NOT use it... Not sure where i picked it up! :-)
lacoolj
Learn somethin new every day.
And I would never have known this existed without hackernews
null
curtisszmania
[dead]
This is awesome! At the end you mention the 27k dragons and 10k lights just barely fits in 16ms. Do you see any paths to improve performance? I've seen some demos on with tens/hundreds of thousands of moving lights, but hard to tell if they're legit or highly constrained. I'm not a graphics programmer by trade.
I need a renderer for a personal project and after some research decided I'll implement a forward clustered renderer as well.