Skip to content(if available)orjump to list(if available)

Simulating a Planet on the GPU: Part 1 (2022)

jkhdigital

The tectonics.js blog has some really incredible write-ups on how to do proper simulation of plate tectonics: https://davidson16807.github.io/tectonics.js/blog/news.html

montebicyclelo

As a hobbyist, shaders is up there as one of the most fun types of programming.. Low-level / relatively simple language, often tied to a satisfying visual result. Once it clicks, it's a cool paradigm to be working in, e.g. "I am coding from the perspective of a single pixel".

lukan

I found them fun once they work, but if something did not work, debugging them I did not enjoy so much.

jangxx

Nothing like outputting specific colors to see what branch the current pixel is currently running through. It's like printf debugging but colorful and with only three floats of output.

indigoabstract

This looks very ambitious, it's really starting from the basics, simulating tectonic plates.

Sadly, there never was a Part 2, was it?

I guess life just got in the way, as usual.

janpmz

I wish I had an intuitive understanding of how much I can do with a GPU. E.g. how many points can I move around? A simulation like this would be great for that.

GistNoesis

TLDR : 1B particles ~ 3s per iterations

For examples like particle simulations, on a single node with a 4090 GPU everything running on GPU without memory transfer to the CPU:

-The main bottleneck is memory usage : available 24GB, Storing the particles 3 position coordinates, + 3 velocity coordinates, 4 bytes by number (float32) = Max 1B particles

-Then GPU memory bandwidth : if everything is on the GPU you get between 1000GB/s of global memory access and 10000GB/s when shared memory caches are hit. The number of memory access is roughly proportional to the number of effective collisions between your particles which is proportional to the number of particles so around 12-30 times ( see optimal sphere packing number of neighbors in 3d, and multiply by your overlap factor). All in all for 1B particles, you can collision them all and move them in 1 to 10s.

If you have to transfer things to the CPU, you become limited by the PCI-express 4.0 bandwidth of 16GB/s. So you can at most move 1B particles to and from the GPU, 0.7 times per second.

Then if you want to store the particle on disk, instead of RAM because your system is bigger, then you can either use a M2 ssd (but you will burn them quickly) which has a theoretical bandwidth of 20GB/s so not a bottleneck, or use a network storage over 100Gb/s (= 12.5GB/s) ethernet, via two interfaces to your parameter server which can be as big as you can afford.

So to summarize so far : 1B particles takes 1 to 10s per iteration per GPU. If you want to do smarter integration schemes like Rk4, you divide by 6. If you need 64 bits precisions you divide by 2. If you only need 16bits precisions you can multiply by 2.

The number of particle you need : Volume of the box / h^3 with h the diameter of the particle = finest details you want to be able to resolve.

If you use an adaptive scheme most of your particles are close to the surface of objects so O( surface of objects / h^2 ) with h=average resolution of the surface of the mesh. But adaptive scheme is 10 times slower.

The precision of the approximation can be bounded by Taylor formula. SPH is typically order 2, but has issues with boundaries, so to represent a sharp boundary the h must be small.

If you want higher order and sharp boundaries, you can do Finite Element Method, instead. But you'll need to tessellate the space with things like Delaunay/Voronoi, and update them as they move.

lukan

Well, to get that intuition, I guess you have to start experimenting. WebGPU is quite easy to get started with the concept. But in general it obviously depends what kind of GPU you have.

lukan

Ah yes, I dreamed about doing something like this, just with even more details ages ago, but concluded, I won't get even close to what I want, without having a big team at disposal and a supercomputer and/or a couple of universities collaborating interdisciplinary. But so far I was buisy with other things and reading about his experience unsurprisingly kind of confirms the challenge there is - mainly performance. But GPUs are on the rise and I am optimistic for the future. If the AI bubble bursts, I suppose lots of cheap GPU power will be avaiable for experiments like these and more elaborate ones. And if not, compute power/money will likely rise anyway.