Sorting Particles on GPU for Fluid Simulations
- Level : Master recherche, engineer internship
- Contacts :
- Bruno Raffin, email@example.com
- Marie Durand, firstname.lastname@example.org
- Duration : 4 to 6 months
One approach to simulate fluids is to rely on particules (Smoothed Particle Hydrodynamics - SPH). The simulation is an iterative process where particule positions are updated at each step according to their inertia and the forces that can apply colliding neighbor particules. Neighbor search is a computationally intensive step for such simulations. Acceleration data structures are classically used to reduce its cost. A common approach consists in partitioning the simulated space with a regular grid, identifying the particles each cell contains. Cell size is defined according to the interaction range to limit neighbor search to the particles contained in the close cells.
Given the large number of particles a simulation may require, performance can be significantly impacted by the memory layout. Having particles belonging to the same cell close by in memory enables to preserve a low cache miss ratio. But maintaining this locality as particle move may require costly memory copies or dynamics allocations. The challenge is thus to find a data structure that offers a good trade-off between particle locality and maintenance cost.
We developped an algorithm to efficiently keep particles sorted according to their cell index. This algorithm is based on the Packed Memory Array sorting algorithm (PMA) proposed by Bender et al. The PMA maintains gaps in the particle array that enable to keep particle sorted with O(log2(n)) amortized data moves. Experiments show that the PMA can significantly outperform a compact quick sorted array for up to 50% element moves.
We developed a sequential implementation of that algorithm, that has been embeded in the Fluid simulator. To scale to interesting simulations a parallelization is required. Today GPUs have to potential to significanlty outperform CPUs, but taking advantage of their power requires an adapted parallel algorithm implemented using a programming environement like CUDA. The goal of this intership is to study, develop and test a GPU implementation of our PMA based algorithm.
The work will consists first in studying existing work related to particule based simulation, the PMA algorithm and GPU programming, before to conceive a parallel algorithm, implement and test it on GPU. For his work the candidat will have access to the existing sequential code and the latest GPU architectures. The candidate will join the Moais team from INRIA, Montbonnot, with a extensive expertise on these parallelisation and data layout issues.
This work has the potential to lead to an international publication (it of course depends on the results obtained) and can be the starting point of a PhD thesis.
CUDA Programming. http://developer.nvidia.com/category/zone/cuda-zone
SPH. Wikipedia. http://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics
Hoetzlein R. C.: Fluids v2.0, open source, fluid simulator, 2008. URL: http://www.rchoetzlein.com/eng/graphics/fluids.htm.
Bender M. A., Farach-Colton M., Mosteiro M. A.: Insertion sort is o(n log n). Theor. Comp. Sys. 39, 3 (June 2006), 391?397. URL: http://dx.doi.org/10.1007/s00224-005-1237-z, doi: 10.1007/s00224- 005- 1237- z.
Bender M. A., Demaine E. D., Farach- Colton M.: Cache-oblivious b-trees. SIAM J. Comput. 35, 2 (2005), 341?358.
Bender M. A., Hu H.: An adaptive packed- memory array. ACM Trans. Database Syst. 32 (November 2007). URL: http://doi.acm.org/10.1145/ 1292609.1292616, doi:http://doi.acm.org/10.1145/ 1292609.1292616.
[IABT11] Ihmsen M., Akinci N., Becker M., Teschner M.: A parallel sph implementation on multi- core cpus. Computer Graphics Forum 30, 1 (2011), 99? 112. URL: http://dx.doi.org/10.1111/j.1467- 8659. 2010.01832.x, doi:10.1111/j.1467-8659.2010.01832.