C++ fast collision detection for uniform(and non-uniform)-distributed AABB particles using adaptive grid with implicit vectorization.
- 10 million dynamic particles AABB collision check per second against static grid of 8000 particles
- 1000x speedup against naive brute-force algorithm for 40k particles (static vs static), with uniform-distribution in range [0 - 1]).
-
- teapot-in-stadium problem is partially solved by "adaptive" grid:
-
- 290x speedup when half of AABBs are 10x further than each other [0-1] and [10-11]
-
- 230x speedup when half of AABBs are 10x far and a single AABB 100x far: [0-1] x N/2, [10-11] x N/2, [100-101] x1
- Produced collision list does not contain duplicate pairs of collisions
- Particle data is not touched, work done only on pointers internally
- Currently it is adaptive, but needs optimizations on memory handling.
-
- On every cell-overflow, it stretches the cell to AABB of all particles and converts to a grid of 4x4x4 cells each with 4 capacity
- Implementation of IParticle is an AABB (axis-aligned bounding box) model
-
- In user defined particle (box as example here), methods (getMinX/Y/Z and getMaxX/Y/Z) must return AABB corners of the underlying user-particle
- More than 2000x speedup against naive brute-force, in single-thread for 10000 particles
- 60 FPS for 20000 particles and ~25000 collision pairs on 2.1GHz FX8150 single-thread
-
- 100+ FPS for new CPUs
- Better performance stability compared to non-sparse version
- Better SIMD support on all-pairs computation method using tiled-computing
- Non-zero based object-id values supported (getId() method in IParticle interface)
// prepare memory pool FastColDetLib::MemoryPool memPool; // map grid to a volume of cube between corners of (0,0,0) and (10005,10005,10005) FastColDetLib::AdaptiveGridV2 grid(memPool,0,0,0,10005,10005,10005); // implement IParticle<float> struct AABBofPointCloud: public FastColDetLib::IParticle<float> { ... const CoordType getMaxX()const {return xmax;} const CoordType getMaxY()const {return ymax;} const CoordType getMaxZ()const {return zmax;} const CoordType getMinX()const {return xmin;} const CoordType getMinY()const {return ymin;} const CoordType getMinZ()const {return zmin;} const int getId()const {return id;} ... }; // initialize AABB vector std::vector<AABBofPointCloud> AABBs; while(simulation) { // clear tree data grid.clear(); // add particles that implement IParticle<float> into grid grid.addParticles(N,AABBs.data()); // build tree grid.buildTree(); // compute all-pairs collision array // 60FPS on FX8150 2.1GHz single-thread for 20000 particles with less than 29000 collisions // 100FPS on Xeon Gold 5215 2.9GHz single-thread for 20000 particles with less than 29000 collisions std::vector<std::pair<int,int>> vec = grid2_0.findCollisionsAll(); // the vec contains id-values of particles that have their AABBs collide so that you can do further fine-grained collision checks between them }
- 60 FPS for 40000 particles' AABB all-pair computations
- Bottlenecked by RAM bandwidth and mutex-array locking throughput
- Only zero-based object-id values supported
- Work load is balanced on particles, not volumes, this makes better distribution of particles on threads but causes duplicated work due to merging of results from all leaf nodes
// 7 threads with load-balancing by a tree, mapped to (0,0,0)-(10005,10005,10005) region AdaptiveGridTree<7> test(0,0,0,10005,10005,10005); for(int i=0;i<100;i++) { size_t nano; { FastColDetLib::Bench bench(&nano); // clear contents of memory pool of adaptive grid tree test.clear(); // AABBs is a vector of objects that implements IParticle<float> interface // (only zero-based object-id values supported from getId() method of IParticle<float>) test.addParticles(N,AABBs.data()); // non-duplicate pairs of collisions const auto coll = test.computeAllPairs(); std::cout<<"c="<<coll.size()<<std::endl; } std::cout<<"t="<<nano<<std::endl; }
For details, please visit wiki page.
Working demo for non-sparse (old version) adaptive grid (requires linking pthread for header and gomp/fopenmp for this demo):
#include"FastCollisionDetectionLib.h" // user-object, with IParticle<coordinate_type> interface for querying AABB within the API class Box: public FastColDetLib::IParticle<float> { public: Box(float xPrm=0, float yPrm=0, float zPrm=0, float x2Prm=0, float y2Prm=0, float z2Prm=0, size_t idPrm=0) { x=xPrm; y=yPrm; z=zPrm; x2=x2Prm; y2=y2Prm; z2=z2Prm; id=idPrm; } const float getMaxX() const { return x>=x2?x:x2;} const float getMaxY() const { return y>=y2?y:y2;} const float getMaxZ() const { return z>=z2?z:z2;} const float getMinX() const { return x>=x2?x2:x;} const float getMinY() const { return y>=y2?y2:y;} const float getMinZ() const { return z>=z2?z2:z;} const int getId() const {return id;} ~Box(){} private: float x,y,z,x2,y2,z2; size_t id; }; #include<iostream> #include<omp.h> #include"Generator.h" int main() { oofrng::Generator<64> gen; { // d^3 number of particles const int d = 20; const int n=d*d*d; std::cout<<"n="<<n<<std::endl; std::vector<Box> box(n); for(int i=0;i<n;i++) { auto x1 = gen.generate1Float()*d; auto y1 = gen.generate1Float()*d; auto z1 = gen.generate1Float()*d; float dx = 0.05f; float dx2 = 0.05f; box[i]=Box(x1,y1,z1,x1+dx2+dx*gen.generate1Float(),y1+dx2+dx*gen.generate1Float(),z1+dx2+dx*gen.generate1Float(),i); } // thread-pool releases itself once it is out of scope // single instance of thread pool can be used for multiple adaptive grids FastColDetLib::ThreadPool<float> thr; FastColDetLib::AdaptiveGrid<float> grid(thr,-1,-1,-1,d+1,d+1,d+1); FastColDetLib::BruteForce<float> bruteForce; std::vector<FastColDetLib::CollisionPair<float>> coll3D,coll3Dbrute; std::cout<<"add"<<std::endl; bruteForce.add(&box[0],n); std::cout<<"compute"<<std::endl; bool bfor = true; for(int k=0;k<40;k++) { size_t nano1,nano2; { { FastColDetLib::Bench bench(&nano1); { FastColDetLib::Bench bench(&nano1); grid.clear(); } std::cout<<"grid-static-object-clear: "<<nano1/1000000000.0<<" s "<<std::endl; { FastColDetLib::Bench bench(&nano1); grid.add(box.data(),n); } std::cout<<"grid-static-object-add ("<<n<<" particles AABB): "<<nano1/1000000000.0<<" s "<<std::endl; { FastColDetLib::Bench bench(&nano1); coll3D = grid.getCollisions(); } std::cout<<"grid-static-object-compute: "<<nano1/1000000000.0<<" s "<<coll3D.size()<<std::endl; } std::cout<<"grid-static-total: "<<nano1/1000000000.0<<" s "<<coll3D.size()<<std::endl; std::vector<float> rand(1000000*3); gen.generate(rand.data(),1000000*3); { std::mutex mut; std::vector<FastColDetLib::IParticle<float>*> res; { FastColDetLib::Bench bench(&nano1); #pragma omp parallel for for(int j=0;j<100;j++) { std::vector<FastColDetLib::IParticle<float>*> resTmp; for(int i=0;i<1000;i++) { auto x = rand[i*3]*d; auto y = rand[i*3+1]*d; auto z = rand[i*3+2]*d; auto item = Box(x,y,z,x+0.25,y+0.25,z+0.25); auto collisions = grid.getDynamicCollisionListFor(&item); std::copy(collisions.begin(),collisions.end(),std::back_inserter(resTmp)); } std::lock_guard<std::mutex> lg(mut); std::copy(resTmp.begin(),resTmp.end(),std::back_inserter(res)); } } std::cout<<"grid-compute-dynamic (100k particles AABB): "<<nano1/1000000000.0<<" s "<<res.size()<<std::endl; } } if(bfor) { FastColDetLib::Bench bench(&nano2); coll3Dbrute = bruteForce.getCollisions(); } if(bfor) std::cout<<"Brute-force ("<<n<<" particles AABB): "<<nano2/1000000000.0<<" s "<<coll3Dbrute.size()<<std::endl; std::cout<<"------------------------------------------------------------------------------------"<<std::endl; } if(bfor) { if(coll3D.size() != coll3Dbrute.size()) { std::cout<<"ERROR: not equal sizes of collision lists"<<std::endl; std::cout<<coll3D.size()<<" vs "<< coll3Dbrute.size()<<std::endl; const int sz = std::min(coll3D.size(),coll3Dbrute.size()); const int sz2 = sz<10?sz:10; for(int i=0;i<sz2;i++) { if((coll3D[i].getParticle1()->getId()!=coll3Dbrute[i].getParticle1()->getId()) && (coll3D[i].getParticle2()->getId()!=coll3Dbrute[i].getParticle2()->getId()) ) { std::cout<<"ERRRROOOR!"<<std::endl; std::cout<<coll3D[i].getParticle1()->getId()<<"<-->"<<coll3D[i].getParticle2()->getId()<<" "<<coll3Dbrute[i].getParticle1()->getId()<<"<-->"<<coll3Dbrute[i].getParticle2()->getId()<<std::endl; } } } else { const size_t sz = coll3D.size(); for(size_t i=0;i<sz;i++) { if((coll3D[i].getParticle1()->getId()!=coll3Dbrute[i].getParticle1()->getId()) && (coll3D[i].getParticle2()->getId()!=coll3Dbrute[i].getParticle2()->getId()) ) { std::cout<<"ERRRROOOR!"<<std::endl; std::cout<<coll3D[i].getParticle1()->getId()<<"<-->"<<coll3D[i].getParticle2()->getId()<<" "<<coll3Dbrute[i].getParticle1()->getId()<<"<-->"<<coll3Dbrute[i].getParticle2()->getId()<<std::endl; } } } } std::cout<<"--"<<std::endl; } return 0; }
output for FX8150 at 2.1GHz:
n=8000 add compute grid-static-object-clear: 6.4614e-05 s grid-static-object-add (8000 particles AABB): 0.00656692 s grid-static-object-compute: 0.00195991 s 16 grid-static-total: 0.00866529 s 16 grid-compute-dynamic (100k particles AABB): 0.0137853 s 3600 Brute-force (8000 particles AABB): 0.541069 s 16 ------------------------------------------------------------------------------------ grid-static-object-clear: 0.00143487 s grid-static-object-add (8000 particles AABB): 0.00523593 s grid-static-object-compute: 0.00182263 s 16 grid-static-total: 0.00853913 s 16 grid-compute-dynamic (100k particles AABB): 0.00835323 s 4300 Brute-force (8000 particles AABB): 0.541662 s 16 ------------------------------------------------------------------------------------ grid-static-object-clear: 0.00161331 s grid-static-object-add (8000 particles AABB): 0.00521717 s grid-static-object-compute: 0.00180544 s 16 grid-static-total: 0.0086723 s 16 grid-compute-dynamic (100k particles AABB): 0.00864333 s 3200 Brute-force (8000 particles AABB): 0.54128 s 16 ------------------------------------------------------------------------------------ grid-static-object-clear: 0.00151348 s grid-static-object-add (8000 particles AABB): 0.00524639 s grid-static-object-compute: 0.00180144 s 16 grid-static-total: 0.00860804 s 16 grid-compute-dynamic (100k particles AABB): 0.010513 s 3200 Brute-force (8000 particles AABB): 0.541246 s 16 ------------------------------------------------------------------------------------ grid-static-object-clear: 0.00160501 s grid-static-object-add (8000 particles AABB): 0.00521408 s grid-static-object-compute: 0.00176137 s 16 grid-static-total: 0.00862669 s 16 grid-compute-dynamic (100k particles AABB): 0.0120599 s 3000 Brute-force (8000 particles AABB): 0.54391 s 16 ------------------------------------------------------------------------------------ grid-static-object-clear: 0.00174257 s grid-static-object-add (8000 particles AABB): 0.00548767 s grid-static-object-compute: 0.00182023 s 16 grid-static-total: 0.00909577 s 16 grid-compute-dynamic (100k particles AABB): 0.0083235 s 3500 Brute-force (8000 particles AABB): 0.54521 s 16 ------------------------------------------------------------------------------------
This is ~8500x performance for checking a particle collision against a grid of static objects and ~50x performance for static-static collision check (or dynamic-dynamic).
More particles:
n=64000 add compute grid-static-object-clear: 1.3058e-05 s grid-static-object-add (64000 particles AABB): 0.0347275 s grid-static-object-compute: 0.0106954 s 114 grid-static-total: 0.0455527 s 114 grid-compute-dynamic (100k particles AABB): 0.0154652 s 3700 Brute-force (64000 particles AABB): 35.473 s 114
777x performance for static vs static collision checking, more than 2000x performance for static vs dynamic.