Dev:Source/Render/Cycles/BVH

Bounding Volume Hierarchy

The raytracing acceleration structure used is a bounding volume hierarchy. The code is based on an implementation from NVidia under the Apache license. This code includes spatial splits to make it more competitive with kd-trees.

On top of that, we added support for instancing, building a QBVH to use SIMD instructions, and support for dynamically updating the BVH through refitting and a two-level BVH.

BVH Building

The BVH is built based on the surface area heuristic (SAH) and spatial splits. Build performance without spatial splits is optimized with binning and multithreading, however spatial splits have not yet been optimized.

Two-level BVH

For instancing and dynamic updates, we build a two level BVH. The BVH's are built independently for each mesh, and then a top level object BVH instances these meshes. This reduces tree quality but for instances leads to lower memory usage and for dynamic updates faster rebuilds as object as are transformed, added or removed. For traversal, the nodes from two levels are still packed into a single array.

With offline rendering, the triangles of non-instanced objects are transformed and placed in the top level of the tree.

Refitting

If no new objects or triangles are added, rather than rebuilding the BVH entirely, we can refit it with the new coordinates. This means we keep the same tree structure, and only update the bounding boxes. As the coordinates deviate further from the original, the tree quality goes down.

Warp Divergence on the GPU

While based on the source code from "Understanding the Efficiency of Ray Traversal on GPUs", we do not actually use the main optimization from that paper yet, which is to keep tracing rays to keep all warps occupied. Instead the path tracer is still a simple "megakernel". As a result, splitting this up may lead to a performance improvement, at the cost of less readable code. This will also be useful for an MBVH implementation on the CPU.

QBVH

On the CPU, SIMD instructions do 4 float operations at once, so to take advantage of this we can use a BVH with 4 child nodes and intersect their bounding boxes in one go. The code for this is still disabled. A further improvement that could be made is also intersecting 4 triangles at once.

Papers

Implemented:

Would like to see implemented: