利用者:DingTo/GSoC 2014/Weekly Reports/Week1
< 利用者:DingTo | GSoC 2014 | Weekly Reports
Week 1
Pre work
I started early with my GSoC, therefore I already worked on some of my goals.
- Calculate face normal on the fly: Instead of storing the face normal, we now calculate it during rendering. See commit (6d62837e5bb2). The performance loss is only ~1-2%, while saving quite some memory. I hope to speed this up still, but I need to find the right place inside the BVH traversal still, to check if we can calculate it there and then store it somewhere (Intersection struct?).
- AVX2 kernel: I added an AVX2 kernel for Intel Haswell CPUs (can also be used with AMD, as soon as they support it). The AVX2 kernel makes rendering about 3-5% faster in several scenes. I tested this with clang on Mac OS with files from our test suite. The AVX2 kernel relies on AVX2, FMA3, BMI and BMI2 instruction sets, and we use some dedicated FMA3 intrinsics already in the kernel. More improvements here can probably be made, but I think it's already a solid basis. See commits (ac908f6c1f6d, 3844b8f85c7d and caaf0e484da8)
- I also looked into Multi Lamp Sampling for Volumes, and submitted a first patch. This needs additional work for Equi-angular sampling though. https://developer.blender.org/D526
What I did this week
This week I spend most of the time on research and tests, but also looked into the fast inverse sqrt instructions.
- Read some documentation on SIMD intrinsics and C++ code optimization, thanks to Marcos Sánchez-Dehes for pointing me to these! http://www.agner.org/optimize/
- I looked into High-Performance timers for benchmarking purposes, but I don't have a working implementation yet. It looks like each OS might need its own implementation, e.g. QueryPerformanceCounter on Windows. Maybe there is a better solution here, some feedback on this would be appreciated! Probably I should also look into profilers, I am mainly interested in benchmarking specific code parts or a function, to see whether a change improves performance or not.
- I started to look into fast inverse sqrt instructions. Here is a simple patch: http://pasteall.org/51827/diff Performance wise, I need to do more tests with it, but the render result is slightly different with the patch. Maybe the solution needs to be refined with one or more Newton-Raphson steps? Also it looks like we only use 1/sqrt() in the Microfacet and Ward closure code, which are not really bottlenecks afaik.
Next week
Continue to look into the Face Normal calculation code and start with uchar attribute support, for things like Vertex colors (to reduce memory usage).
Questions
See above, mainly some input about profiling would be cool. :)
Thanks!