2018年6月29日 (金) 06:21時点における最新版

Weekly reports for 2017

January 1st - 7th

Sent a proposal for the removal of the tile size setting. In its place code will be added to automatically chose work load size, giving maximum performance without concerning users of the details. This is also the best option to replace tile splitting.
Began implementing support for a device to acquire and render multiple tiles at once.
Started to prototype ideal device work load calculations, unfortunately found more driver issues, will need to report and find alternative ways to calculate a good work load size.

January 8th - 14th

Implemented multi tile rendering for CUDA and OpenCL mega kernels.
Experimented with different tile configurations, got victor rendering 37% faster on AMD hardware (mega kernel).
Gathered all driver issue info and reported.
Started to work on multi tile for split kernel.

January 15th - 21st

Worked on multi tile rendering for split kernel, however updating the many parts of code needed and debugging buffer offsets was taking too long, so putting multi tile aside for now.
Made a patch to get adaptive subdivision working with panoramic projections.
Redid work stealing in split kernel giving a potential speed up. Buffer offsets still need a bit of debugging.

January 22nd - 28th

Finished new work stealing and confirmed speed up of 15-40% with AMD W9100.
Fixed issues with split kernel for both opencl and cuda.
Started implementing micropolygon grids, which will give memory savings for adaptive subdivision and solution to several bugs in the tracker.
Spent time to update drivers and test with split kernel branch. New drivers are hanging on Ubuntu. Sent an email to driver team detailing the issue.

January 29th - February 4th

Wrote some scripts for testing with CodeXL as the UI isn't working for me.
Investigated performance, register usage and occupancy of split kernel on AMD OpenCL. Would like to test results under a few more conditions before making a report.
Closed a few bugs in tracker related to memory / known driver issues. Workarounds are in the split kernel branch which would be nice to merge soon.
Started to prepare split kernel branch for review and merging with master as discussed with Sergey.

February 5th - February 11th

More in depth investigation of register usage. Found lots of registers used even outside of giant switch, might be able to improve that a bit. Inside switch is still a major problem, send email asking asking for advice.
Refactoring of split kernel branch to prepare for master, fixing a few bugs along the way.

February 12th - February 18th

Tried to get SSS and volume patches working consistently on all hardware. Seems there are some issues in the AMD compiler that are causing artifacts in SSS. Volumes don't work on Nvidia, not sure why yet.
Cleaning up code to make more ready for master
Fixed a few minor bugs
Made patch to speed up noise texture
Found some adjustments to make split kernel a bit faster

February 19th - February 25th

Fixed a few more bugs in the branch
Got the branch nearly ready for master, still a bug in viewport rendering on Nvidia OpenCL to sort out
Did some testing and reading over some patches that need reviewing (sss, volumes, selective node compilation).
Dug deeper into register usage and started to try and improve the situation.

February 26th - March 4th

Fixed some rendering artifacts in split kernel (hopefully issue with Nvidia OpenCL is fixed now)
Fixed a few more minor bugs
Moved buffer size calculation from host to kernel, which should open up opportunity for better performance with Sergey's isect patch.

March 5th - March 11th

Merged split kernel branch into master
Did final testing of SSS and volumes
Adapted to get Intel OpenCL working. Seems the compiler is very picky. It also got stuck on a very simple line which prevented me from getting further.
Tried out a few ideas to make split kernel faster (binary search, if vs switch, major code removal, etc) Everything turned out to be slower somehow. Have a few more ideas for next week.
Fixed T50888: Numeric overflow in split kernel state buffer size calculation

March 12th - March 18th

Various performance and memory tweaks for CPU and OpenCL CPU devices with split kernel.
Fixed some issues with barriers in the split kernel and removed some related todos.
Added some checks to avoid infinite loops in split kernel
Worked on getting branched path tracing implemented for the split kernel. Code is a bit messy right now and there are some limitations (no volumes/sss/sample all lights yet) but basic shaders are working just fine.

March 19th - March 25th

Branched path tracing implementation for the split kernel progresses nicely, should be ready for review next week.
Encountered some artifacts while working on branched path tracing that looked similar to the ones seen with GCN 1 cards. Will try to reproduce again and see if there's a fix to be found.
Researched and brainstormed improvements to split kernel architecture, there's quite a few things we haven't tried yet to improve performance.
Started working on task reordering to see if we can get a speed up with that.

March 26th - April 1st

Continued working on branched path tracing, unfortunately ran into some issues with volumes that took a while to solve. They are working fine now tho.
Started to clean up the branched path tracing code and deduplicate things a bit.
Fixed a few minor things in Cycles, still need to separate from local branch and commit tho.
Investigated extra noise in branched path tracing with increase in number of lights (also in master), no solution yet.

April 2nd - April 8th

Cleaned up and fixed a bunch of issues with branched path tracing in split kernel, submitted for review.
First thing next week will review denoising and split kernel patches.

April 9th - April 15th

Implemented global size calculation for CUDA split kernel
Started to implement tangents for meshes with adaptive subdivision
Did testing of branched path tracing to see if there are any issues

April 16th - April 22th

Deduplicated integration loops and submitted for review, still more work do be done but need feedback before continuing
Investigated issue with SSS in split BPT, turned out to be a compiler bug, nirved provided a workaround

April 23th - April 29th

Found a fixed a few issues with path termination in split BPT
Found and fixed a problem with how SSS and volumes contributed to renders in BPT
Branched path tracing appears to be ready for master, still waiting for review
Tested and enabled CMJ and single program by default for split kernel
Fixed artifacts when render with split kernel is canceled
Fixed crashes in master from removal of image limits

April 30th - May 6th

Finally got branched path tracing into master
Patch to clear kernel cache
Fixed issue on Nvidia OpenCL from driver workaround
Did more work for tangents for subd (not working yet)
Began working on speed up for BPT

May 7th - 13th

Reworked plan for micropolygon grids to make work doable in smaller independent parts
Continued looking for solution to artifacts seen with BPT speed up
Created a patch to pass all buffers to kernels
Experimented with patch to remove split_data_entries, was noticeably slower

May 14th - 20th

Did a quick look into using split kernel with baking (requires non trivial changes)
Fixed native only build option
Fixed crash when saving images
Tracked down and fixed random noise pattern from unset differentials
Reload kernels when requested features change (T49496)

May 21st - 27th

Patch to deduplicate split kernel function definitions
Finally got speedup for BPT working without problems (70% faster)
Testing of and patch to disable shader sorting

May 28th - June 3rd

Finished all buffers patch, only 5% slower for large samples
Prepared test build for AMD
Looked into possibility of automatically generating the split kernel. Would make maintenance easier and (some potential for speedup but hard to estimate without actually implementing)

June 4th - 10th

Worked with MohamedSakr in IRC to try to get OpenCL working on OSX
Merged BPT performance patch
Patch to make rendering faster by changing tile update logic

June 11th - 17th

Tried to find solution to regression from all-buffers patch
More research into automatic generation of split kernel and related topics

June 18th - 24th

Looked at issue with baking again
Another attempt at fixing all-buffers regression

June 25th - July 1st

Successful prototype (tho limited in scope) of idea that came out of looking into alternative split kernel architectures
More work on performance regression
Minor code cleanup
More work on getting baking working properly with OpenCL
Disabled baking in mega kernel to improve build times (which have gotten quite long)

July 2nd - 8th

Committed patch to provide better error message to users when the GPU runs out of memory
Added debug option to simulate lower memory conditions for OpenCL
Fixed minor issue with comparison in principled BSDF

July 8th - 15th

Final attempt at fixing performance regression (by passing all buffers to all functions, 4-65% slower)
Fixed issue where SSS was included in kernels from principled BSDF even when not in use
Implemented virtual buffers to allow more textures than can currently fit into GPU memory

...

October 8th - 14th

Caught up on changes made while away
Started poking at code again

October 15th - 21st

Augmented CLEW with some very basic logging support. Couldn't get other OpenCL debug tools working, so this will be helpful.
Experimented with evaluating sampling filters directly rather than distributing samples via PDF. Viewport render quality is nicer this way, but not sure its worth it to do a full implementation.
Added dicing camera and dicing falloff, which improve render quality and memory usage of adaptive subdivision. D2891
Made another attempt at getting kernel reloading to work, this time starting from scratch to make it (almost) completely single threaded. Something is still causing the drivers to hang the entire system (this really needs to be fixed in the drivers already). With access to logging and things being limited to a single threaded now it should be easier to narrow down whats tripping things up. Will try doing this next week.

October 22th - 28th

Did more work trying to debug kernel reloading. Nothing jumps out as the source of the problem.

October 29th - November 4th

Added support for face varying interpolation from OSD to Cycles. This enables the use of the smooth uv option with adaptive subdivision.
Tried kernel reloading with new drivers. Behavior of hangs is slightly different. It appears reloading is working just fine, but something else causes the hang. Will not be spending more time on this until getting some kind of update from AMD.
Tweaked split kernel memory usage to avoid excessive memory usage on some systems

November 5th - 11th

Updated and committed old patch to remove max closures as a build option, reducing the number kernel recompilations.
Ran thru a list and eliminated a bunch of simple ideas to speed up kernel building. The kernel is just too complex for any trivial solution here.
Started splitting kernel functions in an effort to remove all indirect calls to svm_eval_nodes (the function that takes the most time to compile). Splitting the direct_emission function and kernel gave a build speed up of 5 secs (the expected amount), with only 1% render slowdown. It should be possible to apply this elsewhere, but there may be some limits (functions that call svm_eval_nodes in a loop for instance).

November 12th - 18th

Split more instances of svm_eval_nodes out of kernels, getting a few more seconds shorter build times.
Fixed an issue with split branched path tracing where shader data memory got clobbered causing causing incorrect renders or hangs.

November 19th - 25th

Fixed wrong shading of brick texture on AMD GPUs.
Refactored faster building branch as requested in review.
Fixed a few small bugs and a sporadic crash in branch.
Split a few more functions, kernels for BMW and classroom now build in half the time as master and rendering is 1% faster.

November 26th - December 2nd

Bug fixing for various artifacts with split branched path tracing.
Still a few artifacts caused by 8ef6f7e80f, obvious cause is contention on ray_state data from using two queues in the same kernel, however using one queue doesn't completely resolve the issue. (may also be causing hangs on some systems, but I was unable to reproduce)
Attempted to use clarmor to help debugging hangs as suggested by Ben, unfortunately this produced nothing of use, I suspect the cause of the hangs is outside the scope of what is detected by clarmor.
Implemented simple bounds checking and reporting to help catch source of hangs. Made BMW 10-100% slower so would need to be left disabled by default. While working on this I've discovered that dead code affects kernel runtime performance, suggesting the compiler may be mishandling dead code somehow. If this is really the case the impact could be quite severe.

2017年12月4日 (月) 17:41時点における版 (ソースを閲覧) wiki>Maiself	2018年6月29日 (金) 06:21時点における最新版 (ソースを閲覧) Yamyam (トーク \| 投稿記録) 細 (1版をインポートしました)
(相違点なし)

「利用者:Maiself/Foundation/2017」の版間の差分

2018年6月29日 (金) 06:21時点における最新版

目次

Weekly reports for 2017

January 1st - 7th

January 8th - 14th

January 15th - 21st

January 22nd - 28th

January 29th - February 4th

February 5th - February 11th

February 12th - February 18th

February 19th - February 25th

February 26th - March 4th

March 5th - March 11th

March 12th - March 18th

March 19th - March 25th

March 26th - April 1st

April 2nd - April 8th

April 9th - April 15th

April 16th - April 22th

April 23th - April 29th

April 30th - May 6th

May 7th - 13th

May 14th - 20th

May 21st - 27th

May 28th - June 3rd

June 4th - 10th

June 11th - 17th

June 18th - 24th

June 25th - July 1st

July 2nd - 8th

July 8th - 15th

October 8th - 14th

October 15th - 21st

October 22th - 28th

October 29th - November 4th

November 5th - 11th

November 12th - 18th

November 19th - 25th

November 26th - December 2nd

案内メニュー

検索