「利用者:Maiself/Foundation/2017」の版間の差分

提供: wiki
移動先: 案内検索
 
(1版 をインポートしました)
 
(相違点なし)

2018年6月29日 (金) 06:21時点における最新版

Weekly reports for 2017

January 1st - 7th

  • Sent a proposal for the removal of the tile size setting. In its place code will be added to automatically chose work load size, giving maximum performance without concerning users of the details. This is also the best option to replace tile splitting.
  • Began implementing support for a device to acquire and render multiple tiles at once.
  • Started to prototype ideal device work load calculations, unfortunately found more driver issues, will need to report and find alternative ways to calculate a good work load size.

January 8th - 14th

  • Implemented multi tile rendering for CUDA and OpenCL mega kernels.
  • Experimented with different tile configurations, got victor rendering 37% faster on AMD hardware (mega kernel).
  • Gathered all driver issue info and reported.
  • Started to work on multi tile for split kernel.

January 15th - 21st

  • Worked on multi tile rendering for split kernel, however updating the many parts of code needed and debugging buffer offsets was taking too long, so putting multi tile aside for now.
  • Made a patch to get adaptive subdivision working with panoramic projections.
  • Redid work stealing in split kernel giving a potential speed up. Buffer offsets still need a bit of debugging.

January 22nd - 28th

  • Finished new work stealing and confirmed speed up of 15-40% with AMD W9100.
  • Fixed issues with split kernel for both opencl and cuda.
  • Started implementing micropolygon grids, which will give memory savings for adaptive subdivision and solution to several bugs in the tracker.
  • Spent time to update drivers and test with split kernel branch. New drivers are hanging on Ubuntu. Sent an email to driver team detailing the issue.

January 29th - February 4th

  • Wrote some scripts for testing with CodeXL as the UI isn't working for me.
  • Investigated performance, register usage and occupancy of split kernel on AMD OpenCL. Would like to test results under a few more conditions before making a report.
  • Closed a few bugs in tracker related to memory / known driver issues. Workarounds are in the split kernel branch which would be nice to merge soon.
  • Started to prepare split kernel branch for review and merging with master as discussed with Sergey.

February 5th - February 11th

  • More in depth investigation of register usage. Found lots of registers used even outside of giant switch, might be able to improve that a bit. Inside switch is still a major problem, send email asking asking for advice.
  • Refactoring of split kernel branch to prepare for master, fixing a few bugs along the way.

February 12th - February 18th

  • Tried to get SSS and volume patches working consistently on all hardware. Seems there are some issues in the AMD compiler that are causing artifacts in SSS. Volumes don't work on Nvidia, not sure why yet.
  • Cleaning up code to make more ready for master
  • Fixed a few minor bugs
  • Made patch to speed up noise texture
  • Found some adjustments to make split kernel a bit faster

February 19th - February 25th

  • Fixed a few more bugs in the branch
  • Got the branch nearly ready for master, still a bug in viewport rendering on Nvidia OpenCL to sort out
  • Did some testing and reading over some patches that need reviewing (sss, volumes, selective node compilation).
  • Dug deeper into register usage and started to try and improve the situation.

February 26th - March 4th

  • Fixed some rendering artifacts in split kernel (hopefully issue with Nvidia OpenCL is fixed now)
  • Fixed a few more minor bugs
  • Moved buffer size calculation from host to kernel, which should open up opportunity for better performance with Sergey's isect patch.

March 5th - March 11th

  • Merged split kernel branch into master
  • Did final testing of SSS and volumes
  • Adapted to get Intel OpenCL working. Seems the compiler is very picky. It also got stuck on a very simple line which prevented me from getting further.
  • Tried out a few ideas to make split kernel faster (binary search, if vs switch, major code removal, etc) Everything turned out to be slower somehow. Have a few more ideas for next week.
  • Fixed T50888: Numeric overflow in split kernel state buffer size calculation

March 12th - March 18th

  • Various performance and memory tweaks for CPU and OpenCL CPU devices with split kernel.
  • Fixed some issues with barriers in the split kernel and removed some related todos.
  • Added some checks to avoid infinite loops in split kernel
  • Worked on getting branched path tracing implemented for the split kernel. Code is a bit messy right now and there are some limitations (no volumes/sss/sample all lights yet) but basic shaders are working just fine.

March 19th - March 25th

  • Branched path tracing implementation for the split kernel progresses nicely, should be ready for review next week.
  • Encountered some artifacts while working on branched path tracing that looked similar to the ones seen with GCN 1 cards. Will try to reproduce again and see if there's a fix to be found.
  • Researched and brainstormed improvements to split kernel architecture, there's quite a few things we haven't tried yet to improve performance.
  • Started working on task reordering to see if we can get a speed up with that.

March 26th - April 1st

  • Continued working on branched path tracing, unfortunately ran into some issues with volumes that took a while to solve. They are working fine now tho.
  • Started to clean up the branched path tracing code and deduplicate things a bit.
  • Fixed a few minor things in Cycles, still need to separate from local branch and commit tho.
  • Investigated extra noise in branched path tracing with increase in number of lights (also in master), no solution yet.

April 2nd - April 8th

  • Cleaned up and fixed a bunch of issues with branched path tracing in split kernel, submitted for review.
  • First thing next week will review denoising and split kernel patches.

April 9th - April 15th

  • Implemented global size calculation for CUDA split kernel
  • Started to implement tangents for meshes with adaptive subdivision
  • Did testing of branched path tracing to see if there are any issues

April 16th - April 22th

  • Deduplicated integration loops and submitted for review, still more work do be done but need feedback before continuing
  • Investigated issue with SSS in split BPT, turned out to be a compiler bug, nirved provided a workaround

April 23th - April 29th

  • Found a fixed a few issues with path termination in split BPT
  • Found and fixed a problem with how SSS and volumes contributed to renders in BPT
  • Branched path tracing appears to be ready for master, still waiting for review
  • Tested and enabled CMJ and single program by default for split kernel
  • Fixed artifacts when render with split kernel is canceled
  • Fixed crashes in master from removal of image limits

April 30th - May 6th

  • Finally got branched path tracing into master
  • Patch to clear kernel cache
  • Fixed issue on Nvidia OpenCL from driver workaround
  • Did more work for tangents for subd (not working yet)
  • Began working on speed up for BPT

May 7th - 13th

  • Reworked plan for micropolygon grids to make work doable in smaller independent parts
  • Continued looking for solution to artifacts seen with BPT speed up
  • Created a patch to pass all buffers to kernels
  • Experimented with patch to remove split_data_entries, was noticeably slower

May 14th - 20th

  • Did a quick look into using split kernel with baking (requires non trivial changes)
  • Fixed native only build option
  • Fixed crash when saving images
  • Tracked down and fixed random noise pattern from unset differentials
  • Reload kernels when requested features change (T49496)

May 21st - 27th

  • Patch to deduplicate split kernel function definitions
  • Finally got speedup for BPT working without problems (70% faster)
  • Testing of and patch to disable shader sorting

May 28th - June 3rd

  • Finished all buffers patch, only 5% slower for large samples
  • Prepared test build for AMD
  • Looked into possibility of automatically generating the split kernel. Would make maintenance easier and (some potential for speedup but hard to estimate without actually implementing)

June 4th - 10th

  • Worked with MohamedSakr in IRC to try to get OpenCL working on OSX
  • Merged BPT performance patch
  • Patch to make rendering faster by changing tile update logic

June 11th - 17th

  • Tried to find solution to regression from all-buffers patch
  • More research into automatic generation of split kernel and related topics

June 18th - 24th

  • Looked at issue with baking again
  • Another attempt at fixing all-buffers regression

June 25th - July 1st

  • Successful prototype (tho limited in scope) of idea that came out of looking into alternative split kernel architectures
  • More work on performance regression
  • Minor code cleanup
  • More work on getting baking working properly with OpenCL
  • Disabled baking in mega kernel to improve build times (which have gotten quite long)

July 2nd - 8th

  • Committed patch to provide better error message to users when the GPU runs out of memory
  • Added debug option to simulate lower memory conditions for OpenCL
  • Fixed minor issue with comparison in principled BSDF

July 8th - 15th

  • Final attempt at fixing performance regression (by passing all buffers to all functions, 4-65% slower)
  • Fixed issue where SSS was included in kernels from principled BSDF even when not in use
  • Implemented virtual buffers to allow more textures than can currently fit into GPU memory


...


October 8th - 14th

  • Caught up on changes made while away
  • Started poking at code again

October 15th - 21st

  • Augmented CLEW with some very basic logging support. Couldn't get other OpenCL debug tools working, so this will be helpful.
  • Experimented with evaluating sampling filters directly rather than distributing samples via PDF. Viewport render quality is nicer this way, but not sure its worth it to do a full implementation.
  • Added dicing camera and dicing falloff, which improve render quality and memory usage of adaptive subdivision. D2891
  • Made another attempt at getting kernel reloading to work, this time starting from scratch to make it (almost) completely single threaded. Something is still causing the drivers to hang the entire system (this really needs to be fixed in the drivers already). With access to logging and things being limited to a single threaded now it should be easier to narrow down whats tripping things up. Will try doing this next week.

October 22th - 28th

  • Did more work trying to debug kernel reloading. Nothing jumps out as the source of the problem.

October 29th - November 4th

  • Added support for face varying interpolation from OSD to Cycles. This enables the use of the smooth uv option with adaptive subdivision.
  • Tried kernel reloading with new drivers. Behavior of hangs is slightly different. It appears reloading is working just fine, but something else causes the hang. Will not be spending more time on this until getting some kind of update from AMD.
  • Tweaked split kernel memory usage to avoid excessive memory usage on some systems

November 5th - 11th

  • Updated and committed old patch to remove max closures as a build option, reducing the number kernel recompilations.
  • Ran thru a list and eliminated a bunch of simple ideas to speed up kernel building. The kernel is just too complex for any trivial solution here.
  • Started splitting kernel functions in an effort to remove all indirect calls to svm_eval_nodes (the function that takes the most time to compile). Splitting the direct_emission function and kernel gave a build speed up of 5 secs (the expected amount), with only 1% render slowdown. It should be possible to apply this elsewhere, but there may be some limits (functions that call svm_eval_nodes in a loop for instance).

November 12th - 18th

  • Split more instances of svm_eval_nodes out of kernels, getting a few more seconds shorter build times.
  • Fixed an issue with split branched path tracing where shader data memory got clobbered causing causing incorrect renders or hangs.

November 19th - 25th

  • Fixed wrong shading of brick texture on AMD GPUs.
  • Refactored faster building branch as requested in review.
  • Fixed a few small bugs and a sporadic crash in branch.
  • Split a few more functions, kernels for BMW and classroom now build in half the time as master and rendering is 1% faster.

November 26th - December 2nd

  • Bug fixing for various artifacts with split branched path tracing.
  • Still a few artifacts caused by 8ef6f7e80f, obvious cause is contention on ray_state data from using two queues in the same kernel, however using one queue doesn't completely resolve the issue. (may also be causing hangs on some systems, but I was unable to reproduce)
  • Attempted to use clarmor to help debugging hangs as suggested by Ben, unfortunately this produced nothing of use, I suspect the cause of the hangs is outside the scope of what is detected by clarmor.
  • Implemented simple bounds checking and reporting to help catch source of hangs. Made BMW 10-100% slower so would need to be left disabled by default. While working on this I've discovered that dead code affects kernel runtime performance, suggesting the compiler may be mishandling dead code somehow. If this is really the case the impact could be quite severe.