GSoC Proposal: Implementing a Denoising Filter in Cycles

The goal of this project is to implement an optional denoising step in Cycles, which would be executed between the actual pathtracing and the Compositor and remove remaining noise from the image, at the cost of some accuracy.

In the last few years, a lot of great research regarding denoising the output of pathtracers has been published, but these algorithms rely on additional information provided by the renderer, which makes an integration in the Compositor very memory-intensive and a challenge in UI design.

Because of that, this proposal is about having a denoiser right in Cycles - where all the additional information (like feature passes and variance info) is available and can be used to produce results that are far better than general image denoising, often allowing to cut render times by 75% or more.

The workflow will be as simple as possible: An additional panel in the rendering properties will allow to switch denoising on and off, along with further options for advanced users to fine-control the behavior of the filter.

Introduction and Contact Info

I’m Lukas Stockner, an 18 year old physics student from Germany. In the last 2-3 years, I became involved with Blender development, especially regarding the Cycles rendering engine.

I can be found/contacted as lukasstockner97 in the following places:

In #blendercoders and #cycles (on irc.freenode.net)
On developer.blender.org
On BlenderArtists

Also, you can reach me by mail at lukas.stockner at freenet.de.

Benefits

Since Cycles is a Path-Tracer, which is a rendering algorithm based on Monte-Carlo integration, the results are unbiased, but noisy. By increasing the number of samples, the noise level decreases, but halving the amount of noise means increasing the samples by a factor of 4. Because of this, getting perfectly smooth and noise-free results out of Cycles can take extremely long. However, often the true result can already be recognized easily from the noisy output, such as in out-of-focus areas or areas of constant color. In those cases, applying post-processing filtering to the image can save lots of time while still producing a result that is very close to the correct image.

In the last few years, there have been significant advances in research, producing algorithms that are specialized for denoising renderer output and therefore produce results far better than general denoising algorithms that were developed for photographs or video clips. However, there is no open-source rendering denoising code available (except for reference implementations published by researchers, which are unusable for production or even non-developer use), which is why Blender users still have to rely on complex compositor setups or external, general-purpose denoising tools.

Having a modern, specialized denoising algorithm directly in Cycles would make it far easier for users to use denoising while also producing superior results.

Deliverables

A denoising algorithm in Cycles that can be activated with a single checkbox and produce comparable results at a significantly lower number of samples (and therefore shorter rendering time) for a broad range of scenes.
Additional options for advanced users to tweak the denoising for even better results.
Documentation how to use this feature (which should be quite short, considering that the denoiser is exposed via an extremely simple user interface).
If there is enough time, an additional technical report (aimed at developers) detailing the internal structure and algorithms used.

Project Details

Proof-of-concept implementation

To test whether denoising is viable and reasonable for Cycles, I’ve developed a small and simple LWR implementation which already allows to test the filter.

The code quality isn’t really great yet and the filter performance and results can still be improved a lot, but it shows what the filtering is capable of and how a UI/workflow for it could look.

Its main limitations currently are fine geometric detail (hair/fur/grass and bump- /normal-mapped detail) and shadows, but that can still be improved a lot (for the details, the perceptual metric and some improvements with the T-SVD step should help a lot; for the shadows, a dedicated shadow feature pass will help).

The code can be found as commit d93cc73ffdf1 on the experimental-build branch.

Compositor Node vs. Directly in Cycles

In theory, since denoising is a post-processing operation, it should be done in the compositor. However, in practice, there are many reasons for implementing it directly in Cycles:

Lots of passes: All modern denoising algorithms are based on ”features”, additional passes like Diffuse Color, Shadow or Normals.
- A compositor node therefore would need to have lots of inputs.
- In addition to the passes themselves, variance information is usually needed, either as a direct variance pass or as Even/Odd passes, again doubling the number of inputs.
- Lots of passes also need a lot of memory. If this information is needed in the compositor, it will (in the worst case) be stored three times: In Cycles, both on the device and the host, as well as in the RenderResult. In Cycles, the filtering could be done directly on the device, only needing to copy back the filtered result.
As mentioned above, for best results, the denoising algorithm should be designed specifically for the rendering engine. Therefore, one of the advantages of an implementation in the compositor - being usable for all rendering engines - is not really useful here.
Another theoretical advantage of the compositor solution is that it allows to use the denoiser anywhere in the node graph. In practice, though, there’s no reason for using other compositing operations before denoising.
The device abstraction of Cycles allows for a easy and stable GPU implementation (the compositor has OpenCL as well, but the execution model might not work well for denoising). Also, this makes it possible to denoise without even copying the feature passes to the host.
Many denoising algorithms include an option for adaptive sampling, where the filtering algorithm produces a map that tells the renderer where more samples are needed (since different regions might have a different detail/noise level). Since the compositing is only executed after rendering, using this feedback mechanism would be impossible. In Cycles, however, it would be easily possible to run a ”prepass”, filter it to produce a preview and determine the adaptive map, and then use it for the rest of the samples.

Workflow from the User perspective

For the artist, the denoiser should of course be very simple to use - too many or too complex options would limit its usefulness for many people.

Therefore, the proposed workflow is pretty simple: The rendering properties tab would get a new panel named ”Denoising” that can be activated with a checkbox in its header (like motion blur, for example). Since LWR filtering chooses many options internally, the default settings will already produce decent results for most types of scenes (in my tests with the demo implementation, a half window size of 9 and a bandwidth factor of zero - like it’s described in the paper - worked well for most scenes).

Advanced users will be able to fine-tune the filtering settings by opening the panel and setting:

Half window size (which should be increased for low-sample previews)
A noise-vs-blurring tradeoff slider
Which types of lighting (diffuse/glossy, direct/indirect etc.) are denoised separately
Whether perceptual-guided denoising should be used (if that turns out to work well)

When rendering the image, the rendered tiles will be denoised as soon as they are finished (for GPU rendering) or as soon as their neighbors are finished (for CPU rendering). The combined pass of the render result will contain the denoised output, which then is further processed in the compositor as usual. Additionally, the user will be able to activate a new render pass, which will contain the unfiltered image (if denoising isn’t activated, it would be identical to the combined pass).

Possible large-scale changes after GSoC

The significant disadvantage of this workflow (as compared to a compositor node) is that the image needs to be rendered again if the user wants to use different parameters. As explained above, the goal is to reduce the need to do so, but totally getting rid of any options is unlikely.

The two main reasons for a Cycles-integrated implementation are memory requirements and handling of the additional passes. However, the pass handling could be improved significantly in the future, the pass system in general is in need of a redesign. Doing so would eliminate the second main reason, which would make a compositor-based implementation feasible.

This redesign is far too large to be included in this proposal, though. Therefore, the proposal is to implement denoising in Cycles for now and consider moving it to the compositor when a new render pass system is available - a lot of code could be reused, so the GSoC effort would not be wasted in any case.

Which denoiser should be used?

As mentioned above, there have been a lot of papers in the last few years regarding denoising. They have some things in common (like relying on feature passes), but of course there are significant differences. A good overview can be found in the 2015 Eurographics State-of-the-art report and/or the Siggraph course on denosising.

The two main interesting algorithms (from my perspective) are:

Robust Denoising: The basic idea behind this paper is to filter the image with multiple ”candidate filters”, in their case Non-Local means filters with different parameters, and then choose the filter that produces the lowest MSE error by applying a SURE estimator to the image. It produces good results, and the filtering performance is decent (according to the paper, 8sec to filter an 1024x1024 image on a GTX580).

Locally Weighted Regression: The idea behind this paper is to model the resulting image as function of the additional features and fit such a function to the data from the neighborhood of each pixel. The results are great for most scenes, and the filtering is still fast enough (in my tests, about 15-30sec for a 1080p frame).

Since I already have created a proof-of-concept LWR implementation for Cycles and the results so far are remarkable, I’ve decided to go for LWR in this proposal. However, the code layout will allow implementing multiple filtering algorithms, so RD could easily be added as well.

Improvements, tweaks and tricks

On top of the algorithm presented in the paper, some modifications can/might improve the results:

Using better feature passes: The LWR paper uses screen position (not explicitly stored, of course), depth, normals and texture color. Other papers suggest features such as an Ambient Occlusion pass or world position (instead of the depth pass). Also, some custom features such as the harmonic mean of the distance of the first bounce have shown promise in my initial tests.
Decomposing the image: As reported in a paper by Disney, denoising individual components of the image separately can produce better results. Since Cycles already supports path space decomposition (in the form of the Diffuse/Glossy/... direct/indirect passes), it makes sense to use this information for more efficient denoising (for example, it’s usually reasonable to denoise the indirect diffuse component more aggressively than the direct component).
Perceptual metrics: The derivation of LWR is designed to minimize rMSE (mean squared error divided by pixel brightness) by choosing a tradeoff between variance (noise) and bias. However, the rMSE doesn’t model the human perception very well - in particular, the frequency-dependent contrast sensitivity is missing. Essentially, it describes that in areas with high-frequency detail, the visibility threshold is higher. Therefore, noise on a clean white wall is far more obvious than the same amount of noise in the leaves of a tree or in grass/fur. On the other hand, these areas where noise can’t be seen as easily are the same areas where filtering tends to overblur the result. So, it makes sense to vary the amount of filtering depending on the frequency content of the area - useful metrics for that have been published in papers and as source code.
Avoiding tile-edge artifacts: Since each tile is rendered separately in Cycles and immediately copied into the RenderResult and freed once it’s finished, it would be great if the filtering could just be applied before copying the result. However, since the filtering is based a filtering window around each pixels, filtering artifacts would be visible along tile edges since the neighbor pixels would be outside of the tile. The obvious alternative is to use a global buffer and filter it after all the tiles are finished, but that would waste lots of memory - for a 4K rendering with a few extra passes activated, my current proof-of-concept code, which uses this global buffer approach, requires about 1.5GB of additional memory, already making GPU rendering impossible for large images. However, since the filtering window is not that big, and GPU tiles tend to be rather large, there is a third way: By rendering a bit outside of the borders of the tile, the required information for per-tile sampling is there, and the overhead is still small (the default window width is 5 pixels to each side, which doesn’t really matter for 256x256 tiles). For the CPU, yet another approach is viable: Since all current tile rendering orders essentially form a closed region of rendered tiles, it’s reasonable to put rendered tiles into a temporary storage, filter them as soon as their neighbors are done rendering, and free them as soon as their neighbors are filtered - therefore only keeping the outer border of the rendered area in memory.

Project schedule

23.5. - 29.5.: Adding support for the needed features (filtering task, feature buffers, Tile overscan etc.) to the host side of Cycles
30.5. - 05.6.: Adding exporting of the feature buffers to the path tracing kernels
06.6. - 15.6.: Implementing the basic LWR filtering as device-independent kernels (most likely two kernels)
21.6. - 26.6.: Evaluating and possibly implementing additional feature passes such as Ambient Occlusion, a shadowcatcher-like pass or roughness
27.6. - 03.7.: Implementing decomposed filtering and evaluating its usefulness
04.7. - 10.7.: Adding a new and additional noise-vs-blurring tradeoff parameter
11.7. - 17.7.: Adding a perceptual error metric and using it to drive the noise-vs-blurring tradeoff
18.7. - 27.7. Implementing an AVX-optimized CPU version of the filter (multiple pixels from the filter window can be processed in parallel with AVX, and since a few KB of temporary memory are no problem on CPUs, many calculations can be deduplicated)
28.7. - 11.8. Testing with various scenes and getting initial artist feedback, tweaking and improving the denoiser based on the results
12.8. - 23.8. Code review, cleaning up the code, writing both end-user as well as technical documentation

The individual items should easily fit into their time allocation, the last two items provide some additional time in case of unexpected complications (large problems are extremely unlikely since there already is a working proof-of concept implementation).

There is a gap of five days in June since my university exams will most likely be towards the end of June - the exact dates aren’t decided yet.

Bio

I’m an 18-year-old physics student from Germany (to be more precise, at the TUM in Munich). I do enjoy computational science and high-performance computing in general, but my main area of interest is computer graphics, especially physically based rendering.

After some early experiments with Raytracing, I got involved with active light transport research and did a few projects at science competitions such as ISEF or EUCYS about extending modern light transport algorithms such as VCM or UPBP, but quickly realized that there is quite a large gap between research and production software in CG and decided to also get involved with software that is actually used by people.

The obvious choice for that was Blender, since it’s open-source, has a huge userbase and lots of opportunities to implement useful features - so I started doing some Cycles coding. My first project didn’t work out (yet), but I learned a lot from it and have since then contributed numerous features to Cycles, the most notable one so far being [http://developer.blender.org/D1133 Light Portals].

Doing this GSoC project would be a great opportunity for me to combine those two interests - state-of-the-art research and production software.

利用者:Lukasstockner97/GSoC 2016/Proposal

目次