Dev:2.8/Viewport/PBR Pipeline/Implementation
This implementation document was put together by Hypersomniac prior to the Viewport sprint. It can be used as reference for the upcoming PBR support in Blender.
Pre Render
1. Update Probes
Before we begin rendering we need to make sure we have all the informations needed to light our models.
For this we uses what is called Probes.(Unity's naming) They capture environment at a specific point in space.
We need information in all directions so we render a cubemap from this specific point.
A world cubemap that only contains the world texture need to be available as default probe source. This is the distant probe infinitely far away.
For local cases, I think that each object should have the possibility to become a probe source. This way no other object type is required and no complicated setups have to be made to get good lighting approximation on one specific object.
Local Probe assignation is discussed later.
Optimization : We can render the 6 faces of the cubemap with a single call with geometry shader
Depending on what is needed we have to precompute the diffuse contribution of this environment.
We have Two choice of storage for this.
- Low res Cubemap : Cubemap size * 6 storage, need additional Shader Sampler, no light bleeding, will be Pixelated.
- Spherical Harmonics : (9 or 16 coefficient to Compute), low storage cost, light bleeding, smooth result, easily interpolated.
I think going with spherical harmonics is a good choice and a lot of people seems to use it already. The artifacts introduced by the 9 bands could be “solved” using 16 bands. This could be a quality parameter. We can use compute shader if available for this task.
If we need to prefilter the cubemap for a specific BRDF we have to also do it at this stage.
For this I would be inclined to use Prefiltered Importance sampling as is decrease drastically the process time. A scene could have a lot of probes and refreshing all of them should not take more than a few seconds.
Optimization : this process should be async and not freeze the UI. But necessary before each frame of an « Opengl Render ».
One other important point is refreshing the captures.
My proposition on this point is :
- World probe is updated every time the world node tree changes.
- Local probes are updated only on request. An additional auto refresh checkbox (per probe/global?) would refresh the probe on any scene changes.
Additionally we can provide an operator to update every probes.
Another concern is light bounce. If we don't render only direct lighting inside the probes, we can fall inside a dependency loop. As the probes in the refresh list are updated, the firsts become obsolete because of the new one. I suggest to allow more than one refresh to aleviate this problem and allow more stable animations and allow light to bounce. Disabling probes for the probes renders is also a possibility eliminating this problem.
So pseudo code for this stage is :
Gather probes tagged to be refreshed. For each light bounce For each probe to refresh Renders the scene to a cubemap render target Prefilter cubemap / compute SH Store result and tag the probe as refreshed
2. Pre Render Buffers
If we want to support screen space effects such as Screen space reflection (SSR) or occlusion (but not post render to apply it effectively) we have to get access to screen information when rendering the geometry.
A depth prepass used for culling could be re-used for ambient occlusion. But for reflection we need a color buffer.
So we have two choices :
- Use the previous frame with re-projection : Faster, may have temporal artifacts, bouncing lights for free.
- Render the scene first without SSR : Easier to implement but Very Slow, no light bounce.
I will definitly go for the first choice but as of now (september 2016) only the second is implemented.
In render mode we might give the opportunity to render the first frame (or all frames) twice to cancel temporal artifacts.
We can also at this stage render the scene with normal inverted to know the thickness of all pixels. This is good to have when doing screen space raytracing.
If we use Hierarchical Zbuffer raytracing we have to downsample the buffers to be able to accelerate the tracing process. This would be done at this stage.
For cone trace reflections we also need to prefilter a color and a visibility buffer.
I propose to use the HiZBuffer for raytracing SSR and AO.
So pseudo code for theses steps are :
Render Depth Buffer (like a shadowmap) Render Depth Buffer with normal inverted Reproject Color Buffer (optional) Make HiZ Buffers Downsample Depth Downsample Backface Depth Downsample Color Buffer Create Visibility Buffer
3. Object Setup
For each object we must find the right data to feed the PBR to make it look right.
For reflection probe there is a few different solution here :
- We can set the probe to use for each object. Either its own or the one of another object. Making transition from one probe to another without blending. The affectation of one probe to other objects is done manualy. This is the easy solution.
- We can set the probe to use based on influence radius / box of the probes. This is what is done in unity3D.
- We can blend between all cubemaps in the pixel shader. This is what is done in UnrealEngine4. But I see this approach impractical for us for a lot of reason. (can't importance sample multiple probe for performance reason, cubemap arrays support…)
Default choice is to use the distant / world probe. If a local probe is available then use it instead.
We could also allow user to provide custom hdri to use instead of scene captured data. This may need additional thoughts.
The lack of blending between cubemaps can be balanced by the fact object can have local cubemaps refreshed every frames.
Optimization : After this stage it would be good to sort objects/surfaces based on the shaders they use and the resources (textures, cubemaps) they uses to minimize state changes. But only doing it when it make sense (object added, Material Assigned...).
Further thoughts : Something need to be done for diffuse lighting. For the moment only Spherical Harmonics are used to store light received at a point in space and applied to the whole mesh. Realtime Global illumination technique already exists but not trivial to implement.
4. Render Surface / Shader compilation
At this stage we bind textures and update uniforms variables. Care must be taken to do this efficiently. The material could have an override option for it's probe too.
One concern is that PBR relies on lots of textures to achieve good performance (Look Up Tables, Depth Buffer, …) and number of texture slots are not unlimited. Also Shadowmaps are also taking texture slots.
Bsdf Rendering
5. Direct Lighting
Direct Lighting is the influence of scene lamps on the bsdfs.
Optimization : Sampling Shadow map should be done once for all opaque bsdf.
As of now lights that are in the scenes are compiled inside each shaders with their attributes dynamically updated. This implies that all surfaces get the lighting cost no mater how far the light is, that adding/removing light (or changing static attribute) recompile every shaders in the scene. But this means that we can inject what is inside the light node tree directly to the shading pipeline.
We could optimize this and do a dynamic light loop with maybe light culling but loose the node tree injection part. So performance of functionnality? We could also render the nodetree to a texture and sample it in the shader when lighting.
Not all light type / bsdf combination have approximation technique. Therefore we should use the most appropriate fall back or use a fitted solution.
Shadows are still sharp even with variance shadow maps. But no ideal solution for now.
Note that when using Area light, the glossy shader approximation using Linearly Transformed Cosine needs 2 LUTs.
6. Indirect Lighting
Indirect lighting is the influence of the probe on the bsdfs.
There is two choices here:
Filtered Importance Sampling
First, we can use few approximations and do everything with Filtered Importance sampling:
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch20.html
This has the benefit to not be confined to one Bsdf and mix bsdfs in one shader. Also Results is very similar to Cycles. But performance is poor due to the number of sample we have to use. Using filtering with mipmaps helps to lower this count but it's still remains a bit hard to render whole scene.
Also to get an even distribution of the samples, random samples has to be generated per pixel (and possibly per frame).
Optimization :Possibly make this random samples only once for all bsdfs. Grouping bsdfs in a loop. This could be achieve at the shader generation step.
For each bsdfs Setup bsdfs For each samples Generate Hammersley sample For each bsdfs Sample Probe For each Bsdfs outputs Color
But this means rejecting all shaders that have bsdf nodes plugged inside non-bsdf inputs. We could just reject bsdf that goes back into other bsdfs but that's a bit more complex.
Split Sum Approximation
The other solution is to prefilter (convolve) the probe to a specific Bsdf. Different Roughness level are baked into the mipmap of the cubemap. For the diffuse lighting the Spherical Harmonics are enough to contains the low frequency of the convolved lighting. But this mean the cubemap cannot be used for other bsdfs.
This needs a precomputation phase but it can be lowered to a few milisec. https://placeholderart.wordpress.com/2015/07/28/implementation-notes-runtime-environment-map-filtering-for-image-based-lighting
We also need a LUT to precompute the BSDF contribution. This is the split sum approximation from Unreal4. This would be a lot faster than importance sampling because all the complicated stuff get simplified to 2 texture lookup. With this method we loose the stretched reflection depending on the view angle.
Unlike Unreal, Cycles does not account for the fresnel effect inside the Glossy shader. This means we can omit the fresnel channel inside the LUT. The fresnel effect can be retrieve by using a trick inside the shader node tree itself so I don't see that as a problem.
With each methods we can use parallax correction to get the reflection grounded to the world and not sliding with the camera.
I also plan on supporting planar reflections. But we have to make them good looking even for rough materials.
When using planar reflections or parallax corrected probes we must have a fallback probe for rays that fails.
So textures that needs to be bound for these methods are :
- probe
- fallback probe
- Hammersley Samples (required for AO OR importance sampling)
- Per Pixel Rotation Texture (required for AO OR importance sampling)
- Bsdf LUT (required for split sum approximation OR SSR)
- For AO & SSR : Screen Depth buffer (Can have min - max Hierarchical levels)
- For SSR : Screen Color buffer
We might fallback to hacks for bsdfs that have no simplified algorithm.
Ambient Occlusion
Ambient Occlusion should be applied to the indirect lighting only so it must be rendered before merging the Direct and Indirect lighting.
It will be rendered when shading the geometry and before evaluating the bsdfs (or in parallel).
We could have different Ambient Occlusion algorithm. One high quality and one faster for preview. We can use the HiZ buffer to our advantage to accelerate tracing rays.
Also ambient occlusion need to be applied differently to diffuse and glossy shaders.
Screen Space Reflections
I aim to implement Screen space cone traced reflections as described in GPU Pro 5
HiZ Tracing is already done (needs a few correction) but the cone tracing is not done.
Post Render
7. Transparency
This is subject to high interest.
Transparent surfaces should be put into a second pass to render them in front of opaque geometry.
But we have problems when multiple transparent object overlaps.
We can't avoid every problem but we could at least let the user have a mean to give object priority for this sort order. Plus we should sort them by distance to the camera by default.
We could also for completeness introduce a Modulate/Multiply Blend mode that darkens the scene. This would mimic the behaviour of the transparent shader in Cycles.
https://www.opengl.org/discussion_boards/showthread.php/144760-Blending-Mode?p=1038945&viewfull=1#post1038945
8. Post drawing
We have to save the pre tonemaped HDR color buffer for the SSR of the next frame.
Then Use a tonemapper operator (we can use OIIO for that) to map HDR to LDR.
Then pass the frame to the Compositor.
And all eye candy effect will be handle there but that's not the scope of PBR itself. Bloom and camera motion blur should be easy to do.
Technical Details
Minor Features / Improvement Ideas :
-Time parameter or a Frame number variable that gets updated by shader uniforms. This way we could have pretty simple way to animate shader.
-The Displace output could be used as Vertex Position Offset along their normal inside the vertex shader.
-Debug pass : As in cycles we should have a mean to isolate lighting components either for compositing or debugging. Like Diffuse/Glossy/Transmission/Emission. Maybe MRT could be our friend here.
-Cubemap Storage : This is a point of concern. Do we save the cubemaps ? Where ? If they are fast to recompute then I think it's not worth it. We should however, be capable of exporting the generated cubemaps. But custom/exported cubemaps should not contains the prefiltered mipmaps.
-Faster Compilation time / async : Do not let some slow action freeze blender's UI. Like shader compilation or cubemap prefiltering.
-Separate shader file : We should organize the glsl Shading code into more little files that would be stitched together when starting blender. That way everything works as intended. Translucency & subsurface still need to be investigated.
-Unlock getting HDR output for compositing.
Planning
I will separate the whole work into chunks that can be reviewed easily. Also sorted by priority from a user/dev perspective.
- Add a button / interface that enable the new high quality shading mode. My initial Idea was to use a checkbox when in material shading mode when under cycles render engine.
- Unclamp Hdr textures inside gpu_draw.c
- Add a bsdf interface inside gpu_material.c.
- Add a file that handles all probe specific function gpu_probe.c
- Add a function that renders the world probe (view3d_draw.c)
- Split Glsl shading files
- Add spherical harmonics computation
- Add the filtered importance sampling inside the brdfs.
- Add scene lamps loop. Add basic lighting functions.
- Add support for LUTs
- Add support for LTC (ggx area light)
- Add prefiltered Cubemap
- Add Object probes
- Add prepass depth and backface
- Add Ambient Occlusion (will bypass the post process one when in high quality mode)
- Add temporal reprojection of last frame inside a reflection buffer and preblur it
- Add HiZ Cone traced reflections
- Add planar reflections
Architecture
- gpu_pbr.c : Contains all functions related to screen buffers and the GPUPBR that holds the LUTs and screen buffers texutres that the view3D needs.
- gpu_probe.c : Contains all functions related to probes rendering, filtering, and spherical harmonics computations. Define the GPUProbe type and create the probes.
- gpu_material.c : Add all light functions logic with light node tree injection and Bsdfs interface. Also contains the binding of PBR related textures used in the shader.
- view3d_draw.c : Add all probe and buffer rendering / refreshing logic. gpu_pbr_update is call before drawing a frame and take care that all probes are updated.
- gpu_texture.c : Add LUTs creation and Cubemap creation.