Dev:2.8/Source/Viewport/DrawManager

Current Status

Drawing 12,500 * 3 edges is not hard for the GPU. The main slowdown comes from iterating over all the state changes due to immediate mode. There is no caching and thus all of this must be repeated every redraw. The immediate mode is not really a bottleneck for UI but for complex scene it can stall the GPU significantly.

As a reference point the current drawing code looks like this :

Observation

The scene does not change much between each redraw so the whole scene rendering state should be cached for fast redraw. This cache is needed to render the scene objects sorted and optimal drawing path. This imply a small memory footprint for each viewport but a few MBytes per viewport for complex scene is really not that much.

So we would have two step, one step where we populate the cache and then a fast rendering loop.

The new drawing code should look more like this :

Most of the data cached is static. It just changes when affecting materials, adding / removing objects. So even animation playback should benefit from such caching.

Things that are immutable like empties should be instanced and use a static VBO for each shape (lamps, cameras …). Small parts like object centers and relationship lines could be batched together inside specific VBOs.

Passes Cache Data Structure

This caching structure should allows us to optimise drawing by sorting rendering calls by resources, reducing API calls. Since each viewport can use a different engine with a different layer, we need to store one cache per viewport. The Engine is responsible to create, populate and render the Passes.

Things that are not object related (tool overlays, 3d cursor, grid, ...) don’t need to be in theses passes and can rely on new immediate mode.

This enables Engines to control rendering and framebuffer switching.

They are populated before sending them to rendering. No decisions has to be taken when rendering a pass.

The calls inside one ShaderBin can be multiple meshes to draw with the same settings. geometry pointer are store as call with model matrix. We can further optimize multiple calls to the same geometry with instancing. Engine is responsible to optimize order / number of shading groups and call order.

The engine then call the render function from the DRW module for each Passes. All low level optimisation is done by the DRW module under the hood.

Updating Cache

As long as the cache exists redrawing the scene should be trivial and fast. Rebuilding the cache from scratch can be slow for huge scenes. I don’t expect cache generation to take more than 1 second on very huge scenes (more than 50 000 objects).

For this reason cache should live as long as possible. Some uniforms needs to be pre-computated per frame and needs special storage. To avoid using extra storage in the cache data, theses values will have to be save outside of the cache and referenced by uniforms.

As we separate batching and rendering, we could use multi-threading to do the the batching process. One main rendering thread and others populating the passes. But most of the time passes are all populated at the same time. So we can only do this if we iterate multiple times on the object list.

Adding, Removing, hiding, showing objects, affecting materials can invalidate cache. Further optimisation could implement insertion / deletion of objects in cache.

We still need to address how to do frustum culling.

Implementation Details

Here you have a diagram that shows the code flow of the draw manager and the draw engines. On the right you have an example of each function of a draw engine.

WARNING : This diagram is outdated. g_data are no longer static vars but are stored inside the StorageList of the engine instead.

And here you have the corresponding cache created by the example engine C.

We will now detail every step as notes in the diagram.

1. Enable Engines (or gather)

First of all, the draw manager select the appropriate engines needed for a particuliar viewport (based on context and viewport parameter). They are put in a list sorted by layering order (as in engine layering. Nothing to do with the layer system). This set in stone the rendering order. Most of the time it follows this order : Renderer Engine (Cycles, Eevee, ...) -> Overlay Engine -> Object Mode Engine -> Active Mode Engine

2. Init Engines

Second step is to run the init function of each engines. This function take care of 3 things : - Ask the draw manager for custom framebuffers. - Make sure engines shaders are compiled. - Run any engine specific pre-rendering code.

If you need to do something that is not rendering, then it's the right place to do it. Every engine level variable needs to be a in nameless struct called e_data.

Cache Creation

The drawing cache is separated in three callback functions. Each engine can have it's own. If the callback is there, it's call in the same order the engine were enabled.

3. Engine Cache Init

The main task here is to setup the cache components : Passes & Shading Groups. When this function is called, you can assume that all passes from earlier cache have been free.

Pass

Passes are independant blocks of rendering commands. They are filled with shading groups. Passes also make sure the openGL state is set to what you expect it to be regardless of previous draw commands.

Once filled the passes are stored per GPUViewport in the specific ENGINE_Data struct. This way they may be reused when

ShadingGroup

A ShadingGroup is a collection of Batches (containing geometry) associated with a shader and uniform. It ensure all object using the same shader are drawn without having to bind/unbind the shader program. Shading groups will be render in the same order they were created / added to their own pass. A shading group is unique and lives inside a pass.

Special ShadingGroup allows instancing and batching of small primitives.

Each Time a ShadingGroup is created you must specify it's uniforms with it. Keep in mind that uniform values does not need to be updated if you use the same shader as the previous ShadingGroup. Due to the nature of the cache you cannot store store uniform value in the ShadingGroup itself. Instead, you must store a reference (pointer) to it's value. That way, if the value updates, the cache can still be used.

The important point : When creating ShadingGroup, you get a reference to it. You have to save it until the cache generation's end. Otherwise it's lost and you cannot add anymore batches to it. For this use a struct g_data containing all references to theses ShadingGroups. You will use them when running per object code. Of course if you don't need to add anymore Batches after ENGINE_cache_init(), then you don't have to save it.

After this function, the order of the shading groups (inside their respective passes) is definitive, but the order of the passes are not (see Engine draw scene).

4. Engine Cache Populate

This function is called once per object inside the active layer. In this step you fill the previously created shadingGroup saved in g_data. Nothing stops you from creating new shadingGroup here but beware of the resulting overhead. If it's a solid object (i.e. it has a geometry data) you can request a specific batch from the MeshRender API. If you need a special value for one object, the best way would be to store it inside the object data, and update it from outside the cache generation. If you need to draw something that is view dependant, you may prefer to do it in the shader, or rely on something that is reevaluated every frame in ENGINE_engine_init().

5. Engine Cache Finish

The common use case for this callback is to update the UBO with the data gathered during the cache generation. You can also run some optimisation code here.

6. Draw Background

This function is only called for one engine (usually the render engine). It's in charge of clearing the default framebuffer. You can pretty much do whatever you can do in ENGINE_draw_scene here too (see Draw Scene for more details).

If no enabled engines has a background function, then the default one is used instead.

The real reason this function exists is to allow callbacks before rendering scene object.

7. Draw Scene

The goal of this function is to draw something to the default framebuffer. The default framebuffer is a simple offscreen buffer composed of a RGB8 color texture and a depth texture the size of the viewport.

To draw, just call the DRW_draw_pass() with the adequate pass from current viewport's cache. To get this cache, use the DRW_viewport_engine_data_get(“EngineName”). It contains all data associated with the viewport that is currently being rendered.

The data struct contains pointer lists that are grouped by type : Framebuffers, Textures, Passes and Storage. This is because the Viewport itself needs to free some of theses depending on some actions (i.e. resizing the viewport frees only the Framebuffers and Textures). The storage list is a general purpose per-viewport storage place. Only store pointer to directly allocated memory (with MEM_callocN or MEM_mallocN) in this list because it will be freed by MEM_freeN().

Much like cache functions, draw_scene functions are called in the same order as other functions like cache_init. So always assume that previous engines have already written to the default framebuffer. This means that you have to take care to enable depth test when necessary to not override too much information.

Sometimes it's handy to render to another separated framebuffer for blending or other effects. To do so, bind the custom framebuffer (created in engine_init()) with DRW_framebuffer_bind() and go on.

Important: When finishing this function, the draw module expect that the default framebuffer is bound and valid (with it's textures attached).

Reminder: You cannot have access (in your shader) to a texture currently used by the active framebuffer. If you need read access only, detach the texture with DRW_framebuffer_texture_detach and reattach with DRW_framebuffer_texture_attach once finish drawing with it. If you need read AND write access, then you must create another temporary framebuffer to write to.