利用者:Geraldine/GSoC 2018/Reports

提供: wiki
移動先: 案内検索

Index

Week 1 (May 14 - May 18)

Week 2 (May 19 - May 25)

Week 3 (May 26 - June 1)

Week 4 (May 26 - June 8)

Week 1

Code familiarization

For the most part, GSoC is my first foray into the Cycles codebase. Because of this, this week, as well as the previous weeks leading up to the start of GSoC, was spent mostly exploring various parts of the cycles codebase that may be relevant to my tasks, such as

  • the exact algorithm used in render/mesh_volume.cpp
  • the conversion from Blender object to Cycles Mesh in blender/blender_mesh.cpp
  • image saving and loading and memory allocation steps in render/image.cpp, device, blender/blender_session.cpp, ~/blender/makesrna/intern/rna_smoke.c
  • numerous bits and pieces of the bvh and kernel files

Data structures

Currently, volumes are stored in kernel as raw arrays representing each voxel's value. Upon some research into optimal data structures for storing sparse 3D datasets, I came across some interesting papers that appeared to be relevant:

I had even implemented some parts of the code in Python before realizing that these structures were shown to have negligible advantages over simply using stl::map for fewer dimensionsal data, such as 3D images.

Texture saving and loading

For the most part, it seems like hashtables are the most efficient way to store sparse data sets. In order to optimize volume texture storage, I traced the top-most relevant code to ImageManager::file_load_image in render/image.cpp. It appears that smoke data specifically is retrieved through the custom functions in ~/blender/makesrna/intern/rna_smoke.c rather than using OIIO. Thus, I can implement here a simple conversion of the volume grid into a map of <coordinates, voxel value> pairs. All parent functions will also need to be modified to be compatible with a map object instead of a vector/array.

To-do next week

  • Modify the rna_SmokeModifier_*_grid_get functions to optionally recieve a thresholding value and return an array of coordinates + values of all active voxels.
  • Modify callers of rna_SmokeModifier_*_grid_get to properly interpret and manipulate the new array type.
  • Create a git branch for code changes.
  • Make comparison tests on memory performance.
  • If have time, move on to importing OpenVDB files.

Questions and random musings

  • Aside from logging the size of the data structures, is there another method to measure memory usage?
  • What is the difference in function between bvh and kernel/bvh?
  • What are closures?
  • What is a split kernel and how did it make Cycles more efficient than before split kernel was implemented?
  • Does index space vs object space mean the coordinate system of the world vs the coordinate system of the object? How is the object's coordinate system determined?

Week 2

Unfortunately was not able to do much this week, thanks to some personal stuff happening.

Notes on Last Week's Report

On advice from Brecht:

  • Conversion from dense to sparse grid should be done in render/image.cpp after image manipulation such as scaling. I was mistaken in thinking that there will be a performance issue otherwise.
  • Check out device/device_memory.h for memory handling and kernel/kernels/<device>/kernel_<device>_image.h files to see how volumes are read.
  • Don't use std::map for sparse grid (see below).
  • Overestimated timeline, will probably not get to Voxel primitive until next week at the earliest.

Sparse Grid Data Structure

Sparse grids should, instead of using map, have an auxiliary array to keep track of active and inactive tile indexes. Tentatively the structure should be something similar to this:

struct VolumeTile {
    float4 voxels[CUBE_SIZE * CUBE_SIZE * CUBE_SIZE];
}
VolumeTile *grid;   /* array of active tiles */
int *indexes; 
/* maps the index of a tile in the original volume to its index in the sparse volume
 * e.g. indexes[A] = B means a tile at orig_vol[A] is now at grid[B]
 * tiles not in grid (inactive tiles) have index -1. */

Since VolumeTile is just an array of floats, it may be better to forgo the struct and just make grid a float array and make indexes keep track of the start of each "tile". This will make the code more difficult to read though, and I am not sure about the performance difference.

Other progress

I have written functions for generating and looking up in the sparse grid, but most of the code has problems like inconsistent typing because I am not yet certain of how the volume will be called during rendering, so I cannot properly write the functions yet. I have spent most of this week tracing the rendering process.

Tracing function calls~

Since the volume textures will be stored as device_memory, I needed to trace what device_memory was converted to after device_update completed (seems to be ShaderData), and from there figure out how voxels are looked up.

From what I can tell, the lookup change should be inserted in the functions in kernel/kernel_volume.h that call ccl_device_inline bool volume_shader_sample(), but I will need to investigate further to be sure.

On an unrelated note, it may be worth templating the separate float/float3 functions in geom/geom_primitive.h and geom/geom_volume.h since most of the code is identical.

To-do next week

  • Finish tracking the whole volume rendering process.
  • Finalize data structure of sparse volume and insert the creation and lookup functions where appropriate.
  • Testing code, compare performance.
  • Create a Git branch, upload changes.

Week 3

While progress is slower than I'd like, I have added a working implementation of the tiling algorithm and some examples here (there are actually 2 very obvious typos in render/image.cpp that I accidentally committed, will fix them in the next commit).

As in the task comments, memory usage savings depend on the threshold, though even with the lowest possible threshold value, the sample image's memory usage was already reduced by ~33%. Later on it may be possible to make this value user changeable.

To-do next week

  • Modify sparse tiles to support generic types (currently only supports float4, should be fine as long they have a > operator overloaded).
  • Add support for all Volume attributes (currently only supports color).
  • Change detection method for calling sparse grid creation (currently, a new ImageDataType was created, but will probably change this to a bool member of Image).
  • Make this compatible with the Mesh Volume speedup. Because of they are currently mutually exclusive, using this implemntation will actually slow down rendering.
  • Implement lookup for tricubic interpolation (currently just throws an assert if a compressed texture tries to call it).
  • Add support for CUDA and OpenCL (this may actually not be tested soon since I won't have access to a PC with a compatible GPU for another couple of weeks).

All of these should be completed by the end of week 4.

Potential to-dos

  • Create a wrapper class for device_memory's of SparseTiles.
  • Support 2D tiling.
  • Make threshold value a user-inputted value.
  • Remove the SparseTile struct altogether and just treat tiles abstractly.

Week 4

So in this week, I believe I am pretty much done with the tiling function for volumes! The implementation should be completely finished for CPU and mostly finished for GPU by now, as all that is left to do is GPU testing and maybe add any other memory/speed optimizations we can think of. You can check it out here. Here is an example volume rendered with and without tiling:

Volumes with tiling Original image

Memory usage of 'color' (__tex_image_float4_000) reduced from 3.06M to 2.50M (18.3% reduction)
Memory usage of 'density' (__tex_image_float_003) reduced from 783.75K to 644.00K (17.8% reduction)
Memory usage of 'flame' (__tex_image_float_011) reduced from 783.75K to 292.00K (62.7% reduction)
Memory usage of 'temperature' (__tex_image_float_019) reduced from 783.75K to 290.00K (63.0% reduction)
Memory usage of 'velocity' (__tex_image_float4_008) increased from 3.06M to 3.06M, not using sparse grid. (0% reduction)
Render time increased from 8:46 to 10:07 (0.86x speed)

Color and velocity in particular are big contributors to memory usage, but especially velocity had no inactive tiles. Presumably with different settings, the memory usage can go as low as a third of the original texture.

Notice that there is nearly no noticeable visual difference between either image.

Last Week's To-do List

So this week I finished all the tasks in last week's to-do list:

  • Modify sparse tiles to support generic types (yay templates).
  • Add support for all Volume attributes.
  • Make compatible with Mesh Volume speedup. Now regardless of the combination of sparse or dense textures per mesh, the Mesh Volume alogrithm will run succesfully.
  • Implement lookup for tricubic interpolation. However, the function used is probably not that efficient.
  • Make threshold value a user-inputted value. Thanks to Brecht for the suggestion to simply use the same isovalue as used in Mesh Volume.
  • Remove the SparseTile struct altogether and just treat tiles abstractly. This was necessary for compatibility with CUDA's tex3D() and actually resulted in much cleaner code all throughout.
  • Add support for OpenCL and CUDA. In theory, this feature should work with these two (because the code compiles XD), but I have not tested it yet nor made comparison tests.

A Few Notes on CUDA Implmentation

In order to support CUDA's interpolation, I added a second version of create_sparse_grid() that pads every tile's 6 faces with their voxel neighbors. However, this would result in a massive increase in memory usage, which depending on how dense the volume is, may not be worth the increased lookup time. A quick space optimization (thanks to Brecht for the advice!) is to remove padding between adjacent tiles in the sparse grid if they are already neighbors in the real image.

Another quirk of the interpolation is that tex3D() expects coordinates in the (x, y, z) coordinate format as opposed to a flat index for array access. While we could convert the offsets into cartesian coordinates, a more efficient method would be to simply store the coordinates of the first voxel in every tile. While this triples the size of grid_info, the lookup savings from not having to compute coordinates from the index should be worth it.

Memory and Speed Optimizations

Aside from the above, I also tried to make optimizations here and there to speed up rendering or reduce space usage further. Unfortunately, there will always be a speed penalty since extra calculations are required to get the sparse space coordinates. Some optimizations made:

1. Created quicker copies of compute_index*() in the kernel_*_image.h files

While having several lookup functions increases maintenance, implementing directly in the kernel files saves a significant percentage of time. I will keep one copy of compute_index*() in util/util_sparse_grid.h for use outside the kernel and as a reference point for the kernel implementations.

2. Stored some resolution info of the image in tiles in device_memory and TextureInfo

The resolution of the image in tiles as well as the dimensions of the last tile in the grid are needed for lookup calculation. I implemented the above to avoid having to recalculate them with every sampling. Thus, more variables have been added to device_memory and TextureInfo. It may be better to implement a wrapper class / struct for sparse device_memory / TextureInfo, but I don't think there is a way for tex_alloc() and TextureInterpolator to detect if the passed device_memory / TextureInfo is the wrapper aside from using dynamic cast.

3. Reduced sparse grid memory by truncating the width/height/depth of the last tiles in each direction if the length is not exactly divisible by TILE_SIZE

Consider a 9 x 9 x 9 volume, which would require 4 tiles to cover. Using only 8 x 8 x 8 tiles, the sparse grid would be 8*8*8*4 = 2048 voxels large, as compared to the original grid's 9*9*9 = 729 voxels. By implementing 2 checks in the voxel lookup, we can reduce the wasted space significantly by truncating the tiles at the end of the grid. However, these extra checks do result in an increase in lookup time.

4. New dimensions paramter

While I originally made the kernel calculate for the truncated end tiles and discarded padding, this just resulted in significant slowdown in rendering. A better solution was to store the tile dimension information the same way tile indexes are already saved (as device_memory), and since all of the information is boolean, they can all be stored in the bits of a single int per tile and retrieved later through bit shifting.

To-do next week

I think that beginning next week I can start working on creating the OpenVDB import function. An easy start would be adding a UI option for it, butI will have to read up some more on the background of the problem before I can say for certain what will need to be done for this feature.

For tiling, there may still be more optimal ways of implementing the tile lookup. I will continue to modify the algorithm if we can come up with more ideas. I am open to any suggestions on how to improve lookup in particular, especially for OpenCL and CUDA as I am still not very familiar with them.