利用者:AlexK/Gsoc2013/report
目次
Weekly Reports
Week 1
This week I did mainly prerequisite code and proof of concept
- OpenCL
- Created CLDM (OpenCL device manager). It now supports only basic functions, but it will include device information and user UI for selecting multiple devices. We probably should migrate compositor to this later.
- Extended clew library (glew for openCL) to support cl-gl transfer. (Note: We are using clew for cl 1.0 and this lib seems to be abandoned)
- Made proof of concept for OpenCL filter with the transfer to GL texture inside of gpu (without copying)
- Wrote up schedule design: Scheduler. Needs Approval!
- Started working on frames' manager. We now can create (test) textures few frames before to be used later.
- We would be able to support multiple "players" and outputs (like 2 for stereoscopic)
- Did a little more grading tools research
Week 2
- Devices init
- cpu
- gpu
- easy program building and kernel creation (and management)
- some cl devices share same cl platform (for program builds)
- Created buckets system
- tested on single thread only
- Created basic image storage
- Needs to be extended to support multiple formats + cpu<->gpu
Week 3
This week was not very productive. I didn't feel well for couple days. Plus, we had holidays. However, I will make up for the lost time next week.
- Dispatcher system (devices, buckets, image storage
- Almost completed
- However, the testing showed leaks/access of freed memory as a result of design flow
- Redid with better reference count design (to free buckets, data, frames when all references are gone). The code is almost ready except for a few functions.
Week 4
- Dispatch system is almost finished
- for now, it is limited to one device due to limited (handmade) testing cases
- Reviewed strip code
- Sadly, it is directly tight in with rendering code
- Worked on strip code
- Created a dependency graph (more like a tree for simpler cases)
- Should allow more flexible system (effects = modifiers, blend = blend effects)
Week 5
- Finished dependency strips graph
- Migrated some code to C++
- Code became cleaner and easier to follow
- Still need to port devices, and fix up memory clean up
- Created parts of bucket generator
- Creates list of commands from effects list in strip dependency graph
- But still has a lot problems
- I'm trying to create very flexible system, but it becomes very hard to debug
- Some minor changes for dispatch system
Week 6
I feel that the project starts to come together. I got video with an effect working. The project should speed up as we have engine working and now changes will be visible for user. Here is the video: http://www.youtube.com/watch?v=iDGz8p4oEVQ
- Finished bucket generator
- Resolved issues with special cases
- Resolved not-freeing issue
- Started writing gpu kernels for effects
Week 7
- Over the weekend I cleaned up and made C++ port
- With well defined classes and better file system
- As Peter pointed out, the current code was very C-like
- seqDeviceCL
- Fixed racing condition
- Implemented some common functions, better structure
- Implement almost all color filters/blends to gpu
- RNA and DNA changes
- The curve modifier is still a work in progress (as more complicated).
- Glow was left for later as it requires changing the size
Week 8
- Implemented resampling algorithms on GPU
- Nearest neighbors
- Bilinear
- Bicubic
- Added very simplistic manipulation from view
- Doesn't take into account relative size/position yet
- Implemented Canvas
- The transformation propagates to the first input
- Still it has weird problems if the output is used more than once
- Looked into cpu vectorization. The cpu probably 4 times slower that it is supposed t beo. There is no universal solution for compilers that I found so far.
Week 9
- Finished canvas and transformation
- We can move all transformation to initial input
- Scale and rotate input to match final output resolution
- Initial integration between view and render engine
- Still needs more cleanup as wm system doesn't have multithreading in mind.
- Changed direction how buckets are created
- Allows for more natural creation: applying transformation before size dependent effect (glow) and splitting a bucket into two if no device can execute all commands
- Implemented histogram and wavescope
- Still I need to fix up different modes
- And needs to be optimize for OpenCL with local mempry
Week 10
- Finished scopes (histogram, waveform, vectorscope)
- Created memory pathways
- Between char and float
- Passing original char to gpu and making conversion there is much faster.
- Added cl_mem to RAM conversion. However, it introduced a lot of crashes. I need more time to figure out what is wrong.
- Code reorganization and clean up.
- Just started implementing cache.
Week 11
- Did a lot of small fixes and code organization stuff.
- New functions made command code cleaner and compact
- Implementing new float and char classes for CPU
- Support for vector operations (not finished yet)
- Supports clamping, proper addition of chars, etc.
- In the future we can apply SSE2 optimization for float vectors (up to 4 times faster)
- With char, it is tricky . Ideally, we can speed up to 8 times faster with cheats, but we simply cannot load just 4 char or shorts to the register.
- Added support for slice execution on CPU as I described earlier. Real speedup for just single frame rendering.
Week 12
This week wasn't productive. I only had Friday to fully work on the project (due to university). I try to catch up over the weekend. I will post the updated report and videos then.
- Did effect fixing to new standard and quality format related bugs.
- Blender's transformation poorly handles the edges. I'm still working on it.
- Worked on Windows related bug. C++ has leaks, and it is hard to test if Blender terminates correctly after the changes. Developed 'new' operator's macro, but it is not compatible with MEM_CXX_CLASS_ALLOC_FUNCS.
Week 13
Sadly I had only Friday due to the university. I will continue working over the weekend.
- Changed how effects add commands with inputs into a bucket. This should allow more precise memory management.
- Had a one bug size related bug which really slowed me down.
- Updating effects/commands for the more standardized execution
Week 14
Sadly I had only few days to work on this project this week.
- Fixed modifier - one input effect issues
- Was cleaning out effects code
- Added some code optimization (memory copying)
- Working on SSE2 integration
Conclusion
In the remaining few days I will update the documentation, upload the rest of the code and demos.
SSE2 optimization is coming along nicely. I was thinking of porting the slightly modified seqCommand to the trunk when the code is ready. With that code we can unify the float/char implementations, multithreading, and have code in individual files. That way the merge of this project later on will be less painful.
Sadly I overestimated my abilities for this project. I lost time by reimplementing the same code due to the interface optimizations. Although the VSE got a lot faster, some parts in current VSE engine aren't implemented. These features will be slowly added in the next few month. Hopefully, by the spring this project will be trunk ready.
P.S.
Next week the university starts. The first week will be crazy. So, I probably won't be able to devote the full time next week. I will focus on getting VSE fully functional by the end of GSoC. There a lot of extra things to be done for me after GSoC, like UI changes and cache/proxies support.