利用者:NeXyon/GSoC2012/Proposals

提供: wiki
< 利用者:NeXyon‎ | GSoC2012
2018年6月29日 (金) 05:48時点におけるYamyam (トーク | 投稿記録)による版 (1版 をインポートしました)
(差分) ← 古い版 | 最新版 (差分) | 新しい版 → (差分)
移動先: 案内検索

Lip Sync

[Note: This proposal is basically the same as last year's, apart from some minor changes/deletions due to the general stuff that already got done last year. New sections got marked with [*]. I again apply with this feature as I think it's probably a not so general feature as 3D audio, but will definitely help professional users create animations quickly and as such is a more mature feature than 3D audio, which seems a bit toy like compared to this one.]

Synopsis

This project will improve audio in blender further by improving the current system and adding new interesting features. The main goal of this project is improving the lip sync workflow for users, by supporting with modern automation algorithms and a revised workflow.

Benefits for Blender

Blender 2.6 has a new audio engine and caught up with what it missed before. Now it's time to get up to speed and implement some state of the art lip sync features.

Proposal

[*] All of the points of the first part that have been here got done last year, so here's a new first step: Cleaning up the existing source of audaspace a little bit ('Spring cleaning'). This one should be rather short, as last years results were pretty nice.

[*] Furthermore: Originally I planed to implement phoneme detection algorithms on my own. But during the last year I learned a bit more about speech recognition and similar stuff and I'd also like to give existing libraries a try, like CMUsphinx:

http://cmusphinx.sourceforge.net/

So this step would include the evaluation of how existing open source libraries could be used to accomplish the target.

I already talked to a CMUsphinx developer and he told me that it's no problem to get the phoneme and time info of the recognition. I also already got pocketsphinx with audaspace running in a Qt test application: http://pastebin.com/1saq7KR8

Using a library for the phoneme detection should enable me to spend more time on the interface.

[Old stuff again:]

The second part is the lip sync target which includes the implementation of an automatic lip sync functionality. Here are some papers about the phoneme detection in audio files I'd like to implement:

User interface wise I thought about something similar to the lip sync interface of most common lip sync tools, which would bring back the audio window, with the wave display and phoneme editor. After talking to several artists and other developers it might be best to link the phonemes to the visemes via (driver) bones with a range between 0 (viseme completely off) to 1 (viseme completely on) with a specific name per phoneme.

So the workflow might end up like this: First the animator creates the viseme poses for the given (fixed) set of phonemes and driver bones to control them with a specific name per phoneme. Then he opens the audio window which is split into a wave display and a phoneme editor, he can now run the phoneme detection algorithm and then edit its result or directly start the editing (add, change, move, remove, set strength, etc.). The rig is then automatically animated via these phonemes.

I'd love to work together with an artist to create a really nice example of how this can be used and works. Maybe I can enthuse one of the Sintel artists (Lee, Pablo, ...) and use Sintel for this.

[*] New UI thoughts: I've talked to Tom (aka LetterRip) a bit and here. About every lip sync tool out there has a kind of similar UI. So you always have at least a timeline with the audio and phonemes next to it. For blender I think it might also be nice to have a spectrum analysis shown and also curves with the strength of the visemes. A vertical interface (most are horizontal) is also interesting and can be implemented too (so the user can choose what he prefers).

Another thing Tom showed me is the book "stop starring" with very useful info on facial animation. A part of it is coarticulation (phonemes/visemes influencing each other) which leads to phoneme to viseme grouping and collapsing. So for example there are three phonemes in sequence, the middle one is normally spoken with a closed mouth, the others with the mouth open, so you have the mouth open for the normally closed one too. How blender/the interface could help with this has still to be researched and for sure is easier when the basic UI and workflow already work.

Also interesting is the possibility of getting emotional information out of the sound file. This is also something that has to be researched in more detail and I consider this and the previous point as bonus tasks if there's still time at the end of the project.

Deliverables

  • [*] Spring cleaning (mandatory)
  • [*] Evaluation of speech libraries (mandatory)
  • [*] Using a library or implementing an own phoneme detection algorithm (mandatory)
  • Lip Sync Animation via Sound window (mandatory)
  • [*] UI and workflow improvements (mandatory)
  • Testing files (rig and sound files) (mandatory)
  • User documentation (mandatory)
  • [*] Coarticulation improvements (optional)
  • [*] Emotion detection (optional)

Development Methodology

Personally I prefer well documented and tested code, or let's say quality over quantity. As I have already been able to get a lot of experience in project management in the software development area, I know that it's very easy to underestimate the work to be done and how much time it takes. That's why the list of deliverables might look a little short and I categorize them into mandatory and optional targets.

Another thing that's very important to me is the user experience. I'll try to get as much people as possible to test my work and give feedback which I can build on.

Schedule

[*] I will start working right away on April 23 right after getting accepted during the Community Bonding Period, as I know the community already. As my university courses run until the end of June, I won't be able to work fulltime during this period, but after exams when summer holidays started I'll work fulltime on the project, just like the last two years.

Starting with the begin of July I'd like to start with the real lip sync work.

[*] I have no clue how long the implementation of the phoneme detection algorithm really takes, but I hope not longer than half of that remaining period starting with July, because I guess the UI part will be quite some work to do. If a library can be used that should be easier and from a first evaluation/use of CMUsphinx this already looks promising. Anyway, you should already know that I prefer to finish things I start, so even if GSoC was over I'd finish the project.

Why ...?

  • ... I want to do this project: I've already worked on blender audio, this would be a next huge step in this area and adds yet another important feature to blender.
  • ... I am the best for this project: I simply know blenders audio code best.
  • ... I'm interested in Blender: Coming from the game development area I found Blender being an awesome 3D tool I will never want to stop using.

About me

My name is Jörg Müller, I'm 23 years old, from Austria and study telematics at the Graz University of Technology. You can contact me via mail nexyon [at] gmail [dot] com or directly talk to me on irc.freenode.net, nickname neXyon.

In 2001 shortly after getting internet at home I started learning Web Development, HTML, CSS, JavaScript, PHP and such stuff. Soon after that in 2002 I got into hobbyist game programming and as such into 2D and later 3D graphics programming. During my higher technical college education I got a quite professional and practical education in software development also including project management also doing projects for external companies. During this time I also entered the linux and open source world (http://golb.sourceforge.net/). Still into hobbyist game development I stumbled upon the Blender game engine and was annoyed that audio didn't work on my system. That's how I got into Blender development, I am an active Blender developer since summer 2009 now, where I mainly worked on the audio system.

In my spare time apart from that I do sports (swimming, running, other gymnastics), play music (piano, drums) and especially like to go to cinema.

  • 09/2003 - 06/2008 Higher technical college, department electronic data processing and business organisation
  • 07/2008 - 12/2008 Military service
  • 01/2009 - 02/2010 Human medicine at the Medical Unversity of Graz, already doing curses of Telematics at the Graz University of Technology
  • 03/2010 - today Telematics at the Graz University of Technology