Another small post, this time on a different approach to morph-target lip-sync.

Di-O-Matic: Efficient methods for creating lip-sync blend shapes

Laurent Abucassis – Founder: Di-O-Matic

A considerably lower-key affair than Halon’s, this talk revolved around a demonstration on how to make phonetic mouth-shapes for lip-sync via blend-shapes (or morphing), something that, while offering more control over mesh deformation than simple bone positions, can be quite a pain to actually create and maintain the multitude of models required to create a blend-shape list.

While it did turn into something of a product pitch towards the end, the educational portion of the talk began with pointing out the first mistake most animators make when creating lip-sync for the first time, whereby they try to for shapes for every letter. However as Laurent said, “A letter is not a sound”.

He went on to advise that you should lip-sync what you hear, basing his observations on the following research:

  • Phoneme shapes are what most lip-sync animation is taken from, and are generally considered to be the smallest building block on which lip-sync should be generated. Phonemes represent the sound that we pronounce. This is in keeping with current beliefs regarding lip-sync that we should concentrate on the simplest sounds that each word makes. However, his research has brought him to the conclusion that it is in fact Visemes, those being the basic mouth positions required to voice these sounds and therefore an even smaller building block from which to take.
  • Quite how his procedural generation finds these based on an audio file (which offer phonemes only) is intriguing, but Voice-O-Matic did not appear to give superior results to a Phoneme-based extraction method, so the Viseme observation may be for animator reference only.
  • He also referred to what he calls the “Thunderstorm Factor”, where the speed of light vs sound requires that you place your lip-sync poses 1-3 frames ahead of where you’d expect them to be based on the audio – a questionable proposition based on experience but something that is quite harmless to try when working with animation curves.
  • Richard Williams has a handy technique for deciding when and when not to animate the jaw opening – put your hand under your chin when speaking the line to feel the absolute necessary moments required to sell the action.
  • Finally, he touched on the idea of “Sticky Lips” – the method of adding a delay while opening the mouth to give a realistic feeling of soft lips uncompressing. This would presumably work nicely when the mouth closes also.

When giving his demo, he listed the following shapes required for the most basic lip-sync:

  • Vowels: A | E | O | U/W
  • Consonants: B/M/D | S | Ch
  • Tongue: L | Th/D | N
  • Teeth (optional): F/V

The meat of the presentation though was the demonstration for quickly creating multiple morph-target models. Beginning with just 4 base models, (mouth neutral, mouth open wide, mouth “O” shape, and finally a wide toothy smile), he used combinations of these 4 to create all of the shapes above within just a few short minutes. Additionally, he used the “mouth open wide” morph slider to hand-tweak adjustments inside the mouth, easier facilitating access to the tongue that can often be quite tricky.

What was especially interesting about this workflow was that changing one of the 4 base models had a recurring effect on each of the created faces, due to them retaining the modifiers of the originals. This makes management of the large number of models for each face plausible for any multi-character project, that’s assuming morph-targets are a viable method for facial animation on any project displaying more than just a few characters at a time.