Beowulf Mocap Postmortem

It’s certainly some time after the event, (it’s slow going when you’re in the middle of a full production), but I’ve finally collated my remaining notes from this year’s Game Developers’ Conference that relate to animation and characters in games. So to start off, we have the head of R&D on last year’s landmark film featuring virtual actors, followed by a trio of Japanese developers giving insight into their approaches to animation and character development.

Sony Pictures Imageworks: A Believable Character Postmortem: Motion Capture on the Virtual Set of BEOWULF

Parag Halvadar – Lead R&D Engineer

Hailing from the same studio that created Monster House, Halvadar’s talk concentrated on facial motion as that’s a recent topic for games industry. As is often the case with movie industry approaches they couldn’t directly be recreated for use in a game development situation, but nonetheless provided an interesting insight into some of the lengths that must be gone to in search of the (some say, false) holy grail of truly photo-real virtual characters.

The first portion of the talk involved simply tallying the vast amounts of data, equipment and effort used in the production:

260 Vicon MX40 cameras were used synchronously to record motion.
Body, facial and hand motion were captured simultaneously.
An Electro Oculograph (EOG) was used to record eye-tracking.
20 actors could be captured simultaneously.
Actions were captured in a 55x55x25ft volume.
81 actors were tracked over the course of the movie.
4 horses.
1 pony. (Only one?!)
46 days of shooting.
250 props made and captured.

The second portion detailed the methods required to bring the faces to “life”. It must be said that, despite often firmly entrenched in the Uncanny Valley as is always the case with attempts to simulate realistic facial motion, Beowulf has done the best job yet at providing real glimpses of coming up the other side. The tallies continue:

4 layers of face rigs.
3D facial models did not match actors faces in a 1:1 ratio, (Ray Winstone in particular), causing lots of marker-swapping.
Adhered to FACS (Facial Action Coding System), to recreate all the muscles of a human face, totalling around 60 facial expressions (including head motions), with 16 different phoneme shapes.
Face poses were created from combinations of weights of a smaller set of basic poses.
Motion-capture values were run through a script to find the closest match with the facial expressions and were replaced with blendshapes.
The EOG recorded horizontal and vertical eye movement, saccades and blinks via and eyepack on back with electrodes by the eyes to detect eye-muscle movements.

It should be noted that, while in my opinion the face of Angelina Jolie was the most successful and consistent in quality throughout all shots involved, Halvadar explained that hers was the scan that was deliberately adjusted the most to form an exaggerated impression of how we picture her, backing up my belief that realism is simply just not realistic enough when it comes to artistic endeavours such as this. This was also apparent in the scene of her naked, gold-dripping body emerging from the water – something which he felt the need to show several times over and was also manipulated drastically due to her pregnancy at the time of shooting.

In closing, it was most interesting of all that Halvadar’s decision to show each scene step-by-step revealed that every shot only achieved the final visual quality after a final pass was made by an animator working with video reference of the original scene, begging the questions as to why go to the bother of all the technicality when that process could be done from scratch with presumably similar results.

If absolute realism in games still is your thing, then you may wish to investigate the work of George Borshukov at EA and his Universal Capture (UCap) method. Proven in The Matrix trilogy and Tiger Woods tech demos this really is something to watch, especially since its optimisation for real-time implementation.