My most-anticipated talk of GDC 2016, Simon Clavet was our physics mastermind on Assassin’s Creed III and began initial explorations into this technology immediately following the completion of that project so I was keen to see how his work has progressed. Even years ago the technology was promising, immediately affording an unprecedented fidelity matched with an usually contrarian high degree of response and control.
I expect For Honor to be a great showcase for Motion Matching given the freshness of its approach to combat and it was fun even in prototype form years ago. The game looks beautiful and Simon’s videos leave me confident it will be a stand-out example of animation in Ubisoft’s tradition, no less because animation director Khai Nguyen is one of the strongest at the Montreal studio.
Currently there are 4 or 5 projects within Ubisoft’s worldwide studios using motion matching, affording Simon colleagues to compare and bounce ideas off including Ubisoft Toronto’s Michael Buttner and Montpelier’s Xavier Lemaire.
Introducing For Honor with a video, the team want it to be “The Call of Duty of Melee Combat”, citing the Street Fighter series as another key influence for their desire to create multiplayer combat with precise controls. The game is now skill-based to the point that Simon has no chance playing against testers who have mastered the game over the course of development.
To showcase the technology Simon followed with a video of where they are now, highlighting the unique movement of a female knight sharing absolutely zero animations with the male samurai. Both characters had a uniqueness rarely seen in animation-intensive games requiring similar move-sets, immediately highlighting one benefit of the technology. Finishing with an example of the build-up to a fight, Simon described the phase of ‘dancing’ around opponents with stance switching before combat as ‘where the magic happens.’
In The Beginning
Giving context, the presentation took us through a brief history of animation systems:
Play Anim – An animation is played. The original no-frills approach.
State Machines – Animations change as the character changes state.
Blend/Decision Trees – Blends come into play driven by state or input parameters.
Bone Masks – Upper-body only etc. Anims played only partially on characters.
Blend Trees per state – More complex version of stage 3.
Parametric Blends – Slope/strafe angle/speed etc. Blends must look similar.
Lists of Anims – highlighting the management required of these systems.
To illustrate the levels of complexity we are currently encountering, Simon, ever the animated programmer, acted out an example of a unique animation that could only be described as Start-Strafe90-TurnOnSpot45-Stop, highlighting the difficulty of categorising and maintaining each and every unique type of motion as required by traditional systems.
A decade ago I attended a lecture in San Francisco by (then) academic Lucas Kovar, whose papers have influenced my and others’ ideas on the various forms of animation blending we’ve been using in the ensuing years. One of the ideas presented at that time was the unstructured list of animations, (massive sets of unedited mocap data automatically marked up to best find transitions and blends), which was simply infeasible at the time for video games given the memory requirements. As we transitioned into the current generation of consoles Simon recognised it as one potential avenue for next-gen animation systems.
His first task was to figure out how to choose the next anim, with his instincts drawing him towards machine learning. Due to the requirement for many connecting motions between the data to afford the best control, a dense graph of interconnected transitions was sought after.
“Our representation organizes samples of motion data into a high-dimensional generalization of a vector field which we call a motion field. Our run-time motion synthesis mechanism freely flows through the motion field in response to user commands.”
Roughly translated into English, this essentially means that the main problems with motion-fields are, in Simon’s words, that ‘the equations involved are scary‘. He side-stepped this complexity by allowing his system to simply ‘jump to any frame whenever we want.’
With that covered, the next problem was how to choose the correct start-frame. He posited the following criteria:
Precise end-position matching?
A mix of all of the above?
Ultimately, the solution was described as ‘a ridiculously brute-force approach to animation selection‘, evaluating the best match for:
The character’s current situation.
The motion that takes us where we want.
A rarity for programmers, Simon not only attended but directed the exploratory mocap sessions for the proposed system, showing us an excerpt of ‘5 minutes of guy stumbling randomly around a mocap volume‘. (And if I recall, he suited up for the very first experiments).
The initial results of parsing and automatically deciding which sections of mocap to play on the in-game character were shown in a video with animation-blending disabled, highlighting the frequency of mocap segment selection. As the character ran around the prototype area he was always switching the desired animation segment to rapidly keep up with the player’s controller input. Two coloured paths run out in front of the character, with a blue (currently dictated by the chosen animation) trajectory trying to match a red (desired input) trajectory.
As mentioned earlier, multiple characters don’t share animations. This is possible due to the large anim budget as manual work to edit the animations is now less of a consideration. Simon said the mocap used for a move-set is now limited only by the stuntman’s energy on the day of the shoot, (to which end he suggested capturing fatigued movements towards the end of the shoot).
As used elsewhere by the Toronto team’s own explorations, a mocap ‘Dance-Card’ lists all the actions required for a succesful mocap shoot. Stuntmen must provide the following actions to cover basic movement:
1. Walks and Runs.
2. Small repositions.
3. Starts and Stops.
4. Circles (Turns).
5. Plant and turn (foot down to change direction) 45, 90, 135 and 180 degrees.
6. Strafe in a square (forward, left, back right – contains 90 degree plants).
7. Strafe plants (foot down though not turning) for 180 direction shifts.
From there, Simon applies a cost-function to evaluate which section of mocap should be chosen for the desired action, essentially applying a value to each movement and looking for the cheapest solution to get from A (current pose) to B (desired pose), much like AI decision-making.
The ‘tricks’ he learned for comparing the current and candidate poses are:
Match only a few bones.
Match the local velocity.
Match feet positions and velocities.
Match weapon positions etc..
He uses around 10 factors in total but stressed you don’t need to find the exact same phase for the feet placements as current systems do with regular walk/run blending etc. For example, a turn-on-spot has lots of foot-shuffling so ignore the feet for those movements. All decision-making for optimum blending is done offline, pre-computing all this metadata for speed at run-time.
With the pose-matching portion of decision-making done, the next task was how to maintain a desired trajectory. To calculate this, one must check where an anim brings you if you play it, using the following determining factors:
Each of these steps from the lists above that are not matched add to the compute cost, making them less viable. This allows the system to hone in on the best sections of mocap to jump to for optimum results. But optimum doesn’t just mean smoothest transitions when video games are concerned because fidelity and response are competing factors. Extended delays before jumping to the desired action can will naturally result in control lag.
Simon solves this by implementing a simple slider between realism and comfort, where adjusting this will change the decision as to where to select mocap transitions, prioritising fidelity vs immediacy:
Realism VS Comfort
The slider is easy to tweak if motions are captured correctly via the dance card, and he assured us that a desirable balance can always be achieved, leaving both animators and gameplay designers happy.
To aid the selection process, animators mark-up the long mocap takes with specific events like stance-changes. In the case that no perfect match for a transition can be found, he just blends between the anims. With this method, Simon joked that even the day before ship the team could capture more transitions if required and feed it into the automated system.
Regarding optimisation, one of the first questions any sane developer will raise is the memory requirement of such a mocap-intensive system. Simon declined to go into detail because that subject is big enough for a future talk of its own, but did state that aggressive (Level Of Detail) LODing of characters avoids updating distant characters every frame – resulting in the system running efficiently on over 100 characters on screen simultaneously.
On the subject of trajectory choice, he explained that every engine chooses to move a character in the world by taking the displacement from the animation itself or from a simulation that will drive the animation. In For Honor, the latter approach is used with code deciding the desired trajectory – animation is ‘a cosmetic detail on top’. As such, the trajectory animations are chosen to match a simulated point in the world, essentially a spring-damper on the desired velocity. While a delay is present, (around 1-metre for stopping and a ramp up from start), the consistency of control is what makes the delay acceptable to the player – something I too have come to learn over the years.
“The goal is not to be as responsive as possible, the goal is to be as predictable as possible.”
Simon clamps the entity, (character’s centre-of-mass), to 15cm around the simulated point. One offshoot of this approach is that it allows the character to predict obstacles such as walls and ledges, causing them to react to obstacles and select mocap that might avoid them.
The animator’s job consists of an intital tweak of the mocap, importing into the engine and marking it up with events. The game-logic, (rather than the animation), still works as a state machine, with clips representing the logic of the game such as a branching combo in combat. The video below shows the process of adjusting the timing of attacks in the state machine.
In order for this to be possible the animator must first place events on the animations. Importantly, they don’t chop anims or create cycles, (the biggest time-sink of traditional game animation pipelines), instead marking only points where they ‘want this event to happen exactly at this moment’. Not at the beginning or the end of a move, but at the most significant parts such as the sword connecting with the target.
The system doesn’t have just a big flat (unstructured) list of animations. They are instead grouped in similar sets with tags such as Tired or Heavy Attack. Other variables that an animator might need to specify in order to make them easy to parse are Stance, Pose, Range or Type of attack as well as outcomes such as Block, Miss, Parry or Hit wall.
When a designer sets the speed of the character it will blend from the walk to run mocap, and when adjusting the range of an attack it can switch which anim is played in a combo. While this thought sounds scary, animators are comforted by completely owning complex two-character moves such as throws and finishing kills that will have less requirements on timing. In this case the animations drive the displacement of characters to keep them in sync.
Due to chosen mocap trajectories rarely being an exact match to the desired position, a rotation correction is applied over time to get the exact orientation. Minor procedural upper-body adjustment is also applied when changing combat targets while strafing. To adjust for speed Simon suggests that designers shouldn’t timescale anims, (the bane of game animators everywhere), more than 10% faster or 20% slower. Instead, slide the characters and use foot IK to relieve foot-sliding.
Switching animations rapidly always causes foot-sliding regardless, so Simon fixes this by locking the toe at run-time with a socket on the ground matching the toe’s position in the animation. IK was also implemented for slopes, pulling the hips down to connect to the lowest foot on slopes and stairs. Importantly, dropping the hips must never break the pose or hyper-extend the knees, preserving the animator-created silhouettes.
To avoid dealing with sword IK to match swords on slopes, he used the same solution as AC3 by simply pitching the characters’ spines – keep whole upper-bodies in sync to solve the height differences. The final results below are impressive:
Simon’s future thoughts on corrections and pose-matching involve better automated mocap selection via ‘interaction partners’ such as bones or surfaces, giving the examples of selecting mocap in a sports game based on the distance between the position of a football player, (or even his leg bone), to the football or a hockey player to the puck.
Motion-matching is not a technology but instead a ‘simple idea that helps us reason about movement description and control.’ Animation data declares events in the mocap, gameplay declares what it wants, and in the middle the matching system finds the best animations to match the desired goals, with three main advantages:
High quality: will preserve details from the mocap stage.
Controllable responsiveness: responsive and tweakable for gameplay.
Minimal manual work: an unintentional side-effect, (not the goal), but teams can spend their money elsewhere. Essentially reducing the time it takes to get from the mocap stage into the game.
For Honor looks set to be the first commercial production to use this new and exciting animation technology so the final proof will be in the near future. In closing, Simon left us with this thought:
“Let’s just mark-up the mocap with the inputs that should trigger moves… and generate the game automatically.”