Cinematic Dialogue In The Witcher 3

The first of my notes from GDC 2016, this was a triumphant presentation because while The Witcher series began on a licensed version of BioWare’s aurora engine the in-house engine used for the third installment has in my opinion surpassed the latest Dragon Age in terms of visual fidelity. Piotr Tomsinski introduced himself as animation technical director, but was quick to give credit to the whole dialogue team comprising animators, programmers, design and QA.

He described The Witcher 3 as a non-linear story-driven RPG, where dialogue is the main tool for presenting story. Cinematic dialogue builds a connection with the players, where a gesture in close-up is better than a line of dialogue.

Cinematic Dialogue

To this end the team created over 35 hours of dialogue content. With only 2.5 hours of cutscenes that left 1463 systemic dialogues in total. On average, one cutscene required a full mocap session so cutscenes were used only for unique and special situations. The sheer volume of dialogue required for the game necessitated a fast pipeline for creating dialogues as swelling the modest animation team size wasn’t an option.

The Witcher 2 dialogue was comparatively simple but the team wanted The Witcher 3 to visually compare with the cutscenes. To illustrate this, Tomsinski showed a dialogue within a dance cutscene and only added notations helped divide what was cutscene and what was the dialogue system due to the dialogue’s use of dancing animations as idles and dynamic cameras moving with the action.

From here, the talk was broken into the 4 stages of dialogue creation to frame the critical development of the Dialogue Editor:


1. Writing
– Script written and split into story choices

2. Quest Design
– When to play?
– Who are the actors?
– Where to play?
– Decisions have impact on the game world

3. Dialogue Design
– Designers are like movie directors
– Timeline tool
– Characters
– Shots
– Items
– Lighting
– Morphs
– Weather

4. Post Production
– Polish
– Corrections

With the first two covered briefly, the third stage of the pipeline was where most of the presentation focussed. Beginning with an example of adding a look-at from within the dialogue editor, (essentially a track-based visual editor much like a video-editing package), Tomsinski was also able to change the duration and the blend in/out – but that was only the beginning…

Dialogue Design

A realtime preview was powerful and essential, displaying the ability to move characters around the scene easily. The entire scene was previewed alongside the editor but could be played in the world location with just one click.

The process for creating a dialogue began with setting the lines of dialogue. There didn’t appear to be any specific tool for handling branching narrative though one would expect something custom was required – perhaps the topic of another talk.

Once the dialogue was set and the voice-over recorded, the “Generator” auto-creates an initial pass from the VO, leaving some less-important dialogues hardly touched. This stage was nearly identical the approach we took for the original Mass Effect where it is impossible to handle the sheer volume of narrative by hand, and creates a good worst-case whereby the scenes still get done even if time runs out. But whereas we referenced the dialogue text, The Wither 3 system appears to generate based on the voice-over audio file.

The Generator

Auto-generated cameras were created by obeying some basic rules such as the 180 rule and adding establishing shots etc. At this stage, all markers to trigger camera-cuts and an initial pass of idles and gestures were drawn from parsing the voice-over audio. To show the power of the generator a video demonstrated how quick it was to simply re-run the generator for a different result If desired. Once an initial pass was laid out, pre-made camera shots could be swapped out by simply selecting from a menu.

Animation-wise, idles were chosen from a pool of 35 variations per male/female/dwarf skeleton-types, plus sitting variants etc. Idles were grouped by emotional state or social status, which could be fed into the generator.


Pose-based look-ats were created by animators to handle special case idles where a standard look-at wouldn’t work such as sitting or hands leaning on something. Tomsinski showed a mesh around the character that represented the direction used by each pose, which updated as the character turned. The aim always blends between 3 poses, matching the verts making up the polygons of the mesh. Interestingly, we use a near-identical debug view for pose-based look-ats at Naughty Dog.

Look-ats could be created to comprise the entire body, just the head, or the eyes only. This system was not too time-consuming for the animators to set up and the results were impressive, showing the character Yennefer in a seated reclining pose leaning on her arm always following the target without her hand sliding. Leaving the authoring of look-ats to the animators freed up programmers for other tasks.

Animation Sharing

In total there were around 2400 dialogue anims, and re-use and sharing was supported across different characters, (human anims were shared across characters as varied as werewolves), by:
– The ability to convert gestures to additive so they work across varied idles.
– Using them as overrides to kill the animation playing underneath.
– Using bone-masks to play only on relevant body-parts.
– Adjusting the blend-weight of the gesture for additional variety.

Additive gestures worked well on different idles when using bone-masking to fix issues. (Additive can sometimes get undesirable results when the idle underneath strays too far from the base pose used to generate the additive).

Realtime Editing

Inspired by Valve’s Source Filmmaker, custom poses could be created from inside the editor by modifying animations in realtime. This was used mostly for turning the head and fixing fingers etc. An example was shown of modifying a pointing gesture’s direction to match characters that were not standing directly in front of one another.

Pose keys could be added on layers to make corrections to gestures such as arms cutting through larger/rounder body-types etc. Even more impressive, run-time facial controls could be used to fix or punch-up generated facial animations for important close-ups.

Real-time placement and attachment of props as well as syncing animated props was the final feature illustrated, which when combined with the others created quite a complete editing package.

The complete process was:

1. Starting with an empty timeline with voice-over only.
2. Generate first pass of acting and cameras.
3. Modify cameras, idles and gesture animations.
4. Final polish, facial and pose corrections.

As such it could only take a few minutes to do a simple scene. To follow up with a demonstration of how powerful the runtime animation was, Tomsinski showed an example of modifying an existing gesture so much as to make it virtually unrecognisable.

To create a new gesture of passing a crossbow between characters, his process was to find the closest anim, spawn a crossbow attached to the hand, and adjust fingers to hold crossbow. Clever camerawork was used to hide the more complicated aspects of moving an object from one character to another, with it happening offscreen but still appearing seamless in editing. This example also illustrated how the system could be scaled for more complicated dialogues requiring complex unique actions.

Post Production

First looking at lighting, he explained how custom lights create mood – something difficult to predict in a 24 hour day/night cycle. To solve this, lights adjusted with the time of day/night. The lighting values in the weather editor were set by a lighting artist, illustrated by 9 panels of various lighting conditions that all worked equally well. This was because the Global Light Value from the weather editor affected the custom light settings in a graceful manner.

Visual effects were also set in the timeline, though weren’t discussed at depth.

Problems And Conclusion

Localisation created headaches with the dialogue editor due to every line of voice-over having a different duration in the variety of supported languages. To overcome this the timeline elegantly scaled to match different languages, where events would trigger at a percentage duration of the timeline to scale with it. In the given video example, in English a scene that lasted 12 seconds played seamlessly in French at 14 seconds in length.

As with other open-world games, the problem of ‘Deterministic Dialogue in a non-deterministic world’ would cause random NPC movement through scenes until a ‘Deny Area’ was placed around dialogue for NPCs – an identical approach we used in Mass Effect and Assassin’s Creed.

To cover the issue of entering dialogues from a horse, (where all dialogues assumed the player was dismounted), a systemic force-horse-dismount cutscene would always preface the dialogue if approached while mounted.

Tomsinski finished with a dialogue video with each of the various elements created by a dialgue designer annotated to highlight look-ats and camera interpolation etc. He summarised the Dialogue Editor as a success. In the end they didn’t require an army of animators and are using this system going forward for their next project, Cyberpunk 2077.

UPDATE: Tomsinski has uploaded all videos included in the talk to this playlist.