Revisiting Random Encounters

Prototyping Dialogue with Google Textual content-to-Speech

Posted on

For a narrative-driven sport like Arctic Awakening, it’s truly very important for our workforce at GoldFire Studios to quickly get a extremely really feel for the transfer and pacing of a scene. We do this some time sooner than we really head into the recording studio with our voice actors, so we wish some kind of placeholder to fill in for the precise recorded dialogue.

Builders have quite a lot of selections for placeholder dialogue, along with timed subtitles and scratch audio recorded by programmers (the aural equal of “programmer art work”). We tried every of these early on in development sooner than deciding on a workflow using Google Cloud Speech, which has proved a significant time saver and given good outcomes. Greater nonetheless, our use case has match all through the product’s free tier.

The Textual content-to-Speech product is a Cloud API which delivers passable-sounding voice clips from the textual content material strings you current. You presumably can go completely different selections along with the content material materials itself, specifying a language, one amongst quite a lot of presets for the character of the voice, and a gender. The language code permits for numerous accents as successfully, for instance American, British, Indian or Australian English.

We already had a database with the knowledge we might have preferred to get started (the highway itself and the character who acknowledged it), so plugging that into Google’s API was comparatively quick and painless. This can be a snippet of code from our dialogue administration platform, StoryDB (which I’ll converse additional about in a later submit), which is just a simple Node.js web server:

// Configuration for which voice goes with which character.
// Guidelines of voices on the market proper right here:
const voices = {
  Alfie: {languageCode: 'en-US', establish: 'en-US-Customary-I', gender: 'MALE'},
  Kai: {languageCode: 'en-US', establish: 'en-US-Wavenet-B', gender: 'MALE'},
  Donovan: {languageCode: 'en-US', establish: 'en-US-Wavenet-J', gender: 'MALE'},
  ATC: {languageCode: 'en-US', establish: 'en-US-Customary-G', gender: 'FEMALE'},
  default: {languageCode: 'en-US', establish: 'en-US-Wavenet-F', gender: 'FEMALE'},

  .then(() => fs.ensures.mkdir(`static/clips/${projectId}`, {recursive: true}))
  .then(() => getLines(ids))
  .then(async(ls) => {
    const generateLine = async(l) => {
      const enter = {textual content material: l.caption};
      const voice = voices[l.character] || voices.default;
      const audioConfig = {audioEncoding: 'LINEAR16', speakingRate: 1.25};

      // Perform the text-to-speech request and write the audio content material materials to file.
      const [response] = await textToSpeechClient.synthesizeSpeech({enter, voice, audioConfig});
      const writeFile = util.promisify(fs.writeFile);
      await writeFile(`static/clips/${projectId}/${}.wav`, response.audioContent, 'binary');

    await Promise.allSettled(;


And here’s a sample of what is going to get generated:

From there, we leap into our sport engine (Unity on this case, nonetheless this might work with any engine) and run a script which automates importing the highway metadata and the audio clips from an API endpoint we organize. At the moment, the clips and subtitles are ready to be used in a scene! As quickly because the exact traces are recorded by voice actors, we merely swap out the recordsdata and re-import, with the traces already utilized in-game.

We’re truly happy with the outcomes, and we would undoubtedly encourage completely different builders to current it a try if using voiced dialogue. These recordings won’t be acceptable for launch usually, nonetheless when engaged on an indie worth vary, being able to quickly and easily prototype your dialogue strategies typically is a large win with no up-front worth involved.