Sound synthesis technology, particularly speech synthesis, has become much more sophisticated in recent years. While text-to-speech technology has been around for decades, the technology has become much more natural sounding. Recent algorithms can take just a few hours of audio and synthesize highly realistic audio samples. As the technology advances, more applications open up, including possibilities in creative media. Recently, as reported by VentureBeat, video game companies have begun investigating the use of AI voice generation to produce dialogue for video games.
One company, Leviathan Games, has started implementing voice AI within games they are currently developing. Wyeth Ridgway, the owner of Leviathan Games, explained that voice AI could change game design in dramatic ways. Ridgway explained that the use of voice AI in game design is an emerging trend, and compared it to how 3D animation software has shifted over the course of the past decade, with companies like Pixar creating proprietary software intended to facilitate animation and modeling.
Traditional methods of generating speech operate by appending pre-recorded sound files together on the fly, stitching sentences together from previously existing words and phrases. This method of speech generation requires the recording of hundreds of hours worth of dialogue and manual labeling of sound clips. It also sounds somewhat unnatural as inflection and emphasis tend to shift across words. By comparison, state-of-the-art voice AI sounds significantly more natural and operates in a different fashion.
Voice AI is based on deep neural networks. WaveNet was one of the first AIs that could generate convincing, natural-sounding audio samples. Since the sound samples are generated from scratch there’s no need to pre-record hundreds of hours of dialogue, as long as sufficient training data is available. Optimized GANs and LSTM models can generate audio after being trained on only a few hours of labeled audio. The results can be extraordinarily convincing, such as when Google’s Duplex experiment called a hair salon to set an appointment.
As these technologies become more powerful, standardized, and easily accessible through cloud computing, it’s likely that more game developers will turn to voice AI to reduce production time and costs. Some companies are already creating models that can potentially be used by game developers. Replica Studios specializes in AI voice technology, and some audio samples generated by their technology can be heard at links here and here.
It’s unlikely that game developers will choose to forgo the use of voice actors over AI. In fact, voice AI could open up more opportunities for voice actors. Currently, many game development companies frequently skip having voiced dialogue because of the time investment and costs associated with the creation of voiced dialogue. Voice actors often need to be brought back for more recording sessions if there are changes to the script or if game directors want a different kind of performance. Voice AI could be used to experiment with/prototype dialogue, getting a feel for what kind of script changes and revisions need to be made before calling in a professional voice actor to record the script. This could lead to more companies having the resources to invest in the creation of voiced dialogue.
AI voice models could even be trained on the voice of a specific voice actor, and the AI used to generate trivial dialogue clips, as long as the actor is paid for the use of their voice. As reported by VentureBeat, voice actors like Simon J. Smith, are optimistic about the increasing use of voice AI models and their potential to open up new voice acting opportunities.
Beyond the use of voice AI to prototype scripts or create voiced lines for minor characters, game developers could also use voice AI to give players more customization options for role-playing video games. Currently, even games that allow players to choose a voice for their avatars typically have just a handful of options. With the use of voice AI, the options could be functionally limitless.