Many thanks!

We have received your enquiry and will be in touch as soon as possible.

SwissGlobal Newsletter

Lost in translation? Not with SwissGlobal. Stay up to date with language industry news, tips, interviews and more – subscribe to our monthly newsletter.

From Organic to Synthetic: the Dawn of AI Voiceover

“More Human Than Human is our motto”: So said Dr Eldon Tyrell in the seminal science fiction film Blade Runner. Made 40 years ago this year, the movie’s vision of synthetic beings created to serve society is gradually becoming a reality, with everything from digital assistants to self-driving cars lending us humans a helping hand. One related development set to take the language and localisation industry by storm is AI voiceover technology – software suites that produce artificial voices to narrate content. With many developers claiming to be indistinguishable from their human counterparts, we want to take a closer look at how much promise this technology holds.

How does AI voiceover work?

The technology itself may be complex, but the premise behind it is simple: an AI-powered suite uses text as the fuel to power a deep-learning-based voice engine with the aim of generating audio(visual) content. The voices of these synthetic speakers tend to be based on real voice actors to ensure human-like authenticity and maximum audience engagement, and they sometimes come packaged with a digital avatar for added legitimacy. With this in mind, the obvious million-dollar question is: Are these synthetic voices truly as lifelike as the developers say?

Our verdict: Not quite yet. But the technology still holds plenty of advantages, not least for companies looking for a distinctive voiceover solution for their brand that is more cost-effective and quicker to produce than hiring a flesh-and-blood voice actor and spending extra on equipment and studio time. The key lies in how this technology is used: While it may not yet be suitable for fully fledged television ads, artificially rendered voiceovers are perfect for bringing lower-rung content to life. And with everything from YouTube instructional vids to mobile apps in beta development to internal employee onboarding material suitable for an AI’s dulcet tones, now is a good time to try out a few of the services offered by the current crop of voice tech firms.

Digital movers and binary shakers

Let’s take a brief look at a few pioneers of this technology. First up is synthesia, a software company founded by a team of bright young things in 2017, which provides what it calls “synthetic media” (and whose technology was responsible for the AI voiceover video above). Then there’s Flawless AI, whose TrueSync dubbing software has been singled out for praise by Time as one of the best inventions of 2021. Elsewhere, Murf’s website boasts lifelike AI voices that can be used to produce “studio-quality” voice-overs at short notice, while Blakify offers its own services in 65 languages and 400+ voices. Lovo puts yet another spin on the service, offering DIY voice cloning, where users’ own voices are cloned for the purpose of narrating audiobooks, YouTube content, Instagram stories and more. Even Amazon is getting in on the act with Polly, a suite allowing individuals and companies to turn up to five million characters of text into “lifelike speech” free of charge.

An inclusive innovation with real potential to change lives

Any time a hip new piece of tech appears on the market, the literature likes to use terms such as “game changer”, “revolutionary” and “life-enhancing”. The reality often falls short of expectations. For AI-driven voiceover services, however, the potential truly is there. It goes beyond saving time, cutting costs and striving to stay on the bleeding edge of things – this technology can help people with all manner of disabilities improve their quality of life. For example, it can put a clear, understandable, engaging voice to a huge range of content for people who are blind or who have a decreased ability to see. The iPhone’s VoiceOver tool is a strong example of this in practice: the AI-powered screen reader reads out app descriptions, battery level and incoming calls, and can even provide a description of certain images. Meanwhile, reams of text can be summarised and made easily understandable for people with learning problems (think a kind of Simple English Wikipedia, but powered by AI). Those with dexterity issues, such as senior citizens, can also benefit, as AI-powered voices can read out content without the need to scroll or type. Game changer, revolutionary, life-enhancing: all apply to AI voiceover.

The robots are coming

What’s our verdict on AI voiceover technology? Well, while it might not yet be “more human than human”, it’s certainly an exciting development for the language industry. A cost-effective voice unique to a brand that is generated in the short term and which can be easily adapted as situations and content evolve: doesn’t that sound like a dream? Of course, real-life human voices will still have their place – prestige areas such as commercials and promotional content where authenticity simply cannot be faked – for a long time to come, but one thing is certain: the smiling, softly spoken, soul-free robots are coming.