During the past weeks I have been getting some ads from different “text-to-speech” platforms. These text-to-speech platforms state they use AI to generate natural voices, so I decided to find out a little bit more about them and see how they work. I have to clarify I didn’t acquire any of these, because I didn’t like the idea of spending money on a software just for a review (the only times I spend money on software is when I actually plan to use it for my work).
I did my research of two different platforms, Speechelo and Talkia, although there are more out there. Basically, you enter your text, and the software automatically converts it to speech. I have to say that, right off the bat, they can sound very robotic, which is far from the promised “natural speech” results you would expect.
The apps have tools to tweak the speech, adding pauses, or emphasis on words, and also change the speed of the speaker, among other things. They can make the results sound better, with more natural inflections and exclamations, although they will still sound somewhat monotone. Both Speechelo and Talkia let you preview (or “pre-listen”?) some of their voices on their websites, so you can judge for yourself.
Since I am involved in game development, I see talks about various subjects, and the subject of voice acting and text-to-speech apps has come up a few times, where people recommend these for voice overs in games. I wonder if they have actually used these apps, or they recommend them because they saw an ad or because they made a web search for “text to speech” and grabbed the first link they found. I certainly don’t think these are replacement for voice actors in video games.
If you ask me, using these will depend greatly on what you are going after. If you want voices for corporate presentations, marketing videos, and similar things, I think these apps can work, but if you are looking for more “natural” speech for other applications, like narrations or voice-overs for entertainment products (audio books, video games, animations, etc.), it’s better to use voice actors.
For more information: