YouTube's New "Expressive Language" AI: Better than Human Voiceovers?
YouTube has just introduced a new AI feature they're calling their most impressive update yet. They've named it "Expressive Speech" for automatic captioning in eight languages.
Transcript
YouTube has just unveiled a new AI feature they’re calling their most impressive update yet. They’re naming it “expressive Speech” for automatic subtitling in eight languages. The promises are grand: not only will your words be translated, but also the emotions, tone, and energy you convey. As Creators, however, we should look beyond the marketing hype and ask the fundamental business question: Can a free tool designed to scale for potentially 2 billion users really deliver the high quality needed for viewer engagement? Let’s delve into the economics.
It’s practically impossible for YouTube to offer high-quality, customized subtitles for everyone for free. It simply doesn’t scale. So, while the technology may be fascinating, let’s explore why I remain skeptical and why here at KW Media we continue to rely on our premium tracks and curated voiceovers. Let’s conduct a blind test. We used YouTube’s new expressive Speech in our latest video and compared it to our internal YouTube premium track production. Listen to this. For YouTube. Even though it’s in German, hopefully, the difference was audible. One is functional, the other emotional.
And in the engagement game, emotion is what drives people to keep watching. However, let’s look at the hard data. YouTube claims that automatic subtitles retain 75% of the original viewer watch time, as mentioned in their last Creator Insider video. But when we look at complex content with standard automation, the reality is brutal. Look at this first chart. The original German tracks had an average retention rate of 30%. Automation dropped it to 13%. That’s a retention rate of only 43% compared to the original. Viewers clicked, heard the robotic voice, and left the page.
This aligns with the feedback we’ve received from clients who encounter automatic subtitles online. And I quote: “When I come across automatic subtitles, I hit the thumbs-down button and click ‘Not Interested in Channel.’ Because of this, I’ve completely lost some Creators, not just in the Shorts feed but they’re also suggested to me much less often in general.” Now compare that 43% to our premium track approach with one of our automotive clients. For manual subtitling, we used a documentary style with voiceovers on a delayed translation layer.
This way, the viewer immediately understands it’s subtitled but the original emotion comes through. The average retention rate for DAP here was 16.1% compared to the original 26.4%. That means we retained over 60% of relative performance, 20% better than YouTube’s tool and actually closer to their promise. Also, note that YouTube’s 75% retention statistic is likely a mixed average. It’s probably heavily skewed by Shorts and visually heavy content where audio takes a back seat.
For narrative-driven long-form content like ours or our clients’, the data suggests the decline is much steeper. YouTube tries to relativize this by saying, “Even if the retention is lower, any additional traffic volume is good, right?” The Creator Insider suggests looking at the total watch time increase. Okay, we did that and dove into the analytics to check the data. The growth in automatic subtitle watch time was over 500%. That’s correct compared to a time when there were no automatic subtitles.
But for automatic subtitles, the entire traffic gain is often statistically irrelevant compared to the 1% home market. So take it with a grain of salt. And looking ahead, YouTube’s roadmap gets even bolder. They’re testing lip-sync where they’ll match your mouth movements to the translated audio and working on translating burned-in text within the video itself. We’re moving towards full localization where the original video is just a blueprint.
And “blueprint” is a good word here since they’re also working on dynamically inserted brand segments. Unfortunately, we don’t get individual options like “I’m okay with the text in the video being translated but not with lip-sync.” So, that’s it for today, and I want to know: Would you let YouTube animate your face for lip-sync? Where do you draw the line with automatic subtitles? Do you even use automatic subtitles at all? Please share your analytics in our Community tab.
Make sure you set your advanced filters to the last 365 days, select an audio track, and include the average watch percentage. Let’s discuss this in the comments. I’m Martin, bringing you weekly creator news. See you next week with more YouTube updates!
