Automated Articulatory Speech Synthesis for Animation

In animation, aligning mouth movements has traditionally always been a painstaking process of carefully working through tiny movements, which is doubly time-consuming when you are animating in a language or dialect that is not your native tongue. But what if we could automate that process? That’s exactly what Zack Qattan’s Edge Impulse project aims to solve and automate, on-device. The result has implications beyond animation, offering benefits into broader accessibility as well.

Quick aside: This project is an interesting use case and public project that Qattan (a co-founder of Brilliant Sole) originally shared on our forum. Like Qattan, I too used to work in animation in a language that I don’t speak, using the same software. This one brought back memories.

It was enjoyable to catch up with Qattan and learn about this project. The content in this blog comprises our conversation, and his views on where this can be used. If you have an interesting project like this please share it with us.

What Are Phonemes?

Before diving into the project at hand, let’s start with the basics: Phonemes are the smallest units of sound in a language. Think of them as the building blocks of speech, individual sounds like “p,” “b,” “t,” or “l.” When we speak, we combine phonemes into syllables and words; each language has its unique set. For animators, matching these phonemes to corresponding mouth shapes (called visemes) is crucial for realistic lip-sync. This is a problem Qattan and I are both acutely aware of.

An interesting solution to this is Qattan’s Phoneme Classifier, used to drive the Pink Trombone articulatory speech synthesizer. While we advise careful and ethically responsible use of any speech synthesis tool (see our Responsible AI License), this approach offers the possibility for avatar control in animation.

Check out a demonstration of the project in action here:

Controlling Pink Trombone with Phoneme Classifier made in Edge Impulse

Why this matters for animators (and everyone else)

Traditional pipelines involve:

  1. Recording dialogue
  2. Breaking down each sentence into phonemes
  3. Matching each phoneme to a mouth or blend shape
  4. Fine-tuning coarticulation, pacing, and emotion

A phoneme classifier + articulatory speech synthesizer automates much of this. Not only does it save time, it frees animators and sound designers to focus on creative storytelling rather than tedious technical work. Beyond animation, this technology could reshape how we engage with VR, gaming, and broader accessibility use cases.

From Tedious Lip Sync to Automatic Control

Zack trained a classification model to perform:

Beyond Lip Sync

Looking forward

The possibilities are practically endless.

Future inputs might include in-mouth wearables (like the Augmental MouthPad) or EMG earbuds to capture subtle muscle movements around the mouth or neck.

From a production standpoint, think about how you can tweak a character’s performance after the fact without re-recording or re-animating everything. That’s a total game-changer for animation studios and indie creators alike.

From an animator’s perspective, I’m personally excited about how quickly this field is evolving, tweening (generating in-between frames) seemed like magic; now we have so much more. We’re moving toward a future where speech creation, modification, and lip sync can happen almost instantaneously, powered by machine learning models that do the heavy lifting. It’s a stark contrast to the manual workflows I used when first animating for a long forgotten TV series!

It's also worth noting that there are connected versions of this that use OpenAPI to do the same process. We are aware of a Blender plugin that does so as well. However, doing this on an edge device is not something I have seen before, and one that can run on an MCU has much more potential for accessibility than connected.

So, if you’ve ever dreaded staying up late to lip-sync a single line of dialogue, or if you’re simply curious about the future of accessibility and animation, take a look at these tools. They might just save you from a few more of those marathon animation sessions and unleash a whole new realm of creative possibilities.

Please share any project you are working on in our forum. We love to hear about the interesting ways our community members are using edge AI.


Resources:

Comments

Subscribe

Are you interested in bringing machine learning intelligence to your devices? We're happy to help.

Subscribe to our newsletter