Using OpenAI Whisper to Train a Tiny Keyword Spotting Model — in Any Language

Imagine you have a product. And you want it to react to a very specific keyword. Aaand you plan to sell millions of it all over the world. Problem? You don't speak French. Or Chinese. Or Spanish. You could hire lots of people on mechanical turk or a similar platform. But that needs a large upfront investment.

Enter Generative AI.

In this video I'll show you how you can use Edge Impulse to get all the voice samples you need, train the keyword spotting model, and deploy it to the extremely tiny Cortex M0+ based Arduino RP2040 Connect. Magical stuff.

Let's jump straight to action.

We're going to use Whisper API for Text to Speech — it provides high quality speech synthesis in a few different voices and supports multiple languages.

First, create a new project in Edge Impulse (Pro Plan or Enterprise tier only, but don't worry, you can do the Enterprise trial to test this out), then go to the Data Acquisition tab and choose Synthetic data, then select Whisper Synthetic Voice Generator.

Let's imagine I don't speak Chinese (其实我会说中文,但是我发音不标准) but I want to use this language. We're going to use the following labels: stop (停), forward (前进), back (撤销), left (左转), and right (右转) — commands suitable for a mobile robot platform.

On the Synthetic data window, enter each word in Chinese, give it a label, leave other parameters on default for now, and generate some samples.

Since we're going to use another nifty trick available in Edge Impulse, few shot keyword spotting, we don't need that many samples. Let's generate 50 for each label.

We know we're going to run this model on a very constrained device, so let's carefully pick DSP parameters, then train the model.

For deployment, there is a precompiled firmware for Arduino RP2040 Connect, which I show in the video. And if you want to customize things and write your application code, you can choose the Arduino library, download it and upload the example sketch with the Arduino IDE. 

You'll see in the video that the on-device results are solid. This is an extremely fast and easy way to get keyword spotting in any language onto hardware.

The possibilities for GenAI for edge devices are endless — take a look at some other things we experimented recently at Edge Impulse, for example distilling some knowledge from GPT4o to a much smaller model and running it on a microcontroller!

Comments

Subscribe

Are you interested in bringing machine learning intelligence to your devices? We're happy to help.

Subscribe to our newsletter