The Edge Impulse studio lets you capture data from real devices to build TinyML models easily. Set a label, press ’Record’, and the raw sensor data is captured. However, some data is hard to capture. If you’re looking to detect a discrete event - like glass breaking - it’s hard to time the recording just right. Or if you want to detect keywords it’s much easier to just say the keyword 60 times in a row, rather than making separate recordings. To solve this we have now added the ability to crop and split data straight from the studio.
Cropping data
To crop data head to Data acquisition, click ⋮
, and select Crop sample. You can set a specific length, or use the drag handles to resize the window, then move the window around to set your selection. Made a wrong crop? No problem, just click Crop sample again and you can move your selection around. To undo the crop, just set the sample length to a high number, and the whole sample is selected again.
Splitting data
Besides cropping you can now also split data automatically. Here you can perform one motion repeatedly, or say a keyword over and over again, and the events are detected and can be stored as individual samples. This makes it easy to very quickly build a high-quality dataset of discrete events. To do so head to Data acquisition, record some new data, click, and select Split sample. You can set the window length, and all events are automatically detected. If you’re splitting audio data you can also listen to events by clicking on the window, the audio player is automatically populated with that specific split.
Samples are automatically centered in the window, which might lead to problems on some models (the neural network could learn a shortcut where data in the middle of the window is always associated with a certain label), so you can select "Shift samples" to automatically move the data a little bit around.
Splitting data is - like cropping data - non-destructive. If you’re not happy with a split just click Crop sample and you can move the selection around easily.
Now available
Cropping and splitting data is now available for all projects (that contain time-series data) so if you run into any issues, please let us know on the forums! In the near future, we’ll also be adding more data augmentation features - like automatically adding noise to audio data, and automatically clipping samples to make models perform better in the real-world - for time-series data, so stay tuned.
Jan Jongboom is the co-founder and CTO of Edge Impulse. He now finally has a device that wakes up when it hears his name.