Building a high-quality dataset is the first step to a successful Machine Learning project. To make this easy the Edge Impulse Studio already contains tools to easily collect data from real devices, upload existing datasets, crop and split data, and to determine the quality of your dataset. In addition, we’ve now also added new options to filter and query your data, and to do batch operations - like changing the labels - on your dataset.
Filtering data
The new filtering feature allows you to quickly and easily find samples within your dataset. This allows you to isolate samples with particular labels or other properties. You can filter by:
Label - Find all samples with matching labels, allowing you to inspect samples by label or remove a label from the dataset
Name - Find all samples with a matching filename
Signature validity - Samples can be signed to ensure their authenticity and to trace exactly where data originated from. Filtering by signature validity allows you to find and remove samples with invalid signatures.
Sample length - Samples can be filtered by their length, so you can find and remove all samples that have an incorrect length for your model, or find longer samples and use the cropping and splitting tools to generate new, shorter samples.
To filter data in the data acquisition page, click the funnel icon above the list of samples and choose some filters. Your selected filters are appended to the URL, so you can easily share or store a filter. The filter icon shows a small orange indicator circle when filters are applied, and you can click ’Clear filters’ to remove any filters and see all samples.
Batch operations
Another new feature is that you can now apply batch operations to your data. You can select multiple samples, and then delete them, change their labels, or move them to the training or testing datasets with a single click. Selecting all samples allows you to apply operations to an entire dataset or set of samples with particular properties. For example, you could filter all short samples and delete them, or filter by the label and move these samples to the test dataset.
To modify multiple samples at once, click the checkbox icon in the top right corner to show the batch operation options. Then, click a checkbox next to a sample name to select it. The row of buttons above the table of samples allows you to delete, re-label, and move all selected samples.
Want to try these new features out? Head to the Data acquisition page for your project in the Studio. And if you have any feedback or questions, please let us know on the forums.
Ross Lowe is a software developer at Edge Impulse.