By harnessing the power of LLMs inside Edge Impulse, you can now clean up massive object detection datasets quickly and with minimal effort.
Training an effective edge AI model requires high quality data that is relevant to your use case. To train their first prototypes, engineers often reach for public datasets, but this can come with risks. Often these datasets are large but contain flaws such as mislabeled data, labels that are not relevant to your use case, or unwanted data augmentation. Manually reviewing all images and labels for a large dataset is time-consuming and error-prone. The newest Transformation Block in Edge Impulse, however, lets you validate your data in minutes and create high quality datasets using the power of multi-modal LLMs.
Check out this video that walks through the tool in a couple of scenarios.
How it works
Models such as OpenAI’s GPT-4o are getting better and better at interpreting images combined with text prompts. We can make use of this to ask the LLM validation questions about images in our dataset. This transformation block asks you for up to three validation prompts. These prompts represent statements that will result in a data sample being disabled. Some example validation prompts include:
- The bounding boxes and labels do not correspond to to the objects in the image
- The image is not clear enough to determine the objects in the image
- There is text overlay visible on this image
- The image is incorrectly rotated
- The image has gaussian blur or salt and pepper noise applied
- The image is not taken by a camera (e.g., it is an illustration, cartoon, or otherwise)
These prompts are then passed to the LLM along with the bounding box label information for the current image. For example:
- Label: helmet, Location: (x: 431, y: 174), Size: (width: 44, height: 76)Label: helmet, Location: (x: 321, y: 172), Size: (width: 46, height: 71)Label: helmet, Location: (x: 269, y: 146), Size: (width: 50, height: 85)
This extra information allows the LLM to check if the labels are correct (if you have asked to reference the given labels one of the prompts).
The response from the LLM is then a structured JSON format including the following:
- Validation: valid or invalid — if invalid the sample is then disabled in your data acquisition view
- Reason: Text based reasoning is returned and put into the metadata for the sample. For example:
- The bounding boxes and labels do not correspond to the objects in the image
The result of running this process over a large object detection dataset will be a dataset where any “unclean” data is disabled and not used for training. You can then use the dataset filtering tools in Edge Impulse to delete all disabled samples, or review disabled samples to see if any need re-enabling.
How to use the block
This feature requires an enterprise subscription to Edge Impulse.
- Go to an Enterprise project, upload an Object Detection dataset which you wish to validate. This can be a public dataset from somewhere like Kaggle. We support a number of industry standard dataset annotation formats.
- Choose Data acquisition->Data Sources->Add new data source.
Select Transformation Block and the 'Validate Object Detection Datasets Using GPT-4o' block, fill in your prompts and and run the block. You can run on a fixed number of samples or over your entire dataset
You need an OPENAI API Key to run the GPT-4o model
- Any items which are invalid will be disabled and viewable in the data acquisition view. Reasoning will be provided in the metadata for each data item:
Check out the source code for this block on our GitHub if you want to explore how it works and try it out with an Enterprise Trial.
Here are the two example projects to which dataset validation has been applied and their dataset sources:
Industrial PPE Detection:
- Project: https://studio.edgeimpulse.com/public/438000/latest/acquisition/training
- Dataset: https://www.kaggle.com/datasets/andrewmvd/hard-hat-detection/data
Truck Detection: