Using LLMs to Analyze and Label Satellite Imagery in Edge Impulse

Recently we launched new Edge Impulse functionality that shows just one of the powerful ways that LLMs can be utilized on the edge: using GPT-4o to train a 2,000,000x smaller model, that runs directly on device.

Our initial demo showed how this tool could be used to label images for binary classification (presence/absence) but there are more complex labeling tasks possible.

In this post and video we explore how visual data from a satellite could be labeled both for cloud type and for coverage, generating two useful edge models from the same dataset very quickly:

The challenge of AI for remote applications

Satellites are resource constrained — power, compute and storage must all be managed carefully as expansion is rarely possible. Another issue is data transfer to Earth, which can be costly and has bandwidth restrictions. Current LLMs can only run in large data centers due to their unwieldy size (billions of parameters) so to utilize their power, raw data must be sent over the air to be computed.

Edge AI is perfectly positioned to address these issues. By developing a model that can run on the device where data collection occurs, you can reduce data transmission to just the desired insight. For example, if you’re only interested in seeing clear images but a particular satellite image is obscured by clouds, it doesn't need to be sent down to earth for inspection. These methods can also be useful for improving existing edge models — sending raw data for classifications on-device that return an “unsure” or “anomaly” result to an LLM for labeling can be a good way to improve model performance over time.

Cloud type and coverage labeling examples

To show the power of LLM labeling tools we have created two models from one raw dataset by designing two different LLM prompts for labeling. In the above video, I accessed an existing Harvard cloud dataset. Using ChatGPT, I can ask questions about the images in the dataset, such as what type of cloud is shown or how much of the frame is covered by clouds. The latest LLMs excel at this type of image analysis, but running this procedure has high latency, and incurs significant cloud costs, especially when in a production environment with many source cameras and a high frame-rate.

Here's how we can create a smaller model to perform this same tasks, on-device:

Project walkthrough

Data collection: I downloaded the open Harvard cloud satellite-image dataset (originally curated for cloud removal with spatiotemporal networks, a more complex task). Taking only the RGB images, I then uploaded them without labels into an Edge Impulse project. This unlabeled dataset forms the bases for both the cloud type classification and cloud coverage models we want to create.
Designing a prompt: I first experimented with labeling a few images in ChatGPT itself, noting the setting specific output options that worked best for each tasks.

• Cloud Classification: My final prompt was “What cloud type is visible in this satellite picture? Respond only with “cirrus”, “cirrocumulus”, “cirrostratus”, “altocumulus”, “altostratus”, “nimbostratus”, “stratocumulus”, “stratus”, “cumulus”, “cumulonimbus”. If there is no cloud respond with "no cloud" or "unsure" if you're not sure.

• Cloud Coverage: My final prompt was “Look at this image from a satellite and respond with a percentage of how much of the image is obscured by clouds. Respond only with a percentage not including "%" or "unsure" if the image is corrupted, or it is not an image from a satellite.”

3. Labeling with GPT-4o: Using the new transformation block "Label image data using GPT-4o" inside each Edge Impulse project, I asked GPT-4o to label the images with my prompt. The block discards any blurry or uncertain images, providing a clean dataset. It also provides natural language reasoning for every labeling decision — a powerful tool for debugging model performance.

4. Model Training: For each model type, I designed an impulse with the desired output format. I used a Transfer Learning Image classifier block for the cloud classification task and a Regression block for the cloud coverage task:

• Cloud Classification: I trained a small MobileNetV1 classification model with these images. The model was only 1.6MB in size, significantly smaller than the original LLM.

• Cloud Coverage: I trained a small convolutional neural network regression model with these images. The model was only 49.1K in size, significantly smaller than the original LLM.

5. Deployment: This model could be deployed on MCU hardware from Cortex M4 upwards, for example an Arduino Nicla Vision. The model accurately detects cloud types on-device, in real-time, without requiring cloud services.

Results

These two models were created in an incredibly short time. Data labeling is often the most time-consuming and error-prone part of training a model for the edge. By utilizing LLMs you can create your first-pass model much more quickly, and the power of having natural language reasons for each label makes debugging model datasets even easier.

LLMs are still flawed and don’t perform perfectly for every use-case, but they are a great tool to assist subject matter experts in developing models faster. As generative AI continues to accelerate in popularity and complexity these techniques will only get better.

References

Jan Jongboom Bringing Large Language Models to the Edge with GPT-4o
Met Office Cloud names and classifications
Edge Impulse GPT-4o Cloud Type Experiment
Edge Impulse GPT-4o Cloud Coverage Experiment
Harvard Dataverse Dataset From: Cloud Removal from Satellite Images using Spatiotemporal Generator Networks

Capabilities

Built for

Industries

Applications

Technical resources