When you need to keep tabs on a physical variable of some process, the first step is to find something that can be measured. These direct measurements will give a clear picture of the current state of the variable in question. Nice and easy… if there is something obvious to measure, that is. But many times it is not possible to obtain a direct measurement, either because it is impractical to do so, or because there is no sensor available that can do the job. In such cases, an indirect measurement may still be made by measuring proxy variables that correlate with the variable of interest.
Consider the case of an endangered species, for example. Researchers may want to understand the degree of welfare of that species, but since there is not an obvious, direct way to measure it, they may use the size of their habitat area as a proxy. When using a proxy, relationships between variables can get complex very quickly, making it difficult to understand how they relate to one another. Machine learning enthusiast Christopher Mendez Martinez found himself thinking through these issues and decided to try using a neural network to understand these types of relationships, since such models are exceptionally good at picking patterns out of large datasets.
Towards this end, Martinez set out to uncover a seemingly very unlikely relationship between air quality and accelerometer data. Huh?! How is that possible, you ask? Well, the trick lies in the air purifier that Martinez uses — it automatically adjusts its operating speed based on how much cleaning of the air it needs to do. Since those speed changes alter the vibratory patterns of the purifier, that should provide a measurement that correlates with air quality. And if there is a signal in the data, it follows that a neural network should be able to learn to recognize it.
So as you can see, Martinez’s plan stands on solid ground, and he has not fallen off the edge of sanity — but he did want to run the algorithm on the edge, so he tracked down an Adafruit HUZZAH32 development board with an ESP32 microcontroller and 520 KB of SRAM to power the air quality detector. A three-axis accelerometer was also wired in to measure vibrations, which completed the build. While the HUZZAH32 has some pretty substantial horsepower for a microcontroller, it is still very resource-constrained by machine learning standards, so Edge Impulse Studio was leveraged to build an algorithm that is optimized for edge computing devices.
Before building that model, a training dataset needed to be collected, so Martinez placed the assembled device on the air purifier and collected data under four conditions — clean air, slightly polluted, highly polluted, and with the purifier turned off. The HUZZAH32 was linked with the Edge Impulse project, so as data was collected, it was automatically uploaded and made available in the data acquisition tab. A very modest dataset, containing only 90 seconds of samples for each condition, was collected.
At that point, the impulse could be created to analyze the accelerometer data. A preprocessing step that performs a spectral analysis to extract the most important features in the data was added, followed by a neural network classifier. The network was then trained on the previously uploaded data, after which the accuracy was checked with a confusion matrix. Classification accuracy was found to be sitting at 100%, which is very good — maybe even a bit too good, Martinez thought. He suspected that the model may have overfit to the training data, so he tried out the live classification tool as another method to validate the data. This allowed the model to be tested on real world data from the device, and this also showed that the network was performing exactly as expected, so with his fears of possible overfitting eased, he moved on to complete the air quality detector build.
The deployment tool was used to create a downloadable Arduino library, optimized by the EON compiler, for deployment to the HUZZAH32 development board. With a few clicks this library was imported into Arduino IDE and flashed to the hardware. By running the algorithm directly on the hardware, there is no reliance on a wireless network connection or cloud computing resources for the device to operate.
In the future, Martinez would like to explore running the algorithm on alternate hardware platforms (spoiler, it is as simple as selecting a different hardware target in Edge Impulse’s deployment tool) and also transmitting the classification results to an external server where they can be viewed in a dashboard application.
Before you run off to turn unlikely data sources into valuable insights for your office or workshop, make sure to check out Martinez’s project write-up for some useful tips and tricks.
Want to see Edge Impulse in action? Schedule a demo today.