Building an Advanced Flood Monitoring System with Edge Impulse

Flood prediction is becoming increasingly critical as climate change continues to reshape our environment. Leveraging machine learning and distributed sensors can offer a proactive solution to this challenge. Deploying models like this at the edge is beneficial because it allows immediate action at the point of data collection, regardless of wider network connectivity which can be affected by extreme weather. In this blog, we’ll explore how to build an advanced flood monitoring system using Edge Impulse, featuring time-series forecasting to predict river levels up to 24 hours into the future.

This project combines distributed sensors, centralized data aggregation and preparation, and edge AI model training, all powered by the Edge Impulse platform. While the focus is on flood monitoring, the architecture and techniques can be adapted to other use cases, such as predictive maintenance in industrial settings, using distributed sensors across a factory floor.

Video Walkthrough

Watch this video for a step-by-step guide on setting up and deploying the flood monitoring system:

The Project Overview

The system uses publicly available data from the UK government’s river level and rainfall sensors, aggregated via an API. These sensors are distributed across the region, providing real-time environmental data. For this project, we focus on predicting water levels at a specific monitoring station on the River Foss in Yorkshire.

Here’s a step-by-step breakdown of the workflow:

Data Collection: Using the Environment Agency Real Time flood monitoring API, we aggregated data from multiple river and rainfall sensors. This raw data was processed and narrowed down to the most relevant inputs, such as nearby river levels, rainfall, and seasonal context (e.g., winter flooding is more likely due to saturated soil).
Pre-Processing: The aggregated data was formatted into time-series chunks, enabling predictions for 1, 2, 3, or even 48 hours into the future. Additional features like the time of year and shifted time labels were added to enhance the model’s accuracy.
Model Training with Edge Impulse: Leveraging Edge Impulse’s tools, we trained time-series regression models for each prediction window (e.g., 1 hour, 8 hours, etc.). Neural networks were used, but classical ML methods like XGBoost or linear regression could also be applied.

Deployment and Visualization: The trained models were deployed to a Linux-based edge device running Docker. The edge device hosts a web server that displays a dashboard of current river levels, rainfall data, and predictions:

On-device prediction graph with full results

The two blue lines on this graph show the up to date readings for the target river level and recent rainfall. The other colors show the different time-horizon prediction model results and the grey line is a mean of all the model inputs. As you can see up to 8 hours in the future this model is fairly accurate, but the 24 hour model is not accurate at all, this is likely because the rainfall is the key driver in this river level rise but the effects last less than 24 hours usually.

The web server also hosts a simpler graph with the average model outputs, as well as an endpoint to download a comma separated version of the data in the graph.

On-device prediction graph with averaged results

Key Challenges and Solutions

1. Data Aggregation and Processing

The raw data from the API contained readings from all monitoring stations across the country. By filtering and processing this data, we created a streamlined dataset focused on relevant inputs for the target monitoring station.

Using Edge Impulse’s transformation blocks, we automated the process of downloading and structuring the data. For example, rainfall data from nearby sensors was synchronized with river levels to improve predictions.

2. Seasonal Variation

Flood patterns vary significantly between seasons. By incorporating seasonal labels (e.g., winter vs. summer), we accounted for factors like soil saturation and precipitation patterns. This adjustment significantly improved model accuracy.

3. Model Scalability

The system supports multiple prediction models for different time horizons (e.g., 1-hour vs. 48-hour forecasts). Each model is fine-tuned for its respective window to balance accuracy and long-term forecasting capability. The shorter time horizons yielded higher accuracy, and there is a time limit at which the model is not useful (greater than 24 hours in our testing). This highlights the importance of understanding what influences the measurement you’re trying to predict. In this case short term rises in river level often follow rainfall, and this rise happens in a few hours, so predicting short-term river level rises further ahead is not possible.

System Architecture

The solution leverages a modular and scalable architecture:

Data Source: Public API for river and rainfall levels. This could be substituted for a set of distributed sensors connected to a central server or a data historian. There is a Transformation Block which pulls raw data from this API into an S3 bucket ready for processing. This is automated to run weekly as new data becomes available.
Data Processing: Transformation blocks process raw data into time-series datasets ready for training.
Edge Impulse: Train ML models for time-series regression with Edge Impulse’s cloud tools.
Edge Device: Models are deployed to a containerized edge server using the Linux EIM executable, enabling real-time predictions and hosting a local dashboard.

This architecture could be adapted to other industrial or environmental monitoring scenarios, such as factory sensor networks or agricultural forecasting. Here is a high level architecture diagram explaining the proposed solution:

Building the Data Pipeline

One of the standout features of Edge Impulse is its ability to create robust data pipelines. Here’s how the pipeline was designed for this project:

Scraping Data: A transformation block downloads the latest sensor data weekly. If data files already exist, they’re skipped to avoid redundancy. The raw data files store readings from all the sensors nationally.
Processing Data: A custom script refines the raw data, selecting key monitoring stations and transforming readings into usable time-series formats. This step includes generating time-shifted labels for predictions. We can make use of customisable UI elements when running the transformation block to choose different parameters for the processed data.
Training Data Preparation: Data is imported into multiple Edge Impulse projects, each corresponding to a different prediction horizon (e.g., 1-hour, 8-hour, etc.).
Model Deployment: Edge Impulse’s API automates model deployment to the edge device. Updates to the model are fetched and applied programmatically, ensuring the system stays up to date.

Model Training and Optimization

For this project, a regression model was trained using the following steps:

Feature Generation: Using Edge Impulse’s raw data processing block, we generated features based on historical readings and shifted time labels.
Model Selection: Neural networks were chosen for their ability to capture complex relationships. The EON Tuner was used to find the optimal parameters such as window size and network architecture. Classical ML architectures like XGBoost or LightGBM could also be effective for this problem.
Evaluation: The trained models were evaluated for each time horizon, balancing accuracy and computational efficiency.

Conclusion and Applications

This advanced flood monitoring system demonstrates how Edge Impulse can enable rapid development of production Edge AI solutions. By integrating distributed sensors, scalable data pipelines, and edge deployment, this project highlights the potential for predictive systems in a wide range of applications.

Whether it’s monitoring river levels, predicting factory equipment failures, or forecasting agricultural yields, the techniques showcased here offer a foundation for tackling complex challenges with machine learning and IoT.

Ready to build your own monitoring system? Check out the project’s GitHub repository for source code, resources, and transformation blocks.