CASE STUDY

Edge Impulse helps the Oura Ring: Go deeper on Deep Sleep

The sleep data we collect is precious. It’s also messy and hard to scrub. Edge Impulse makes it easy to bring data in, combine it, and refine it, so we can get as much value out of it as possible.

Xi Zhang, Head of Health Sensing at Oura
Health

Can machine learning improve user’s sleep and health?


Better health starts with better sleep. To deliver better insights about sleep quality and user readiness, Oura ran a large scale, international sleep study. Manually processing the raw clinical data, to identify and extract relevant insights, would have taken months if not years. By leveraging Edge Impulse’s MLOps infrastructure to source and analyze data, Oura rapidly made sense of the clinical results and built an optimal data set with the specific heart, motion, and temperature readings they needed, to improve their algorithm. Now the Oura Ring delivers unprecedented sleep-scoring accuracy, and Oura’s developers can easily explore more ways to help people thrive.

Results

Better sleep, better health

Good health often starts with a good night’s sleep, since well-rested people tend to be more alert, more energetic, and at lower risk of high blood pressure, obesity, depression, and other serious conditions. Knowing how well you’re sleeping is an important measure of overall health, and a useful tool for maintaining everyday wellness. The gold standard for evaluating sleep is the poly somno graphy (PSG) test, a comprehensive test typically performed in asleep lab. At the lab, you spend the night with various electrodes attached to your body.

The sleep data we collect is precious. It’s also messy and hard to scrub. Edge Impulse makes it easy to bring data in, combine it, and refine it, so we can get as much value out of it as possible - Xi Zhang, Head of Health Sensing at Oura.

A human scorer monitors your sensor readings for things like brain waves, eye movement, cardiac signals, and muscle movement, and then classifies your sleep stages.

Personal sleep evaluations

Recent advances have given rise to wearable devices that can track and evaluate sleep at home, without electrodes or specialists standing by. The Oura Ring is a sleek, stylish device, worn on the finger, that quantifies activity, sleep, and recovery.

It uses tiny sensors to track movement, body temperature, and heart rate, and uses mathematical modeling of circadian rhythms to classify sleep stages. As a sleep monitor, it collects data and uses a machine learning (ML)application, supported by your smartphone and the cloud, to score your sleep session.

In a four-stage study conducted in a sleep lab, the human scorer classifies periods of wake, light sleep, deep sleep, and REM sleep, which is the stage most closely associated with dreaming. Scoring is still an inexact science, and even the most experienced human scorers will interpret data differently. Best-in-class results, with two human scorers scoring the same data set, typically yields a correlation of 82-83% between the two scores over four classes. Oura’s first sleep-analysis application, introduced with the first generation of the Ring, was developed using a relatively small data set and based on a handwritten ML algorithm. Yet it performed remarkably well, reaching a correlation accuracy of 62% in four-stage sleep analysis.

Getting better data

Encouraged by the success of their first effort, Oura began considering ways to improve their results. The initial algorithm had been designed based on the insights from a limited amount of sleep data, collected from local sources. Expanding the dataset, by acquiring high-quality data from a larger pool of people, would improve their training models and increase the reliability of their produced results.

To create a more generalized dataset, representing a broader range of ages, ethnicities, and genders, Oura made an ambitious plan. They partnered with three sleep clinics on three different continents – Asia, Europe, andNorth America – and enlisted roughly 100 healthy men and women, between the ages of 15and 73, to participate in a sleep study.

Each participant would undergo standard PSG testing, with electrodes and human scorers, and would also wear a research version of the Oura Ring, equipped with enough flash memory to record a full night’s worth of raw data. A side-by-side comparison of time synchronized PSG and Ring data would make it possible to correlate the human-scored PSG results with data collected by the Ring and, as a result, create a well-labeled dataset for ML training.

In all, the study would include a total of 440 nights of sleep from 106 individuals, and would generate a grand total of 3,444 hours of combined PSG andRing data. In their first sleep study, using a limited number of local participants, Oura had collected, managed, and pre-processed their training data largely by hand. Using the manual approach, it would take months, if not years, to deal with the massive amount of highly complex data they were about to face.

Managing complexity

To help deal with the onslaught of data produced by the upcoming study, and ensure they ended up with a high-quality dataset, Oura turned to Edge Impulse. Using the Edge Impulse development environment, Oura created an infrastructure for automated data sourcing and preprocessing.The new infrastructure gathers, scrubs, and prepares data with little to no manual intervention, while providing the security necessary to protect sensitive information.

Data Collection

Oura’s research partners (that is, the clinicians conducting the sleep studies), use an Edge Impulse UploadPortal to securely submit PSG results, sleep scores, questionnaires, and other relevant data. Clinicians have no access to dataset content, and can’t delete files. Uploaded data is stored in an Amazon Simple Storage Service(S3) bucket managed by Oura. The S3bucket is scalable, stores all datatypes in their native formats, and uses native encryption, with access control, to maintain security. Uploaded data remains private and untouched before being processed for use in training.

Data Management

Since data in the S3 bucket is from different sources and in different formats, it needs to be consolidated before it can be processed for training. Previously, Oura manually copied and moved data, which was time-consuming and prone to errors. Using the Edge Impulse infrastructure, Oura is able to consolidate data coming from different sources of truth, and automatically store it securely in the data lake. Consolidation is simpler, with fewer steps, and the resulting dataset is cleaner and ready to be used to train new models.

Pre-processing Pipeline

Incomplete, noisy, and inconsistent data are an unavoidable part of real-world datasets. The Edge Impulse Data Pipeline takes raw data from the organizational database, runs algorithms on the different files, and merges them in a single file that Oura can analyze. The dataset is scrubbed and prepped for ML training without having to look for gaps in reporting, edit out extraneous information, or resolve inconsistencies. Blocks in the pipeline verify data integrity, bring data into alignment (by comparing the sensor signatures of both devices), and highlight items that researchers may want to evaluate more closely.

Impressive Results

Using the Edge Impulse infrastructure to source and process data made it easier for Oura to obtain high-quality data for their ML training. Having sifted through all the available data in a matter of weeks, instead of months or years, Oura was able to narrow their focus and extract only three types of sensor data. The new dataset, covering heart rate, motion, and body temperature, produced higher correlation accuracy across the board. In the most challenging case, correlation accuracy increased by 17 points, from 62% to 79%. This is the best result that any wearable manufacturer has posted to date.

The Oura Ring now yields a best-in-class correlation accuracy of 79%.

Oura partitioned the dataset using fivefold cross validation, and evaluated the data using a standardized framework for sleep-stage classification assessment. These steps helped generalize the model and make training more effective. Two models were used for detection.The simple model used data collected by the ring’s accelerometer, which measures body movement. The full model added data relating toAutonomic Nervous System(ANS)-mediated peripheral signals (which are involuntary body functions, such as respiration and heart rate), and circadian features for sleep-stage detection. Both models were used for two-stage detection (sleep/wake) and four-stage detection (wake, light, deep, and REM sleep).

Suitable for medical research

With a correlation accuracy of 79%, the Oura Ring now comes close to the 82- 83% results considered best-in-class for PSG tests and human scorers. With access to a cost-effective sleep monitor that delivers professional-level results at home, where people are often more comfortable and more relaxed, researchers can now consider working on a larger scale, with greater confidence that their data better represents a normal night’s sleep.

Foundation for growth

Now that the Edge Impulse infrastructure is in place, Oura can begin new projects and acquire new types of data, to support different areas of wellness. The revised application uses a subset of features, but the complete dataset, with its full complement of identified features, remains accessible and usable, so Oura can add features to their algorithm whenever they want or experiment with new capabilities. The Edge Impulse infrastructure can also be expanded, to take advantage of additional functionality. For example, the pre-processing pipeline can be configured to automatically feed into a project, so the project can automatically be retained and then score all data. Oura is already exploring this concept for use with activity detection, and has more than a dozen Edge Impulse projects active in this area.

Want to read offline?