Introduction
Sleep is important for our overall health and quality of life
1
. Inadequate sleep is often associated with
many negative outcomes, including obesity
2
, irritability
2,3
, cardiovascular dysfunction
4
, hypotension
5
,
impaired memory
6
and depression
7
. About one third of the general population in the United States are
affected by insufficient sleep
8
. The prevalence of inadequate sleep results in large economic costs
9
and
continues to increase in various nations
10,11
. Spontaneous sleep arousals, defined as brief intrusions of
wakefulness into sleep
12
, are a common characteristic of brain activity during sleep. Excessive arousals
due to disturbances can be harmful, resulting in fragmented sleep, daytime sleepiness and sleep disorders
13,14
. There are different types of arousing
15
stimulus, including obstructive sleep apneas or hypopneas,
respiratory effort-related arousals (RERA), hyperventilations, bruxisms (teeth grinding), snoring,
vocalizations, and leg movements. Together with sleep stages (wakefulness, stage1, stage2, stage3, and
rapid eye movement), sleep arousals are labeled through visual inspections of polysomnographic
recordings according to the American Academy of Sleep Medicine (AASM) scoring manual
16
. Of note,
an 8-hour sleep record sampled at 200Hz with 13 different physiological measurements contains a total of
75 million data points. It takes hours to manually annotate such a large-scale sleep record.
Many research efforts have been made in developing computational methods for automatic arousal
detection based on polysomnographic recordings
17–21
. These methods mainly focus on 30-second epochs,
and extract statistical features in the time and frequency domains through Fourier transform or in-house
feature engineering. These features and/or raw signals are subsequently fed into machine learning models
to predict sleep arousals. However, due to the large differences of datasets and evaluation metrics used in
previous studies, it remains unknown how to build an accurate and robust model to quickly delineate all
sleep arousal events within a sleep record at a high resolution. In particular, how to preprocess the raw
data or extract features before training models? Which types of machine learning models are well suited?
What is the optimal input length (e.g. 30-second epochs or full-length records)? Which types of
physiological signals should be used?
Here we investigate these questions and describe a novel deep learning approach, DeepSleep, for
automatic detection of sleep arousals. This approach ranked first in the 2018 “You Snooze, You Win”
PhysioNet/Computing in Cardiology Challenge
22
, in which state-of-the-art computational methods were
systematically evaluated for predicting non-apnea sleep arousals on a large held-out test dataset
23
. The
workflow of DeepSleep is schematically illustrated in Fig. 1. We built a deep convolutional neural
network (CNN) to capture long-range and short-range interdependencies between time points across an
entire sleep record. Information at different resolutions and scales was integrated to improve the
performance. Intriguingly, we found that similar EEG and EMG channels were interchangeable, which
was used as a special augmentation in our approach. DeepSleep is able to delineate the sleep arousal
profile of a sleep record at 5-millisecond resolution within 10 seconds.
Methods
Physionet Challenge Dataset
The dataset used in this study contains a total of 994 polysomnographic sleep records from different
individuals and their corresponding labels at each time point. Specifically, the arousal region is labeled by
3
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3445559