Sequence-to-point learning with neural networks for non-intrusive load
monitoring
Chaoyun Zhang
1
, Mingjun Zhong
2
, Zongzuo Wang
1
, Nigel Goddard
1
, and Charles Sutton
1
1
School of Informatics, University of Edinburgh, United Kingdom
chaoyun.zhang@ed.ac.uk, {ngoddard,csutton}@inf.ed.ac.uk
2
School of Computer Science, University of Lincoln, United Kingdom
mzhong@lincoln.ac.uk
Abstract
Energy disaggregation (a.k.a nonintrusive load monitoring,
NILM), a single-channel blind source separation problem,
aims to decompose the mains which records the whole house
electricity consumption into appliance-wise readings. This
problem is difficult because it is inherently unidentifiable.
Recent approaches have shown that the identifiability prob-
lem could be reduced by introducing domain knowledge into
the model. Deep neural networks have been shown to be a
promising approach for these problems, but sliding windows
are necessary to handle the long sequences which arise in sig-
nal processing problems, which raises issues about how to
combine predictions from different sliding windows. In this
paper, we propose sequence-to-point learning, where the in-
put is a window of the mains and the output is a single point of
the target appliance. We use convolutional neural networks to
train the model. Interestingly, we systematically show that the
convolutional neural networks can inherently learn the signa-
tures of the target appliances, which are automatically added
into the model to reduce the identifiability problem. We ap-
plied the proposed neural network approaches to real-world
household energy data, and show that the methods achieve
state-of-the-art performance, improving two standard error
measures by 84% and 92%.
Energy disaggregation (Hart 1992) is a single-channel blind
source separation (BSS) problem that aims to decompose
the whole energy consumption of a dwelling into the en-
ergy usage of individual appliances. The purpose is to help
households to reduce their energy consumption by helping
them to understand what is causing them to use energy, and
it has been shown that disaggregated information can help
householders to reduce energy consumption by as much as
5 − 15% (Fischer 2008). However, current electricity me-
ters can only report the whole-home consumption data. This
triggers the demand of machine-learning tools to infer the
appliance-specific consumption.
Energy disaggregation is unidentifiable and thus a diffi-
cult prediction problem because it is a single-channel BSS
problem; we want to extract more than one source from a
single observation. Additionally, there are a large number
of sources of uncertainty in the prediction problem, includ-
ing noise in the data, lack of knowledge of the true power
Copyright
c
2018, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
usage for every appliance in a given household, multiple
devices exhibiting similar power consumption, and simul-
taneous switching on/off of multiple devices. Therefore en-
ergy disaggregation has been an active area for the appli-
cation of artificial intelligence and machine learning tech-
niques. Popular approaches have been based on factorial
hidden Markov models (FHMM) (Kolter and Jaakkola 2012;
Parson et al. 2012; Zhong, Goddard, and Sutton 2013; 2014;
2015; Lange and Berg
´
es 2016) and signal processing meth-
ods (Pattem 2012; Zhao, Stankovic, and Stankovic 2015;
2016; Batra, Singh, and Whitehouse 2016; Tabatabaei, Dick,
and Xu 2017).
Recently, it has been shown that single-channel BSS
could be modelled by using sequence-to-sequence (seq2seq)
learning with neural networks (Grais, Sen, and Erdogan
2014; Huang et al. 2014; Du et al. 2016). In particular, it has
been applied to energy disaggregation (Kelly and Knotten-
belt 2015a) —both convolutional (CNN) and recurrent neu-
ral networks (RNN) were employed. The idea of sequence-
to-sequence learning is to train a deep network to map be-
tween an input sequence, such as the mains power readings
in the NILM problem, and an output sequence, such as the
power readings of a single appliance.
A difficulty immediately arises when applying seq2seq in
signal processing applications such as BSS. In these applica-
tions, the input and output sequences can be long, for exam-
ple, in one of our data sets, the input and output sequences
are 14,400 time steps. Such long sequences can make train-
ing both computationally difficult, both because of memory
limitations in current graphics processing units (GPUs) and,
with RNNs, because of the vanishing gradient problem. A
common way to avoid these problems is a sliding window
approach, that is, training the network to map a window of
the input signal to the corresponding window of the out-
put signal. However, this approach has several difficulties,
in that each element of the output signal is predicted many
times, once for each sliding window; an average of multiple
predictions is naturally used, which consequently smooths
the edges. Further, we expect that some of the sliding win-
dows will provide a better prediction of a single element
than others; particularly, those windows where the element
is near the midpoint of the window rather than the edges, so
that the network can make use of all nearby regions of the
input signal, past and future. But a simple sliding window
arXiv:1612.09106v3 [stat.AP] 18 Sep 2017