Deep Neural Networks and Tabular Data: A Survey
3.2. Brief History of Deep Learning on Tabular
Data
Tabular data are the oldest form of data. Before digital
collection of text, images, and sound was possible, almost
all data were tabular. Therefore, it was the target of early
machine learning research. However, deep neural networks
became popular in the digital age and were further developed
with a focus on homogeneous data. In recent years, various
supervised, self-supervised, and semi-supervised deep learn-
ing approaches have been proposed that explicitly address
the issue of tabular data modeling again. Early works mostly
focused on data transformation techniques for preprocessing
(Giles et al., 1992; Horne and Giles, 1995; Willenborg and
De Waal, 1996), which are still important today (Hancock
and Khoshgoftaar, 2020).
A huge stimulus was the rise of e-commerce, which de-
manded novel solutions, especially in advertising (Richardson
et al., 2007; Guo et al., 2017). These tasks required fast
and accurate estimation on heterogeneous data sets with
many categorical variables, for which the traditional machine
learning approaches are not well suited (e.g., categorical
features that have high cardinality can lead to very sparse
high-dimensional feature vectors and non-robust models). As
a result, researchers and data scientists started looking for
more flexible solutions, e.g., based on deep neural networks,
that are able to capture complex non-linear dependencies in
the data.
In particular, the click-through rate prediction problem
has received a lot of attention (Guo et al., 2017; Ke et al.,
2019; Wang et al., 2021). A large variety of approaches
were proposed, most of them relying on specialized neural
network architectures for heterogeneous tabular data. The
most important methods for click-through rate estimation are
included in our survey.
A newer line of research evolved based on idea that
regularization may improve the performance of deep neural
networks on tabular data (Kadra et al., 2021). The idea
was sparked by Shavitt and Segal (2018), leading to an
intensification of research on regularization approaches.
Due to the tremendous success of attention-based ap-
proaches such as transformers on textual (Brown et al.,
2020) and visual data (Dosovitskiy et al., 2021; Khan et al.,
2021), researchers have started applying attention-based
methods and self-supervised learning techniques to tabular
data recently. After the first and the most influential work
by Arik and Pfister (2019) raised the reasearch interest,
transformers are quickly gaining popularity, especially for
large tabular data sets.
3.3. Challenges of Learning With Tabular Data
As mentioned above, deep neural networks are usually
inferior to more traditional (e.g. linear or tree-based) machine
learning methods when dealing with tabular data. However,
it is often unclear why deep learning cannot achieve the
same level of predictive quality as in other domains such
as image classification and natural language processing. In
the following, we identify and discuss four possible reasons:
1. Inappropriate Training Data:
The data quality is a
common issue for real-world tabular data sets. They
often include missing values (Sánchez-Morales et al.,
2020), extreme data (outliers) (Pang et al., 2021),
erroneous or inconsistent data (Karr et al., 2006), and
have small overall size relative to the high-dimensional
feature vectors generated from the data (Xu and Veera-
machaneni, 2018). Also, due to the expensive nature
of data collection, tabular data are frequently class-
imbalanced.
2. Missing or Complex Irregular Spatial Dependen-
cies:
There is often no spatial correlation between the
variables in tabular data sets (Zhu et al., 2021), or the
dependencies between features are rather complex and
irregular. Thus, the inductive biases used in popular
models for homogeneous data, such as convolutional
neural networks, are unsuitable for modeling this data
type (Katzir et al., 2021; Rahaman et al., 2019; Mitchell
et al., 2017).
3. Extensive Preprocessing:
One of the main challenges
when working with tabular data is how to handle cat-
egorical features (Hancock and Khoshgoftaar, 2020).
In most cases, the first step is to convert the categories
into a numerical representation, for example, using a
simple one-hot or ordinal encoding scheme. However,
as categorical features may be very sparse (a problem
known as curse of dimensionality), this can lead to a
very sparse feature matrix (using the one-hot encoding
scheme) or a synthetic alignment of unordered values
(using the ordinal encoding scheme). Hancock and
Khoshgoftaar (2020) have analyzed different embed-
ding techniques for categorical variables. Dealing with
categorical features is also one of the main aspects we
discuss in Section 4.
Applications that work with homogeneous data have
effectively used data augmentation (Perez and Wang,
2017), transfer learning (Tan et al., 2018) and test-
time augmentation (Shanmugam et al., 2020). For
heterogeneous tabular data, these techniques are often
difficult to apply. However, some frameworks for
learning with tabular data, such as VIME (Yoon et al.,
2020) and SAINT (Somepalli et al., 2021), use data
augmentation strategies in the embedding space.
Lastly, note that we often lose information with respect
to the original data when applying preprocessing meth-
ods for deep neural networks, leading to a reduction in
predictive performance (Fitkov-Norris et al., 2012).
4. Model Sensitivity:
Deep neural networks can be
extremely fragile to tiny perturbations of the input data
(Szegedy et al., 2013; Levy et al., 2020). The smallest
possible change of a categorical (or binary) feature
might already have a large impact on the prediction.
This is usually less problematic for homogeneous
(continuous) data sets.
In contrast to deep neural networks, decision-tree algo-
rithms can handle perturbations exceptionally well by
selecting a feature and threshold value and "ignoring"
V.Borisov et al.: Preprint submitted to Elsevier Page 4 of 19