Introduction 5
(such as an individual’s salary or the price of a house) and models that
are used for classification. Classification models are very common in
machine learning. As an example, we will later look at an application of
machine learning where potential borrowers are classified as accepta-
ble or unacceptable credit risks.
Unsupervised learning is concerned with recognizing patterns in da-
ta. The main objective is not to forecast a particular variable. Rather it is
to understand the environment represented by the data better. Consid-
er a company that markets a range of products to consumers. Data on
consumer purchases could be used to determine the characteristics of
the customers who buy different products. This in turn could influence
the way the products are advertised. As we will see in Chapter 2, clus-
tering is the main tool used in unsupervised learning.
The data for supervised learning contains what are referred to as
features and labels. The labels are the values of the target that is to be
predicted. The features are the variables from which the predictions are
to be made. For example, when predicting the price of a house the fea-
tures could be the square feet of living space, the number of bedrooms,
the number of bathrooms, the size of the garage, whether the basement
is finished, and so on. The label would be the house price. The data for
unsupervised learning consists of features but no labels because the
model is being used to identify patterns, not to forecast something. We
could use an unsupervised learning model to understand the houses
that exist in a certain neighborhood without trying to predict prices. We
might find that there is a cluster of houses with 1,500 to 2,000 square
feet of living space, three bedrooms, and a one-car garage and another
cluster of houses with 5,000 to 6,000 square feet of living area, six bed-
rooms, and a two-car garage.
Semi-supervised learning is a cross between supervised and un-
supervised learning. It arises when we are trying to predict something
and we have some data with labels (i.e., values for the target) and some
(usually much more) unlabeled data. It might be thought that the unla-
beled data is useless, but this is not necessarily the case. The unlabeled
data can be used in conjunction with the labeled data to produce clus-
ters which help prediction. For example, suppose we are interested in
predicting whether a customer will purchase a particular product from
features such as age, income level, and so on. Suppose further that we
have a small amount of labeled data (i.e., data which indicates the fea-
tures of customers as well as whether they bought or did not buy the
product) and a much larger amount of unlabeled data (i.e., data which
indicates the features of potential customers, but does not indicate
whether they bought the product). We can apply unsupervised learning