Python Machine Learning
11
Classification attempts to find the appropriate class label, such as analyzing
positive/negative sentiment, male and female persons, benign and malignant tumors,
secure and unsecure loans etc.
In supervised learning, learning data comes with description, labels, targets or desired
outputs and the objective is to find a general rule that maps inputs to outputs. This kind
of learning data is called labeled data. The learned rule is then used to label new data
with unknown outputs.
Supervised learning involves building a machine learning model that is based on labeled
samples. For example, if we build a system to estimate the price of a plot of land or a
house based on various features, such as size, location, and so on, we first need to create
a database and label it. We need to teach the algorithm what features correspond to what
prices. Based on this data, the algorithm will learn how to calculate the price of real estate
using the values of the input features.
Supervised learning deals with learning a function from available training data. Here, a
learning algorithm analyzes the training data and produces a derived function that can be
used for mapping new examples. There are many supervised learning algorithms such
as Logistic Regression, Neural networks, Support Vector Machines (SVMs), and Naive
Bayes classifiers.
Common examples of supervised learning include classifying e-mails into spam and not-
spam categories, labeling webpages based on their content, and voice recognition.
Unsupervised Learning
Unsupervised learning is used to detect anomalies, outliers, such as fraud or defective
equipment, or to group customers with similar behaviors for a sales campaign. It is the
opposite of supervised learning. There is no labeled data here.
When learning data contains only some indications without any description or labels, it is
up to the coder or to the algorithm to find the structure of the underlying data, to discover
hidden patterns, or to determine how to describe the data. This kind of learning data is
called unlabeled data.
Suppose that we have a number of data points, and we want to classify them into several
groups. We may not exactly know what the criteria of classification would be. So, an
unsupervised learning algorithm tries to classify the given dataset into a certain number
of groups in an optimum way.
Unsupervised learning algorithms are extremely powerful tools for analyzing data and for
identifying patterns and trends. They are most commonly used for clustering similar input
into logical groups. Unsupervised learning algorithms include Kmeans, Random Forests,
Hierarchical clustering and so on.
Semi-supervised Learning
If some learning samples are labeled, but some other are not labeled, then it is semi-
supervised learning. It makes use of a large amount of unlabeled data for training and
a small amount of labeled data for testing. Semi-supervised learning is applied in cases
where it is expensive to acquire a fully labeled dataset while more practical to label a small
subset. For example, it often requires skilled experts to label certain remote sensing