Chapter 4, Basics of Machine Learning, helps you to understand the basic theoretical
concepts behind machine learning, such as what exactly is machine learning, how it is used,
examples of its use in real life, and the different forms of machine learning. If you are new to
the field of machine learning, or want to brush up your existing knowledge on it, this chapter
is for you. Here I will also show how, as a developer, you should approach a machine
learning problem, including topics on feature extraction, feature selection, model testing,
model selection, and more.
Chapter 5, Regression on Big Data, explains how you can use linear regression to predict
continuous values and how you can do binary classification using logistic regression. A real-
world case study of house price evaluation based on the different features of the house is used
to explain the concepts of linear regression. To explain the key concepts of logistic
regression, a real-life case study of detecting heart disease in a patient based on different
features is used.
Chapter 6, Naive Bayes and Sentimental Analysis, explains a probabilistic machine learning
model called Naive Bayes and also briefly explains another popular model called the support
vector machine. The chapter starts with basic concepts such as Bayes Theorem and then
explains how these concepts are used in Naive Bayes. I then use the model to predict the
sentiment whether positive or negative in a set of tweets from Twitter. The same case study is
then re-run using the support vector machine model.
Chapter 7, Decision Trees, explains that decision trees are like flowcharts and can be
programmatically built using concepts such as Entropy or Gini Impurity. The golden egg in
this chapter is a case study that shows how we can predict whether a person's loan application
will be approved or not using decision trees.
Chapter 8, Ensembling on Big Data, explains how ensembling plays a major role in
improving the performance of the predictive results. I cover different concepts related to
ensembling in this chapter, including techniques such as how multiple models can be joined
together using bagging or boosting thereby enhancing the predictive outputs. We also cover
the highly popular and accurate ensemble of models, random forests and gradient-boosted
trees. Finally, we predict loan default by users in a dataset of a real-world Lending Club (a
real online lending company) using these models.
Chapter 9, Recommendation Systems, covers the particular concept that has made machine
learning so popular and it directly impacts business as well. In this chapter, we show what
recommendation systems are, what they can do, and how they are built using machine
learning. We cover both types of recommendation systems: content-based and collaborative,
and also cover their good and bad points. Finally, we cover two case studies using the
MovieLens dataset to show recommendations to users for movies that they might like to see.
Chapter 10, Clustering and Customer Segmentation on Big Data, speaks about clustering and
how it can be used by a real-world e-commerce store to segment their customers based on
how valuable they are. I have covered both k-Means clustering and bisecting k-Means
clustering, and used both of them in the corresponding case study on customer segmentation.
Chapter 11, Massive Graphs on Big Data, covers an interesting topic, graph analytics. We
start with a refresher on graphs, with basic concepts, and later go on to explore the different
forms of analytics that can be run on the graphs, whether path-based analytics involving