[ ix ]
Chapter 6, Naive Bayes and Sentimental Analysis, explains a probabilistic machine
learning model called Naive Bayes and also briey explains another popular model
called the support vector machine. The chapter starts with basic concepts such as
Bayes Theorem and then explains how these concepts are used in Naive Bayes.
I then use the model to predict the sentiment whether positive or negative in a set
of tweets from Twitter. The same case study is then re-run using the support vector
machine model.
Chapter 7, Decision Trees, explains that decision trees are like owcharts and can be
programmatically built using concepts such as Entropy or Gini Impurity. The golden
egg in this chapter is a case study that shows how we can predict whether a person's
loan application will be approved or not using decision trees.
Chapter 8, Ensembling on Big Data, explains how ensembling plays a major role in
improving the performance of the predictive results. I cover different concepts
related to ensembling in this chapter, including techniques such as how multiple
models can be joined together using bagging or boosting thereby enhancing the
predictive outputs. We also cover the highly popular and accurate ensemble of
models, random forests and gradient-boosted trees. Finally, we predict loan default
by users in a dataset of a real-world Lending Club (a real online lending company)
using these models.
Chapter 9, Recommendation Systems, covers the particular concept that has made
machine learning so popular and it directly impacts business as well. In this chapter,
we show what recommendation systems are, what they can do, and how they are
built using machine learning. We cover both types of recommendation systems:
content-based and collaborative, and also cover their good and bad points. Finally,
we cover two case studies using the MovieLens dataset to show recommendations to
users for movies that they might like to see.
Chapter 10, Clustering and Customer Segmentation on Big Data, speaks about clustering
and how it can be used by a real-world e-commerce store to segment their customers
based on how valuable they are. I have covered both k-Means clustering and
bisecting k-Means clustering, and used both of them in the corresponding case
study on customer segmentation.
Chapter 11, Massive Graphs on Big Data, covers an interesting topic, graph analytics.
We start with a refresher on graphs, with basic concepts, and later go on to explore
the different forms of analytics that can be run on the graphs, whether path-based
analytics involving algorithms such as breadth-rst search, or connectivity analytics
involving degrees of connection. A real-world ight dataset is then used to explore
the different forms of graph analytics, showing analytical concepts such as nding
top airports using the page rank algorithm.