Preface
Being a data scientist in the tech industry is one of the most rewarding careers on the planet
today. I went and studied actual job descriptions for data scientist roles at tech companies
and I distilled those requirements down into the topics that you'll see in this course.
Hands-On Data Science and Python Machine Learning is really comprehensive. We'll start with
a crash course on Python and do a review of some basic statistics and probability, but then
we're going to dive right into over 60 topics in data mining and machine learning. That
includes things such as Bayes' theorem, clustering, decision trees, regression analysis,
experimental design; we'll look at them all. Some of these topics are really fun.
We're going to develop an actual movie recommendation system using actual user movie
rating data. We're going to create a search engine that actually works for Wikipedia data.
We're going to build a spam classifier that can correctly classify spam and nonspam emails
in your email account, and we also have a whole section on scaling this work up to a cluster
that runs on big data using Apache Spark.
If you're a software developer or programmer looking to transition into a career in data
science, this course will teach you the hottest skills without all the mathematical notation
and pretense that comes along with these topics. We're just going to explain these concepts
and show you some Python code that actually works that you can dive in and mess around
with to make those concepts sink home, and if you're working as a data analyst in the
finance industry, this course can also teach you to make the transition into the tech
industry. All you need is some prior experience in programming or scripting and you
should be good to go.
The general format of this book is I'll start with each concept, explaining it in a bunch of
sections and graphical examples. I will introduce you to some of the notations and fancy
terminologies that data scientists like to use so you can talk the same language, but the
concepts themselves are generally pretty simple. After that, I'll throw you into some actual
Python code that actually works that we can run and mess around with, and that will show
you how to actually apply these ideas to actual data. These are going to be presented as
IPython Notebook files, and that's a format where I can intermix code and notes
surrounding the code that explain what's going on in the concepts. You can take these
notebook files with you after going through this book and use that as a handy-quick
reference later on in your career, and at the end of each concept, I'll encourage you to
actually dive into that Python code, make some modifications, mess around with it, and just
gain more familiarity by getting hands-on and actually making some modifications, and
seeing the effects they have.