PREFACE
xviii
data was not assumed to be uniformly spaced in time, and they covered more algo-
rithms but with less rigor. I later realized that similar methods were also being taught
in the economics, electrical engineering, and computer science departments.
In early 2009, I graduated and moved to Silicon Valley to start work as a software
consultant. Over the next two years, I worked with eight companies on a very wide
range of technologies and saw two trends emerge which make up the major thesis for
this book: first, in order to develop a compelling application you need to do more
than just connect data sources; and second, employers want people who understand
theory and can also program.
A large portion of a programmer’s job can be compared to the concept of connect-
ing pipes—except that instead of pipes, programmers connect the flow of data—and
monstrous fortunes have been made doing exactly that. Let me give you an example.
You could make an application that sells things online—the big picture for this would
be allowing people a way to post things and to view what others have posted. To do this
you could create a web form that allows users to enter data about what they are selling
and then this data would be shipped off to a data store. In order for other users to see
what a user is selling, you would have to ship the data out of the data store and display
it appropriately. I’m sure people will continue to make money this way; however to
make the application really good you need to add a level of intelligence. This intelli-
gence could do things like automatically remove inappropriate postings, detect fraud-
ulent transactions, direct users to things they might like, and forecast site traffic. To
accomplish these objectives, you would need to apply machine learning. The end user
would not know that there is magic going on behind the scenes; to them your applica-
tion “just works,” which is the hallmark of a well-built product.
An organization may choose to hire a group of theoretical people, or “thinkers,”
and a set of practical people, “doers.” The thinkers may have spent a lot of time in aca-
demia, and their day-to-day job may be pulling ideas from papers and modeling them
with very high-level tools or mathematics. The doers interface with the real world by
writing the code and dealing with the imperfections of a non-ideal world, such as
machines that break down or noisy data. Separating thinkers from doers is a bad idea
and successful organizations realize this. (One of the tenets of lean manufacturing is
for the thinkers to get their hands dirty with actual doing.) When there is a limited
amount of money to be spent on hiring, who will get hired more readily—the thinker
or the doer? Probably the doer, but in reality employers want both. Things need to get
built, but when applications call for more demanding algorithms it is useful to have
someone who can read papers, pull out the idea, implement it in real code, and iterate.
I didn’t see a book that addressed the problem of bridging the gap between think-
ers and doers in the context of machine learning algorithms. The goal of this book is
to fill that void, and, along the way, to introduce uses of machine learning algorithms
so that the reader can build better applications.