to see unfiltered and undetected spam in their inboxes. These results demonstrate an
enormous advance over the simplistic spam filtering techniques developed in the
early days of the internet, which made use of simple word filtering and email meta‐
data reputation to achieve modest results.
The fundamental lesson that both researchers and practitioners have taken away from
this battle is the importance of using data to defeat malicious adversaries and improve
the quality of our interactions with technology. Indeed, the story of spam fighting
serves as a representative example for the use of data and machine learning in any
field of computer security. Today, almost all organizations have a critical reliance on
technology, and almost every piece of technology has security vulnerabilities. Driven
by the same core motivations as the spammers from the 1980s (unregulated, cost-free
access to an audience with disposable income and private information to offer), mali‐
cious actors can pose security risks to almost all aspects of modern life. Indeed, the
fundamental nature of the battle between attacker and defender is the same in all
fields of computer security as it is in spam fighting: a motivated adversary is con‐
stantly trying to misuse a computer system, and each side races to fix or exploit the
flaws in design or technique before the other uncovers it. The problem statement has
not changed one bit.
Computer systems and web services have become increasingly centralized, and many
applications have evolved to serve millions or even billions of users. Entities that
become arbiters of information are bigger targets for exploitation, but are also in the
perfect position to make use of the data and their user bases to achieve better security.
Coupled with the advent of powerful data crunching hardware and the development
of more powerful data analysis and machine learning algorithms, there has never
been a better time for exploiting the potential of machine learning in security.
In this book, we demonstrate applications of machine learning and data analysis tech‐
niques to various problem domains in security and abuse. We explore methods for
evaluating the suitability of different machine learning techniques in different scenar‐
ios, and focus on guiding principles that will help you use data to achieve better secu‐
rity. Our goal is not to leave you with the answer to every security problem you might
face, but rather to give you a framework for thinking about data and security as well
as a toolkit from which you can pick the right method for the problem at hand.
The remainder of this chapter sets up context for the rest of the book: we discuss
what threats modern computer and network systems face, what machine learning is,
and how machine learning applies to the aforementioned threats. We conclude with a
detailed examination of approaches to spam fighting, which provides a concrete
example of applying machine learning to security that can be generalized to nearly
any domain.
2 | Chapter 1: Why Machine Learning and Security?