
ABOUT THIS BOOK
xix
can use the library that comes with this book by writing only a few lines of code! More-
over, in order to ensure the longevity and maintenance of the source code, we’ve cre-
ated a new project dedicated to it, on the Google code site: http://code.google.com/
p/yooreeka/.
Roadmap
The book consists of seven chapters. The first chapter is introductory. Chapters 2
through 6 cover search, recommendations, groupings, classification, and the combi-
nation of classifiers, respectively. Chapter 7 brings together the material from the pre-
vious chapters, but it covers new ground in the context of a single application.
While you can find references from one chapter to the next, the material was writ-
ten in such a way that you can read chapters 1 through 5 on their own. Chapter 6
builds on chapter 5, so it would be hard to read it by itself. Chapter 7 also has depen-
dencies because it touches upon the material of the entire book.
Chapter 1 provides an overview of intelligent applications as well as several exam-
ples of their value. It provides a practical definition of intelligent web applications and
a number of design principles. It presents six broad categories of web applications
that can leverage the intelligent algorithms of this book. It also provides background
on the origins of the algorithms that we’ll present, and their relation with the fields of
artificial intelligence, machine learning, data mining, and soft computing. The chap-
ter concludes with a list of eight design pitfalls that occur frequently in practice.
Chapter 2 begins with a description of searching that relies on traditional informa-
tion retrieval techniques. It summarizes the traditional approach and paves the way
for searching beyond indexing, which includes the most celebrated link analysis algo-
rithm—PageRank. It also includes a section on improving the search results by
employing user click analysis. This technique learns the preferences of a user toward a
particular site or topic, and can be greatly enhanced and extended to include addi-
tional features.
Chapter 2 also covers the searching of documents that aren’t web pages by employing
a new algorithm, which we call DocRank. This algorithm has shown some promise, but
more importantly it demonstrates that the underlying mathematical theory of link anal-
ysis can be readily extended and studied in other contexts by careful modifications. This
chapter also covers some of the challenges that may arise in dealing with very large net-
works. Lastly, chapter 2 covers the issue of credibility and validation for search results.
Chapter 3 introduces the vital concepts of distance and similarity. It presents two
broad categories of techniques for creating recommendations—collaborative filtering
and the content-based approach. The chapter uses a virtual online music store as its
context for developing recommendations. It also presents two more general exam-
ples. The first is a hypothetical website that uses the Digg
API and retrieves the content
of our users, in order to recommend unseen articles to them. The second example
deals with movie recommendations and introduces the concept of data normaliza-
tion. In this chapter we also evaluate the accuracy of our recommendations based on
the root mean squared error.
Licensed to Deborah Christiansen <pedbro@gmail.com>