xviii PrefaCe
e included scripts will call upon a few programming skills, in either Perl, Python,
or Ruby. You should know the basic syntax of a language, the minimum structural
requirements for a script, how command lines are written, how iterating loops are
structured, how files are opened, read, and written, how values can be assigned to and
retrieved from data structures, how simple regular expressions are interpreted, and
how scripts are launched. e scripts are written in a style that sacrifices elegance for
readability. If your knowledge of Perl, Python, or Ruby is shaky, there are numerous
beginner-level books, and many Web-based tutorials for each of these languages.
e book is divided into four parts: Part I—Fundamental Algorithms and Methods
of Medical Informatics; Part II—Medical Data Resources; Part III—Primary
Tasks of Medical Informatics; and Part IV—Medical Discovery.
Part I—Fundamental Algorithms and Methods of Medical Informatics
(Chapters 1 to 4) provides simple methods for viewing text and image files, and for
parsing through large data sets line by line, retrieving, counting, and indexing selected
items. e primary purpose of these chapters is to introduce the basic computational
subroutines that are used in more complex scripts later in the book. e secondary
purpose of these chapters is to demonstrate that Perl, Python, and Ruby are quite
similar to one another, and provide equivalent functionality.
Part II—Medical Data Resources (Chapters 5 to 13) demonstrates uses of some
freely available biomedical data sets. ese data sets have cost hundreds of millions
of dollars to assemble, yet many healthcare workers are unaware of their enormous
clinical value. In these chapters, you will learn the intended uses of data sets, how the
data sets are organized, and how you can select, retrieve, and analyze information from
the files.
Part III—Primary Tasks of Medical Informatics (Chapters 14 to 18) covers some
of the computational methods of biomedical informatics, including autocoding, data
scrubbing, and data deidentification.
A good question is hard to find. Part IV—Medical Discovery (Chapters 19 through
27) provides examples of the kinds of questions that biomedical scientists can ask and
answer with public data and open source programming languages. In these chapters,
we combine methods developed in the earlier chapters, using freely available data
sources to answer specific questions or to develop new medical hypotheses. Many of
the informatics projects that you will use in your biomedical career can be completed
with the basic methods and implementations described in these chapters.
is book is intended to be used as a textbook in medical informatics courses.
Because the methods in the book are generalized, the book will also serve as a con-
venient reference source of script snippets that can be freely used by students and pro-
fessionals. e scripts are written in a syntax appropriate for the most current popular
version of Perl, Python, or Ruby, and based on the availability of about a dozen large,
public data sets, each with a consistent data structure. Over time, programming lan-
guages change; the availability, Internet location, and organization of the large public