Acknowledgments
This book is the result of the efforts of m any individuals. By c onvention, authors rec eive explic it c redit, and get to
have their nam es printed on the book c over. But c reating this book would not have been possible without a lot of
hard work behind the sc enes. We, the authors, would like to express our gratitude to a num ber of people that
provided substantial c ontributions, and thus help define and shape the final result that is Pentaho Kettle Solutions.
First, we’d like to thank those individuals that c ontributed direc tly to the m aterial that appears in the book:
Ingo Klose suggested an elegant solution to generate keys starting from a given offset within a single
transformation (this solution is disc ussed in Chapter 8, “Handling Dimension Tables,” subsec tion
“Generating Surrogate Keys Based on a Counter,” shown in Figure 8-2).
Samatar Hassan provided text as well as working example transformations to dem onstrate Kettle’s RSS
c apabilities. Sam atar’s c ontribution is inc luded almost c om pletely and appears in the RSS sec tion of
Chapter 21, “Web Servic es.”
Thanks to Mike Hillyer and the M ySQL doc um entation team for c reating and m aintaining the Sakila
sample database, whic h is introduc ed in Chapter 4 and appears in many exam ples throughout this book.
Although only three authors appear on the c over, there was ac tually a fourth one: We c annot thank Kasper
de Graaf of DIKW-Ac ademy enough for writing the Data Vault c hapter, whic h has benefited greatly from his
deep expertise on this subjec t. Spec ial thanks also to Johannes van den Bosc h who did a great job
reviewing Kasper’s work and gave another boost to the overall quality and c larity of the c hapter.
Thanks to Bernd Asc hauer and Robert Wintner, both from Asc hauer EDV (http://www.asc hauer-edv.at/en), for
providing the exam ples and sc reenshots used in the sec tion dedic ated to SAP of Chapter 6, “Data
Extrac tion.”
Daniel Einspanjer of the M ozilla Foundation provided sample transformations for Chapter 7, “Cleansing
and Conforming.”
Thanks for your c ontributions. T his book benefited substantially from your efforts.
Muc h gratitude goes out to all of our tec hnic al reviewers. Providing a good tec hnic al review is hard and tim e-
c onsuming, and we have been very luc ky to find a c ollec tion of suc h talented and seasoned Pentaho and Kettle
experts willing to find some tim e in their busy sc hedules to provide us with the kind of quality review required to
write a book of this size and sc ope.
We’d like to thank the Kettle and Pentaho c om m unities. During and before the writing of this book, individuals
from these c om munities provided valuable suggestions and ideas to all three authors for topic s to c over in a book
that foc uses on ET L, data integration, and Kettle. We hope this book will be useful and prac tic al for everybody who
is using or planning to use Kettle. Whether we suc c eeded is up to the reader, but if we did, we have to thank
individuals in the Kettle and Pentaho c ommunities for helping us ac hieve it.
We owe m any thanks to all c ontributors and developers of the Kettle software projec t. T he authors are all
enthusiastic users of Kettle: we love it, bec ause it solves our daily data integration problems in a straightforward and
effic ient manner without getting in the way. Kettle is a joy to work with, and this is what provided m uc h of the drive
to write this book.
Finally, we’d like to thank our publisher, Wiley, for giving us the opportunity to write this book, and for the
exc ellent support and management from their end. In partic ular, we’d like to thank our Projec t Editor, Sara Shlaer.
Despite the often delayed deliveries from our end, Sara always kept her c ool and somehow m anaged to m ake
deadlines work out. Her advic e, patienc e, enc ouragement, c are, and sense of humor m ade all the differenc e and
form an important c ontribution to this book. In addition, we’d like to thank our Exec utive Editor Robert Elliot. We
apprec iate the trust he put into our sm all team of authors to do our job, and his efforts to realize Pentaho Kettle
Solutions.
—The authors
Writing a tec hnic al book like the one you are reading right now is very hard to do all by yourself. Bec ause of the
extrem ely busy agenda c aused by the release proc ess of Kettle 4, I probably should never have agreed to c o-
author. It’s only thanks to the dedic ation and professionalism of Jos and Roland that we m anaged to write this book
at all. I thank both friends very m uc h for their invitation to c o-author. Even though writing a book is a hard and
painful proc ess, working with Jos and Roland m ade it all worthwhile.
When Kettle was not yet released as open source c ode it often rec eived a lukewarm reac tion. T he reason was
that nobody was really waiting for yet another c losed sourc e ETL tool. Kettle c am e from that position to being the
most widely deployed open source ET L tool in the world. T his happened only thanks to the thousands of volunteers
who offered to help out with various tasks. Ever sinc e Kettle was open sourc ed it bec am e a projec t with an every
growing c ommunity. It’s im possible to thank this c om m unity enough. Without the help of the developers, the
translators, the testers, the bug reporters, the folks who partic ipate in the forums, the people with the great ideas,