
DATADATA SCIENCE
Data Science at the Command Line
ISBN: 978-1-491-94785-2
US $39.99 CAN $41.99
“
The Unix philosophy
of simple tools, each
doing one job well, then
cleverly piped together,
is embodied by the
command line. Jeroen
expertly discusses how
to bring that philosophy
into your work in data
science, illustrating how
the command line is not
only the world of file input/
output, but also the world
of data manipulation,
exploration, and even
modeling.
”
—Chris H. Wiggins
Associate Professor in the Department of
Applied Physics and Applied Mathematics
at Columbia University and Chief Data
Scientist at The New York Times
“
This book explains how
to integrate common
data science tasks into
a coherent workflow. It's
not just about tactics for
breaking down problems,
it's also about strategies
for assembling the pieces
of the solution.
”
—John D. Cook
mathematical consultant
Twitter: @oreillymedia
facebook.com/oreilly
This hands-on guide demonstrates how the flexibility of the command line
can help you become a more efficient and productive data scientist. You’ll
learn how to combine small, yet powerful, command-line tools to quickly
obtain, scrub, explore, and model your data.
To get you started—whether you’re on Windows, OS X, or Linux—author
Jeroen Janssens has developed the Data Science Toolbox, an easy-to-
install virtual environment packed with over 80 command-line tools.
Discover why the command line is an agile, scalable, and extensible
technology. Even if you’re already comfortable processing data with, say,
Python or R, you’ll greatly improve your data science workflow by also
leveraging the power of the command line.
■ Obtain data from websites, APIs, databases, and spreadsheets
■ Perform scrub operations on text, CSV, HTML/XML, and JSON
■ Explore data, compute descriptive statistics, and create
visualizations
■ Manage your data science workow
■ Create reusable command-line tools from one-liners and
existing Python or R code
■ Parallelize and distribute data-intensive pipelines
■ Model data with dimensionality reduction, clustering,
regression, and classication algorithms
Jeroen Janssens, a Senior Data Scientist at YPlan in New York, specializes in
machine learning, anomaly detection, and data visualization. He holds an MSc in
Artificial Intelligence from Maastricht University and a PhD in Machine Learning
from Tilburg University. Jeroen is passionate about building open source tools for
data science.
Jeroen Janssens
Data
Science
at the
Command Line
FACING THE FUTURE WITH TIME-TESTED TOOLS
Data Science at the Command Line
Janssens
评论1