Python数据分析实践指南（第二版）

需积分: 10 141 浏览量更新于2024-07-19 收藏 8.94MB PDF 举报

Python for Data Analysis 2nd Edition Python for Data Analysis 2nd Edition 是一本由 Wes McKinney 撰写的数据分析书籍，于 2017 年 9 月出版。该书籍旨在帮助读者学习使用 Python 进行数据分析，特别是使用 Pandas、NumPy 和 IPython 等库。 **数据分析基础** 在本书中，作者 Wes McKinney 将为读者介绍数据分析的基础知识，包括数据的收集、清洁、转换和可视化。读者将学习如何使用 Pandas 库来处理结构化数据，并使用 NumPy 库来进行数值计算。 **Pandas 库** Pandas 库是 Python 中最流行的数据分析库之一，提供了高效的数据处理和分析功能。读者将学习如何使用 Pandas 库来读取、写入和处理大型数据集，包括数据的合并、分组和排序等操作。 **NumPy 库** NumPy 库是 Python 中最流行的数值计算库之一，提供了高效的矩阵运算和数值分析功能。读者将学习如何使用 NumPy 库来进行数值计算，包括矩阵乘法、 Eigen 值分解和 singular 值分解等操作。 **IPython 库** IPython 库是一个交互式计算环境，提供了高效的代码编辑和执行功能。读者将学习如何使用 IPython 库来进行交互式计算和数据可视化。 **数据可视化** 在本书中，作者还将介绍如何使用 matplotlib 库来进行数据可视化，包括绘制图表、散点图和柱状图等操作。 **实践案例** 本书中提供了许多实践案例，展示了如何使用 Python 库来解决实际数据分析问题，包括数据清洁、数据转换、数据可视化等操作。 **目标读者** 本书适合初学 Python 语言的数据分析师和科学计算从业者，也适合已经熟悉 Python 语言的数据分析师和科学计算从业者。 **总结** Python for Data Analysis 2nd Edition 是一本非常实用的数据分析书籍，提供了完整的数据分析解决方案，涵盖了数据的收集、清洁、转换、可视化和分析等方面。读者将学习如何使用 Python 库来进行数据分析，并掌握实际数据分析技能。

To comment or ask technical questions about this book, send email to bookques‐

tions@oreilly.com.

For more information about our books, courses, conferences, and news, see our web‐

site at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

This work is the product of many years of fruitful discussions, collaborations, and

assistance with and from many people around the world. I’d like to thank a few of

them.

In Memoriam: John D. Hunter (1968–2012)

Our dear friend and colleague John D. Hunter passed away after a battle with colon

cancer on August 28, 2012. This was only a short time after I’d completed the final

manuscript for this book’s first edition.

John’s impact and legacy in the Python scientific and data communities would be

hard to overstate. In addition to developing matplotlib in the early 2000s (a time

when Python was not nearly so popular), he helped shape the culture of a critical gen‐

eration of open source developers who’ve become pillars of the Python ecosystem that

we now often take for granted.

I was lucky enough to connect with John early in my open source career in January

2010, just after releasing pandas 0.1. His inspiration and mentorship helped me push

forward, even in the darkest of times, with my vision for pandas and Python as a

first-class data analysis language.

John was very close with Fernando Pérez and Brian Granger, pioneers of IPython,

Jupyter, and many other initiatives in the Python community. We had hoped to work

on a book together, the four of us, but I ended up being the one with the most free

time. I am sure he would be proud of what we’ve accomplished, as individuals and as

a community, over the last five years.

Acknowledgments for the Second Edition (2017)

It has been five years almost to the day since I completed the manuscript for this

book’s first edition in July 2012. A lot has changed. The Python community has

grown immensely, and the ecosystem of open source software around it has

flourished.

xiv | Preface

This new edition of the book would not exist if not for the tireless efforts of the pan‐

das core developers, who have grown the project and its user community into one of

the cornerstones of the Python data science ecosystem. These include, but are not

limited to, Tom Augspurger, Joris van den Bossche, Chris Bartak, Phillip Cloud,

gfyoung, Andy Hayden, Masaaki Horikoshi, Stephan Hoyer, Adam Klein, Wouter

Overmeire, Jeff Reback, Chang She, Skipper Seabold, Jeff Tratner, and y-p.

On the actual writing of this second edition, I would like to thank the O’Reilly staff

who helped me patiently with the writing process. This includes Marie Beaugureau,

Ben Lorica, and Colleen Toporek. I again had outstanding technical reviewers with

Tom Augpurger, Paul Barry, Hugh Brown, Jonathan Coe, and Andreas Müller contri‐

buting. Thank you.

This book’s first edition has been translated into many foreign languages, including

Chinese, French, German, Japanese, Korean, and Russian. Translating all this content

and making it available to a broader audience is a huge and often thankless effort.

Thank you for helping more people in the world learn how to program and use data

analysis tools.

I am also lucky to have had support for my continued open source development

efforts from Cloudera and Two Sigma Investments over the last few years. With open

source software projects more thinly resourced than ever relative to the size of user

bases, it is becoming increasingly important for businesses to provide support for

development of key open source projects. It’s the right thing to do.

Acknowledgments for the First Edition (2012)

It would have been difficult for me to write this book without the support of a large

number of people.

On the O’Reilly staff, I’m very grateful for my editors, Meghan Blanchette and Julie

Steele, who guided me through the process. Mike Loukides also worked with me in

the proposal stages and helped make the book a reality.

I received a wealth of technical review from a large cast of characters. In particular,

Martin Blais and Hugh Brown were incredibly helpful in improving the book’s exam‐

ples, clarity, and organization from cover to cover. James Long, Drew Conway, Fer‐

nando Pérez, Brian Granger, Thomas Kluyver, Adam Klein, Josh Klein, Chang She,

and Stéfan van der Walt each reviewed one or more chapters, providing pointed feed‐

back from many different perspectives.

I got many great ideas for examples and datasets from friends and colleagues in the

data community, among them: Mike Dewar, Jeff Hammerbacher, James Johndrow,

Kristian Lum, Adam Klein, Hilary Mason, Chang She, and Ashley Williams.

Preface | xv

into a structured form. As an example, a collection of news articles could be pro‐

cessed into a word frequency table, which could then be used to perform sentiment

analysis.

Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely

used data analysis tool in the world, will not be strangers to these kinds of data.

1.2 Why Python for Data Analysis?

For many people, the Python programming language has strong appeal. Since its first

appearance in 1991, Python has become one of the most popular interpreted pro‐

gramming languages, along with Perl, Ruby, and others. Python and Ruby have

become especially popular since 2005 or so for building websites using their numer‐

ous web frameworks, like Rails (Ruby) and Django (Python). Such languages are

often called scripting languages, as they can be used to quickly write small programs,

or scripts to automate other tasks. I don’t like the term “scripting language,” as it car‐

ries a connotation that they cannot be used for building serious software. Among

interpreted languages, for various historical and cultural reasons, Python has devel‐

oped a large and active scientific computing and data analysis community. In the last

10 years, Python has gone from a bleeding-edge or “at your own risk” scientific com‐

puting language to one of the most important languages for data science, machine

learning, and general software development in academia and industry.

For data analysis and interactive computing and data visualization, Python will inevi‐

tably draw comparisons with other open source and commercial programming lan‐

guages and tools in wide use, such as R, MATLAB, SAS, Stata, and others. In recent

years, Python’s improved support for libraries (such as pandas and scikit-learn) has

made it a popular choice for data analysis tasks. Combined with Python’s overall

strength for general-purpose software engineering, it is an excellent option as a pri‐

mary language for building data applications.

Python as Glue

Part of Python’s success in scientific computing is the ease of integrating C, C++, and

FORTRAN code. Most modern computing environments share a similar set of legacy

FORTRAN and C libraries for doing linear algebra, optimization, integration, fast

Fourier transforms, and other such algorithms. The same story has held true for

many companies and national labs that have used Python to glue together decades’

worth of legacy software.

Many programs consist of small portions of code where most of the time is spent,

with large amounts of “glue code” that doesn’t run often. In many cases, the execution

time of the glue code is insignificant; effort is most fruitfully invested in optimizing

2 | Chapter 1: Preliminaries

剩余540页未读，继续阅读

平淡从容

粉丝: 13

Python数据分析实践指南（第二版）

Python经典：《Python for Data Analysis, 2nd Edition》全面指南

大数据分析入门经典教程：Python for Data Analysis 第二版

Python数据分析第二版：Pandas, NumPy与IPython实战

Python for Data Analysis 2nd Edition-附相关代码

Python Data Analysis(2nd Edition) -- Code

Python for Data Analysis 2nd Edition

Python for Data Analysis 2nd Edition pdf

Python Data Analysis, 2nd Edition-Packt Publishing(2017).epub

Python for Data Analysis 2nd Edition（源代码+数据+高清新版）（是新的python3电子版，不是扫描的。）

Python for Data Analysis, 2nd Edition

最新资源