动手学数据科学：Python实现基础原理

需积分: 14 10 浏览量更新于2024-07-18 1 收藏 5.6MB PDF 举报

"《Data Science from Scratch : First Principles with Python》是Joel Grus撰写的一本书，旨在帮助读者深入理解数据科学的基础知识，通过从零开始实现数据科学工具和算法来学习。书中涵盖了Python编程、线性代数、统计学和概率论的基础，并教授如何处理和分析数据，涉及机器学习的基本概念，包括k-近邻算法、朴素贝叶斯、线性回归、逻辑回归、决策树、神经网络和聚类等。此外，还探讨了推荐系统、自然语言处理、网络分析、MapReduce和数据库等相关主题。" 在数据科学领域，掌握基础知识至关重要。这本书首先引导读者熟悉Python这一广泛用于数据科学的编程语言。Python因其简洁的语法和丰富的库而成为数据科学的首选工具，对于初学者来说是理想的起点。接着，作者讲解了线性代数，这是理解许多高级数据科学概念（如矩阵运算和特征向量）的基础。同时，统计学和概率论是数据科学的核心，它们帮助我们理解数据的分布、关联性和随机性，以及如何基于数据进行推断。书中还涵盖了数据预处理的步骤，包括数据收集、探索、清洗、整理和操纵，这些都是实际数据分析项目中不可或缺的部分。数据清洗尤其重要，因为真实世界的数据往往存在缺失值、异常值和不一致性，需要经过处理才能用于后续分析。在机器学习部分，作者介绍了多种常用模型，例如k-Nearest Neighbors (KNN) 是一种基于实例的学习，适用于分类和回归任务；Naive Bayes 则基于贝叶斯定理，常用于文本分类；线性回归和逻辑回归则分别用于连续变量和二分类问题的预测；决策树是一种易于理解和解释的模型，适用于分类和回归任务；神经网络则为复杂问题提供了强大的模型能力；而聚类算法则用于无监督学习，将数据集划分为相似的组。除此之外，书中还涉及了推荐系统，这是大数据和个性化服务的关键技术，用于预测用户可能感兴趣的内容。自然语言处理（NLP）使计算机能够理解和生成人类语言，这对于文本分析和情感分析等领域至关重要。网络分析则关注节点和边构成的关系网络，可用于社交网络、信息传播等领域。MapReduce是大数据处理的一种分布式计算模型，常与Hadoop配合使用，处理大规模数据集。最后，数据库章节将介绍如何存储和查询大量数据，包括关系型数据库和NoSQL数据库的应用。《Data Science from Scratch》是一本适合有一定数学基础和编程经验的学习者入门数据科学的书籍，它不仅讲解了理论知识，还提供了实际动手实践的机会，帮助读者建立起扎实的数据科学基础。通过阅读和实践，读者将具备挖掘数据背后信息的能力，为成为一名合格的数据科学家做好准备。

This element signifies a tip or suggestion.

This element signifies a general note.

This element indicates a warning or caution.

Using Code Examples

Supplemental material (code examples, exercises, etc.) is available for download at

https://github.com/joelgrus/data-science-from-scratch.

This book is here to help you get your job done. In general, if example code is offered

with this book, you may use it in your programs and documentation. You do not

need to contact us for permission unless you’re reproducing a significant portion of

the code. For example, writing a program that uses several chunks of code from this

book does not require permission. Selling or distributing a CD-ROM of examples

from O’Reilly books does require permission. Answering a question by citing this

book and quoting example code does not require permission. Incorporating a signifi‐

cant amount of example code from this book into your product’s documentation does

require permission.

We appreciate, but do not require, attribution. An attribution usually includes the

title, author, publisher, and ISBN. For example: “Data Science from Scratch by Joel

If you feel your use of code examples falls outside fair use or the permission given

above, feel free to contact us at permissions@oreilly.com.

Safari® Books Online

Safari Books Online is an on-demand digital library that deliv‐

ers expert content in both book and video form from the

world’s leading authors in technology and business.

xiv | Preface

matter how you define data science, you’ll find practitioners for whom the definition

is totally, absolutely wrong.

Nonetheless, we won’t let that stop us from trying. We’ll say that a data scientist is

someone who extracts insights from messy data. Today’s world is full of people trying

to turn data into insight.

For instance, the dating site OkCupid asks its members to answer thousands of ques‐

tions in order to find the most appropriate matches for them. But it also analyzes

these results to figure out innocuous-sounding questions you can ask someone to

find out how likely someone is to sleep with you on the first date.

Facebook asks you to list your hometown and your current location, ostensibly to

make it easier for your friends to find and connect with you. But it also analyzes these

locations to identify global migration patterns and where the fanbases of different

football teams live.

As a large retailer, Target tracks your purchases and interactions, both online and in-

store. And it uses the data to predictively model which of its customers are pregnant,

to better market baby-related purchases to them.

In 2012, the Obama campaign employed dozens of data scientists who data-mined

and experimented their way to identifying voters who needed extra attention, choos‐

ing optimal donor-specific fundraising appeals and programs, and focusing get-out-

the-vote efforts where they were most likely to be useful. It is generally agreed that

these efforts played an important role in the president’s re-election, which means it is

a safe bet that political campaigns of the future will become more and more data-

driven, resulting in a never-ending arms race of data science and data collection.

Now, before you start feeling too jaded: some data scientists also occasionally use

their skills for good—using data to make government more effective, to help the

homeless, and to improve public health. But it certainly won’t hurt your career if you

like figuring out the best way to get people to click on advertisements.

Motivating Hypothetical: DataSciencester

Congratulations! You’ve just been hired to lead the data science efforts at DataScien‐

cester, the social network for data scientists.

Despite being for data scientists, DataSciencester has never actually invested in build‐

ing its own data science practice. (In fairness, DataSciencester has never really inves‐

ted in building its product either.) That will be your job! Throughout the book, we’ll

be learning about data science concepts by solving problems that you encounter at

work. Sometimes we’ll look at data explicitly supplied by users, sometimes we’ll look

at data generated through their interactions with the site, and sometimes we’ll even

look at data from experiments that we’ll design.

2 | Chapter 1: Introduction

www.allitebooks.com

剩余329页未读，继续阅读

snake_master

粉丝: 0
资源: 1

动手学数据科学：Python实现基础原理

Data Science from Scratch First Principles with Python 无水印pdf

Data Science from Scratch First Principles with Python

Data Science from Scratch 原版PDF by Grus

有关python大数据分析技术的文献及其作者和出处

大数据软件技术的参考文献

data science from scratch中文

Install JDK on Ubuntu from scratch

如何利用Pygame库创建一个简单的游戏，让初学者体验从Scratch到Python的编程过渡？

python青少儿编程教程-青少年PYTHON编程入门

Traceback (most recent call last): File "D:/DEMO/pythonProject/scratch.py", line 19, in <module> import matplotlib.pyplot as plt

最新资源