没有合适的资源?快使用搜索试试~ 我知道了~
首页Python数据科学入门:全面剖析数据处理之旅
"《数据科学基础Python指南》是一本备受赞誉的入门级人工智能教材,全面介绍了使用Python进行数据分析的全流程。该书以实践为导向,每个练习都像一个有趣挑战,旨在提升读者的数据处理能力。对于初学者和有志于成为数据科学家的人来说,这无疑是一本必读之作。 作者彼得·哈蒙顿(Peter Hampton)来自乌尔斯特大学,他称赞这本书能够快速让读者掌握数据科学领域的常见任务和工具,包括数据抓取、清洗、分析和存储。它不仅提供了技术基础,还帮助读者将更多时间用于实际研究,而不是不断探索技术细节。 杰森·蒙托霍(Jason Montojo)作为《使用Python 3进行实用编程:计算机科学入门》的合著者,对这本书给出了高度评价。他指出,《数据科学基础Python》特别适合那些对解决问题充满好奇、热衷于数据发现的人,书中深入浅出地介绍了技术和工具,并通过精心设计的示例和习题,确保了其实践性和易读性。 洛克什·库马尔·马卡尼(Lokesh Kumar Makani),作为一家名为Skyhigh Networks的CASB专家,虽然在评论中没有提供具体内容,但他的认可表明这本书在专业领域内也受到了肯定。《数据科学基础Python》是数据科学学习者和从业人员的理想起点,无论是新手还是进阶者,都能从中获得扎实的基础知识和实用技能。"
资源详情
资源推荐
to this differentiation, in this book I, too, use single quotes for single characters
and double quotes for character strings.
The Book Forum
The community forum for this book can be found online at the Pragmatic
Programmers web page for this book.
5
There you can ask questions, post
comments, and submit errata.
Another great resource for questions and answers (not specific to this book)
is the newly created Data Science Stack Exchange forum.
6
Your Turn
The end of each chapter features a unit called “Your Turn.” This unit has
descriptions of several projects that you may want to accomplish on your own
(or with someone you trust) to strengthen your understanding of the material.
The projects marked with a single star
*
are the simplest. All you need to work
on them is solid knowledge of the functions mentioned in the preceding
chapters. Expect to complete single-star projects in no more than thirty
minutes. You’ll find solutions to them in Appendix 2, Solutions to Single-Star
Projects, on page 175.
The projects marked with two stars
**
are hard(er). They may take you an hour
or more, depending on your programming skills and habits. Two-star projects
involve the use of intermediate data structures and well thought-out algorithms.
Finally, the three-star
***
projects are the hardest. Some of the three-star projects
may not even have a perfect solution, so don’t get desperate if you cannot find
one! Just by working on these projects, you certainly make yourself a better
programmer and a better data scientist. And if you’re an educator, think of
the three-star projects as potential mid-semester assignments.
Now, let’s get started!
Dmitry Zinoviev
dzinoviev@gmail.com
August 2016
5.
pragprog.com/book/dzpyds
6.
datascience.stackexchange.com
Preface • xvi
report erratum • discuss
CHAPTER 1
It’s impossible to grasp the boundless.
➤
Kozma Prutkov, Russian author
What Is Data Science?
I’m sure you already have an idea about what data science is, but it never
hurts to remind! Data science is the discipline of the extraction of knowledge
from data. It relies on computer science (for data structures, algorithms,
visualization, big data support, and general programming), statistics (for
regressions and inference), and domain knowledge (for asking questions and
interpreting results).
Data science traditionally concerns itself with a number of dissimilar topics,
some of which you may be already familiar with and some of which you’ll
encounter in this book:
• Databases, which provide information storage and integration. You’ll find
information about relational databases and document stores in Chapter
4, Working with Databases, on page 47.
• Text analysis and natural language processing, which let us “compute
with words” by translating qualitative text into quantitative variables.
Interested in tools for sentiment analysis? Look no further than Unit 16,
Processing Texts in Natural Languages, on page 38.
• Numeric data analysis and data mining, which search for consistent pat-
terns and relationships between variables. These are the subjects of
Chapter 5, Working with Tabular Numeric Data, on page 63 and Chapter
6, Working with Data Series and Frames, on page 83.
• Complex network analysis, which is not complex at all. It is about complex
networks: collections of arbitrary interconnected entities. Chapter 7,
Working with Network Data, on page 121, makes complex network analysis
simpler.
• Data visualization, which is not just cute but is extremely useful, especially
when it comes to persuading your data sponsor to sponsor you again. If
report erratum • discuss
one picture is worth a thousand words, then Chapter 8, Plotting, on page
135, is worth the rest of the book.
• Machine learning (including clustering, decision trees, classification, and
neural networks), which attempts to get computers to “think” and make
predictions based on sample data. Chapter 10, Machine Learning, on page
157, explains how.
• Time series processing and, more generally, digital signal processing, which
are indispensable tools for stock market analysts, economists, and
researchers in audio and video domains.
• Big data analysis, which typically refers to the analysis of unstructured
data (text, audio, video) in excess of one terabyte, produced and captured
at high frequency. Big data is simply too big to fit in this book, too.
Regardless of the analysis type, data science is firstly science and only then
sorcery. As such, it is a process that follows a pretty rigorous basic sequence
that starts with data acquisition and ends with a report of the results. In this
chapter, you’ll take a look at the basic processes of data science: the steps
of a typical data analysis study, where to acquire data, and the structure of
a typical project report.
Chapter 1. What Is Data Science? • 2
report erratum • discuss
Unit 1
Data Analysis Sequence
The steps of a typical data analysis study are generally consistent with a
general scientific discovery sequence.
Your data science discovery starts with the question to be answered and the
type of analysis to be applied. The simplest analysis type is descriptive, where
the data set is described by reporting its aggregate measures, often in a
visual form. No matter what you do next, you have to at least describe the
data! During exploratory data analysis, you try to find new relationships
between existing variables. If you have a small data sample and would like
to describe a bigger population, statistics-based inferential analysis is right
for you. A predictive analyst learns from the past to predict the future. Causal
analysis identifies variables that affect each other. Finally, mechanistic data
analysis explores exactly how one variable affects another variable.
However, your analysis is only as good as the data you use. What is the ideal
data set? What data has the answer to your question in an ideal world? By
the way, the ideal data set may not exist at all or be hard or infeasible to
obtain. Things happen, but perhaps a smaller or not so feature-rich data set
would still work?
Fortunately, getting the raw data from the web or from a database is not that
hard, and there are plenty of Python tools that assist with downloading and
deciphering it. You’ll take a closer look in Unit 2, Data Acquisition Pipeline,
on page 5.
In this imperfect world, there is no perfect data. “Dirty” data has missing
values, outliers, and other “non-standard” items. Some examples of “dirty”
data are birth dates in the future, negative ages and weights, and email
addresses not intended for use (
noreply@
). Once you obtain the raw data, the
next step is to use data-cleaning tools and your knowledge of statistics to
regularize the data set.
With clean data in your files, you then perform descriptive and exploratory
analysis. The output of this step often includes scatter plots (mentioned on
page 143), histograms, and statistical summaries (explained on page 150). They
give you a smell and sense of data—an intuition that is indispensable for
further research, especially if the data set has many dimensions.
report erratum • discuss
Data Analysis Sequence • 3
And now you are just one step away from prognosticating. Your tools of the
trade are data models that, if properly trained, can learn from the past and
predict the future. Don’t forget about assessing the quality of the constructed
models and their prediction accuracy!
At this point you take your statistician and programmer hats off and put a
domain expert hat on. You’ve got some results, but are they domain-signifi-
cant? In other words, does anyone care about them and do they make any
difference? Pretend that you’re a reviewer hired to evaluate your own work.
What did you do right, what did you do wrong, and what would you do better
or differently if you had another chance? Would you use different data, run
different types of analysis, ask a different question, or build a different model?
Someone is going to ask these questions—it’s better if you ask them first.
Start looking for the answers when you are still deeply immersed in the context.
Last, but not least, you have to produce a report that explains how and why
you processed the data, what models were built, and what conclusions and
predictions are possible. You’ll take a look at the report structure at the end
of this chapter in Unit 3, Report Structure, on page 7.
As your companion to select areas of data science in the Python language,
this book’s focus is mainly on the earlier, least formalized, and most creative
steps of a typical data analysis sequence: getting, cleaning, organizing, and
sizing the data. Data modeling, including predictive data modeling, is barely
touched. (It would be unfair to leave data modeling out completely, because
that’s where the real magic happens!) In general, results interpretation, chal-
lenging, and reporting are very domain-specific and belong to specialized texts.
Chapter 1. What Is Data Science? • 4
report erratum • discuss
剩余217页未读,继续阅读
tommy_jiang2
- 粉丝: 0
- 资源: 8
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 构建Cadence PSpice仿真模型库教程
- VMware 10.0安装指南:步骤详解与网络、文件共享解决方案
- 中国互联网20周年必读:影响行业的100本经典书籍
- SQL Server 2000 Analysis Services的经典MDX查询示例
- VC6.0 MFC操作Excel教程:亲测Win7下的应用与保存技巧
- 使用Python NetworkX处理网络图
- 科技驱动:计算机控制技术的革新与应用
- MF-1型机器人硬件与robobasic编程详解
- ADC性能指标解析:超越位数、SNR和谐波
- 通用示波器改造为逻辑分析仪:0-1字符显示与电路设计
- C++实现TCP控制台客户端
- SOA架构下ESB在卷烟厂的信息整合与决策支持
- 三维人脸识别:技术进展与应用解析
- 单张人脸图像的眼镜边框自动去除方法
- C语言绘制图形:余弦曲线与正弦函数示例
- Matlab 文件操作入门:fopen、fclose、fprintf、fscanf 等函数使用详解
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功