没有合适的资源?快使用搜索试试~ 我知道了~
首页Python实战:机器学习解决数据洪流中的挑战
Python实战:机器学习解决数据洪流中的挑战
需积分: 1 0 下载量 51 浏览量
更新于2024-07-19
收藏 6.7MB PDF 举报
"《机器学习实战》是一本将理论与实践完美融合的书籍,它深刻阐述了在当今数据驱动的世界中,机器学习的重要性。数据被视为新时代的“原料”,海量而未加区分,而机器学习正是处理这一信息海洋的关键工具。作者彼得·哈林顿引领读者理解,通过将统计数据分析、数据处理和可视化的核心算法转化为可重用的计算机代码,企业可以显著提升数据分析的能力,超越个体专家的局限。 本书的核心内容围绕如何利用Python编程语言来构建实用的工具,实现数据分类、预测、推荐等任务,如文本摘要和简化等高级功能。读者将通过丰富的实例学习关键概念,如分类、数值预测和聚类,同时接触重要的算法,如Apriori,用于识别大型数据集中的关联模式,以及Adaboost,一个元算法,能够提高许多机器学习任务的效率。 《机器学习实战》不仅教授理论,更注重实际操作,帮助读者掌握从基础到进阶的技术,并在数据挖掘的过程中理解和沟通复杂信息。书中的内容适合对机器学习感兴趣的专业人士,无论他们是数据科学家、工程师还是希望提升数据分析能力的企业决策者。 购买此书时,读者可以通过Manning出版社的网站获取折扣,或直接联系特殊销售部门查询更多信息。版权方面,未经出版商书面许可,不得任何形式复制、存储或传输书中的内容。书中提到的制造商和卖家的产品名称,如果被标记为商标,均表示了它们的权益。 《机器学习实战》是一本既具有深度又实用的教程,对于任何希望在这个数据驱动时代脱颖而出的人来说,是不可或缺的参考资料。通过阅读和实践书中的例子,读者将具备在实际业务环境中应用机器学习技术的坚实基础。"
资源详情
资源推荐
CONTENTS
xv
15.6 Example: the Pegasos algorithm for distributed SVMs 316
The Pegasos algorithm 317
■
Training: MapReduce support
vector machines with mrjob 318
15.7 Do you really need MapReduce? 322
15.8 Summary 323
appendix A Getting started with Python 325
appendix B Linear algebra 335
appendix C Probability refresher 341
appendix D Resources 345
index 347
xvii
preface
After college I went to work for Intel in California and mainland China. Originally my
plan was to go back to grad school after two years, but time flies when you are having
fun, and two years turned into six. I realized I had to go back at that point, and I
didn’t want to do night school or online learning, I wanted to sit on campus and soak
up everything a university has to offer. The best part of college is not the classes you
take or research you do, but the peripheral things: meeting people, going to seminars,
joining organizations, dropping in on classes, and learning what you don’t know.
Sometime in 2008 I was helping set up for a career fair. I began to talk to someone
from a large financial institution and they wanted me to interview for a position mod-
eling credit risk (figuring out if someone is going to pay off their loans or not). They
asked me how much stochastic calculus I knew. At the time, I wasn’t sure I knew what
the word stochastic meant. They were hiring for a geographic location my body
couldn’t tolerate, so I decided not to pursue it any further. But this stochastic stuff
interested me, so I went to the course catalog and looked for any class being offered
with the word “stochastic” in its title. The class I found was “Discrete-time Stochastic
Systems.” I started attending the class without registering, doing the homework and
taking tests. Eventually I was noticed by the professor and she was kind enough to let
me continue, for which I am very grateful. This class was the first time I saw probability
applied to an algorithm. I had seen algorithms take an averaged value as input before,
but this was different: the variance and mean were internal values in these algorithms.
The course was about “time series” data where every piece of data is a regularly spaced
sample. I found another course with Machine Learning in the title. In this class the
PREFACE
xviii
data was not assumed to be uniformly spaced in time, and they covered more algo-
rithms but with less rigor. I later realized that similar methods were also being taught
in the economics, electrical engineering, and computer science departments.
In early 2009, I graduated and moved to Silicon Valley to start work as a software
consultant. Over the next two years, I worked with eight companies on a very wide
range of technologies and saw two trends emerge which make up the major thesis for
this book: first, in order to develop a compelling application you need to do more
than just connect data sources; and second, employers want people who understand
theory and can also program.
A large portion of a programmer’s job can be compared to the concept of connect-
ing pipes—except that instead of pipes, programmers connect the flow of data—and
monstrous fortunes have been made doing exactly that. Let me give you an example.
You could make an application that sells things online—the big picture for this would
be allowing people a way to post things and to view what others have posted. To do this
you could create a web form that allows users to enter data about what they are selling
and then this data would be shipped off to a data store. In order for other users to see
what a user is selling, you would have to ship the data out of the data store and display
it appropriately. I’m sure people will continue to make money this way; however to
make the application really good you need to add a level of intelligence. This intelli-
gence could do things like automatically remove inappropriate postings, detect fraud-
ulent transactions, direct users to things they might like, and forecast site traffic. To
accomplish these objectives, you would need to apply machine learning. The end user
would not know that there is magic going on behind the scenes; to them your applica-
tion “just works,” which is the hallmark of a well-built product.
An organization may choose to hire a group of theoretical people, or “thinkers,”
and a set of practical people, “doers.” The thinkers may have spent a lot of time in aca-
demia, and their day-to-day job may be pulling ideas from papers and modeling them
with very high-level tools or mathematics. The doers interface with the real world by
writing the code and dealing with the imperfections of a non-ideal world, such as
machines that break down or noisy data. Separating thinkers from doers is a bad idea
and successful organizations realize this. (One of the tenets of lean manufacturing is
for the thinkers to get their hands dirty with actual doing.) When there is a limited
amount of money to be spent on hiring, who will get hired more readily—the thinker
or the doer? Probably the doer, but in reality employers want both. Things need to get
built, but when applications call for more demanding algorithms it is useful to have
someone who can read papers, pull out the idea, implement it in real code, and iterate.
I didn’t see a book that addressed the problem of bridging the gap between think-
ers and doers in the context of machine learning algorithms. The goal of this book is
to fill that void, and, along the way, to introduce uses of machine learning algorithms
so that the reader can build better applications.
xix
acknowledgments
This is by far the easiest part of the book to write...
First, I would like to thank the folks at Manning. Above all, I would like to thank
my editor Troy Mott; if not for his support and enthusiasm, this book never would
have happened. I would also like to thank Maureen Spencer who helped polish my
prose in the final manuscript; she was a pleasure to work with.
Next I would like to thank Jennie Si at Arizona State University for letting me
sneak into her class on discrete-time stochastic systems without registering. Also
Cynthia Rudin at
MIT for pointing me to the paper “Top 10 Algorithms in Data
Mining,”
1
which inspired the approach I took in this book. For indirect contributions
I would like to thank Mark Bauer, Jerry Barkely, Jose Zero, Doug Chang, Wayne
Carter, and Tyler Neylon.
Special thanks to the following peer reviewers who read the manuscript at differ-
ent stages during its development and provided invaluable feedback: Keith Kim,
Franco Lombardo, Patrick Toohey, Josef Lauri, Ryan Riley, Peter Venable, Patrick
Goetz, Jeroen Benckhuijsen, Ian McAllister, Orhan Alkan, Joseph Ottinger, Fred Law,
Karsten Strøbæk, Brian Lau, Stephen McKamey, Michael Brennan, Kevin Jackson,
John Griffin, Sumit Pal, Alex Alves, Justin Tyler Wiley, and John Stevenson.
My technical proofreaders, Tricia Hoffman and Alex Ott, reviewed the technical
content shortly before the manuscript went to press and I would like to thank them
1
Xindong Wu, et al., “Top 10 Algorithms in Data Mining,” Journal of Knowledge and Information
Systems 14, no. 1 (December 2007).
剩余381页未读,继续阅读
hahahaqq1
- 粉丝: 0
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- zlib-1.2.12压缩包解析与技术要点
- 微信小程序滑动选项卡源码模版发布
- Unity虚拟人物唇同步插件Oculus Lipsync介绍
- Nginx 1.18.0版本WinSW自动安装与管理指南
- Java Swing和JDBC实现的ATM系统源码解析
- 掌握Spark Streaming与Maven集成的分布式大数据处理
- 深入学习推荐系统:教程、案例与项目实践
- Web开发者必备的取色工具软件介绍
- C语言实现李春葆数据结构实验程序
- 超市管理系统开发:asp+SQL Server 2005实战
- Redis伪集群搭建教程与实践
- 掌握网络活动细节:Wireshark v3.6.3网络嗅探工具详解
- 全面掌握美赛:建模、分析与编程实现教程
- Java图书馆系统完整项目源码及SQL文件解析
- PCtoLCD2002软件:高效图片和字符取模转换
- Java开发的体育赛事在线购票系统源码分析
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功