Python数据挖掘指南：发掘隐藏在数据中的模式

需积分: 9 124 浏览量更新于2024-07-19 收藏 13.03MB PDF 举报

"《利用Python掌握数据挖掘：发掘隐藏模式》是一本由Megan Squire撰写的专业书籍，旨在帮助读者学习如何利用Python进行高级数据分析，从而创建出更强大的数据挖掘应用。本书是Packt Publishing出版的，它深入浅出地介绍了数据挖掘的核心概念和技术，让读者能够探索并理解在日常数据中潜藏的复杂模式。该书详细涵盖了Python在数据挖掘领域的各种工具和方法，包括但不限于数据预处理、特征选择、聚类分析、关联规则挖掘、分类算法、深度学习等。通过阅读本书，读者不仅会了解理论原理，还能掌握实际操作步骤，以便在真实项目中灵活运用。值得注意的是，版权方面，此书享有2016年Packt Publishing的独家授权，未经书面许可，不得以任何形式复制、存储或传播书中的内容。尽管作者和出版社已尽力确保信息的准确性，但书中提供的信息不带有任何明示或暗示的保修，也不承担因本书内容导致的直接或间接损失的责任。《Mastering Data Mining with Python - Find patterns hidden in your data》是一本紧跟行业趋势的实用指南，适合数据分析师、数据科学家、机器学习工程师以及对Python数据分析感兴趣的人员参考和提升技能。无论是初学者还是经验丰富的专业人士，都可以从中找到有价值的知识和实践经验，进一步提升自己的数据挖掘能力。"

Preface

[ ix ]

Chapter 4, Network Analysis, is a tour through the basics of network or graph analysis,

as used to describe the relationships between various interconnected groups of

entities. We investigate the various types of network and learn how to describe and

measure them. Then we put our learning into practice to describe how a network of

software developers has changed over time.

Chapter 5, Sentiment Analysis in Text, is the rst of four text mining chapters in this

book. This chapter serves as an introduction to the growing eld of sentiment, or

mood, analysis in text. After comparing various approaches to sentiment mining and

learning how to evaluate the results, we practice using a machine learning classier

to determine the sentiment of a set of software developer chat logs and e-mail logs.

Chapter 6, Named Entity Recognition in Text, is about nding proper nouns and proper

names in text. We spend some time learning why this task is useful, and why nding

named entities can sometimes be more difcult than it sounds. At the end of the

chapter we implement a named entity recognition system on several different types

of real-world text data including e-mail, chat logs, and board meeting minutes.

Along the way we apply different techniques for quantifying the success or failure

of our results.

Chapter 7, Automatic Text Summarization, presents several strategies for automatically

create condensed summaries of text. This chapter emphasizes extractive

summarization tools, which are designed to nd the most important sentences in a

text sample. To this end, we experiment with three different tools for accomplishing

this goal, testing the summarization methods, and learning how they differ.

Following the introduction of each tool, we attempt to summarize a common

set of text documents and compare the results.

Chapter 8, Topic Modeling in Text, shows how to use software tools to reveal what

topics or concepts are present in a given text. Can we train a computer program to

infer the themes that are present in large amounts of text? In a series of experiments,

we learn how to use common topic modeling libraries to reveal the topics present in

software developer e-mails, and how those topics change over time.

Chapter 9, Mining for Data Anomalies, is where we learn how to use data mining and

statistical techniques to improve our own data mining process. While all of the other

chapters in this book deal with nding different types of patterns in data, here we

focus on nding data that is anomalous or that does not match a particular pattern.

Whether it is because the data is empty, missing, or just plain weird, this chapter

presents strategies for nding or xing this type of data so that the rest of your data

can be mined more effectively.

Preface

[ x ]

What you need for this book

To complete the projects in this book, you will need a version of Python 3.5 or higher.

I recommend using Anaconda Python, but any Python distribution will do as long as

it is updated and contains the following packages: Numpy, Matplotlib, NetworkX,

PyMySQL, Gensim, and NLTK. In Chapter 1, Expanding Your Data Mining Toolbox,

we will walk through an easy installation of Python and all these libraries, and each

time a library is used later in the book, we will install it or upgrade it together.

Because data mining is obviously data-centric, and because the data sets we are

working with are sometimes large or require some type of persistent data storage,

I chose to implement some of the data mining algorithms alongside a relational

database system. I chose MySQL for accomplishing this since it is an established,

easy-to-download and install piece of infrastructure. The chapters where MySQL

comes into play are in working with the memory-intensive algorithms in Chapter 2,

Association Rule Mining, and Chapter 3, Entity Matching. I also use MySQL for some of

the examples in Chapter 9, Mining for Data Anomalies, but it is possible to go through

that chapter without MySQL.

Who this book is for

If you picked up a book on mastering data mining, you are probably familiar with

the basics of data analysis and you have likely experimented with machine learning

techniques such as regression, decision trees, classication, and cluster analysis.

If you have intermediate experience with Python, understand basic relational

database terminology, have some exposure to basic statistics, and can understand the

rudiments of how supervised and unsupervised machine learning techniques work,

then you are ready for this book. Let's build on what you already know to learn some

more exotic, unusual strategies for mining your data!

Conventions

In this book, you will nd a number of text styles that distinguish between different

kinds of information. Here are some examples of these styles and an explanation of

their meaning.

Code words in text, database table names, folder names, lenames, le extensions,

pathnames, dummy URLs, user input, and Twitter handles are shown as follows:

"We can include other contexts through the use of the

include directive."

剩余268页未读，继续阅读

wintops

粉丝: 0
资源: 10

Python数据挖掘指南：发掘隐藏在数据中的模式

Mastering Data Mining with Python 无水印pdf

data-mining-python

Machine_Learning_Mastery_With_Python _Understand_Your_Data

Mastering Python Design Patterns

Mastering-Python-Design-Patterns-Second-Edition:Packt出版的Mastering-Python-Design-Patterns-Second-Edition

Kasampalis -- Mastering Python Design Patterns -- 2015

Mastering-Data-Mining-with-Python.pdf.pdf

Mastering-Python-Design-Patterns-Example:掌握Python设计模式代码示例

Mastering Data Mining with Python

Madhavan -- Mastering Python for Data Science -- 2015.pdf

最新资源