没有合适的资源?快使用搜索试试~ 我知道了~
首页Mastering-Predictive-Analytics-with-Python.pdf.pdf
Mastering-Predictive-Analytics-with-Python.pdf.pdf
需积分: 10 111 浏览量
更新于2023-05-28
评论 1
收藏 6.76MB PDF 举报
Mastering-Predictive-Analytics-with-Python.pdf
资源详情
资源评论
资源推荐

Mastering Predictive Analytics
with Python
Exploit the power of data in your business by building
advanced predictive modeling applications with Python
Joseph Babcock

Preface vii
Chapter 1: From Data to Decisions – Getting Started with
Analytic Applications 1
Designing an advanced analytic solution 4
Data layer: warehouses, lakes, and streams 6
Modeling layer 8
Deployment layer 14
Reporting layer 15
Case study: sentiment analysis of social media feeds 16
Data input and transformation 17
Sanity checking 18
Model development 18
Scoring 19
Visualization and reporting 19
Case study: targeted e-mail campaigns 19
Data input and transformation 20
Sanity checking 21
Model development 21
Scoring 21
Visualization and reporting 21
Summary 23
Chapter 2: Exploratory Data Analysis and Visualization in Python 25
Exploring categorical and numerical data in IPython 26
Installing IPython notebook 27
The notebook interface 27
Loading and inspecting data 30
Basic manipulations – grouping, ltering, mapping, and pivoting 33
Charting with Matplotlib 38
Contents

Time series analysis 46
Cleaning and converting 46
Time series diagnostics 48
Joining signals and correlation 50
Working with geospatial data 53
Loading geospatial data 53
Working in the cloud 55
Introduction to PySpark 56
Creating the SparkContext 56
Creating an RDD 58
Creating a Spark DataFrame 59
Summary 61
Chapter 3: Finding Patterns in the Noise – Clustering and
Unsupervised Learning 63
Similarity and distance metrics 64
Numerical distance metrics 64
Correlation similarity metrics and time series 70
Similarity metrics for categorical data 78
K-means clustering 83
Afnity propagation – automatically choosing cluster numbers 89
k-medoids 93
Agglomerative clustering 94
Where agglomerative clustering fails 96
Streaming clustering in Spark 100
Summary 104
Chapter 4: Connecting the Dots with Models – Regression
Methods 105
Linear regression 106
Data preparation 109
Model tting and evaluation 114
Statistical signicance of regression outputs 119
Generalize estimating equations 124
Mixed effects models 126
Time series data 127
Generalized linear models 128
Applying regularization to linear models 129
Tree methods 132
Decision trees 132
Random forest 138

Scaling out with PySpark – predicting year of song release 141
Summary 143
Chapter 5: Putting Data in its Place – Classication Methods
and Analysis 145
Logistic regression 146
Multiclass logistic classiers: multinomial regression 150
Formatting a dataset for classication problems 151
Learning pointwise updates with stochastic gradient descent 155
Jointly optimizing all parameters with second-order methods 158
Fitting the model 162
Evaluating classication models 165
Strategies for improving classication models 169
Separating Nonlinear boundaries with Support vector machines 172
Fitting and SVM to the census data 174
Boosting – combining small models to improve accuracy 177
Gradient boosted decision trees 177
Comparing classication methods 180
Case study: tting classier models in pyspark 182
Summary 184
Chapter 6: Words and Pixels – Working with Unstructured Data 185
Working with textual data 186
Cleaning textual data 186
Extracting features from textual data 189
Using dimensionality reduction to simplify datasets 192
Principal component analysis 193
Latent Dirichlet Allocation 205
Using dimensionality reduction in predictive modeling 209
Images 209
Cleaning image data 210
Thresholding images to highlight objects 213
Dimensionality reduction for image analysis 216
Case Study: Training a Recommender System in PySpark 220
Summary 222
Chapter 7: Learning from the Bottom Up – Deep Networks and
Unsupervised Features 223
Learning patterns with neural networks 224
A network of one – the perceptron 224
Combining perceptrons – a single-layer neural network 226
Parameter tting with back-propagation 229
欢迎加入非盈利Python编程学习交流QQ群783462347,群里免费提供500+本Python书籍!
剩余322页未读,继续阅读

















weixin_38743737
- 粉丝: 370
- 资源: 2万+
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助

会员权益专享
最新资源
- Xilinx SRIO详解.pptx
- Informatica PowerCenter 10.2 for Centos7.6安装配置说明.pdf
- 现代无线系统射频电路实用设计卷II 英文版.pdf
- 电子产品可靠性设计 自己讲课用的PPT,包括设计方案的可靠性选择,元器件的选择与使用,降额设计,热设计,余度设计,参数优化设计 和 失效分析等
- MPC5744P-DEV-KIT-REVE-QSG.pdf
- 通信原理课程设计报告(ASK FSK PSK Matlab仿真--数字调制技术的仿真实现及性能研究)
- ORIGIN7.0使用说明
- 在VMware Player 3.1.3下安装Redhat Linux详尽步骤
- python学生信息管理系统实现代码
- 西门子MES手册 13 OpcenterEXCR_PortalStudio1_81RB1.pdf
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制

评论0