没有合适的资源?快使用搜索试试~ 我知道了~
首页斯坦福CS229机器学习完整笔记pdf版
"斯坦福学习笔记CS229是一份详细的英文PDF文档,由著名教授Andrew Ng讲解的机器学习课程的完整解读。这份笔记源自2011年秋季ml-class.org网站上的原始课程资料,涵盖了课程的所有核心内容,包括但不限于监督学习、无监督学习、神经网络、支持向量机、集成学习等主题。笔记作者原本是为了个人学习而编写,随着时间的推移,它已经发展成为一个包含超过40,000字和众多图表的全面参考材料,适用于对编程有一定基础,但不假设读者具备统计学、微积分或线性代数背景的学习者。 笔记的特点是所有的解释和概念阐述深入浅出,所有图表都是作者根据讲座内容绘制或者直接引用,充分展示了Ng教授的精彩教学。此外,虽然原课程涉及Octave/MATLAB编程部分在这些笔记中并未包含,但对于希望深入理解机器学习理论的人来说,这是一份极其宝贵的资源。 由于其详尽的内容和易懂的语言,这份笔记不仅适合那些正在上斯坦福CS229课程的学生,也对希望自学机器学习或者作为教师的教学辅助材料非常有用。通过阅读这份笔记,读者能够系统地掌握机器学习的基本原理和实践技巧,是提升机器学习能力不可多得的参考资料。"
资源详情
资源推荐
17_Large_Scale_Machine_Learning
file:///C|/Users/fencer/Desktop/Machine_learning_complete/17_Large_Scale_Machine_Learning.html[2017/11/1 21:03:12]
For many learning algorithms, we derived them by coming up with an optimization objective (cost
function) and using an algorithm to minimize that cost function
When you have a large dataset, gradient descent becomes very expensive
So here we'll define a different way to optimize for large data sets which will allow us to scale the
algorithms
Suppose you're training a linear regression model with gradient descent
Hypothesis
Cost function
If we plot our two parameters vs. the cost function we get something like this
Looks like this bowl shape surface plot
Quick reminder - how does gradient descent work?
In the inner loop we repeatedly update the parameters θ
We will use linear regression for our algorithmic example here when talking
about stochastic gradient descent, although the ideas apply to other algorithms too, such as
Logistic regression
Neural networks
Below we have a contour plot for gradient descent showing iteration to a global minimum
17_Large_Scale_Machine_Learning
file:///C|/Users/fencer/Desktop/Machine_learning_complete/17_Large_Scale_Machine_Learning.html[2017/11/1 21:03:12]
As mentioned, if m is large gradient descent can be very expensive
Although so far we just referred to it as gradient descent, this kind of gradient descent is called batch
gradient descent
This just means we look at all the examples at the same time
Batch gradient descent is not great for huge datasets
If you have 300,000,000 records you need to read in all the records into memory from disk
because you can't store them all in memory
By reading all the records, you can move one step (iteration) through the algorithm
Then repeat for EVERY step
This means it takes a LONG time to converge
Especially because disk I/O is typically a system bottleneck anyway, and this will inevitably
require a huge number of reads
What we're going to do here is come up with a different algorithm which only needs to look at single
example at a time
Stochastic gradient descent
Define our cost function slightly differently, as
So the function represents the cost of θ with respect to a specific example (x
i
,y
i
)
And we calculate this value as one half times the squared error on that example
Measures how well the hypothesis works on a single example
The overall cost function can now be re-written in the following form;
This is equivalent to the batch gradient descent cost function
With this slightly modified (but equivalent) view of linear regression we can write out how stochastic
gradient descent works
1) - Randomly shuffle
17_Large_Scale_Machine_Learning
file:///C|/Users/fencer/Desktop/Machine_learning_complete/17_Large_Scale_Machine_Learning.html[2017/11/1 21:03:12]
2) - Algorithm body
So what's going on here?
The term
Is the same as that found in the summation for batch gradient descent
It's possible to show that this term is equal to the partial derivative with respect to the parameter
θ
j
of the cost (θ,(x
i
,y
i
))
What stochastic gradient descent algorithm is doing is scanning through each example
The inner for loop does something like this...
Looking at example 1, take a step with respect to the cost of just the 1st training example
Having done this, we go on to the second training example
Now take a second step in parameter space to try and fit the second training example better
Now move onto the third training example
And so on...
Until it gets to the end of the data
We may now repeat this who procedure and take multiple passes over the data
The randomly shuffling at the start means we ensure the data is in a random order so we don't bias
the movement
Randomization should speed up convergence a little bit
Although stochastic gradient descent is a lot like batch gradient descent, rather than waiting to sum up
the gradient terms over all m examples, we take just one example and make progress in improving the
parameters already
Means we update the parameters on EVERY step through data, instead of at the end of each loop
through all the data
What does the algorithm do to the parameters?
As we saw, batch gradient descent does something like this to get to a global minimum
17_Large_Scale_Machine_Learning
file:///C|/Users/fencer/Desktop/Machine_learning_complete/17_Large_Scale_Machine_Learning.html[2017/11/1 21:03:12]
With stochastic gradient descent every iteration is much faster, but every iteration is flitting a single
example
What you find is that you "generally" move in the direction of the global minimum, but not
always
You never actually converges like batch gradient descent does, but ends up wandering around
some region close to the global minimum
In practice, this isn't a problem - as long as you're close to the minimum that's probably
OK
One final implementation note
May need to loop over the entire dataset 1-10 times
If you have a truly massive dataset it's possible that by the time you've taken a first pass through the
dataset you may already have a perfectly good hypothesis
In which case the inner loop might only need to happen once if m is very very large
If we contrast this to batch gradient descent
We have to make k passes through the entire dataset, where k is the number of steps needed to
move through the data<
Mini Batch Gradient Descent
17_Large_Scale_Machine_Learning
file:///C|/Users/fencer/Desktop/Machine_learning_complete/17_Large_Scale_Machine_Learning.html[2017/11/1 21:03:12]
Mini-batch gradient descent is an additional approach which can work even faster than stochastic
gradient descent
To summarize our approaches so far
Batch gradient descent: Use all m examples in each iteration
Stochastic gradient descent: Use 1 example in each iteration
Mini-batch gradient descent: Use b examples in each iteration
b = mini-batch size
So just like batch gradient descent, except we use tiny batches
Typical range for b = 2-100 (10 maybe)
For example
b = 10
Get 10 examples from training set
Perform gradient descent update using the ten examples
Mini-batch algorithm
We for-loop through b-size batches of m
Compared to batch gradient descent this allows us to get through data in a much more efficient way<
After just b examples we begin to improve our parameters
Don't have to update parameters after every example, and don't have to wait until you cycled
through all the data
Mini-batch gradient descent vs. stochastic gradient descent
Why should we use mini-batch?
Allows you to have a vectorized implementation
Means implementation is much more efficient
Can partially parallelize your computation (i.e. do 10 at once)
A disadvantage of mini-batch gradient descent is the optimization of the parameter b
But this is often worth it!
To be honest, stochastic gradient descent and batch gradient descent are just specific forms of batch-
gradient descent
For mini-batch gradient descent, b is somewhere in between 1 and m and you can try to optimize
for it!
剩余212页未读,继续阅读
开发老牛
- 粉丝: 36
- 资源: 17
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 多模态联合稀疏表示在视频目标跟踪中的应用
- Kubernetes资源管控与Gardener开源软件实践解析
- MPI集群监控与负载平衡策略
- 自动化PHP安全漏洞检测:静态代码分析与数据流方法
- 青苔数据CEO程永:技术生态与阿里云开放创新
- 制造业转型: HyperX引领企业上云策略
- 赵维五分享:航空工业电子采购上云实战与运维策略
- 单片机控制的LED点阵显示屏设计及其实现
- 驻云科技李俊涛:AI驱动的云上服务新趋势与挑战
- 6LoWPAN物联网边界路由器:设计与实现
- 猩便利工程师仲小玉:Terraform云资源管理最佳实践与团队协作
- 类差分度改进的互信息特征选择提升文本分类性能
- VERITAS与阿里云合作的混合云转型与数据保护方案
- 云制造中的生产线仿真模型设计与虚拟化研究
- 汪洋在PostgresChina2018分享:高可用 PostgreSQL 工具与架构设计
- 2018 PostgresChina大会:阿里云时空引擎Ganos在PostgreSQL中的创新应用与多模型存储
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功