没有合适的资源?快使用搜索试试~ 我知道了~
首页一份简明《机器学习数学基础》丝滑入门手册(来自UC伯克利),47页pdf.pdf
一份简明《机器学习数学基础》丝滑入门手册(来自UC伯克利),47页pdf.pdf
需积分: 48 443 浏览量
更新于2023-05-24
评论 1
收藏 695KB PDF 举报
机器学习使用来自各种数学领域的工具。本文件试图提供一个概括性的数学背景,需要在入门类的机器学习,这是在加州大学伯克利分校被称为CS 189/289A。
资源详情
资源评论
资源推荐

Mathematics for Machine Learning
Garrett Thomas
Department of Electrical Engineering and Computer Sciences
University of California, Berkeley
January 11, 2018
1 About
Machine learning uses tools from a variety of mathematical fields. This document is an attempt to
provide a summary of the mathematical background needed for an introductory class in machine
learning, which at UC Berkeley is known as CS 189/289A.
Our assumption is that the reader is already familiar with the basic concepts of multivariable calculus
and linear algebra (at the level of UCB Math 53/54). We emphasize that this document is not a
replacement for the prerequisite classes. Most subjects presented here are covered rather minimally;
we intend to give an overview and point the interested reader to more comprehensive treatments for
further details.
Note that this document concerns math background for machine learning, not machine learning
itself. We will not discuss specific machine learning models or algorithms except possibly in passing
to highlight the relevance of a mathematical concept.
Earlier versions of this document did not include proofs. We have begun adding in proofs where
they are reasonably short and aid in understanding. These proofs are not necessary background for
CS 189 but can be used to deepen the reader’s understanding.
You are free to distribute this document as you wish. The latest version can be found at http://
gwthomas.github.io/docs/math4ml.pdf. Please report any mistakes to gwthomas@berkeley.edu.
1

Contents
1 About 1
2 Notation 5
3 Linear Algebra 6
3.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 The matrix of a linear map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.2 Nullspace, range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.5.1 Pythagorean Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5.2 Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5.3 Orthogonal complements and projections . . . . . . . . . . . . . . . . . . . . 12
3.6 Eigenthings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.8 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.9 Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.10 Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.10.1 Rayleigh quotients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.11 Positive (semi-)definite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.11.1 The geometry of positive definite quadratic forms . . . . . . . . . . . . . . . . 19
3.12 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.13 Fundamental Theorem of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.14 Operator and matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.15 Low-rank approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.16 Pseudoinverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.17 Some useful matrix identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.17.1 Matrix-vector product as linear combination of matrix columns . . . . . . . . 26
3.17.2 Sum of outer products as matrix-matrix product . . . . . . . . . . . . . . . . 26
3.17.3 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Calculus and Optimization 27
2

4.1 Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 The Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 The Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Matrix calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5.1 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.7 Conditions for local minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.8 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.8.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.8.2 Basics of convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.8.3 Consequences of convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.8.4 Showing that a function is convex . . . . . . . . . . . . . . . . . . . . . . . . 33
4.8.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Probability 37
5.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.1.1 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1.2 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1.3 Bayes’ rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.1 The cumulative distribution function . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.2 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.3 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.4 Other kinds of random variables . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3.1 Independence of random variables . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3.2 Marginal distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 Great Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4.1 Properties of expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5.1 Properties of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5.2 Standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.6 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.6.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.7 Random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3

5.8 Estimation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.8.1 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.8.2 Maximum a posteriori estimation . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.9 The Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.9.1 The geometry of multivariate Gaussians . . . . . . . . . . . . . . . . . . . . . 45
References 47
4

2 Notation
Notation Meaning
R set of real numbers
R
n
set (vector space) of n-tuples of real numbers, endowed with the usual inner product
R
m×n
set (vector space) of m-by-n matrices
δ
ij
Kronecker delta, i.e. δ
ij
= 1 if i = j, 0 otherwise
∇f(x) gradient of the function f at x
∇
2
f(x) Hessian of the function f at x
A
>
transpose of the matrix A
Ω sample space
P(A) probability of event A
p(X) distribution of random variable X
p(x) probability density/mass function evaluated at x
A
c
complement of event A
A
˙
∪ B union of A and B, with the extra requirement that A ∩B = ∅
E[X] expected value of random variable X
Var(X) variance of random variable X
Cov(X, Y ) covariance of random variables X and Y
Other notes:
• Vectors and matrices are in bold (e.g. x, A). This is true for vectors in R
n
as well as for
vectors in general vector spaces. We generally use Greek letters for scalars and capital Roman
letters for matrices and random variables.
• To stay focused at an appropriate level of abstraction, we restrict ourselves to real values. In
many places in this document, it is entirely possible to generalize to the complex case, but we
will simply state the version that applies to the reals.
• We assume that vectors are column vectors, i.e. that a vector in R
n
can be interpreted as an
n-by-1 matrix. As such, taking the transpose of a vector is well-defined (and produces a row
vector, which is a 1-by-n matrix).
5
剩余46页未读,继续阅读













安全验证
文档复制为VIP权益,开通VIP直接复制

评论0