Python中的数据数学语言探索：线性代数基础与机器学习

需积分: 0 56 浏览量更新于2024-07-18 收藏 1.34MB PDF 举报

"Discover the Mathematical Language of Data in Python" 在数据科学和机器学习领域，Python语言已经成为了一种不可或缺的工具，因为它提供了丰富的库和强大的计算能力，使得处理和理解复杂的数据变得更为简单。这本书《Discover the Mathematical Language of Data in Python》专注于讲解如何利用Python来探索数据的数学语言，特别是线性代数的基础，这是机器学习中的核心概念。线性代数是数学的一个分支，它研究向量、矩阵、线性映射以及它们在几何和代数中的应用。对于机器学习而言，线性代数提供了解决多元线性问题的框架，并且是理解许多机器学习算法（如线性回归、主成分分析PCA、神经网络等）的基础。以下是一些关键知识点： 1. 向量（Vectors）：向量是具有方向和大小的一维数组，常用来表示物理量或特征。在Python中，可以通过NumPy库创建和操作向量。 2. 矩阵（Matrices）：矩阵是二维数组，用于表示多个变量之间的关系。它们在Python中以二维数组的形式存在，例如使用NumPy的`numpy.array()`创建。 3. 线性组合与线性独立：理解向量或向量组的线性组合和线性独立性是线性代数的基础。线性组合是通过常数系数加权求和得到的向量；线性独立意味着没有一个向量可以表示为其他向量的线性组合。 4. 矩阵运算：包括矩阵加法、乘法、转置和求逆。矩阵乘法尤其重要，因为它在机器学习中用于计算权重和预测。 5. 矩阵分解：如奇异值分解（SVD）、特征值分解等，这些方法在降维、推荐系统和稳定计算中有着广泛应用。 6. 最小二乘法（Least Squares）：这是一种解决过定系统（即方程组比未知数多）的优化方法，常用于线性回归问题。 7. 正交投影：在机器学习中，正交投影用于将数据投射到低维空间，例如主成分分析（PCA）就是一种正交投影的应用。 8. 线性方程组：线性代数提供了求解线性方程组的高效方法，如高斯消元法和LU分解，这些在优化问题和机器学习模型的训练中至关重要。 9. 范数（Norms）：范数衡量向量或矩阵的大小，有L1范数、L2范数等，它们在机器学习中的正则化中起到重要作用。 10. 径向基函数（RBF）：这些函数在构建核方法如支持向量机（SVM）时用到，通过非线性映射将数据转换到高维空间。作者Jason Brownlee强调了准确性和责任自负的原则，提醒读者在应用书中的知识时需谨慎。他还感谢了编辑团队对内容的校对和技术审查。这本书的内容涵盖了线性代数的基础，适合对机器学习感兴趣并希望深入理解其数学基础的读者。

About Further Reading

Each lesson includes a list of further reading resources. This may include:



Books and book chapters.



API documentation.



Articles and Webpages.

Wherever possible, I try to list and link to the relevant API documentation for key functions

used in each lesson so you can learn more about them. I have tried to link to books on Amazon

so that you can learn more about them. I don’t know everything, and if you discover a good

resource related to a given lesson, please let me know so I can update the book.

About Getting Help

You might need help along the way. Don’t worry; you are not alone.



Help with a Technique?

If you need help with the technical aspects of a speciﬁc

operation or technique, see the Further Reading sections at the end of each lesson.



Help with NumPy?

If you need help with using the NumPy library, see the list of

resources in the Further Reading section at the end of each lesson, and also see Appendix



Help with your workstation?

If you need help setting up your environment, I would

recommend using Anaconda and following my tutorial in Appendix B.



Help with the math?

I provided a list of locations where you can search for answers

and ask questions about linear algebra math in Appendix A. You can also see Appendix D

for a crash course on math notation.



Help in general? You can shoot me an email. My details are in Appendix A.

Summary

Are you ready? Let’s dive in!

Next up you will discover a gentle introduction to the ﬁeld of linear algebra.

1.3. Numerical Linear Algebra 3

of numbers called matrices, to create new columns and arrays of numbers. Linear algebra is the

study of lines and planes, vector spaces and mappings that are required for linear transforms.

It is a relatively young ﬁeld of study, having initially been formalized in the 1800s in order

to ﬁnd unknowns in systems of linear equations. A linear equation is just a series of terms and

mathematical operations where some terms are unknown; for example:

y = 4 × x + 1 (1.1)

Equations like this are linear in that they describe a line on a two-dimensional graph. The

line comes from plugging in diﬀerent values into the unknown

to ﬁnd out what the equation

or model does to the value of

. We can line up a system of equations with the same form with

two or more unknowns; for example:

y = 0.1 × x

+ 0.4 × x

y = 0.3 × x

+ 0.9 × x

y = 0.2 × x

+ 0.3 × x

···

(1.2)

The column of

values can be taken as a column vector of outputs from the equation. The

two columns of integer values are the data columns, say

and

, and can be taken as a matrix

. The two unknown values

and

can be taken as the coeﬃcients of the equation and

together form a vector of unknowns b to be solved. We can write this compactly using linear

algebra notation as:

y = A · b (1.3)

Problems of this form are generally challenging to solve because there are more unknowns

(here we have 2) than there are equations to solve (here we have 3). Further, there is often no

single line that can satisfy all of the equations without error. Systems describing problems we

are often interested in (such as a linear regression) can have an inﬁnite number of solutions.

This gives a small taste of the very core of linear algebra that interests us as machine learning

practitioners. Much of the rest of the operations are about making this problem and problems

like it easier to understand and solve.

1.3 Numerical Linear Algebra

The application of linear algebra in computers is often called numerical linear algebra.

“numerical” linear algebra is really applied linear algebra.

— Page ix, Numerical Linear Algebra, 1997.

It is more than just the implementation of linear algebra operations in code libraries; it also

includes the careful handling of the problems of applied mathematics, such as working with the

limited ﬂoating point precision of digital computers. Computers are good at performing linear

algebra calculations, and much of the dependence on Graphical Processing Units (GPUs) by

modern machine learning methods such as deep learning is because of their ability to compute

linear algebra operations fast.

1.4. Linear Algebra and Statistics 4

Eﬃcient implementations of vector and matrix operations were originally implemented in

the FORTRAN programming language in the 1970s and 1980s and a lot of code, or code ported

from those implementations, underlies much of the linear algebra performed using modern

programming languages, such as Python. Three popular open source numerical linear algebra

libraries that implement these functions are:



Linear Algebra Package, or LAPACK.



Basic Linear Algebra Subprograms, or BLAS (a standard for linear algebra libraries).



Automatically Tuned Linear Algebra Software, or ATLAS.

Often, when you are calculating linear algebra operations directly or indirectly via higher-

order algorithms, your code is very likely dipping down to use one of these, or similar linear

algebra libraries. The name of one of more of these underlying libraries may be familiar to you

if you have installed or compiled any of Python’s numerical libraries such as SciPy and NumPy.

1.4 Linear Algebra and Statistics

Linear algebra is a valuable tool in other branches of mathematics, especially statistics.

Usually students studying statistics are expected to have seen at least one semester

of linear algebra (or applied algebra) at the undergraduate level.

— Page xv, Linear Algebra and Matrix Analysis for Statistics, 2014.

The impact of linear algebra is important to consider, given the foundational relationship

both ﬁelds have with the ﬁeld of applied machine learning. Some clear ﬁngerprints of linear

algebra on statistics and statistical methods include:



Use of vector and matrix notation, especially with multivariate statistics.



Solutions to least squares and weighted least squares, such as for linear regression.



Estimates of mean and variance of data matrices.



The covariance matrix that plays a key role in multinomial Gaussian distributions.



Principal component analysis for data reduction that draws many of these elements

together.

As you can see, modern statistics and data analysis, at least as far as the interests of a

machine learning practitioner are concerned, depend on the understanding and tools of linear

algebra.

剩余211页未读，继续阅读

zhouguoqionghai

粉丝: 71
资源: 17

Python中的数据数学语言探索：线性代数基础与机器学习

Basics of Linear Algebra for Machine Learning Discover the Mathematical 无水印原版pdf

Basics of Linear Algebra for Machine Learning (Python)

Python: Advanced Predictive Analytics: 2017-12 python重量级高级大数据分析预测

Advanced Techniques for MySQL Data Cleaning and Preprocessing with Python

The Role of Truth Tables in Artificial Intelligence: Foundations for Logical Reasoning and Decision ...

Introduction to Common Data Science Tools in Jupyter Notebook

Time Series Autoregressive Models: In-depth Exploration and Practical Techniques

软考论文范例解读：信息系统项目管理与设计方法的应用

Markdown 是一种轻量级标记语言，它允许人们使用易读易写的纯文本格式编写文档 .zip

Go语言简易指令树实现.zip

最新资源