优化Python性能：实现更快更可扩展的计算

需积分: 17 119 浏览量更新于2024-07-19 收藏 7.16MB PDF 举报

"《High Performance Python》是一本专注于优化Python性能的书籍，旨在打破Python运行速度慢的误解。作者通过深入介绍理论计算、算法、优化策略等，帮助读者理解Python的实现机制，提升代码的执行效率。这本书适用于需要在大数据量程序中提高代码速度的开发者，内容包括多核架构利用、系统扩展性以及在社交数据分析、机器学习生产化等场景中的高性能Python应用案例。" 在《High Performance Python》中，作者探讨了多个关键知识点： 1. **高性能编程的概念**：第一章介绍了高性能计算的含义，强调了性能优化的重要性，特别是在学术界和工业界广泛使用的Python语言中。 2. **理论计算**：第二章深入理论计算的基础，包括问题的定义、计算模型的探讨。这部分帮助读者理解不同类型的计算问题以及如何建模。 3. **算法基础**：第三章详细讲解了算法，包括算法的构成、Big O表示法（描述算法的时间复杂度）、递归关系，以及P、NP和完备性的概念。这些都是性能优化中不可或缺的知识。 4. **优化规则**：第四章提出了优化策略，首先强调了提前规划的重要性，即规则#1，计划先行。 5. **速度优化技巧**：第五章提供了实际的Python速度优化技术，如像作弊一样思考（思考更高效的方法），使用Psyco库加速Python，检查循环效率，Anthony Tuininga的cx_Freeze用于编译Python为可执行文件，等待摩尔定律的作用（硬件性能提升），以及限制使用可能影响性能的正则表达式。这本书不仅涵盖了理论知识，还提供了实战经验，让有经验的Python程序员能够应对多核心架构、集群环境下的性能挑战，以及构建可扩展且可靠的系统。通过这些知识，读者将能够显著提升代码在处理大规模数据时的速度，同时了解如何在社交媒体分析、机器学习生产化等实际场景中运用高性能Python。

Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill,

Jones & Bartlett, Course Technology, and hundreds more. For more information about

Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.

1005 Gravenstein Highway North

Sebastopol, CA 95472

800-998-9938 (in the United States or Canada)

707-829-0515 (international or local)

707-829-0104 (fax)

To comment or ask technical questions about this book, send email to bookques

tions@oreilly.com.

For more information about our books, courses, conferences, and news, see our website

at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

Thanks to Jake Vanderplas, Brian Granger, Dan Foreman-Mackey, Kyran Dale, John

Montgomery, Jamie Matthews, Calvin Giles, William Winter, Christian Schou Oxvig,

Balthazar Rouberol, Matt “snakes” Reiferson, Patrick Cooper, and Michael Skirpan for

invaluable feedback and contributions. Ian thanks his wife Emily for letting him dis‐

appear for 10 months to write this (thankfully she’s terribly understanding). Micha

thanks Elaine and the rest of his friends and family for being so patient while he learned

to write. O’Reilly are also rather lovely to work with.

Our contributors for the “Lessons from the Field” chapter very kindly shared their time

and hard-won lessons. We give thanks to Ben Jackson, Radim Řehůřek, Sebastjan Treb‐

ca, Alex Kelly, Marko Tasic, and Andrew Godwin for their time and effort.

Preface | xv

addition, each of these units has different properties that we can use to understand them.

The computational unit has the property of how many computations it can do per

second, the memory unit has the properties of how much data it can hold and how fast

we can read from and write to it, and finally the connections have the property of how

fast they can move data from one place to another.

Using these building blocks, we can talk about a standard workstation at multiple levels

of sophistication. For example, the standard workstation can be thought of as having a

central processing unit (CPU) as the computational unit, connected to both the random

access memory (RAM) and the hard drive as two separate memory units (each having

different capacities and read/write speeds), and finally a bus that provides the connec‐

tions between all of these parts. However, we can also go into more detail and see that

the CPU itself has several memory units in it: the L1, L2, and sometimes even the L3

and L4 cache, which have small capacities but very fast speeds (from several kilobytes

to a dozen megabytes). These extra memory units are connected to the CPU with a

special bus called the backside bus. Furthermore, new computer architectures generally

come with new configurations (for example, Intel’s Nehalem CPUs replaced the front‐

side bus with the Intel QuickPath Interconnect and restructured many connections).

Finally, in both of these approximations of a workstation we have neglected the network

connection, which is effectively a very slow connection to potentially many other com‐

puting and memory units!

To help untangle these various intricacies, let’s go over a brief description of these fun‐

damental blocks.

Computing Units

The computing unit of a computer is the centerpiece of its usefulness—it provides the

ability to transform any bits it receives into other bits or to change the state of the current

process. CPUs are the most commonly used computing unit; however, graphics

processing units (GPUs), which were originally typically used to speed up computer

graphics but are becoming more applicable for numerical applications, are gaining

popularity due to their intrinsically parallel nature, which allows many calculations to

happen simultaneously. Regardless of its type, a computing unit takes in a series of bits

(for example, bits representing numbers) and outputs another set of bits (for example,

representing the sum of those numbers). In addition to the basic arithmetic operations

on integers and real numbers and bitwise operations on binary numbers, some com‐

puting units also provide very specialized operations, such as the “fused multiply add”

operation, which takes in three numbers, A,B,C, and returns the value A * B + C.

The main properties of interest in a computing unit are the number of operations it can

do in one cycle and how many cycles it can do in one second. The first value is measured

2 | Chapter 1: Understanding Performant Python

1. Not to be confused with interprocess communication, which shares the same acronym—we’ll look at the

topic in Chapter 9.

by its instructions per cycle (IPC),

while the latter value is measured by its clock speed.

These two measures are always competing with each other when new computing units

are being made. For example, the Intel Core series has a very high IPC but a lower clock

speed, while the Pentium 4 chip has the reverse. GPUs, on the other hand, have a very

high IPC and clock speed, but they suffer from other problems, which we will outline

later.

Furthermore, while increasing clock speed almost immediately speeds up all programs

running on that computational unit (because they are able to do more calculations per

second), having a higher IPC can also drastically affect computing by changing the level

of vectorization that is possible. Vectorization is when a CPU is provided with multiple

pieces of data at a time and is able to operate on all of them at once. This sort of CPU

instruction is known as SIMD (Single Instruction, Multiple Data).

In general, computing units have been advancing quite slowly over the past decade (see

Figure 1-1). Clock speeds and IPC have both been stagnant because of the physical

limitations of making transistors smaller and smaller. As a result, chip manufacturers

have been relying on other methods to gain more speed, including hyperthreading,

more clever out-of-order execution, and multicore architectures.

Hyperthreading presents a virtual second CPU to the host operating system (OS), and

clever hardware logic tries to interleave two threads of instructions into the execution

units on a single CPU. When successful, gains of up to 30% over a single thread can be

achieved. Typically this works well when the units of work across both threads use

different types of execution unit—for example, one performs floating-point operations

and the other performs integer operations.

Out-of-order execution enables a compiler to spot that some parts of a linear program

sequence do not depend on the results of a previous piece of work, and therefore that

both pieces of work could potentially occur in any order or at the same time. As long

as sequential results are presented at the right time, the program continues to execute

correctly, even though pieces of work are computed out of their programmed order.

This enables some instructions to execute when others might be blocked (e.g., waiting

for a memory access), allowing greater overall utilization of the available resources.

Finally, and most important for the higher-level programmer, is the prevalence of multi-

core architectures. These architectures include multiple CPUs within the same unit,

which increases the total capability without running into barriers in making each in‐

dividual unit faster. This is why it is currently hard to find any machine with less than

two cores—in this case, the computer has two physical computing units that are con‐

nected to each other. While this increases the total number of operations that can be

The Fundamental Computer System | 3

剩余368页未读，继续阅读

guaguastd

粉丝: 212
资源: 19

优化Python性能：实现更快更可扩展的计算

Python_High_Performance_2nd_Edition_(2017).pdf

Python High Performance(2nd) epub

High Performance Python - EN

high performance python

High Performance Python 无水印pdf

High Performance Python, 2nd ed. -- 2020.pdf

Python High Performance Programming

High.Performance.Python

High_Performance_Python

Mastering Python High Performance 代码

最新资源