《计算问题：算法与思想探索》——源代码解析

需积分: 10 34 浏览量更新于2024-07-16 收藏 5.18MB PDF 举报

"Matters Computational: Ideas, Algorithms, Source Code" 是一本面向计算专业人士的书籍，作者是Jörg Arndt。这本书旨在提供算法和计算思维，适合程序员以及对计算方法感兴趣的读者。本书的内容主要围绕低级算法展开，详细讨论了与位操作相关的各种技巧和方法。以下是一些关键知识点的概述： 1. **位操作**：书中介绍了一些基本的位操作，如提取单个位、块操作和位移操作，这些都是在计算机科学中处理二进制数据的基础。 2. **位转换**：作者讨论了如何在位级别上进行旋转、反转和压缩操作，这些操作在某些高效算法和数据编码中至关重要。 3. **位计数**：书中涵盖了计算单词中的比特位数和块数的方法，这对于理解和优化内存使用以及计算复杂度分析很有帮助。 4. **位集操作**：单词被视作位集来处理，包括查找第i个设置的位、避免分支（无条件跳转）等，这些都是在编程中实现高效逻辑操作的关键。 5. **格雷码和奇偶性**：格雷码是一种非连续变化的二进制编码，常用于信号传输和编码，书中还讨论了与其相关的奇偶性问题。 6. **位序列和幂**：书中有关于位序列的生成和格雷码的幂的讨论，这些在编码理论和特定算法设计中都有应用。 7. **稀疏二进制表示**：书中提到了几种稀疏二进制表示，如基数-2(minustwo)表示和一种稀疏的有符号二进制表示，这些表示方法对于处理大量零值或稀疏数据时可以节省存储空间。 8. **数学函数**：书中包含了一些与计算有关的数学函数，如对2的n次幂取逆和平方根模运算，这些在数值计算和加密算法中常见。 9. **扫描和搜索**：作者介绍了扫描零字节的方法，这在处理字节流或内存区域时非常有用。这些内容深入浅出，不仅涵盖了基础的位操作，还包括了一些高级概念，为读者提供了丰富的计算思维和实际编程技巧。通过学习这些算法和思想，读者能够更好地理解底层计算原理，并能应用于实际项目中，提高代码效率和解决问题的能力。

4 Chapter 1: Bit wizardry

Note that the C standard leaves the behavior of a right shift of a signed integer as ‘implementation-

deﬁned’. The described behavior (that a negative value remains negative after right shift) is the default

behavior of many commonly used C compilers.

1.1.4 A pitfall (two’s complement)

c=................ -c=................ c= 0 -c= 0 <--=

c=...............1 -c=1111111111111111 c= 1 -c= -1

c=..............1. -c=111111111111111. c= 2 -c= -2

c=..............11 -c=11111111111111.1 c= 3 -c= -3

c=.............1.. -c=11111111111111.. c= 4 -c= -4

c=.............1.1 -c=1111111111111.11 c= 5 -c= -5

c=.............11. -c=1111111111111.1. c= 6 -c= -6

[--snip--]

c=.1111111111111.1 -c=1.............11 c= 32765 -c=-32765

c=.11111111111111. -c=1.............1. c= 32766 -c=-32766

c=.111111111111111 -c=1..............1 c= 32767 -c=-32767

c=1............... -c=1............... c=-32768 -c=-32768 <--=

c=1..............1 -c=.111111111111111 c=-32767 -c= 32767

c=1.............1. -c=.11111111111111. c=-32766 -c= 32766

c=1.............11 -c=.1111111111111.1 c=-32765 -c= 32765

c=1............1.. -c=.1111111111111.. c=-32764 -c= 32764

c=1............1.1 -c=.111111111111.11 c=-32763 -c= 32763

c=1............11. -c=.111111111111.1. c=-32762 -c= 32762

[--snip--]

c=1111111111111..1 -c=.............111 c= -7 -c= 7

c=1111111111111.1. -c=.............11. c= -6 -c= 6

c=1111111111111.11 -c=.............1.1 c= -5 -c= 5

c=11111111111111.. -c=.............1.. c= -4 -c= 4

c=11111111111111.1 -c=..............11 c= -3 -c= 3

c=111111111111111. -c=..............1. c= -2 -c= 2

c=1111111111111111 -c=...............1 c= -1 -c= 1

Figure 1.1-A: With two’s complement there is one nonzero value that is its own negative.

In two’s complement zero is not the only number that is equal to its negative. The value with just

the highest bit set (the most negative value) also has this property. Figure 1.1-A (the output of [FXT:

bits/gotcha-demo.cc]) shows the situation for words of 16 bits. This is why innocent looking code like

the following can simply fail:

if ( x<0 ) x = -x;

// assume x positive here (WRONG!)

1.1.5 Another pitfall (shifts in the C-language)

A shift by more than BITS_PER_LONG−1 is undeﬁned by the C-standard. Therefore the following function

can fail if k is zero:

1 static inline ulong first_comb(ulong k)

2 // Return the first combination of (i.e. smallest word with) k bits,

3 // i.e. 00..001111..1 (k low bits set)

4 {

5 ulong t = ~0UL >> ( BITS_PER_LONG - k );

6 return t;

7 }

Compilers usually emit just a shift instruction which on certain CPUs does not give zero if the shift is

equal to or greater than BITS_PER_LONG. This is why the line

if ( k==0 ) t = 0; // shift with BITS_PER_LONG is undefined

has to be inserted just before the return statement.

1.1.6 Shortcuts

Test whether at least one of a and b equals zero with

if ( !(a && b) )

This works for both signed and unsigned integers. Check whether both are zero with

if ( (a|b)==0 )

This obviously generalizes for several variables as

if ( (a|b|c|..|z)==0 )

Test whether exactly one of two variables is zero using

1.1: Trivia 5

if ( (!a) ^ (!b) )

1.1.7 Average without overﬂow

A routine for the computation of the average (x+y)/2 of two arguments x and y is [FXT: bits/average.h]

1 static inline ulong average(ulong x, ulong y)

2 // Return floor( (x+y)/2 )

3 // Use: x+y == ((x&y)<<1) + (x^y)

4 // that is: sum == carries + sum_without_carries

5 {

6 return (x & y) + ((x ^ y) >> 1);

7 }

The function gives the correct value even if (x + y) does not ﬁt into a machine word. If it is known

that x ≥ y, then we can use the simpler statement return y+(x-y)/2. The following version rounds to

inﬁnity:

1 static inline ulong ceil_average(ulong x, ulong y)

2 // Use: x+y == ((x|y)<<1) - (x^y)

3 // ceil_average(x,y) == average(x,y) + ((x^y)&1))

4 {

5 return (x | y) - ((x ^ y) >> 1);

6 }

1.1.8 Toggling between values

To toggle an integer x between two values a and b, use:

pre-calculate: t = a ^ b;

toggle: x ^= t; // a <--> b

The equivalent trick for ﬂoating-point types is

pre-calculate: t = a + b;

toggle: x = t - x;

Here an overﬂow could occur with a and b in the allowed range if both are close to overﬂow.

1.1.9 Next or previous even or odd value

Compute the next or previous even or odd value via [FXT: bits/evenodd.h]:

1 static inline ulong next_even(ulong x) { return x+2-(x&1); }

2 static inline ulong prev_even(ulong x) { return x-2+(x&1); }

4 static inline ulong next_odd(ulong x) { return x+1+(x&1); }

5 static inline ulong prev_odd(ulong x) { return x-1-(x&1); }

The following functions return the unmodiﬁed argument if it has the required property, else the nearest

such value:

1 static inline ulong next0_even(ulong x) { return x+(x&1); }

2 static inline ulong prev0_even(ulong x) { return x-(x&1); }

4 static inline ulong next0_odd(ulong x) { return x+1-(x&1); }

5 static inline ulong prev0_odd(ulong x) { return x-1+(x&1); }

Pedro Gimeno gives [priv. comm.] the following optimized versions:

1 static inline ulong next_even(ulong x) { return (x|1)+1; }

2 static inline ulong prev_even(ulong x) { return (x-1)&~1; }

4 static inline ulong next_odd(ulong x) { return (x+1)|1; }

5 static inline ulong prev_odd(ulong x) { return (x&~1)-1; }

1 static inline ulong next0_even(ulong x) { return (x+1)&~1; }

2 static inline ulong prev0_even(ulong x) { return x&~1; }

4 static inline ulong next0_odd(ulong x) { return x|1; }

5 static inline ulong prev0_odd(ulong x) { return (x-1)|1; }

6 Chapter 1: Bit wizardry

1.1.10 Integer versus ﬂoat multiplication

The ﬂoating-point multiplier gives the highest bits of the product. Integer multiplication gives the

result modulo 2

where b is the number of bits of the integer type used. As an example we square the

number 111111111 using a 32-bit integer type and ﬂoating-point types with 24-bit and 53-bit mantissa

(signiﬁcand):

a = 111111111 // assignment

a*a == 12345678987654321 // true result

a*a == 1653732529 // result with 32-bit integer multiplication

(a*a)%(2**32) == 1653732529 // ... which is modulo (2**bits_per_int)

a*a == 1.2345679481405440e+16 // result with float multiplication (24 bit mantissa)

a*a == 1.2345678987654320e+16 // result with float multiplication (53 bit mantissa)

1.1.11 Double precision ﬂoat to signed integer conversion

Conversion of double precision ﬂoats that have a 53-bit mantissa to signed integers via [11, p.52-53]

1 #define DOUBLE2INT(i, d) { double t = ((d) + 6755399441055744.0); i = *((int *)(&t)); }

2 double x = 123.0;

3 int i;

4 DOUBLE2INT(i, x);

can be a faster alternative to

1 double x = 123.0;

2 int i = x;

The constant used is 6755399441055744 = 2

+ 2

. The method is machine dependent as it relies on the

binary representation of the ﬂoating-point mantissa. Here it is assumed that, the ﬂoating-point number

has a 53-bit mantissa with the most signiﬁcant bit (that is always one with normalized numbers) omitted,

and that the address of the number points to the mantissa.

1.1.12 Optimization considerations

Never assume that some code is the ‘fastest possible’. There is always another trick that can still improve

performance. Many factors can have an inﬂuence on performance, like the number of CPU registers or

cost of branches. Code that performs well on one machine might perform badly on another. The old

trick to swap variables without using a temporary is pretty much out of fashion today:

// a=0, b=0 a=0, b=1 a=1, b=0 a=1, b=1

a ^= b; // 0 0 1 1 1 0 0 1

b ^= a; // 0 0 1 0 1 1 0 1

a ^= b; // 0 0 1 0 0 1 1 1

// equivalent to: tmp = a; a = b; b = tmp;

However, under some conditions (like extreme register pressure) it may be the way to go. Note that if

both operands are identical (memory locations) then the result is zero.

The only way to ﬁnd out which version of a function is faster is to actually do benchmarking (timing). The

performance does depend on the sequence of instructions surrounding the machine code, assuming that

all of these low-level functions get inlined. Studying the generated CPU instructions helps to understand

what happens, but can never replace benchmarking. This means that benchmarks for just the isolated

routine can at best give a rough indication. Test your application using diﬀerent versions of the routine

in question.

Never ever delete the unoptimized version of some code fragment when introducing a streamlined one.

Keep the original in the source. If something nasty happens (think of low level software failures when

porting to a diﬀerent platform), you will be very grateful for the chance to temporarily resort to the slow

but correct version.

Study the optimization recommendations for your CPU (like [11] and [12] for the AMD64, see also [144]).

You can also learn a lot from the documentation for other architectures.

1.2: Operations on individual bits 7

Proper documentation is an absolute must for optimized code. Always assume that nobody will under-

stand the code without comments. You may not be able to understand uncommented code written by

yourself after enough time has passed.

1.2 Operations on individual bits

1.2.1 Testing, setting, and deleting bits

The following functions should be self-explanatory. Following the spirit of the C language there is no

check whether the indices used are out of bounds. That is, if any index is greater than or equal to

BITS_PER_LONG, the result is undeﬁned [FXT: bits/bittest.h]:

1 static inline ulong test_bit(ulong a, ulong i)

2 // Return zero if bit[i] is zero,

3 // else return one-bit word with bit[i] set.

4 {

5 return (a & (1UL << i));

6 }

The following version returns either zero or one:

1 static inline bool test_bit01(ulong a, ulong i)

2 // Return whether bit[i] is set.

3 {

4 return ( 0 != test_bit(a, i) );

5 }

Functions for setting, clearing, and changing a bit are:

1 static inline ulong set_bit(ulong a, ulong i)

2 // Return a with bit[i] set.

3 {

4 return (a | (1UL << i));

5 }

1 static inline ulong clear_bit(ulong a, ulong i)

2 // Return a with bit[i] cleared.

3 {

4 return (a & ~(1UL << i));

5 }

1 static inline ulong change_bit(ulong a, ulong i)

2 // Return a with bit[i] changed.

3 {

4 return (a ^ (1UL << i));

5 }

1.2.2 Copying a bit

To copy a bit from one position to another, we generate a one if the bits at the two positions diﬀer. Then

an XOR changes the target bit if needed [FXT: bits/bitcopy.h]:

1 static inline ulong copy_bit(ulong a, ulong isrc, ulong idst)

2 // Copy bit at [isrc] to position [idst].

3 // Return the modified word.

4 {

5 ulong x = ((a>>isrc) ^ (a>>idst)) & 1; // one if bits differ

6 a ^= (x<<idst); // change if bits differ

7 return a;

8 }

The situation is more tricky if the bit positions are given as (one bit) masks:

1 static inline ulong mask_copy_bit(ulong a, ulong msrc, ulong mdst)

2 // Copy bit according at src-mask (msrc)

3 // to the bit according to the dest-mask (mdst).

4 // Both msrc and mdst must have exactly one bit set.

5 {

6 ulong x = mdst;

7 if ( msrc & a ) x = 0; // zero if source bit set

8 x ^= mdst; // ==mdst if source bit set, else zero

9 a &= ~mdst; // clear dest bit

剩余977页未读，继续阅读

weixin_38743506

粉丝: 349
资源: 2万+

《计算问题：算法与思想探索》——源代码解析

Matters Computational

Matters Computational(pdf)

Algorithms:各种算法思想的个人实现。 主要与机器学习有关

Matters Computational-ideas, algorithms, source code

Matters Computational ideas, algorithms, source code

计算问题：算法与源代码探索

《Matters Computational》：算法与源代码探索

performance-matters:绩效问题

workers:Perf-matters rabbit mq 工人

matters

最新资源

Algorithms:各种算法思想的个人实现。主要与机器学习有关