NIST SP800-22测试：加密应用随机数生成器的统计评估

需积分: 10 91 浏览量更新于2024-07-16 收藏 7.23MB PDF 举报

"NIST SP800-22rev1a.pdf" 是一份由美国国家标准与技术研究所（NIST）发布的关于随机和伪随机数生成器（RNG和PRNG）的统计测试套件，专门针对加密应用。这份文档讨论了在选择和测试这些生成器时的一些关键点，强调了在加密应用中使用生成器的安全性和不可预测性要求。它还涉及统计测试和密码分析的关系，并提供了一些建议的统计测试方法，这些测试可以作为评估生成器是否适合特定加密应用的初步步骤。在加密应用中，随机和伪随机数生成器的重要性不言而喻，因为它们通常用于生成密钥材料等关键安全元素。根据描述，文档指出，用于此类应用的生成器需要满足比其他应用更严格的标准，尤其是其输出必须在不了解输入的情况下保持不可预测。文档中可能会讨论一些评估生成器性能和安全性的标准和准则。 NIST SP800-22测试套件包含了多个统计测试，这些测试旨在检查生成器的输出是否具有足够的随机性。虽然这些测试可以提供一定的保证，但文档也明确指出，没有任何一组统计测试能够绝对证明一个生成器在特定应用中的适用性。这意味着，除了进行统计测试外，还需要进行密码分析以确保生成器的安全性。关键词包括“随机数生成器”、“假设检验”和“P值”，暗示文档将深入探讨如何通过统计方法来评估随机性，包括使用P值来判断生成器输出的随机性是否达到预期的显著水平。P值是统计学中衡量观测结果与零假设相符程度的一个指标，若P值低于预设的显著性水平，通常会拒绝零假设，认为数据展现出的随机性足够。 NIST SP800-22rev1a是加密领域中评估随机和伪随机数生成器质量的重要参考资料，它提供了统计测试的指导和建议，帮助确保用于加密的随机数源的可靠性与安全性。这份文档由NIST的专家团队修订并发布，反映了当前对这一领域的科学理解和实践标准。

A STATISTICAL TEST SUITE FOR RANDOM AND PSEUDORANDOM NUMBER GENERATORS FOR CRYPTOGRAPHIC APPLICATIONS

The probability of a Type I error is often called the level of significance of the test. This probability can

be set prior to a test and is denoted as α. For the test, α is the probability that the test will indicate that the

sequence is not random when it really is random. That is, a sequence appears to have non-random

properties even when a “good” generator produced the sequence. Common values of α in cryptography

are about 0.01.

The probability of a Type II error is denoted as β. For the test, β is the probability that the test will

indicate that the sequence is random when it is not; that is, a “bad” generator produced a sequence that

appears to have random properties. Unlike α, β is not a fixed value. β can take on many different values

because there are an infinite number of ways that a data stream can be non-random, and each different

way yields a different β. The calculation of the Type II error β is more difficult than the calculation of α

because of the many possible types of non-randomness.

One of the primary goals of the following tests is to minimize the probability of a Type II error, i.e., to

minimize the probability of accepting a sequence being produced by a generator as good when the

generator was actually bad. The probabilities α and β are related to each other and to the size n of the

tested sequence in such a way that if two of them are specified, the third value is automatically

determined. Practitioners usually select a sample size n and a value for α (the probability of a Type I

error – the level of significance). Then a critical point for a given statistic is selected that will produce the

smallest β (the probability of a Type II error). That is, a suitable sample size is selected along with an

acceptable probability of deciding that a bad generator has produced the sequence when it really is

random. Then the cutoff point for acceptability is chosen such that the probability of falsely accepting a

sequence as random has the smallest possible value.

Each test is based on a calculated test statistic value, which is a function of the data. If the test statistic

value is

S and the critical value is t, then the Type I error probability is P(S > t || H

is true) = P(reject H

is true), and the Type II error probability is P(S ≤ t || H

is false) = P(accept H

| H

is false). The test

statistic is used to calculate a P-value that summarizes the strength of the evidence against the null

hypothesis. For these tests, each P-value is the probability that a perfect random number generator would

have produced a sequence less random than the sequence that was tested, given the kind of non-

randomness assessed by the test. If a P-value for a test is determined to be equal to 1, then the sequence

appears to have perfect randomness. A P-value of zero indicates that the sequence appears to be

completely non-random. A significance level (

) can be chosen for the tests. If P-value

≥ α

, then the

null hypothesis is accepted; i.e., the sequence appears to be random. If P-value

< α

, then the null

hypothesis is rejected; i.e., the sequence appears to be non-random. The parameter

denotes the

probability of the Type I error. Typically,

is chosen in the range [0.001, 0.01].

• An α of 0.001 indicates that one would expect one sequence in 1000 sequences to be rejected by

the test if the sequence was random. For a P-value ≥ 0.001, a sequence would be considered to be

random with a confidence of 99.9%. For a P-value < 0.001, a sequence would be considered to be non

random with a confidence of 99.9%.

• An α of 0.01 indicates that one would expect 1 sequence in 100 sequences to be rejected. A P-

value ≥ 0.01 would mean that the sequence would be considered to be random with a confidence of 99%.

A P-value < 0.01 would mean that the conclusion was that the sequence is non-random with a confidence

of 99%.

For the examples in this document, α has been chosen to be 0.01. Note that, in many cases, the

parameters in the examples do not conform to the recommended values; the examples are for illustrative

purposes only.

1-4

€

A STATISTICAL TEST SUITE FOR RANDOM AND PSEUDORANDOM NUMBER GENERATORS FOR CRYPTOGRAPHIC APPLICATIONS

1.1.6 Considerations for Randomness, Unpredictability and Testing

The following assumptions are made with respect to random binary sequences to be tested:

1. Uniformity: At any point in the generation of a sequence of random or pseudorandom

bits, the occurrence of a zero or one is equally likely, i.e., the probability of each is

exactly 1/2. The expected number of zeros (or ones) is n/2, where n = the sequence

length.

2. Scalability: Any test applicable to a sequence can also be applied to subsequences

extracted at random. If a sequence is random, then any such extracted subsequence

should also be random. Hence, any extracted subsequence should pass any test for

randomness.

3. Consistency: The behavior of a generator must be consistent across starting values

(seeds). It is inadequate to test a PRNG based on the output from a single seed, or an

RNG on the basis of an output produced from a single physical output.

1.2 Definitions

Term

Definition

Asymptotic Analysis

A statistical technique that derives limiting approximations for functions

of interest.

Asymptotic Distribution

The limiting distribution of a test statistic arising when n approaches

infinity.

Bernoulli Random

Variable

A random variable that takes on the value of one with probability p and the

value of zero with probability 1-p.

Binary Sequence

A sequence of zeroes and ones.

Binomial Distribution

A random variable is binomially distributed if there is an integer n and a

probability p such that the random variable is the number of successes in n

independent Bernoulli experiments, where the probability of success in a

single experiment is p. In a Bernoulli experiment, there are only two

possible outcomes.

Bit String

A sequence of bits.

Block

A subset of a bit string. A block has a predetermined length.

Central Limit Theorem

For a random sample of size n from a population with mean

and

variance

, the distribution of the sample means is approximately normal

with mean

and variance

/n as the sample size increases.

Complementary Error

Function

See Erfc.

Confluent Hypergeometric

Function

The confluent hypergeometric function is defined as

Φ(a; b;z) =

Γ(b)

Γ(a)Γ(b − a)

a−1

(1− t)

b− a−1

∫

,0 < a < b

Critical Value

The value that is exceeded by the test statistic with a small probability

(significance level). A "look-up" or calculated value of a test statistic (i.e.,

a test statistic value) that, by construction, has a small probability of

occurring (e.g., 5 %) when the null hypothesis of randomness is true.

Cumulative Distribution

Function (CDF) F(x)

A function giving the probability that the random variable X is less than or

equal to x, for every value x. That is,

F(x) = P(X ≤ x).

1-5

A STATISTICAL TEST SUITE FOR RANDOM AND PSEUDORANDOM NUMBER GENERATORS FOR CRYPTOGRAPHIC APPLICATIONS

Entropy

A measure of the disorder or randomness in a closed system. The entropy

of uncertainty of a random variable X with probabilities p

, …, p

defined to be H(X) =

i 1

−

∑

p log p

Entropy Source

A physical source of information whose output either appears to be

random in itself or by applying some filtering/distillation process. This

output is used as input to either a RNG or PRNG.

Erfc

The complementary error function erfc(z) is defined in Section 5.5.3. This

function is related to the normal cdf.

Geometric Random

Variable

A random variable that takes the value k, a non-negative integer with

probability p

(1-p). The random variable x is the number of successes

before a failure in an infinite series of Bernoulli trials.

Global Structure/Global

Value

A structure/value that is available by all routines in the test code.

igamc

The incomplete gamma function Q(a,x) is defined in Section 5.5.3.

Incomplete Gamma

Function

See the definition for igamc.

Hypothesis (Alternative)

A statement H

that an analyst will consider as true (e.g., H

: the sequence

is non-random) if and when the null hypothesis is determined to be false.

Hypothesis (Null)

A statement H

about the assumed default condition/property of the

observed sequence. For the purposes of this document, the null hypothesis

is that the sequence is random. If H

is in fact true, then the reference

distribution and critical values of the test statistic may be derived.

Kolmogorov-Smirnov Test

A statistical test that may be used to determine if a set of data comes from

a particular probability distribution.

Level of Significance (

)

The probability of falsely rejecting the null hypothesis, i.e., the probability

of concluding that the null hypothesis is false when the hypothesis is, in

fact, true. The tester usually chooses this value; typical values are 0.05,

0.01 or 0.001; occasionally, smaller values such as 0.0001 are used. The

level of significance is the probability of concluding that a sequence is

non-random when it is in fact random. Synonyms: Type I error,

error.

Linear Dependence

In the context of the binary rank matrix test, linear dependence refers to m-

bit vectors that may be expressed as a linear combination of the linearly

independent m-bit vectors.

Maple

An interactive computer algebra system that provides a complete

mathematical environment for the manipulation and simplification of

symbolic algebraic expressions, arbitrary extended precision mathematics,

two- and three-dimensional graphics, and programming.

MATLAB

An integrated, technical computer environment that combines numeric

computation, advanced graphics and visualization, and a high level

programming language. MATLAB includes functions for data analysis and

visualization; numeric and symbolic computation; engineering and

scientific graphics; modeling, simulation and prototyping; and

programming, application development and a GUI design.

Normal (Gaussian)

Distribution

A continuous distribution whose density function is given by

);( ;













−

πσ

µ σ

exf

, where µ and σ are location and scale

parameters.

1-6

A STATISTICAL TEST SUITE FOR RANDOM AND PSEUDORANDOM NUMBER GENERATORS FOR CRYPTOGRAPHIC APPLICATIONS

P-value

The probability (under the null hypothesis of randomness) that the chosen

test statistic will assume values that are equal to or worse than the

observed test statistic value when considering the null hypothesis. The P-

value is frequently called the “tail probability.”

Poisson Distribution - §3.8

Poisson distributions model (some) discrete random variables. Typically,

a Poisson random variable is a count of the number of rare events that

occur in a certain time interval.

Probability Density

Function (PDF)

A function that provides the "local" probability distribution of a test

statistic. From a finite sample size n, a probability density function will be

approximated by a histogram.

Probability Distribution

The assignment of a probability to the possible outcomes (realizations) of

a random variable.

Pseudorandom Number

Generator (PRNG)

A deterministic algorithm which, given a truly random binary sequence of

length k, outputs a binary sequence of length l >> k which appears to be

random. The input to the generator is called the seed, while the output is

called a pseudorandom bit sequence.

Random Number

Generator (RNG)

A mechanism that purports to generate truly random data.

Random Binary Sequence

A sequence of bits for which the probability of each bit being a “0” or “1”

is ½. The value of each bit is independent of any other bit in the sequence,

i.e., each bit is unpredictable.

Random Variable

Random variables differ from the usual deterministic variables (of science

and engineering) in that random variables allow the systematic

distributional assignment of probability values to each possible outcome.

Rank (of a matrix)

Refers to the rank of a matrix in linear algebra over GF(2). Having

reduced a matrix into row-echelon form via elementary row operations, the

number of nonzero rows, if any, are counted in order to determine the

number of linearly independent rows or columns in the matrix.

Run

An uninterrupted sequence of like bits (i.e., either all zeroes or all ones).

Seed

The input to a pseudorandom number generator. Different seeds generate

different pseudorandom sequences.

SHA-1

The Secure Hash Algorithm defined in Federal Information Processing

Standard 180-1.

Standard Normal

Cumulative Distribution

Function

See the definition in Section 5.5.3. This is the normal function for mean =

0 and variance = 1.

Statistically Independent

Events

Two events are independent if the occurrence of one event does not affect

the chances of the occurrence of the other event. The mathematical

formulation of the independence of events A and B is the probability of the

occurrence of both A and B being equal to the product of the probabilities

of A and B (i.e., P(A and B) = P(A)P(B)).

Statistical Test (of a

Hypothesis)

A function of the data (binary stream) which is computed and used to

decide whether or not to reject the null hypothesis. A systematic statistical

rule whose purpose is to generate a conclusion regarding whether the

experimenter should accept or reject the null hypothesis H

Word

A predefined substring consisting of a fixed pattern/template (e.g., 010,

0110).

1-7

A STATISTICAL TEST SUITE FOR RANDOM AND PSEUDORANDOM NUMBER GENERATORS FOR CRYPTOGRAPHIC APPLICATIONS

1.3 Abbreviations

Abbreviation

Definition

ANSI

American National Standards Institute

FIPS

Federal Information Processing Standard

NIST

National Institute of Standards and Technology

RNG

Random Number Generator

SHA-1

Secure Hash Algorithm

1.4 Mathematical Symbols

In general, the following notation is used throughout this document. However, the tests in this document

have been designed and described by multiple authors who may have used slightly different notation.

The reader is advised to consider the notation used for each test separate from that notation used in other

tests.

Symbol

Meaning





The floor function of x; for a given real positive x,





= x-g, where





is a non-negative integer, and 0

≤

The significance level.

The normalized difference between the observed and expected number of frequency

components. See Sections 2.6 and 3.6.

∇ψ

(obs);

∇

(obs)

A measure of how well the observed values match the expected value. See Sections

2.11 and 3.11.

E[ ]

The expected value of a random variable.

The original input string of zero and one bits to be tested.

The i

bit in the original sequence

The null hypothesis; i.e., the statement that the sequence is random.

log(x)

The natural logarithm of x: log(x) = log

(x) = ln(x).

log

(x)

Defined as

ln( 2 )

ln( x )

, where ln is the natural logarithm.

The number of bits in a substring (block) being tested.

The number of M-bit blocks to be tested.

The number of bits in the stream being tested.

The sum of the log

distances between matching L-bit templates, i.e., the sum of the

number of digits in the distance between L-bit templates. See Sections 2.9 and 3.9.

3.14159… unless defined otherwise for a specific test.

The standard deviation of a random variable =

( )

∫

− f ( x )dxx

The variance of a random variable = (standard deviation)

obs

The observed value which is used as a statistic in the Frequency test.

The n

partial sum for values X

= {-1, +1}; i.e., the sum of the first n values of X

The summation symbol.

Standard Normal Cumulative Distribution Function (see Section 5.5.3).

The total number of times that a given state occurs in the identified cycles. See

Section 2.15 and 3.15.

The elements of the string consisting of ±1 that is to be tested for randomness, where

= 2

-1.

1-8

剩余130页未读，继续阅读

艾米的爸爸

粉丝: 800
资源: 314

NIST SP800-22测试：加密应用随机数生成器的统计评估

NIST SP800-66 Rev1：HIPAA安全规则实施指南

NIST SP800-107 rev1：安全使用批准哈希算法的指南

NIST SP800-67 Rev1：三重数据加密算法（TDEA）

SP800-22rev1a 随机数与伪随机数生成

STM32微控制器的随机数生成器如何通过NIST SP800-22rev1a和SP800-90b测试集进行验证？请详细说明测试流程及意义。

如何确保STM32微控制器的随机数生成器满足NIST SP800-22rev1a和SP800-90b标准？请提供具体的验证步骤和测试流程。

DFT的matlab源代码-sp800_22_tests:SP800-22Rev1aPRNG测试套件的python实现

NIST 随机数测试 中英文

使用NIST统计测试集验证STM32微控制器随机数生成

STM32微控制器随机数生成验证：NIST统计测试实战

最新资源

NIST 随机数测试中英文