**PSEUDO-RANDOM NUMBER
GENERATORS**

**(REPORT I)**

A report submitted in partial fulfilment of the requirements for the

degree of Bachelor of Science

by Z. O’Connell (950786535)

Department of Information Systems and Computing,

Brunel University

January 1999

**Table of Contents**

Chapter 1 – Introduction 3

1.1 Problem Definition 3

1.2 Study Scope 3

1.3 Study Objectives 3

Chapter 2 – Available generators 5

Introduction 5

2.1 Von Neumann mid-square 7

2.2 Congruential Generators 7

2.3 Feedback Shift Registers 9

2.4 Algorithm M 10

2.5 Lagged-Fibonacci 11

Chapter 3 – Analysis of random sequences 12

Introduction 12

3.1 Statistical frequency tests 14

3.2 Non-statistical empirical tests 15

3.3 Marsaglia "Stringent" Tests 17

Chapter 4 – Conclusion 19

References 20

Appendix A: Evaluation 22

A.1 Project Overview 22

A.2 The Approach 22

A.3 Possible Future Work In The Area 23

# Chapter 1 – Introduction

## 1.1 Problem Definition

There are many known algorithms for generating numbers which are,

or at least appear to be, random. There are also a large number of

tests to check that what is produced by such generators does closely

resemble a random stream. There is not, however, a currently freely

available suite of generators and testing routines allowing the user

to eaisly define new generators and testers for experimental

purposes. This report reviews the sources required for implementing

such a program and presents some background on the more fundimental

aspects of random number generation and testing.

## 1.2 Study Scope

This project studies the speed and efficiency of pseudo-random

number generators and tests used to eliminate "flawed"

generators. Methods for converting "flat" integer

distributions into standard real number distributions are also

briefly examined. The ultimate aim of the project is to produce a set

of random numbers generators of various types and a battery of tests

which can be used to "weed out" any flawed generator,

hopefully arriving at a single generator or set of generators for

various applications that are both fast and as random as possible.

Cryptanalysis of random number generators in general is too wide a

field to be able to study in detail in this project, so this project

will concentrate on the algorithmic and simulation modeling

applications of random numbers. Despite this, much of the source

material comes from cryptography primarily because most of the recent

work on PRNGs has been in that area.

The generators we study produce, in general, equally distributed

output between zero and a known value. Although mathematical

functions exist to convert this output into other distributions using

real numbers, such techniques are also not examined here.

## 1.3 Study Objectives

The aim of this report is to briefly summarise the sources of

information required to produce a suite of random numbers. It is not

intended that there should be enough information contained here to do

any theoretical analysis of generators or tests or to implement the

more complicated tests as such a report would be several times the

size of this one.

Generators discussed here are those directly relating to the main

area of study – uses of pseudo-random numbers in simulation modeling

and probabilistic algorithms. It is intended to analyse more

generators than those presented here in the second part of the

project in order to compare these generators to near-perfect or

"secure" (Unpredictable) but slower cryptograpic and

hardware generators, however such generators do not need to be

implemented as source code or generator output is freely available.

# Chapter 2 – Available

generators

## Introduction

Knuth [1] offers the only detailed published non-cryptographic

discussion on what a random number actually is, although much of the

discussion is now out of date. The discussion is, in any case,

mathematical and doesn’t help us understand what a random number is

at a fairly basic level. Most notably, his discussion makes no

reference to the use of random numbers for encryption, which is now

one the main area of research connected to random numbers.

A good definition [22] of a "True" random number

generator is that is should be "Unbiased, unpredictable and

unreproducable". As we are concentrating on probabilistic

algorithms and simulation modeling applications, we are primarily

concerned with the unbiased aspects of random number streams. A

__pseudo__-random number generator will always be predictable and

reproducable given knowledge of the algorithm and initial seed value

used. In both fields, we are not worried about the predictability of

an unbiased algorithm, and unreproducability can be a very

undesirable trait if a particular run needs to be repeated.

A computer, being a purely mathematical machine cannot generate

truly random number sequences. Instead, it has to rely on

mathematical ‘tricks’ to produce a sequence that *appears* to be

random. There are electronic devices, such as ERNIE and Rairfield et

al’s integrated circuit [3] based on "*…the frequency
instability of a free running oscillator*". These systems can

generate random number sequences which are dependent on entropy in

the environment and as they are not practically predictable, are

suitable for use in cryptographic systems or where predictable

numbers are undesirable, such as lotteries. They are not studied here

however as we are interested primarily in generators which are

implementatable in software. Cryptographic

generators are also not looked at as they sacrifice much of their

speed in the interests of security. Similarly, cryptographic

hashes of semi-random sources, such as the SHA hash of entropy gained

from hardware events used by Linux (See [11], also

/usr/src/linux/drivers/char/random.c on

any Linux machine) are also not discussed as they require OS-level

access to the hardware to generate the initial entropy and as such

can not be implemented portably.

It is very easy for anyone to propose a random number generator

without properly understanding what they are doing, as Knuth [1]

mentions, ("*…random numbers should not be generated with a
method chosen at random. Some theory is required*.")

so any generator which doesn’t at least have an analysis using the

stnadard tests isn’t discussed here. For our non-cryptographic

purposes, we require generators that are far quicker than these

cryptographic ones. There are many combination generators and

"shuffling" tecniques which are also mostly not discussed,

such as the McGill "Super-Duper" generator mentioned in

[21], as there are so many that it would take a long time to study

and discuss the relative merits of each. These methods are in general

much slower than a single generator and designed to "defeat"

cryptographic attacks rather than increase randomness.

One of the most important aspects of a

generator is it’s peroid and it’s tendancy to "degenerate"

to sequences such as all zeros. The period of a generator is how long

a sequence it generates without repeating. This is mainly a problem

with generators, such as the Von Neuman middle-squared method, that

use the output of one iteration as the "seed" for the next

one and don’t maintain any hidden "state". With such a

system, a particular seed will always produce the same output

sequence, and if that value reoccurs in the output sequence the

generator repeats. Evaluating the period of a generator with

internal state of more than 32 bits algorithmically requires a

prohibitive amount of memory (512Mb, assuming states that have

occurred are stored in an array of 2^{32} single-bit boolean

values) and knowledge of the internals of the generator, so it is

difficult to do. Fortunatly, the serial test picks up on this well

and other tests tend to produce poorer results for generators that

have a low period or degenerate quickly so we do not need to test for

these attributes directly.

## 2.1 Von Neumann mid-square

The earliest proposed generator, by Von Neuman, is simple and

relatively quick but no longer considered suitable for any

application because of it’s relatively low period and tendancy to

degenerate rapidly. The basis of the generator is to take a seed

value, square it and take as many digits from the middle of the

number as required. Knuth [1] describes this method in details and

points out that in most (not all) cases that a Von Neumann generator

will loop after a relatively small number of iterations. Although he

gives a method for stopping a Von Neumann generator from looping and

gives cases with a very long period, there are generators which are

almost as fast which are for more random.

Paul & Balmer [15] discuss a modified form of Von Neumann

where rather than squaring a number, the product of two numbers is

used and the middle digits are taken but this "mid-product"

method is also flawed and biased in the same way as Von Neumann,

although with longer peroid and less tendancy to degenerate.

## 2.2 Congruential Generators

Congruential generators, those which in some way use the modulus

function, are by far the most common generators in use and are

discussed in every book on the topic. There are several types of

congruential generator available, however.

### 2.2.1 Linear Congruential

Also called a "mixed congruential generator", several

texts describe linear congruential methods in detail, although they

were originally proposed by Lehmer in 1948, although not published

until 1951. [17] Knuth (As usual) presents the best basic discussion

on this method, which is based on the formula f(*x*+1)=(*a*

f(*x*)+*c*)mod*m*,

where *a*, *c* and *m*

are constants chosen to give good speed and period. For this

generator, *r*=*m*,

but as Knuth shows it is a mistake to choose *m*

to be a "convenient" value such as 65536. Knuth [1] and

Banks et al. [13] go into quite some detail over the choice of *a,
c* and

*m*

which are shown to be very important to choose carefully to get a

good generator. In practice these values are usually fixed in a

particular implementation as the potential to produce a very poor

generator by accidently choosing the wrong vaules is very high. When

analysing linear congruential generators each set of choices for

values is generally considered as a separate generator as properties

such as period and even the overall randomness of the output vary

wildly based on them. Schneier [8] gives a table of "good"

values which pass the spectral test, a partly-theoretical test used

to analyse the choice of values for linear congruential generators.

Because of their simplicity, linear congruential generators with good

choices of values are still considered good sources of random numbers

for non-cryptographic applications. The period of a well-chosen LCG

on a 32-bit computer is only 2D10

^{9}

[13] however, so other generators are required for many uses.

### 2.2.2 Multiplicative Congruential

Lehmer, when he originally proposed a congruential generator,

actually gave a linear congruential generator but with *c*=0.

This "multiplictive congruential generator" is not often

used because computationally *c@*0

is trivial and greatly increases the randomness of the generator. [1]

It is used as the basis for other combined generators however, most

notably Wichmann & Hill [5], although they don’t say why they

chose multiplicative over linear congruential generators.

### 2.2.3 Additive Congruential

Additive congruential generators, sometimes called "Fibonacci"

generators, of the form f(*x*)=(f(*x*-1)+f(*x*–*k*))mod

*m* are mentioned briefly by Kleijnen and Groenendaal

[17] although are no longer considered good generators in most cases.

(The name Fibonacci is due to the similarity of the generator to the

fibonacci sequence with *k*=2 rather than any link with the

mathematician of the same name.) The origins of this form of

generator are unclear but Knuth mentions it as having been proposed

in 1950’s with *k*=2 and then

analysed with *k@*2

by Green, Smith and Klem in 1959. *m*=2^{35}

and *k*=97 is mentioned

in [17] as being a good version of this generator. If these values

give a good period and pass the standard randomness tests then this

is a very good generator as it is faster then even Von Neumann – it

is a single addition operation and a modulus that, on a 64-bit

computer, can be implemented in a single AND instruction.

### 2.2.4 Polynomial Congruential

Schneier [8] and Knuth [1] mention, unfortunately without

references for their source, polynomial congruential generators of

the form f(*x*+1)=(*a* f(*x*)^{n}+*b*

f(*x*)^{n-1}…+*c*)mod*m*.

Without knowing where these come from it is impossible to make a

guess as to good values for the generator. As with all congruential

generators, theory is required to select values to produce a good

generator. Although the spectral test (See section 3.4) could

probably be adapted to do this, such work is beyond the scope of this

project.

### 2.2.5 Wichmann & Hill Combined

Wichmann & Hill [5] produced an combined generator based on

three multiplictive congruential sources with different periods.

(There is no reason not to use more source generators, but Weichman &

Hill state that they found three adequate to produce good randomness)

Although their original paper is very short (Just over 3 pages with

spacing similar to this report) and was initially only an attempt to

produce a fast, portable generator for Pascal they managed to find a

new form of generator that produces good random numbers. The main

attraction of this generator is it’s speed as it is designed with

microcomputers in mind. (Specifically, the PDP-11) For a congruential

generator, it also has a very large period (Wichmann & Hill claim

2.78D10^{13}) which should be

sufficient for most applications. Unusually, this generator produces

real numbers between 0 and 1 rather than integers. The production of

real numbers appears too closely tied in with the theory behind the

generator to be removed, and this fact could slow the generator down

under certain circumstances as discussed in the next section.

Unfortunately, Wichmann & Hill don’t say what the resolution of

the output is (I.e. How many decimal places can be regarded as

random) which we will need to know to do accurate testing, so some

theoretical analysis of this generator is probably needed. A serial

test conducted on a generator with an assumption of more resolution

than is actually present will probably fail.

### 2.2.6 L’Ecuyer Combined

L’Ecuyer [7] produced what is now considered the best method of

combining linear congruential generators, and is the only

"non-cryptographic" generator Schneier [8] bothers giving

code for and the only combined generator discussed by Banks et al.

[13] L’Ecuyer’s generator has a far more detailed theoretical base

than Wichmann & Hill present. Although more complex than W&H’s

combined generator, it is still easy to implement and relatively fast

compared to some other generators. Schneider claims a period for his

implementation of "*somewhere in the neighborhood of 10 ^{18}*".

This brings us onto the interesting question of speed versus

portability. Although L’Ecuyers period is much greater than W&Hs

period for most applications W&H is adequate and if it passes

randomness tests adequately there seems no reason to use L’Ecuyer if

it is slower. L’Ecuyer is more complicated however, so likely to be

slower but W&H relys on floating-point calculations, the speed of

which can vary wildly between different processor types. Therefor,

when considering the speed of generators during implementation they

will need to be tested on different platforms to gauge the difference

in performance caused by such details.

### 2.2.7 Inversive Congruential Generators

Although ICGs, which are essentially LCGs that use the

multiplicative inverse of f(*x*-1)

rather than the raw value, are usually ignored as they fail

the Diehard tests [19], Marsaglias implementation in the diehard

suite did not use the best values, [26] which highlights the

importance of choosing the correct values for a generator and the

fact that a specific implementation should be tested before use and

not blindly relied on because it implements a "known good"

generator. There appears to have been no "fixed" version of

this generator tested using the Marsaglia/Diehard tests so it is not

known if this generator performs well or not.

## 2.3 Feedback Shift Registers

The other major type of generator is the feedback shift register,

sometimes called Tausworthe generators. The origin of this generator

is unclear – some sources say it was proposed by Selmer in 1965,

others by Tausworthe in the same year. The generator works by taking

a single bit from the right of a *n*-bit

register, shifting the register right by one bit then calculating the

left-most bit based on the contents of the whole register. The main

attraction of this style of generator lies with the cryptographic

security of certain versions of FSRs and in it’s ease of

implementation in hardware. However, they are also very easy to

implement in software and fast so they are also valuable for other

applications. As with congruential generators, there is more than one

version of this generator.

### 2.3.1 Linear feedback shift register

The fastest form of FSR is the linear feedback shift register,

which uses two or more bits from the register XORed together to give

the new bit. Schneier [8] gives a detailed discussion of this method,

including details on how to choose which bits to XOR for maximum

effect. Schneier claims that a LFSR is non-random if large numbers

are generated although doesn’t give a convincing argument as to why

this should be and Kleijnen and Groenendaal [11] don’t consider the

LFSR flawed compared to congruential generators. There are certain

problems with this style of generator, most notably that the shift

register must be large compared to the size of output word to

approximate random output. It is also relatively easy for the output

to degenerate into all-zeros or other repeating patterns in a similar

manner to Von Neumann generators.

### 2.3.2 Feedback with carry shift register

Feedback with carry shift register generators are relatively

new and were devised for cryptographic uses so there is no study on

their practical use in non-cryptographic applications. The basic

feedback function is the sum of all the "tapped" bits plus

the carry register – that value mod 2 becomes the new bit, and div 2

becomes the new carry register. [8] Like congruential generators with

differing values for *a, c* and

*m*, FCSRs need to have any initial value tested before

use to check that they have reasonable period and don’t degenerate.

Schneier gives a table of "tap sequences" (Which bits in

the FSR need to be used) to give maximum periods – the maximum

periods are typically in the range 2^{n} where *n*

is the size of the feedback shift register. This gives a maximum

period of 3.4D10^{38}

for a 128-bit generator, which is certainly acceptable for most

applications.

### 2.3.3 Mersenne Twisters

Mersenne Twisters are a form of

generator proposed by Matsumoto

and Nishimura in [24] that, they claim, have the rather amazing

period 2^{19937}-1 and pass "*several stringent
statistical tests, including diehard*". They are based on

"Twisted" Feedback-shift-register generators that use

Mersenne Primes to produce longer periods (Most generators,

particularly congruential and combination generators use normal

primes or relatively-prime numbers to achieve a longer period) This

generator is very new (Less that one year at the time of writing) and

although apparently well received on the Internet, not discussed in

other works as yet. The algorithm on initial examination appears to

be quite slow but actually lends itself quite well to computer

implementations and general Internet research (Comments made on

newsgroups) suggest that it probably compares favorably with other

non-cryptographic generators.

### 2.3.3 Other feedback shift register generators

The above generators are by no means the only FSR generators

available. However, they are the only two fast enough to be

considered usable for general applications. For example, the

FSR-based Blum, Blum & Shub generator is often used in

cryptography and considered very good but is far too slow for most

uses. [18, NEWS2/94061801.HTM]

## 2.4 Algorithm M

Algorithm M is a method originally proposed by MacLaren and

Marsaglia for combining two random sequences in a computing-friendly

(I.e. fast) way to produce a "*considerably more random*"

output. [1] It is named after it’s discussion in Knuth where it is

referred to as just "Algorithm M". Unlike combination

generators discussed earlier this generator is designed to work with

any sequence and not just congruential generators. Knuth mentions

that if the two source generators periods are relatively prime with

respect to one another that the period of the product of both

generators, and this is true in the case of most combination

generators. Knuth doesn’t actually offer a proof of this, it is left

as a (relatively easy) exercise for the reader.

## 2.5 Lagged-Fibonacci

The Lagged-Fibonacci generator is dealt with in great detail by

Marsaglia, [21] and as the name implies is closely related to the

additive congruential generator discussed above. The generator has

the formula f(*x*)=f(*x*–*r*)¿f(*x-s*)

where *r* and *s*

are the "lag" constants for the generator and ¿

is a mathematical operation. The additive congruential generator

follows this formula where *r*=1,

*s*=*k*

and ¿ is

addition followed by mod *m*.

Lagged-Fibonacci generators are typically dealt with separately from

congruential generators as the theory behind choosing the correct

values is radically different. Marsaglia goes on to show that

Lagged-fibonacci generators using multiplication are the only

non-combination generator he studies that passes all the "diehard"

[19] tests. However, it is not clear whether the comments in [26]

were made before or after this comment – see section 2.5.2.

### 2.5.1 Combination-lagged-Fibonacci

Marsaglia also gives two combination generators in [21], COMBO and

NCOMBO, based on two specific cases of the Lagged-Fibonacci combined

using simple subtraction. Marsaglia shows that the result is indeed

more random as long as neither generator degenerates, based on work

done by Brown and Solomon and Marshall and Olkin, however the proof

is rather complicated. The theory is applicable to generators other

than Lagged-Fibonacci, and Marsaglia also analyses the McGill

"Super-duper" MCR/LFSR combination generator in the same

paper.

### 2.5.2 "With-carry" generators

A cross between the lagged fibonacci generator and the

Feedback-with-Carry-Shift-Register generators are the "with-carry"

generators, and although usually considered separately from

Lagged-Fibonacci generators are essentially of the same class.

Add-with-carry generators are basic additive congruential generators

that add in the concept of carry as used by FCSR generators.

Subtract-with-borrow and multiply-with-carry generators follow along

the same lines using subtraction and multiplication instead of

addition. Add-with-carry and subtract-with-borrow are not

particularly powerful generators, however multiply-with-carry has

been shown to perform well in tests and have a long period as well as

being very fast. These generators have been studied extensively by

Marsaglia in [18], [19] and [25]. Knechtel writes: [26] "*The
best of [the multiply with carry generators] seem to pass all the
Diehard tests *(See section 3.3)

*,*

even tests that many common pseudorandom number generators fail."

even tests that many common pseudorandom number generators fail.

Little independent study of these generators exist yet, however

Marsaglia himself is bullish: "

*I predict it will be the*

system generator for many future computers", and claims a

system generator for many future computers

period of 2

f(

*x*-1)+

*c*mod 2

^{32}, given if

*a*

is chosen "*from a large set of easily found integers*".

Unfortunately, the comments above are not dated and the diehard

suite [19] has evolved over time; the tests the latest version

employs are no longer the same as the ones discussed in [21]. As

mentioned in [25], Marsaglia didn’t examine this generator in paper

[21] which may mean he was unaware of it at the time.

# Chapter 3 – Analysis of random

sequences

## Introduction

The descriptions of tests given here are necessarily brief – most

of them would take at least two full pages each to describe in

detail. The information here is meant to be sufficient to understand

the comments and discussion on the generator and to give the reader

some understanding of the analysis routines that will be written as

the implementation part of this project. When applying these tests,

it is important to remember that it is almost always possible to

construct a non-random algorithm which a specific test will not pick

up on as non-random – such an approach is not a proof that the test

does not work.

Formulae for the statistical tests and values of S(*x*)

(Needed for statistical analysis of the output results) for tests are

given, as these are often wildly different between texts,

particularly in the case of Kolmogorov-Smirnov, so "standardised"

versions are necessary for implementation. Where this isn’t possible,

due to the complexity of the equations or a requirement for lengthy

lookup tables, a specific work with good examples is cited. Most of

the tests here are "standard" tests used and discussed by

almost every basic textbook on the subject, so specific citation is

only given to unique points of note.

Knuth [1] points out that there is no such thing as a "Random

Number", as you cannot say whether a single number on it’s own

is random or not. For example, is the number 3 random or not? We

cannot answer this question. It is

assumed here that all the generators produce integer values from 0 to

*r*, as this is true in all but one case for the generators we

have looked at. Following on from this, is the sequence 3, 3,

3, 3 random? Possibly, as it is just as likely as any other sequence

of four numbers. When testing random numbers, we test probabilities –

a generator that produces many threes and few other numbers over a

long period of time is highly unlikely to be unbiased and, in

statistical terms, we "reject the hypothesis" that it is

unbiased. As Paul and Balmer [15] note, "Y*ou can try to be
too perfect. Genuine random sequences contain subsequences that are
highly systematic*".

There are several tests not considered here. There has been much

work published on the spectral test (Also known as Fourier analysis)

in many different forms however it has a theoretical base closely

related to choosing good values for linear congruential generators.

Theoretical tests have also not been discussed as they require human

(rather than algorithmic) analysis on the structure of the generator.

It is somewhat surprising to see that almost all the tests have their

origins in work done by Kendall and Babington-Smith in the 1940s and

that Knuth’s [1] work in 1969 is still probably the most valuable

text on empirical testing – Marsaglia actually refers to the standard

tests as "Knuth’s" (See quote in section 3.3) rather than

Kendall and Babington-Smith’s. I have limited statistical tests to

just the three most common which measure different properties of the

generator as any common statistical test could be applied.

In the interests of readability, the following notation is used

throughout, regardless of the notation used in the source documents

as almost every book uses different notation.

*n – *Length of random

sequence being analysed. (I.e. Number of values being analysed)

"Total number of independent observations" in statistical

terminology.

*r – *Number (Mnemonic:*
*Range) of possible outputs. (For

example,

*r*=65536 if possible integer outputs are in the range

0-65535 inclusive)

O* _{r} –
*Observed frequency of value

*r*.

E* _{r} –
*Expected frequency of value

*r*.

F(*x*) – PRNG as a function, 0T*x*<*n*

S(*x*) – C.d.f of expected output

*p* – Probability of a given

distribution output, calculated using S(*x*)

## 3.1 Statistical frequency tests

### 3.1.1 Chi-squared test

The Chi-squared (æ^{2})

test verifies that a random sequence is distributed as we would

expect. In all the generators discussed here, the output is initially

evenly distributed real numbers between two values. To do this, a

count of the number of occurrences of each value *s*

is taken, referred to as *O _{s}*.

Using the equationwe

can measure the "goodness of fit" of a particular

generator. In all cases studied here,

*E*as

_{s}=n/rthe generators raw output are all evenly distributed between two

numbers. Once we have the æ

^{2}

value, the value of *p* can be

calculated to find out how likely such a value is and therefor

if the value is "suspect" or not. Is it possible to

approximate appropriate values of *p* for æ^{2},

(*p* cannot be calculated

exactly) however the formula to do this is very complicated

and most mathematical libraries include code to do this as standard.

If only a "Fail" or "Pass" indication is

required, it is faster to use a table of critical values, such as

those given in most statistical texts, Most books on the subject

assume this is what is required, however for our purposes we will

need the actual values to implement Knuths suggestion of running the

Kolmogorov-Smirnov test on the outputs of many æ^{2}

tests, as discussed below.

Knuth points out that applying this

test to a very large sample of output has it’s problems however, as

although it picks up general long-term bias in the output, it won’t

always detect what Knuth terms "Locally non-random behavior",

for example often-repeated sequences of numbers. These could be

picked up if they are frequent enough and biased heavily towards one

set of numbers, but they could easily be balanced out long-term by

other locally non-random behavior. Although we are constrained by the

general requirement that E_{s}U5

for æ^{2}, [1] it is best

to keep *n* as small as possible and repeat the test several

times. This applies to all the other tests discussed here as well,

although some tests such as the serial test produce a small amount of

output for a large *r* so there is only likely to be enough data

in even a large sample for a single æ^{2}

test.

When applying this test many

times on one generator, we need a method of measuring the output more

formal than simply the standard statistical "reject the

hypothesis" method, as with a large enough number of independent

tests this is likely to occur at least once. This also checks that

the distribution of *p* is reasonable – i.e. the generator is

"globally random". To do this Knuth proposes applying the

Kolmogorov-Smirnov test to the values of *p. *(See section

3.1.2) The æ^{2} test is

inappropriate for this task as it is not well suited to handling

relatively small *n* with infinitely variable real, as opposed

to a fixed number of integer, values. Other authors unfortunately

assume a single test and don’t discuss this.

It is important to note that æ^{2}

requires independent tests. For example, if applying æ^{2}

to the serial test in section 3.3, if *c*=2 and *a*=2, *x*

should be sampled as 0, 1, 4, 5, 8, 9… to avoid using any values

twice. However, Marsaglia [21] has produced modified versions of

many of the standard empirical tests which are suitable for running

with "overlapping" values.

### 3.1.2 Kolmogorov-Smirnov test

The Kolomogorov-Smirnov (KS) test is a measure of the maximum

difference between the expected and actual distribution. It is more

suited to evaluating real numbers rather than discrete integer

values, however it can detect non-randomness that æ^{2}

fails to, so it is a useful test. This gives a formula for KS of

*D*=*n*^{½}maxsF(*x*)-S(*x*)s,

which like æ^{2 }produces

a values which can be calculated to give *p *to see if the value

is too high or low. Fortunately, *p* can be approximated for

large *n* using the formula

S(*x*)=1-e(-2*x*^{2}).

Knuth suggests *n*=1,000,

although this only gives 2 decimal places of accuracy. With modern

computing hardware, *n=*100,000

gives a better compromise between speed and accuracy. (3 d.p.) This

does mean that a large number of KS/æ^{2}

tests will need to be carried out for each generator unless a table

of critical values is used. (The same formula is used for

applying KS to KS results) There are many versions of the KS

test, so care should be taken to use

the correct tables/formula. Knuths formulae from [1] are used

here primarily because his descriptions on KS are better and he

presents the formula for S(*x*),

although the form of equations he uses are not the most

common-version.

Algorithmically, KS is best calculated by ordering the data from

*X _{1}* to

*X*

_{n}(Ascending order if S(*x*) increases as *x* increases and

vise-versa) and finding max(max(*j/n*-S(*X _{j}*),

max(S(

*X*)-(

_{j}*j-1*)/

*n*)),

*1TjTn*.

(The c.d.f. for evenly distributed random numbers from 1 to

*r*-1

is S(

*x*)=

*x/r*, 0T

*x<r*)

The requirement to order the data and the use of floating

point operations means that the test is slower than æ^{2},

particularly^{ }for very large *n*.

There is no reason however to use the same subdivisions of data that

æ^{2} uses, so it is more

efficient to divide the data into smaller segments then apply KS

again to the results.

### 3.1.3 Correlation test

Sometimes referred to as the

"serial correlation test" or "autocorrelation test"

when applied to random number theory, unlike æ^{2
}and KS the correlation test (Specifically, the Pearson Product

Moment Correlation Coefficient) checks the relationship between

successive values – that is, it is dependent on the ordering of the

output values within the test set. The previous two statistical tests

are independent of ordering so won’t detect non-random sequences like

1, 2, 3… as long as the set has the expected distribution. Banks et

al. [13] and Kleijnen and Groenendaal [16] give a full description of

this test when applied to random number sets, and fortunately PPMCC

is a standard statistical test and details of it’s calculation don’t

vary between texts. The correlation test is not often applied however

as it measures the same properties as the serial test discussed in

section 3.2.1. (No source referenced considers both the correlation

test and the serial test)

## 3.2 Non-statistical empirical tests

### 3.2.1 Serial test

The serial tests applies the statistical frequency tests in

section 3.2 to tuples of numbers, (F(*x*),

F(*x*+*a*),

F(*x*+2*a*)…F(*x*+*ca*)).

This is equivalent to evaluating a generator with

*r*=*r*_{generator}* ^{c}*,

so is unfeasible for

*c*>2

for most generators. (Assuming an 8 bit generator, if

*c*=4

then æ

^{2}tests on 21

billion values would be required to satisfy E

_{s}U5.)

Although even books as recently as 1993 [15] regard the serial test

even with

*c*=2 as

computationally too difficult to carry out generally, modern hardware

should be capable of running tests in a reasonable amount of time.

Marsagla [21] proposes the "Overlapping

M-Tuple test", which is a variation of the basic serial test

with the *m* in the title being the *c* of the serial test,

using PPMCC to "modify" the output to be appropriate for æ^{2}

tests despite not being independent values. Marsaglia analyses data

as bit-streams rather than discrete numbers and takes only three bits

of a number at a time for this test which makes use of a larger *c*

feasible.

### 3.2.2 Gap test

The gap test checks the gap between numbers follows the expected

distribution, which Banks et al. [13] show is S(*x*)=1-0.9^{x+1}.

As S(*x*) is non-linear,

it is necessary to vary the size of the frequency counts to satisfy

E_{s}U5. For example, if

*n*=10,000, E_{71}=0.56,

so E_{71+}=5.1 should be used instead. Counting the frequency

of each gap size is algorithmically very easy and fast compared to

some other tests. Kendall and Babington-Smith’s original version only

counted the gap between zero digits [5] although with modern

computing it should be possible to apply this test to all possible

values.

### 3.2.3 Poker test

There are two ways to conduct a poker test. Knuth’s [1] method

involves testing for two, three, four or five of the same number

recurring in sets of five random numbers. With a large *r*,

this is not really feasible however. (If *r*=65536

and *n*=100000 E_{pair}Y1)

The more commonly discussed method, such as in Banks [13] counts the

reoccurrence of the same digit within a random number. It is

important to note however that if analysing in decimal a number

produced by a binary generator, the first digit may not be in the

range 1-10 and the expected distribution should be adjusted as

appropriate. Although only mentioned in an exercise in [1] and not

discussed in detail anywhere, the same test could be carried out

using hexadecimal, octal or even binary digits instead of decimal.

### 3.2.4 Coupon collector’s test

The coupon collectors test evaluates how long a run is required to

get at least one occurrence of every possible number. We cannot apply

this test to all generators, as some may always degenerate or loop

before they generate every possible value and also requires a very

large sample set compared to other tests and may be very

time-consuming on slower generators. (For one output value, at least

*r* random numbers will have

been generated) Knuth [1] discusses this test in detail but other

books (Which all cite Knuth as a reference) have not mentioned it.

This is probably due to the fact that the coupon collectors test is

difficult to implement and, as Wichmann & Hill [5] point out,

measures the same properties as the gap test.

### 3.2.5 Run test

The run test as used by Knuth checks that the length of "runs

up" and "runs down" of random numbers matches the

expected distribution. The formula

to do this is somewhat complicated however, as successive values are

not independent of each other so æ^{2}

cannot be used. Knuth[1] gives formula, which are quite long

and involve tables so are not given here, which allow the calculation

of a statistic which can be subjected to æ^{2}.

Kleijnen and Groenendall [16] state "*We again use a *æ^{2}

statistic to test whether the generator is acceptable. The standard

æ* ^{2} test,
however, does not apply because of the dependence between the
successive [run lengths]*" but do not give the test which is

used.

Kleijnen and Groenendall also have a

variation on this test of "Runs above and below the average"

which functions exactly as the name implies – checking that the

distribution of runs above and below the expected mean. (0.5*r*

in our case) This picks up generators that tend to produce long runs

of numbers higher or lower than average, something most other tests

are only slightly sensitive to.

### 3.2.6 Permutation test

Knuth [1] gives an algorithm for counting the frequency of

orderings of random numbers. For example, if using sets of three

random numbers (*a, b, *c)

together this test counts the frequency of each possible

combination of the three numbers when ordered, of which there are six

(3!) combinations. (*a*<*b*<*c*, *a*<*c*<*b*,

*b*<*a*<*c*, *b*<*c*<*a*,

*c*<*a*<*b* and *c*<*b*<*a*)

This test, along with the Maximum-of-t test, isn’t even mentioned in

any other books or papers on the subject which may mean that this

test is no longer considered useful. Despite this, I can find no

reference that actually shows this specific test to be a poor test.

As with the overlapping-permutation test discussed below, it may

simply be that this test does not pick up any non-random attributes

not found by other tests such as the run test and serial test.

Marsaglia’s version of this test is the overlapping-permutation

test, which as the name implies is a variation of the permutation

test with overlapping sets of data. Marsaglia himself doesn’t regard

this test as particularly good: "*[Overlapping permutation]
tests…are not very stringent; most generators seem to pass them…"*

### 3.2.7 Maximum-of-t

The maximum-of-t test, again only discussed by Knuth [1] tests the

distribution of the maximum of *t*

random numbers is distributed as expected. This is a simple and quick

test to implement although again there is a strange absence of this

test from other books.

### 3.2.8 Maurer test

Although often referred to as a "statistical" test,

Maurers test presented in [11] is not a general statistical functions

like KS or æ^{2}.

It is geared more towards hardware cryptograpical uses, but it does

fail more modern flawed software generators that the earlier tests

don’t pick up, so it still useful in non-cryptographic software

situations. Maurer himself states that although the test is designed

to defeat hardware generators, a good software generator should

certainly pass his test. The basis for the test is trying to detect

whether a generator has detectable "memory" (That is, that

later values are based in some way on earlier ones) Maurer claims his

test will fail any generator that fails another test presented here

and although Maurer presents no empirical or theoretical proof of

this, although it is fairly likely to do this based on the nature of

the test. Being a comparatively recently developed test, there are

few good discussions on it’s relative merits, although it is

generally regarded as a good test.

## 3.3 Marsaglia "Stringent" Tests

In [19] and [21], Marsaglia presents a suite of tests apparently

of his own devising rather more complicated than those previously

studied, which are "*more difficult to pass than [Knuth’s]
tests that have become standard*".

Although there is little independent discussion of these tests in

texts, primarily because they are significantly more difficult to

analyse that other tests, they are considered worthwhile tests. Each

test is discussed briefly, although the fact the formula presented by

Marsaglia are very complicated and the lack of independent discussion

on them means that descriptions are somewhat brief. Two of his tests,

the M-Tuple test and Overlapping-permutations are modifications of

the standard tests and have already been discussed in section 3.2.

Although proof of the tests is given in

most cases, Marsaglia doesn’t discuss __why__

this particular group of tests as a whole is useful, a discussion

which is lacking in most books as well. As more powerful extensions

of the "standard" tests they should perform well in

practice and passing these tests is considered a very good indication

of the suitability of a generator for non-cryptographic uses but it

remains to be seen if any of the more obscure generators and

particularly Maurer pick out poor generators that these "stringent"

tests fail to.

### 3.3.1 Overlapping-pairs-sparce-occupancy (OPSO)

OPSO is an unusual test that tried to overcome the problems of

requiring large amounts of memory and a large number of tests to do

serial tests with large *r* and

*c*=2.

### 3.3.2 Parking-lot and lattice tests

This test actually applies a probabilistic algorithm of "parking

cars" in an *n*-dimensional

space and compares the result with one from a known-good random

source. (There is no theory to solve this problem that is not

probabilistic) Although suggested elsewhere, this is the only actual

discussion about a specific implementation of a probabilistic

problem.

### 3.3.3 Birthday-spacings

The Birthday spacings test as

presented in Marsaglias paper appears to be related to the standard

gap test, although the description presented is somewhat confusing

and incomplete. Marsaglia appears not to have published a proof or

detailed description of it yet, only source code. Apparently, "…*a
detailed discussion of the test will appear elsewhere*"

although no indication is given as to when or where this might be.

### 3.3.4 Ranks

Marsaglia’s description of the "ranks of binary number

matricies" test is also unclear, and his test results suggest

that this is not a powerful test as it does not fail a generator that

is not failed by several other tests. It is, apparently, designed

with simulation modeling in mind and the availability of source code

in [19] may make this test implementable, even though the paper

appears to give insufficient information for this.

# Chapter 4 – Conclusion

At this stage, without any hard empirical results to refer to, it

is difficult to draw any conclusions, particularly with regard the

the suitability of the tests. All the tests discussed above are

"accepted" by the community as valid tests so, barring

implementation problems, none should spuriously fail otherwise valid

generators. There is some obvious overlap in the generators however,

particularly with regard to the standard "Knuth" tests and

Marsaliga’s work. No suite of tests would be complete without those

listed in [1] however, so they have been included nonetheless.

Most of the basic congruential and linear-feedback generators are

now considered fairly poor, so the most likely "winner"

(The fastest generator that passes all tests) will probably be a form

of Lagged-Fibonacci-with-carry generator or one of the two combined

congruential generators. Additive Congruential generators have been

neglected in recent studies and although likely to fail Maurers test

may just prove to be a front-runner despite their simplicity.

Lagged-Mersenne Twisters are too new to say much about, although they

have been reported to pass all the tests given here except perhaps

Maurers.

# References

[1] Knuth, D.E., 1969, *The Art of Computer Programming, Volume 2 –
Seminumerical Algorithms*, Addison-Wesley

[2] Reif, J.H., Tygar, J.D., 1985, *Efficient Parallel
Pseudo-Random Number Generation*, pp.433-446, Advances in

Cryptology – Proceedings of CRYPTO ’85.

[3] Rairfield, R.C., Mortenson, R.L., Coulthart, K.B., 1984, *An
LSI Random Number Generator*, pp.203-230, Advances in Cryptology

– Proceedings of CRYPTO ’84.

[4] Newman, T.G., Odell, P.L., 1971, *The Generation of Random
Variates*, Griffin.

[5] Wichmann, B.A., Hill, I.D., 1982, *A Pseudo-Random Number
Generator, NPL Report DITC 6/82*, National Physical Laboratory.

[6] Neave, H., 1972, *Random Number Package*, University of

Nottingham.

[7] L’Ecuyer, P., 1988, *Efficient and Portable Combined Random
Number Generators,* pp.742-749, 774, Communications of the ACM

Volume 31, Number 6.

[8] Schneier, B., 1996, *Applied Cryptography*, John Wiley &

Sons.

[9] Kelsey, J., Schneier, B., Wagner, D., Hall, C., 1988,

*Cryptanalytic Attacks on Pseudorandom Number Generators*, pp.

168-188, Fast Software Encryption, Fifth International Workshop

Proceedings (March 1998), Springer-Verlag.

[10] Harel, D., 1992, *Algorithmics, The Spirit of Computing, 2 ^{nd}
edition*, Addison-Wesley.

[11] Maurer, U.M., 1992, *A Universal Statistical Test for Random
Bit Generators*, pp.89-105, Journal of Cryptology, Vol.5, No. 2.

[12] Eastlake 3^{rd}, D., Crocker, S., Schiller, J., 1994,

*Request for Comments 1750: Randomness Recommendations for
Security*, Network Working Group.

[13] Banks, J., Carson 2^{nd}, J., Nelson,

B., 1996, *Discrete-Event System Simulation*, Prentice-Hall.

[14] Fishman, G., 1973, *Concepts and Methods in
Discrete Event Digital Simulation*, John Wiley & Sons.

[15] Paul, R., Balmer, D., 1993, *Simulation
Modelling*, Chartwell Bratt.

[16] Kleijnen, J., Groenendaal, W., 1992,

*Simulation – A Statistical Perspective*, John Wiley & Sons.

[17] Lehmer

[18] Ritter, T., *Ciphers by Ritter*, http://www.io.com/~ritter/

[19] DIEHARD, http://stat.fsu.edu/~geo/diehard.html

[20] Haahr, M., 1998, *random.org – introduction*,

http://www.random.org/essay.html

[21] Marsaglia, G., 1984, *A current view of Random Number
Generators*, Computer Science and Statistics: The Proceedings of

the 16

^{th}Symposium on the Interface, Atlanta, Elsevier

Press.

[22] Author unknown, Random number generaton, RNGs and PRNGs,

http://www.helsbreth.org/random/

[23] Moreau, T., 1996, *A practical ‘perfect’ pseudo-random number
generator*, article submitted to Computers in Physics.

[24] M. Matsumoto and T. Nishimura, 1998, *Mersenne Twister: A
623-dimensionally equidistributed uniform pseudorandom number
generator*, ACM Transactions on Modeling and Computer Simulation,

Vol. 8, No. 1, pp.3-30.

[25] Marsaglia, G., Zaman, A., 1991, *A new class of random number
generators,* Annals of applied probability Vol. 1, No. 3,

pp.462-480.

[26] Sullivan, J., 1998, *Numerical Analysis & Associated
Fields Resource Guide*, http://net.indra.com/~sullivan/q10.html

# Appendix A: Evaluation

## A.1 Project Overview

Much of the time taken (Well over one quarter) was spend tracking

down appropriate books and papers, for which the Internet has proved

an invaluable tool. Unfortunately much of the early work done by Von

Neumann and Kendall & Babington-Smith is no longer available and

even very large libraries such as the British Library and Library of

Congress have no record of the books. The tendency for work in this

field to be published purely as conference papers and in journals

rather than in books also proved a hinderance. This seems partly due

to the success of Knuth’s work [1] as no-one seems to have produced

any widely-used text on the subject because his work is regarded as

being "definitive", despite being considerably outdated.

Much confusion was also caused by the lack of standard naming for

many tests and generators and wildly varying notation – it is often

not possible to tell that two complex tests or generators are

identical until some considerable research has been carried out.

Almost all of the time in the initial few weeks was spent

narrowing down the areas covered as it was clear from a fairly early

stage that certain topics, discussed below in section A.2, were not

appropriate or too complex bearing in mind the ultimate

implementation aim of the project.

Most of the remainder of the time was spent reading and

attempting to understand some of the more complicated tests and

generators and following up references given in some of the more

useful works. Some of the maths involved required revising old

mathematical courses or researching mathematical concepts (Such as

Mersenne Primes used by the Mersenne Twister generator) new to the

author. References are not given for books used for this purpose as

they are generally outside the scope of this project and not useful

to the reader.

Once the bulk of the topic had been understood, the actual writing

of the report took relatively little time however, as is to be

expected, this was hardly a linear effort and new ideas or problems

were introduced with almost every writing session, entailing more

research.

## A.2 The Approach

Many areas, most notably theoretical analysis of generators, were

too complicated and would not be useful topics to study for producing

a suite of programs for generating and analysing random numbers.

Cryptographic areas of study were also dismissed fairly early as

being too complicated. More time was initially spent looking at

converting the uniformly distributed output produced by the

generators here to patterns following other distributions however

this is really outside the scope of the project as described in the

initial project proposal and was dropped. Although useful in a few

specific types of simulation modelling, in general an equally

distributed integer generator is sufficient for most applications.

Also, most of the work in this area is primarily mathematical and not

computer-science based.

## A.3 Possible Future Work In The Area

The most obvious extension of the work discussed here is

theoretical analysis of many of the newer generators, particularly

Mersenne Twisters. This is a topic best pursued by a mathematician

rather than a computer scientist, however, as proper theoretical

study is not a case of simple algorithmic work but requires at times

some fairly complex mathematics.

Another approach would be an actual suite of generation and

analysis tools, which is the eventual aim of the implementation

phase. The only suite currently freely available seems to be Diehard

[19], however the choice of tests used appears, in the currently

available version, to be limited to tests devised by the author

himself. However, the suite is widely referenced and does seem to be

regarded as something of a "benchmark" in the field. The

basic design is not particularly expandable, having been converted by

and large automatically from Fortran source and as such does not lend

itself well to experimentation by other researchers.

A somewhat more time-consuming and exhaustive topic would be a

full-blown article or book on the subject of non-cryptographic random

numbers, rather than just a literary review, including analysis of

some of the more recent developments such as work by Fishman, Maurer

and Marsaglia

– essentially an updated version of Knuth’s work in [1]. This is

still the most widely used reference in the field but only discusses

congruential generators and the Kendall & Babbington-Smith tests

and is sadly out of date in many areas.