**PSEUDO-RANDOM NUMBER GENERATORS**

**(REPORT II)**

A report submitted in partial fulfilment of the requirements for the degree of Bachelor of Science

Z. O’Connell (950786535)

Department of Information Systems and Computing,

Brunel University

June 1999

# Table of Contents

Table of Contents 2

1 Background 4

1.1 Definition and Scope 4

1.2 Objectives 4

2 Overall design 5

2.1 Structure 5

2.2 Files 6

2.3 Testing 7

3 Testers 8

3.1 Introduction 8

3.2 API 8

3.3 Library routines 8

3.4 Tests implemented 9

3.5 Unimplemented tests 12

3.6 General observations and conclusion 12

4 Generators 13

4.1 Introduction 13

4.2 API 13

4.3 Library routines 14

4.4 Implemented generators 15

4.5 Unimplemented Generators 19

4.6 General observations and conclusion 19

5 Front-ends 21

5.1 Tester 21

5.2 test-static 21

6 Conclusions 22

References 23

Appendix A – Evaluation 24

A.1 Project Overview 24

A.2 The Approach 24

A.3 Possible Future Work In The Area 24

Appendix B – Tables of results 25

B.1 Empirical test results 25

B.2 Speed test results 26

Appendix C – Source Code 27

# 1 Background

## 1.1 Definition and Scope

There are many known algorithms for generating numbers which are, or at least appear to be, random.

There are also a large number of tests to check that what is produced by such generators does closely

resemble a random stream. There is not, however, a currently freely available suite of generators and

testing routines allowing the user to eaisly define new generators and testers for experimental

purposes. This project implements the framework for such a system.

It would be impossible in the timescale to study every possible generator and test procedure available.

The project does however implement the most common of these and it should be very simple to add

new libraries as required.

## 1.2 Objectives

There are two main objectives:

To produce a portable set of pseudo-random number generators that can be easily used as

part of any application in a variety of programming languages

To test the random number generators for speed and randomness

The main consideration when writing these generators is portability and usefulness. Although in many

cases the generators are coded in such a way that they are slower than they could be trying to speed

up the generators would make the code less readable. The main suite of random number

generators/testers currently available, Diehard, [9 ] consists mostly of highly-optimized FORTRAN code

which has then been automatically converted to C. Even with a detailed knowledge of how the

generators and testers work it is usually impossible to figure out how the code works or adapt it for

other uses.

To achieve the objectives, we can break the problem down into three tasks which need to be

considered:

The generator library and a well-defined API (Application Programmers Interface)

The testing library and a well-defined API

A front-end that can be used to test the generators

# 2 Overall design

## 2.1 Structure

Each generator or tester is presented as a stand-alone file that, with the exception of combination

generators, does not rely on other generators. These libraries are dynamically loadable at run time or

can be compiled into a program, allowing the choice of generator to be specified by the programmer or

end-user as desired. There is also a central library of functions used by all the libraries to avoid

excessive duplication of code.

The code is all written in C. There are several reasons for this choice of language. Firstly, it is fast.

Most other languages are harder to optimize and perform bounds checking on arrays and variables

which, while normally desirable, will slow the generators down quite significantly. Secondly, it is

portable. C code can be compiled on almost any computer currently in existence. This allows most of

the generators, except combination ones which rely on dlopen() as discussed below, to be compiled on

virtually any platform. Finally, C code is easy to build into libraries and most programming languages

allow C code to be called from within them.

The development platform is Unix, (Linux primarily, also tested on Solaris 2.5.1) however as the code is

written with portability in mind and it should be easy to compile on other systems with only minor

modifications. There are two notable exceptions to this – dlopen() and gmp. Dlopen() is used to load

libraries dynamically and is only available on some systems. It is used primarily by the front-end tester

but also by some of the “combination” generators. The combination generators could be modified to link

in their component generators statically although this would take quite a bit of work to allow all the

generators to have “alternative” function names when used as component generators. Combination

generators perform poorly both in terms of speed and randomness however so this doesn’t seem

worthwhile.

The ICG generator uses the GMP GNU multi=precision math library for doing inverse multiplication.

Although this library will compile on the majority of systems it is not generally installed as standard.

Both the generators and testers work on bit-streams rather than floating point numbers. Although most

older random number generator systems generate floating point numbers they are harder to analyze

and reliance on floating point math would slow down the generators. It would be trivial, if required, to

convert the output from this API to a floating point number. Care should be taken to ensure that the

granularity of the numbers obtained from a generator is sufficient however – clearly taking a single bit at

a time is going to produce poor results if three decimal places of randomness is required.

## 2.2 Files

Makefile Instructions for building the programs (See below)

*.c Source code

*.so Compiled dynamic-link libraries

librnd_name.* Generator “name” (See section 4.4)

libtest_name.* Tester “name” (See section 3.4)

random.h Global function definitions, included by every generator.

librandom.* Useful routines used by many generators and testers. (See sections 3.3 and

4.3)

tester Simple testing program that loads in generators and testers dynamically. (See

section 5.1)

test-static Statically linked example showing the display tester and the Mersenne Twister

generator. (See section 5.2)

random.dat Random source used for the file generator and as a seed by other generators.

(See below)

Each generator/tester is a separate source (.c) file and compiles into a separate object (.o) or library

(.so) file.

### 2.2.1 Makefile

The Makefile is used to compile the code by simply typing make on a Unix system. make -k should

normally be used to force make build all the libraries even if some fail as the ICG generator will only

compile if the gmp library is present. There are only two things that need to be changed in the Makefile

in the normal course of events. The first is the CCOPTIONS line. Normal options would be -O6 -Wall,

indicating full optimization and all compiler warnings. Other useful options are -DVERBOSE to turn on

extra output in generators and testers and -g instead of -O6 to enable debugging symbols in the output

files for use with debuggers such as gdb. -DQUIET can be used to suppress all output from testers.

This is for use when testers might want to be included in a larger program although isn’t the most

common case and therefor isn’t the default option.

If desired, the libraries compiled into test-static can be easily changed here, demonstrating how easy it

would be to incorporate a new generator as part of a program using this API. Debuggers don’t handle

dynamic linking particularly well so in most cases it is eaiser to link a specific library statically as part of

test-static for debugging.

Typing “make clean” will delete all the compiled libraries and executable.

The variable LIB_GCC is a workaround for an apparent bug in the gcc and egcs compilers. For some

reason, under certain circumstances involving dynamic linking not all the correct functions for unsigned

long long math are included in librandom.so. Adding the libgcc library specifically resolves this problem.

### 2.2.2 Random.dat

For a “true” random source and as a seed for some generators, this file was downloaded from a

hardware source on the Internet. [10] The web site doesn’t allow large downloads in one go, so around

a hundred separate files were downloaded and concatenated together to produce one larger file.

## 2.3 Testing

Although the generators are tested by the testers, the testers themselves need to be validated to

ensure they give the expected results. To do this, we can use generators known to be “perfect”, such

as cryptographic generators, which should not fail any statistical test. There is a small probability that a

truly random source will produce non-random output however this possibility is very small given the

amount of data being generated and this problem was not encountered.

There are also certain expected results for most tests, as almost every paper discussing a testing

mechanism gives several tests and the results obtained by the author. However,there are many ways of

implementing certain tests and some generators do fail tests they have been shown to pass in other

situations.

# 3 Testers

## 3.1 Introduction

The testers are presented as separate libraries which can either be linked in with a program at compile-time or, if supported on the specific platform, dynamically selected and loaded at run-time. The output is

a boolean pass or fail indication for a particular generator, although all the testers produce additional

human-readable text to showing what results were produced. It is not expected that testers will be

called automatically by programs using the libraries, however in same cases this may be desirable.

Some applications may wish to produce a different set of random numbers each time without saving the

state of the generators between each run. However, some “seed” values for generators may produce

poor output. A good tester, such as Maurer, can be quickly applied to a particular generator with a

certain seed value to check if it produces sane output.

## 3.2 API

There are only two functions required for the tester modules:

int rantest_test(void *state, getbits function)

char *rantest_info_text(void)

Rantest_test takes a pointer to a generators state array and a pointer to a getbits function and

produces a single boolean result to indicate pass or fail. All of the testers produce some sort of

diagnostic output however giving actual p-values where appropriate, and if VERBOSE is defined in the

Makefile they will usually give details of the internal counters. Verbose output is usually unlabelled

however and only really interesting for debugging purposes.

Rantest_info_text simply copies a descriptive string into scratch and returns a pointer to that. It is

implemented this way, rather than by a static text string in the generator, so that extra dynamic

information on the test library could be output, as with the generators, although this is never actually

used.

## 3.3 Library routines

There are two routines in librandom.c called by testing routines. These are:

double ChiSqPN(unsigned long df, double x)

double KolSmirP(unsigned long df, double x)

ChiSqPN takes degrees of freedom and a value and calculates the p-value for the Chi-squared

distribution from x with df degrees of freedom. The code is adapted from poorly documented Javascript

by Ritter [8] as no other readily-implementable formulae or algorithms were available. This routine

works by normalising x and then passing that to NormalP which actually calculates the p-value for the

normal distribution using a fairly simple formulae.

KolSmirP is also adapted from code by Ritter but is far more complicated and difficult to understand.

Similarly to ChiSqPN it calculates the p-value for the Kolmogorov-Smirnov distribution from x with df

degrees of freedom. Unfortunately, the Kolmogorov-Smirnov distribution has even less information

available about it than the Chi-squared distribution and the source of the underlying algorithm isn’t

known.

## 3.4 Tests implemented

### 3.4.1 Display tester

This “display” test simply prints the first 100,000 16-bit numbers from the generator being “tested”. This

is for manually checking the output of a generator is sane, (No leading zeros/all zeros output) doesn’t

have an incredibly short period and for manually checking that a generator is iterating according to the

correct formula.

### 3.4.2 Speed tester

The speed test measures how fast the generator can produce 2,000,000 16 bit numbers and the results

are presented in Appendix A. Three platforms were measured, Intel, Sun and Alpha, although not all

platforms were directly available so only Intel results are given in full. The test is slightly unfair in that

generators that produce results in multiples of 16 bits will perform better as rangen_generic_getbits is

faster when bits are being requested in multiples of the generator bits per iteration. The reflects how the

generators would probably be used in practice however. There are few surprises here, the generators

perform as would generally be expected and in roughly the same order on all platforms. The file

generator appears slightly faster than all zeros on Intel because it loads a 64-bit value into STATE->output every iteration whereas the zero generator only loads 32 bits a time.

The tester actually measures the CPU time elapsed for the process rather than the actual time elapsed

so even on a fairly heavily loaded machine results will be accurate. On faster generators, however, the

granularity of the system clock seems to cause a problem and affect results by about 5%. The

compiler used can make a more significant difference and can affect results by as much as 50%. On all

platforms, the tests used the gcc compiler. On the sun platform however the sunpro compiler was

found to be between 30% and 300% faster than gcc for various generators.

The speed of a generator can heavily affect the choice of generator for an application – a generator

such as the linux /dev/random, whilst highly random and unpredictable, is far too slow if tens of millions

of random numbers are required.

The speed test always passes a generator.

### 3.4.3 Chi-squared

The simple statistical chi-squared test fails a surprising number of generators given that it is

independent of ordering. We test ten million 16-bit values however, which will tend to cause any

generator with a period lower than a few million to fail badly. Almost every generator which fails another

test also fails Chi-squared although this is more likely to be coincidence that poor generators also have

shorter periods that are detected by Chi-squared. Generators with short periods fail because slight

deviations from the expected distribution are exaggerated as they loop and give the same values again.

Chi-squared is a two-tailed test so a low value is as bad as a high one – LCG 2 fails as the values are

too close to the expected distribution to be random. A generator passes this test if 0.005 < p < 0.995 -
i.e. it passes at the two-tailed 1% level.

Because this generator is “two-tailed” it detects sequences that are “too random” – i.e. that fir the

expected distribution improbably well – as well as those that are not random enough. LCG 2 exhibits

this property – it gives a Chi-squared value of zero.

### 3.4.4 Kolmogorov-Smirnov

The KS test is really just a weaker version of the Chi-squared test and fails most of the generators that

Chi-squared picks up. Each result is one-tailed – I.e. it only fails if too high not too low – but two values

are produced and the test fails if either are too high. This tester is weaker than Chi-squared primarily

because calculating the p-value for ten million degrees of freedom is too time-consuming

computationally whereas the million degrees of freedom used are much faster. This test and other tests

based on it passes the test if p < 0.99 - i.e. it passes at the one-tailed 1% level.

### 3.4.5 Gap test

The gap test measures the distributions of gaps between occurrences of a number fits the expected

distribution. This test actually measures using chi-squared the distribution of gaps between each

possible 8-bit number individually and then performs a Kolmogorov-Smirnov test on the results.

Because of this the test proves to be very powerful – it picks up flaws in generators that every other test

passes. LFSRs and FCSRs failing is a surprise, although looking at debug output by uncommenting the

commented out code lines in the tester shows that the two linear feedback generators are failing due to

extremely long gaps between certain numbers. (This produces a **lot** of output)

### 3.4.6 Poker test

The poker tests measures occurrences of the same digit in a four-digit hexadecimal number generated

by the generators. Because we are doing this test on bit streams rather than floating point numbers,

any non-randomness here should also be picked up by the Chi-squared test and this is indeed the

case.

### 3.4.7 Runs up/runs down

This test proved very difficult to implement as Knuth’s description for calculating the matrices required

for this test isn’t clear. The matrixes, a and b, are hard-coded into the program however as they do not

change when other parameters of the tester change. This tests fails weak generators as expected and

is reasonably powerful. One surprise is that it fails the FCSR generator, which no other tester other

than the gap test fails. The two results this tester produces are Chi-squared vlues for runs up and runs

down.

### 3.4.8 Runs above/below the mean

This is a very weak test and fails only the poorest of generators. This generator produces four outputs –

a pair of KS results for each of runs above and below the mean. This test doesn’t fail the zero

generator because of a peculiarity in the formulae which will only occur when the entire input is above

or below the mean.

### 3.4.9 Maurer’s Universal Statistical bit test

Maurer’s test almost lives up to his claim of being able to detect any flaw in a generator detectable by

other means although fails to spot flaws spotted by the gap tester. Unfortunately, his original paper lists

an incorrect expected value when calculated on 16-bit numbers, so the test was conducted only on 8-bit. Maurer’s paper lists the expected value as 15.1670.001 whereas experimental evidence strongly

indicates a value closer to 15.159. The version of the code implemented is a hand-converted C version

of Maurer’s Pascal code [5] and performs flawlessly on 8-bit numbers so the error appears to be in the

expected values provided. The error appears to be mathematical, not typographical, as the same value

is repeated elsewhere within the paper. Unfortunately, calculating the value to the required precision for

testing a million random numbers (Maurer’s code only tests 20,000 8-bit numbers) requires 200,000

iterations of the formula which to achieve in a reasonable amount of time requires more computing

power than readily available. (A bc script to calculate it had not completed after over two days of

running on a PPro200 when the machine was shut down)

### 3.4.10 File tester

This tester simply writes 2 million bytes of the output of the generator as binary data to a file called

output.dat. This file can then either be renamed for use as a seed/input for other generators or used for

some other purpose.

## 3.5 Unimplemented tests

### 3.5.1 Serial correlation

This test isn’t implemented as no exact version of it appears to exist. Knuth gives a version but only

gives conjectured formulae for the expected values which appear not to fit well with actual empirical

results. This test is not included in modern random number packages and appears to have fallen into

disuse, possibly because it measures similar properties to other non-statistical tests such as the run

and gap tests.

3.5.2 Marsagla’s “Stringent” tests

Without further analysis, Marsagla’s tests are not easy to implement. The information supplied with

Diehard [9] is insufficient to easily figure out the algorithm being used for some tests. Unfortunatly, as

has been mentioned previously, the code in Diehard is totally impenetrable and gives no clue as to the

authors intentions.

## 3.6 General observations and conclusion

Implementing some of the testers, particularly the more common “Knuth” types, proved quite difficult

due to conflicting and inaccurate literature. For example, Banks et al. Give the number of degrees of

freedom for a gap test as the sample size whereas in practice it is actually the number of discrete

values the output can have. (256 for an 8-bit output) The error in Maurer’s paper, mentioned previously,

is another example of this problem. Peer-review of papers in this area appears poor, possibly because

most current work concentrates on cryptographic generators and there is currently little interest in non-cryptographic implementations.

When testing new generators, it would seem sensible to test using Maurer’s test or the Gap test first as

they are the most powerful tests which are more likely to reject a generator than any other if it is

flawed. The generator should then be tested against the other, less powerful, tests if desired.

# 4 Generators

## 4.1 Introduction

The generators use a similar API to the testers. The internals of the generator are normally kept hidden

from the calling program so that generators can be changed and the program relinked with no

modification. There are routines for changing the internal variables of a generator however and these

can be used by programs with a greater knowledge of the internals of a specific generator or by a user

to select different values for the generator, either for testing purposes or because specific values might

be more appropriate for a certain applications. Programs can get values from a generator by means of

a routine which extracts a specified number of bits at a time. There is also an information routine in

each generator which will output a textual description of a generator.

## 4.2 API

There are many more routines used by the generators than there are for testers. These are:

char *rangen_info_text(void *state);

unsigned int rangen_info_statesize(void);

void *rangen_create(void);

void rangen_init(void *);

void rangen_iterate(void *state);

char *rangen_info_getparametername(unsigned int number);

unsigned long long rangen_info_getparametervalue(unsigned int parameter void *state);

void rangen_init_setparameter(unsigned int parameter, unsigned long long value, void *state);

unsigned long long rangen_getbits(void *state, unsigned int bits);

In addition, the state structure should begin as follows:

struct state_struct {

unsigned long long output;

unsigned int outputbitsavailable; };

Rangen_info_text takes a pointer to generator state and returns a pointer to a string giving details of

the generator. If VERBOSE is defined in the Makefile this can include details of the current generator

settings. Rangen_info_statesize returns the current size of the state array which can be used to make a

copy of the generator or, if desired, save it’s current state to disk. Generators that cannot be copied or

saved to disk in this way return 0.

Rangen_create allocates memory for a generator state and returns a pointer. It will also usually fill in

some reasonable default values for the generator. Rangen_init should be called once all parameters

have been set using the state pointer to initialize any arrays and values before rangen_getbits is called

for the first time. If parameters are changed and rangen_init is not called before the next call to

rangen_getbits, behavior is undefined. This will make some generators crash as checks for this are not

performed in the interests of speed.

Rangen_info_getparametername returns a pointer to a string describing parameter number. If number

is higher than the maximum parameter number, NULL is returned. If the first three letters of a

parameter are “Alg” (Short for algorithm) the value required is a pointer to a generator function and the

next parameter is a pointer to generator state. Rangen_info_getparametervalue takes a state pointer in

addition to a parameter number and returns the value of that parameter. Similarly,

rangen_info_setparametervalue can be used to set a value. If parameter zero is available it should be a

value appropriate for “seeding” the generator. No guarantees are made as to how well any particular

seed value will perform however.

There is no sanity checking performed on values selected in this way and unpredictable things (Like

crashes) could happen if invalid values are entered.

Rangen_getbits is the most-used routine and returns an appropriate number of bits from the generator.

In all the generators implemented her this is simply a call to the library routine rangen_generic_getbits

discussed below. Rangen_iterate is intended primarily for use by librandom for putting a new value into

STATE->output. STATE->output always contains the current generator output and STATE->outputbits

the number of bits of random number left in STATE->output.

Throughout the generators, STATE is used as a shorthand for ((struct state_struct *)state). This is

because the state is always passed as a void pointer so that programs without knowledge of the

generator internals can pass it around and thus it requires explicit typing every time it is referenced.

## 4.3 Library routines

Two library routines are used by generators. These are:

int countbits(unsigned long long);

unsigned long long rangen_generic_getbits(void *state, unsigned int bits, void (*iterate)(void *));

Countbits is used simply to count the position of the left-most one in a binary number and is used by

generators such as Von Neumann and LCGs either as part of the algorithm or to calculate the number

of random bits generated per iteration.

Rangen_generic_getbits takes a state pointer, which must start in the way described above, a number

of bits desired and a pointer to an iteration function and will iterate an appropriate number of times to

produce the desired number of bits. Every generator implemented here uses this routine.

## 4.4 Implemented generators

### 4.4.1 “Reference” generators

There are three generators which are implemented to allow testing of the testers. These are Linux, File

and Zero – any valid test should pass the first two and fail the zero generator.

The “Linux” generator uses the crypto-random generator present in the Linux operating system kernel

which uses internal hardware sources of the machine to generate an unpredictable bit stream. The

generator actually uses the non-cryptographic file /dev/urandom as this is much faster than the

cryptographically secure generator /dev/random. It still uses cryptographic functions and hardware

sources so is indeed random.

“File” reads random data from a file “random.dat”. The file used in testing was generated from

http://www.random.org/, which gets data from a hardware random source. The reason this generator

fails the Chi-squared test is because the file in use was only 2Mb and twenty million bytes (10,000,000

16-bit numbers) were tested.

Although primarily intended to test the testers, these two generators could be used in real-world

applications if relatively small amount of randomness is required. There are faster and more convenient

generators available however.

“Zero”, as the name implies, simply generates a stream of zeros.

The Zero and File generators also give an indication of the maximum possible speed of a generator –

I.e. how much is lost in function call overhead compared to actual computational time. The file

generator is marginally faster on Intel because it works on 64 bits values whereas the zero generator

only generates one 32-bit per call to rangen_iterate. (The file generator was reading data off a remote

filesystem on the Sun machine, hence the lower speed)

None of these generators take any parameters and their state cannot be saved.

(Rangen_info_stateside returns zero)

### 4.4.2 C Library generators

The “Rand” generator tests the properties of the random number generator random() in the system C

library. Rand 1 is results from a Linux machine running glibc2, which performs admirably and is also

reasonably fast. The source code for the library doesn’t give references as to the source of the

generator but it appears to be a “polynomial” type of some description. On a glibc2 machine, this

generator is certainly more than acceptable for randomness although slower than other generators.

Rand 2 are results from a Solaris 2.5.1 machine and passes all the tests except Chi-squared.

Documentation claims this is a “non-linear additive feedback” generator with a period in the order of

2^31 which if true indicates that this failure isn’t in fact due to a small period in the generator but to a

major flaw in it’s implementation.

This generator takes no parameters.

### 4.4.3 Von Neumann

The Von Neumann generator as implemented performs poorly in almost all tests. The version used is a

“Binary” version – taking the middle binary digits and squaring them – rather than the original decimal

version Von Neumann proposed. A hexadecimal version could also easily be implemented, although

would offer no obvious benefits over the binary version. The original “decimal” version of this generator

would be much slower and also offer no improvement over the binary version.

This generator takes one parameter – number zero is the current internal state of the generator and can

be used to “seed” the generator. The default seed value used is one that has been observed to

perform reasonably well in the Maurer test, which is the most powerful test available, compared to other

seed values. Although this is a relatively fast generator on most platforms it’s failure in over half the

tests makes it unsuitable for use.

### 4.4.4 Linear Congruential Generator

The values used in LCG 1 are those suggested by Knuth and those in LCG 2 are the defaults compiled

into the generator as suggested by Ritter. Neither set performs particularly well, failing even the Chi-squared test suggesting that the period of these generators is quite low. Although other values may

perform better the generator as implemented is reliant to m being a power of two as we are producing

random bit streams rather than random floating point numbers. Converting from values of m other than

powers of two into a random bit stream is very difficult to do without destroying the randomness of the

generator. LCGs are the most common generators in current usage however because although slower

for large batches of numbers they are easily implemented and fast if only a few values are needed.

This generator takes four parameters. Parameter zero is the internal state of the generator and can be

used as a seed. The first, second and third parameters are m, c and a respectively. M is the modules

of the generator, c the value added each cycle (Zero for pure Multiplicative Congruential Generators)

and a the multiplication value.

### 4.4.5 Additive Congruential Generators

ACGs perform incredibly well given their simplicity and speed. The initial table used is based on the first

few values in the file used for the “File” generator with a default “lag” of 97. This is the only generator

actually implemented other than Mersenne Twisters that actually passes all the tests and achieve over

a million numbers per second on a PPro200.

This generator takes two parameters. Parameter one is m, the modules and number two is k, the “lag”.

As this generator cannot be usefully seeded in it’s current incarnation there is no parameter zero.

### 4.4.6 Inversive Congruential Generators

ICGs perform disappointingly, failing every single test, even the very weak runs above/below the mean

tests. Looking at the output it is clear this generator has a very low period – a few hundred – making it

totally useless. There appear to be some suggestions that this is a good generator however I can find

no source that states this with any certainly. Due to it’s use of inverse multiplication it is also very slow,

making it unsuitable for general use. This generator is further hindered by it’s use of the external library

“gmp” – the GNU Multi-Precision math library – for inversive multiplication, which is probably not present

on all systems.

Although it might be possible to speed up this generator and increase it’s period, there seems little

reason to do so. As there is no “hidden” state in this generator it’s maximum period is it’s modules as

with LCGs and there appear to be no benefits of this generator over LCGs despite being slower so the

effort does not seem to be worthwhile.

The parameters of the ICG are the same as for the LCG – zero is the state then one to three are m, c

and a.

### 4.4.7 L’Ecuyer

The L’Ecuyer combination generator also performs poorly. L’Ecuyer doesn’t give a convincing argument

as to why the output should be uniformly distributed and looking at the verbose output of Chi-squared is

clear there is bias towards particularly high or low results. The bias is only significant for a reasonably

large sample size, greater than 10,000, which is probably why L’Ecuyers original testing of his

generator did not spot this flaw.

The API as defined can not handle an arbitrary number of sub-generators, as L’Ecuyer uses, so the

configuration of this generator is static and there are no parameters.

### 4.4.8 Algorithm M

Algorithm M performs comparably to the LCGs – It is based on the same generator as LCG 1 for it’s

actual output, shuffled by another identical LCG generator. Although the output of the generator in

terms of it’s ordering is “more random” than a plain LCG, the LCGs used pass most of the ordering-dependant However, LCGs do fail a Chi-squared test and shuffling the output does not help here as

Algorithm M also fails the Chi-squared test.

Algorithm M has quite a few parameters. Firstly, powerk defines how big a “shuffle box” is required –

the implementation limits this to powers of two so, for example, setting powerk to 8 gives a box of size

256. Parameter two is the number of bits obtained from the generator each cycle. Parameters three

through six are the algorithm and state for the two generators X (Number source) and Y. (Shuffle

source)

### 4.4.9 Linear Feedback Shift Register

This generator passes most of the tests well. Rather surprisingly it fails the gap test, which examination

of verbose output reveals is due to very high gaps between certain reoccurances of the same number.

Unfortunately, because they have to work bit-by-bit it is rather slow to implement in software and are

nearly and order of magnitude slower than Mersenne Twisters and ACGs which perform just as well.

The default settings for the tap bits on this generator is taken from Schneider [4].

There are fiver parameters for this generator. Parameter zero is, as always, the current state and can

be used to seed the generator. (Although a seed of zero will produce a zero output) Parameters one to

four are the tap bit locations, from 0 to 63.

### 4.4.10 Feedback with Carry Shift Register generators

As with LFSRs, FCSRs suffer from the same failure in the gap test and are even slower. They offer no

obvious advantage over FCSRs and take the same parameters.

### 4.4.10 Mersenne Twisters

The code for the Mersenne Twisters is adapted from the original code by Matsumoto and Nishimura

[12] with no major modifications. Despite it’s reliance on a relatively large number of operations per

cycle mathematical operations it is very fast as it generates over eight thousand random bits each time.

Rangen_iterate doesn’t return all generated bits every time it is called but breaks them down into more

manageable 32-bit chunks – the real iterate routine is called as required. This generator passes all the

tests easily without any problems and because of it’s speed seems the generator of choice where

anything more than a few random numbers are required. “Seeding” this generator would be difficult as

the initial state is relatively large (Over 8 thousand random bits) although this could be overcome simply

by taking an offset into an existing random source, such as the random.dat currently used to seed the

generator.

Mersenne Twisters have no configurable parameters.

## 4.5 Unimplemented Generators

### 4.5.1 Wichmann & Hill Combined

This generator, presented in [2] only produces floating point results and there is insufficient information

in the paper to determine the granularity of the generator which would be required to convert the output

to a bit-stream efficiently.

### 4.5.2 Polynomial Congruential

As seen by the results of the Rand test, Polynomal Congruential generators are both reasonably fast

and quite random. Unfortunately, although this syle of generator is used by glibc2 and also mentioned

by Knuth insufficient information is available to attempt a decent implementation.

## 4.6 General observations and conclusion

Implementing the generators proved a much eaiser task than the testers. To construct a good generator

far more theory is required and invariably all the generators include either working code or an algorithm

that can be used to quickly produce working code.

It is clear from the table of results in Appendix B that Mersenne Twisters and ACGs are by far the

generators of choice. They are both very fast and fail none of the tests presented here.

# 5 Front-ends

## 5.1 Tester

The first and most useful front-end, “tester”, takes the name of a generator and tester library and runs

them. This program shows how to dynamically load tester libraries on a Unix system that supports

dlopen(). It is necessary to set the environment variable LD_LIBRARY_PATH to the directory holding

the libraries or install the libraries in the system libraries directory so that the dynamic linker can find

them.

Tester is called with two arguments, firstly the full filename of a generator library and second the full

filename of a tester library, for example:

tester librnd_mersenne.so libtest_display.so

(Assuming “tester” is in the current PATH)

If VERBOSE is defined when compiling a library then tester first outputs a short text description of each

library obtained using the info_text functions.

If there are any configurable parameters for the generator then tester will prompt for them one by one,

showing default values. The default values can be accepted by just pressing enter without entering any

data.Tester cannot handle “Alg” type parameters and ignores them.

## 5.2 test-static

The second tester, test-static, shows how simple it is to include a generator (And tester) in a program.

The libraries included in the static executable can easily be changed by modifying the

STATIC_LIBRARIES variable in the Makefile. Default settings produce a completely static executable,

i.e. one that requires no support libraries other than the standard C libraries. Substituting the .so files

for .o gives a dynamically linked version, where the library files are required to run the program, but

with no modifications to the source. This latter method is the preferred method of linking in the random

number libraries as if a generic name, such as librndgen.so, is used the generator can be changed

merely by replacing the library file with no need to change the source code.

# 6 Conclusions

The results here are not entirely surprising. As noted in the first report, both Mersenne Twisters and

ACGs were expected to do reasonably well. The failure of LFSRs is a surprise however, as these seem

to be regarded as reasonably proficient generators and although slow in software are often used in

hardware implementations. The failure of combined generators is also somewhat surprising, although

recent research work has moved away from this and into totally new generators.

There is little to be said about the testers as they all performed as expected. It is worth reiterating the

poor quality of work on random number testing however, in the hope that someone may eventually

produce a work that actually includes algorithms for implementing tests quickly and accurately.

Algorithms for this are very thin on the ground and often had to be derived from ambiguous statements

and the ambiguities resolved by experimentation.

# References

[1] Knuth, D.E., 1969, The Art of Computer Programming, Volume 2 – Seminumerical Algorithms,

Addison-Wesley

[2] Wichmann, B.A., Hill, I.D., 1982, A Pseudo-Random Number Generator, NPL Report DITC 6/82,

National Physical Laboratory.

[3] L’Ecuyer, P., 1988, Efficient and Portable Combined Random Number Generators, pp.742-749, 774,

Communications of the ACM Volume 31, Number 6.

[4] Schneier, B., 1996, Applied Cryptography, John Wiley & Sons.

[5] Maurer, U.M., 1992, A Universal Statistical Test for Random Bit Generators, pp.89-105, Journal of

Cryptology, Vol.5, No. 2.

[6] Banks, J., Carson 2nd, J., Nelson, B., 1996, Discrete-Event System Simulation, Prentice-Hall.

[7] Fishman, G., 1973, Concepts and Methods in Discrete Event Digital Simulation, John Wiley & Sons.

[8] Ritter, T., Ciphers by Ritter, http://www.io.com/~ritter/

[9] DIEHARD, http://stat.fsu.edu/~geo/diehard.html

[10] Haahr, M., http://www.random.org/

[11] Marsaglia, G., 1984, A current view of Random Number Generators, Computer Science and

Statistics: The Proceedings of the 16th Symposium on the Interface, Atlanta, Elsevier Press.

[12] M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed

uniform pseudorandom number generator, ACM Transactions on Modeling and Computer Simulation,

Vol. 8, No. 1, pp.3-30.

[13] Sullivan, J., 1998, Numerical Analysis & Associated Fields Resource Guide,

http://net.indra.com/~sullivan/q10.html

# Appendix A – Evaluation

## A.1 Project Overview

Initial work was on structuring the API in such a way that it was flexible enough to cope with most

generators and testers although still specific enough to allow generators to be interchanged easily in a

program. Once a rough outline had been produced the “reference” generators and display and speed

tester were written as a “proof of concept” and the API refined as a result of this. Constructing the API

and resolving issues surrounding the basic framework took around a quater of the time, primarily

because constructing dynamic-link libraries and particularly using the dlopen() call causes problems not

encountered in normal programming. Debugging also proved a problem – although most generators

can be linked statically combination generators such as L’Ecuyer rely on dlopen() to work which makes

using standard debuggers such as gdb tricky.

Only around ten percent of the time was spent implementing generators as this proved a relatively easy

task given the wide availability of algorithms to perform the most common generation functions. Writing

the testers and running tests took over half the time taken. This was due primarily to the difficulties in

implementing testing libraries that have been discussed at length previously. The remaining time was

spent writing the report.

## A.2 The Approach

There is little to be said about the approach that has not already been covered in the body of the

report. No major assumptions were required for implementation and the reasoning behind the structure

of the project and choice of language has already been covered in section 2.1.

## A.3 Possible Future Work In The Area

This project could be extended quite considerably to encompass far more generators and testers than

are currently implemented. Trying to implement even a significant portion of the available generators

would be quite a significant work however as there are a large number of generators and an almost

infinite number of possible parameters for those generators.

# Appendix B – Tables of results

## B.1 Empirical test results

Chi-sq | KS | Gap | Poker | Runs | Mean | Maurer | ||

Linux | 0.362208 | 0.510559 0.675486 |
0.712583 0.000526 |
0.478271 0.428924 |
0.359596
0.874975 |
0.247679 0.393694 |
0.184395 0.458329 |
7.184869 |

File | 1.000000 | 0.738718 0.948190 |
0.935290 0.001834 |
0.011070 0.011181 |
0.497735 0.132207 |
0.130925 0.047258 |
0.174363 0.272003 |
7.184507 |

Zero | 1.000000 | 1.000000 0.000000 |
0.000000 1.000000 |
1.000000 1.000000 |
1.000000 1.000000 |
0.000000 0.000000 |
0.000000 0.000000 |
0.000001 |

Rand 1 | 0.715177 | 0.820229 0.018466 |
0.480549 0.096585 |
0.414513 0.399850 |
0.138942 0.473569 |
0.252494 0.710310 |
0.138706 0.399274 |
7.184058 |

Rand 2 | 1.000000 | 0.753250 0.382823 |
0.071074 0.162250 |
0.131359
0.109035 |
0.228733 0.801380 |
0.473434 0.175626 |
0.588748 0.818538 |
7.185135 |

Von N. | 1.000000 | 1.000000 1.000000 |
0.000000 1.000000 |
0.996104
0.972135 |
1.000000 1.000000 |
0.859635 0.757239 |
0.931238 0.753966 |
7.196248 |

LCG 1 | 1.000000 | 0.247861 0.804789 |
0.000000 1.000000 |
0.154877
0.061890 |
0.230819 0.159677 |
0.215360 0.324168 |
0.477696 0.260543 |
7.198858 |

LCG 2 | 0.000000 | 0.033461 0.952883 |
0.000000 1.000000 |
0.032041
0.083639 |
0.014468 0.013934 |
0.885685 0.897066 |
0.914775 0.975280 |
7.233509 |

ACG | 0.191041 | 0.771454 0.167969 |
0.376785 0.270558 |
0.022058
0.093272 |
0.292254 0.498318 |
0.180289 0.403473 |
0.231975 0.144920 |
7.183310 |

ICG | 1.000000 | 1.000000 1.000000 |
0.999999 0.998766 |
0.999999
0.998766 |
1.000000 1.000000 |
0.928292 0.739950 |
0.999918 1.000000 |
7.180711 |

L’Ecuyer | 1.000000 | 1.000000 1.000000 |
0.000000 1.000000 |
0.852401
0.554670 |
1.000000 1.000000 |
0.118549 0.250390 |
0.180989 0.411692 |
7.169784 |

Alg. M | 1.000000 | 0.033915 0.811274 |
0.000000 1.000000 |
0.149044
0.061221 |
0.010887 0.498318 |
0.295884 0.919160 |
0.324374 0.510433 |
7.196667 |

LFSR | 0.505779 | 0.657751 0.104576 |
0.000000 1.000000 |
0.033919
0.012633 |
0.160175 0.184342 |
0.138039 0.101322 |
0.291267 0.243762 |
7.184588 |

FCSR | 0.084193 | 0.517227 0.511065 |
0.000000 1.000000 |
0.033539
0.056973 |
0.821765 0.818449 |
0.580329 0.835615 |
0.836863 0.335689 |
7.183938 |

Mer.Tw. | 0.806905 | 0.642702 0.057305 |
0.532591 0.116994 |
0.097397
0.119505 |
0.424862 0.960693 |
0.717156 0.727823 |
0.245693 0.405926 |
7.183670 |

Red: Failure at the 1% level Yellow: Failure at the 5% level Green: Pass

## B.2 Speed test results

Intel PPro200 | Sun Ultra 5/10
(Sun4u 299MHz) |
Alpha 21164 533MHz | |

file | 1851 | 2251 | |

zero | 1770 | 2483 | 6736 |

Mersenne | 1470 | 2125 | |

Von Neumann | 1418 | 727 | 5120 |

ACG | 1190 | 1309 | 4808 |

Rand | 1190 | 1636 | |

LCG | 1058 | 920 | 4367 |

L’Ecuyer | 794 | 336 | 1295 |

Algorithm M | 374 | 304 | 1694 |

LFSR | 160 | 176 | |

FCSR | 141 | 171 | |

ICG | 54 | ||

Linux | 34 |

All values are thousands of 16-bit values per second.