PRNG Report II: Analysis

PSEUDO-RANDOM NUMBER GENERATORS





(REPORT II)





A report submitted in partial fulfilment of the requirements for the degree of Bachelor of Science


Z. O’Connell (950786535)





Department of Information Systems and Computing,


Brunel University


June 1999





Table of Contents






Table of Contents 2



1 Background 4


1.1 Definition and Scope 4


1.2 Objectives 4



2 Overall design 5


2.1 Structure 5


2.2 Files 6


2.3 Testing 7



3 Testers 8


3.1 Introduction 8


3.2 API 8


3.3 Library routines 8


3.4 Tests implemented 9


3.5 Unimplemented tests 12


3.6 General observations and conclusion 12



4 Generators 13


4.1 Introduction 13


4.2 API 13


4.3 Library routines 14


4.4 Implemented generators 15


4.5 Unimplemented Generators 19


4.6 General observations and conclusion 19



5 Front-ends 21


5.1 Tester 21


5.2 test-static 21



6 Conclusions 22



References 23



Appendix A – Evaluation 24


A.1 Project Overview 24


A.2 The Approach 24


A.3 Possible Future Work In The Area 24



Appendix B – Tables of results 25


B.1 Empirical test results 25


B.2 Speed test results 26



Appendix C – Source Code 27



1 Background



1.1 Definition and Scope




There are many known algorithms for generating numbers which are, or at least appear to be, random.
There are also a large number of tests to check that what is produced by such generators does closely
resemble a random stream. There is not, however, a currently freely available suite of generators and
testing routines allowing the user to eaisly define new generators and testers for experimental
purposes. This project implements the framework for such a system.





It would be impossible in the timescale to study every possible generator and test procedure available.
The project does however implement the most common of these and it should be very simple to add
new libraries as required.





1.2 Objectives




There are two main objectives:

To produce a portable set of pseudo-random number generators that can be easily used as
part of any application in a variety of programming languages

To test the random number generators for speed and randomness




The main consideration when writing these generators is portability and usefulness. Although in many
cases the generators are coded in such a way that they are slower than they could be trying to speed
up the generators would make the code less readable. The main suite of random number
generators/testers currently available, Diehard, [9 ] consists mostly of highly-optimized FORTRAN code
which has then been automatically converted to C. Even with a detailed knowledge of how the
generators and testers work it is usually impossible to figure out how the code works or adapt it for
other uses.






To achieve the objectives, we can break the problem down into three tasks which need to be
considered:

The generator library and a well-defined API (Application Programmers Interface)

The testing library and a well-defined API

A front-end that can be used to test the generators





2 Overall design



2.1 Structure

Each generator or tester is presented as a stand-alone file that, with the exception of combination
generators, does not rely on other generators. These libraries are dynamically loadable at run time or
can be compiled into a program, allowing the choice of generator to be specified by the programmer or
end-user as desired. There is also a central library of functions used by all the libraries to avoid
excessive duplication of code.




The code is all written in C. There are several reasons for this choice of language. Firstly, it is fast.
Most other languages are harder to optimize and perform bounds checking on arrays and variables
which, while normally desirable, will slow the generators down quite significantly. Secondly, it is
portable. C code can be compiled on almost any computer currently in existence. This allows most of
the generators, except combination ones which rely on dlopen() as discussed below, to be compiled on
virtually any platform. Finally, C code is easy to build into libraries and most programming languages
allow C code to be called from within them.




The development platform is Unix, (Linux primarily, also tested on Solaris 2.5.1) however as the code is
written with portability in mind and it should be easy to compile on other systems with only minor
modifications. There are two notable exceptions to this – dlopen() and gmp. Dlopen() is used to load
libraries dynamically and is only available on some systems. It is used primarily by the front-end tester
but also by some of the “combination” generators. The combination generators could be modified to link
in their component generators statically although this would take quite a bit of work to allow all the
generators to have “alternative” function names when used as component generators. Combination
generators perform poorly both in terms of speed and randomness however so this doesn’t seem
worthwhile.




The ICG generator uses the GMP GNU multi=precision math library for doing inverse multiplication.
Although this library will compile on the majority of systems it is not generally installed as standard.




Both the generators and testers work on bit-streams rather than floating point numbers. Although most
older random number generator systems generate floating point numbers they are harder to analyze
and reliance on floating point math would slow down the generators. It would be trivial, if required, to
convert the output from this API to a floating point number. Care should be taken to ensure that the
granularity of the numbers obtained from a generator is sufficient however – clearly taking a single bit at
a time is going to produce poor results if three decimal places of randomness is required.



2.2 Files

Makefile Instructions for building the programs (See below)

*.c Source code

*.so Compiled dynamic-link libraries

librnd_name.* Generator “name” (See section 4.4)

libtest_name.* Tester “name” (See section 3.4)

random.h Global function definitions, included by every generator.

librandom.* Useful routines used by many generators and testers. (See sections 3.3 and
4.3)

tester Simple testing program that loads in generators and testers dynamically. (See
section 5.1)

test-static Statically linked example showing the display tester and the Mersenne Twister
generator. (See section 5.2)

random.dat Random source used for the file generator and as a seed by other generators.
(See below)




Each generator/tester is a separate source (.c) file and compiles into a separate object (.o) or library
(.so) file.



2.2.1 Makefile

The Makefile is used to compile the code by simply typing make on a Unix system. make -k should
normally be used to force make build all the libraries even if some fail as the ICG generator will only
compile if the gmp library is present. There are only two things that need to be changed in the Makefile
in the normal course of events. The first is the CCOPTIONS line. Normal options would be -O6 -Wall,
indicating full optimization and all compiler warnings. Other useful options are -DVERBOSE to turn on
extra output in generators and testers and -g instead of -O6 to enable debugging symbols in the output
files for use with debuggers such as gdb. -DQUIET can be used to suppress all output from testers.
This is for use when testers might want to be included in a larger program although isn’t the most
common case and therefor isn’t the default option.




If desired, the libraries compiled into test-static can be easily changed here, demonstrating how easy it
would be to incorporate a new generator as part of a program using this API. Debuggers don’t handle
dynamic linking particularly well so in most cases it is eaiser to link a specific library statically as part of
test-static for debugging.




Typing “make clean” will delete all the compiled libraries and executable.




The variable LIB_GCC is a workaround for an apparent bug in the gcc and egcs compilers. For some
reason, under certain circumstances involving dynamic linking not all the correct functions for unsigned
long long math are included in librandom.so. Adding the libgcc library specifically resolves this problem.



2.2.2 Random.dat

For a “true” random source and as a seed for some generators, this file was downloaded from a
hardware source on the Internet. [10] The web site doesn’t allow large downloads in one go, so around
a hundred separate files were downloaded and concatenated together to produce one larger file.



2.3 Testing

Although the generators are tested by the testers, the testers themselves need to be validated to
ensure they give the expected results. To do this, we can use generators known to be “perfect”, such
as cryptographic generators, which should not fail any statistical test. There is a small probability that a
truly random source will produce non-random output however this possibility is very small given the
amount of data being generated and this problem was not encountered.




There are also certain expected results for most tests, as almost every paper discussing a testing
mechanism gives several tests and the results obtained by the author. However,there are many ways of
implementing certain tests and some generators do fail tests they have been shown to pass in other
situations.





3 Testers



3.1 Introduction

The testers are presented as separate libraries which can either be linked in with a program at compile-time or, if supported on the specific platform, dynamically selected and loaded at run-time. The output is
a boolean pass or fail indication for a particular generator, although all the testers produce additional
human-readable text to showing what results were produced. It is not expected that testers will be
called automatically by programs using the libraries, however in same cases this may be desirable.
Some applications may wish to produce a different set of random numbers each time without saving the
state of the generators between each run. However, some “seed” values for generators may produce
poor output. A good tester, such as Maurer, can be quickly applied to a particular generator with a
certain seed value to check if it produces sane output.



3.2 API

There are only two functions required for the tester modules:




int rantest_test(void *state, getbits function)

char *rantest_info_text(void)




Rantest_test takes a pointer to a generators state array and a pointer to a getbits function and
produces a single boolean result to indicate pass or fail. All of the testers produce some sort of
diagnostic output however giving actual p-values where appropriate, and if VERBOSE is defined in the
Makefile they will usually give details of the internal counters. Verbose output is usually unlabelled
however and only really interesting for debugging purposes.




Rantest_info_text simply copies a descriptive string into scratch and returns a pointer to that. It is
implemented this way, rather than by a static text string in the generator, so that extra dynamic
information on the test library could be output, as with the generators, although this is never actually
used.



3.3 Library routines

There are two routines in librandom.c called by testing routines. These are:




double ChiSqPN(unsigned long df, double x)

double KolSmirP(unsigned long df, double x)




ChiSqPN takes degrees of freedom and a value and calculates the p-value for the Chi-squared
distribution from x with df degrees of freedom. The code is adapted from poorly documented Javascript
by Ritter [8] as no other readily-implementable formulae or algorithms were available. This routine
works by normalising x and then passing that to NormalP which actually calculates the p-value for the
normal distribution using a fairly simple formulae.




KolSmirP is also adapted from code by Ritter but is far more complicated and difficult to understand.
Similarly to ChiSqPN it calculates the p-value for the Kolmogorov-Smirnov distribution from x with df
degrees of freedom. Unfortunately, the Kolmogorov-Smirnov distribution has even less information
available about it than the Chi-squared distribution and the source of the underlying algorithm isn’t
known.



3.4 Tests implemented



3.4.1 Display tester

This “display” test simply prints the first 100,000 16-bit numbers from the generator being “tested”. This
is for manually checking the output of a generator is sane, (No leading zeros/all zeros output) doesn’t
have an incredibly short period and for manually checking that a generator is iterating according to the
correct formula.



3.4.2 Speed tester

The speed test measures how fast the generator can produce 2,000,000 16 bit numbers and the results
are presented in Appendix A. Three platforms were measured, Intel, Sun and Alpha, although not all
platforms were directly available so only Intel results are given in full. The test is slightly unfair in that
generators that produce results in multiples of 16 bits will perform better as rangen_generic_getbits is
faster when bits are being requested in multiples of the generator bits per iteration. The reflects how the
generators would probably be used in practice however. There are few surprises here, the generators
perform as would generally be expected and in roughly the same order on all platforms. The file
generator appears slightly faster than all zeros on Intel because it loads a 64-bit value into STATE->output every iteration whereas the zero generator only loads 32 bits a time.




The tester actually measures the CPU time elapsed for the process rather than the actual time elapsed
so even on a fairly heavily loaded machine results will be accurate. On faster generators, however, the
granularity of the system clock seems to cause a problem and affect results by about 5%. The
compiler used can make a more significant difference and can affect results by as much as 50%. On all
platforms, the tests used the gcc compiler. On the sun platform however the sunpro compiler was
found to be between 30% and 300% faster than gcc for various generators.




The speed of a generator can heavily affect the choice of generator for an application – a generator
such as the linux /dev/random, whilst highly random and unpredictable, is far too slow if tens of millions
of random numbers are required.




The speed test always passes a generator.



3.4.3 Chi-squared

The simple statistical chi-squared test fails a surprising number of generators given that it is
independent of ordering. We test ten million 16-bit values however, which will tend to cause any
generator with a period lower than a few million to fail badly. Almost every generator which fails another
test also fails Chi-squared although this is more likely to be coincidence that poor generators also have
shorter periods that are detected by Chi-squared. Generators with short periods fail because slight
deviations from the expected distribution are exaggerated as they loop and give the same values again.




Chi-squared is a two-tailed test so a low value is as bad as a high one – LCG 2 fails as the values are
too close to the expected distribution to be random. A generator passes this test if 0.005 < p < 0.995 - i.e. it passes at the two-tailed 1% level.




Because this generator is “two-tailed” it detects sequences that are “too random” – i.e. that fir the
expected distribution improbably well – as well as those that are not random enough. LCG 2 exhibits
this property – it gives a Chi-squared value of zero.



3.4.4 Kolmogorov-Smirnov

The KS test is really just a weaker version of the Chi-squared test and fails most of the generators that
Chi-squared picks up. Each result is one-tailed – I.e. it only fails if too high not too low – but two values
are produced and the test fails if either are too high. This tester is weaker than Chi-squared primarily
because calculating the p-value for ten million degrees of freedom is too time-consuming
computationally whereas the million degrees of freedom used are much faster. This test and other tests
based on it passes the test if p < 0.99 - i.e. it passes at the one-tailed 1% level.



3.4.5 Gap test

The gap test measures the distributions of gaps between occurrences of a number fits the expected
distribution. This test actually measures using chi-squared the distribution of gaps between each
possible 8-bit number individually and then performs a Kolmogorov-Smirnov test on the results.
Because of this the test proves to be very powerful – it picks up flaws in generators that every other test
passes. LFSRs and FCSRs failing is a surprise, although looking at debug output by uncommenting the
commented out code lines in the tester shows that the two linear feedback generators are failing due to
extremely long gaps between certain numbers. (This produces a lot of output)



3.4.6 Poker test

The poker tests measures occurrences of the same digit in a four-digit hexadecimal number generated
by the generators. Because we are doing this test on bit streams rather than floating point numbers,
any non-randomness here should also be picked up by the Chi-squared test and this is indeed the
case.



3.4.7 Runs up/runs down

This test proved very difficult to implement as Knuth’s description for calculating the matrices required
for this test isn’t clear. The matrixes, a and b, are hard-coded into the program however as they do not
change when other parameters of the tester change. This tests fails weak generators as expected and
is reasonably powerful. One surprise is that it fails the FCSR generator, which no other tester other
than the gap test fails. The two results this tester produces are Chi-squared vlues for runs up and runs
down.



3.4.8 Runs above/below the mean

This is a very weak test and fails only the poorest of generators. This generator produces four outputs –
a pair of KS results for each of runs above and below the mean. This test doesn’t fail the zero
generator because of a peculiarity in the formulae which will only occur when the entire input is above
or below the mean.



3.4.9 Maurer’s Universal Statistical bit test

Maurer’s test almost lives up to his claim of being able to detect any flaw in a generator detectable by
other means although fails to spot flaws spotted by the gap tester. Unfortunately, his original paper lists
an incorrect expected value when calculated on 16-bit numbers, so the test was conducted only on 8-bit. Maurer’s paper lists the expected value as 15.1670.001 whereas experimental evidence strongly
indicates a value closer to 15.159. The version of the code implemented is a hand-converted C version
of Maurer’s Pascal code [5] and performs flawlessly on 8-bit numbers so the error appears to be in the
expected values provided. The error appears to be mathematical, not typographical, as the same value
is repeated elsewhere within the paper. Unfortunately, calculating the value to the required precision for
testing a million random numbers (Maurer’s code only tests 20,000 8-bit numbers) requires 200,000
iterations of the formula which to achieve in a reasonable amount of time requires more computing
power than readily available. (A bc script to calculate it had not completed after over two days of
running on a PPro200 when the machine was shut down)



3.4.10 File tester

This tester simply writes 2 million bytes of the output of the generator as binary data to a file called
output.dat. This file can then either be renamed for use as a seed/input for other generators or used for
some other purpose.



3.5 Unimplemented tests



3.5.1 Serial correlation

This test isn’t implemented as no exact version of it appears to exist. Knuth gives a version but only
gives conjectured formulae for the expected values which appear not to fit well with actual empirical
results. This test is not included in modern random number packages and appears to have fallen into
disuse, possibly because it measures similar properties to other non-statistical tests such as the run
and gap tests.




3.5.2 Marsagla’s “Stringent” tests

Without further analysis, Marsagla’s tests are not easy to implement. The information supplied with
Diehard [9] is insufficient to easily figure out the algorithm being used for some tests. Unfortunatly, as
has been mentioned previously, the code in Diehard is totally impenetrable and gives no clue as to the
authors intentions.



3.6 General observations and conclusion

Implementing some of the testers, particularly the more common “Knuth” types, proved quite difficult
due to conflicting and inaccurate literature. For example, Banks et al. Give the number of degrees of
freedom for a gap test as the sample size whereas in practice it is actually the number of discrete
values the output can have. (256 for an 8-bit output) The error in Maurer’s paper, mentioned previously,
is another example of this problem. Peer-review of papers in this area appears poor, possibly because
most current work concentrates on cryptographic generators and there is currently little interest in non-cryptographic implementations.




When testing new generators, it would seem sensible to test using Maurer’s test or the Gap test first as
they are the most powerful tests which are more likely to reject a generator than any other if it is
flawed. The generator should then be tested against the other, less powerful, tests if desired.


4 Generators



4.1 Introduction

The generators use a similar API to the testers. The internals of the generator are normally kept hidden
from the calling program so that generators can be changed and the program relinked with no
modification. There are routines for changing the internal variables of a generator however and these
can be used by programs with a greater knowledge of the internals of a specific generator or by a user
to select different values for the generator, either for testing purposes or because specific values might
be more appropriate for a certain applications. Programs can get values from a generator by means of
a routine which extracts a specified number of bits at a time. There is also an information routine in
each generator which will output a textual description of a generator.



4.2 API




There are many more routines used by the generators than there are for testers. These are:




char *rangen_info_text(void *state);

unsigned int rangen_info_statesize(void);

void *rangen_create(void);

void rangen_init(void *);

void rangen_iterate(void *state);

char *rangen_info_getparametername(unsigned int number);

unsigned long long rangen_info_getparametervalue(unsigned int parameter void *state);

void rangen_init_setparameter(unsigned int parameter, unsigned long long value, void *state);

unsigned long long rangen_getbits(void *state, unsigned int bits);




In addition, the state structure should begin as follows:




struct state_struct {

unsigned long long output;

unsigned int outputbitsavailable; };

Rangen_info_text takes a pointer to generator state and returns a pointer to a string giving details of
the generator. If VERBOSE is defined in the Makefile this can include details of the current generator
settings. Rangen_info_statesize returns the current size of the state array which can be used to make a
copy of the generator or, if desired, save it’s current state to disk. Generators that cannot be copied or
saved to disk in this way return 0.




Rangen_create allocates memory for a generator state and returns a pointer. It will also usually fill in
some reasonable default values for the generator. Rangen_init should be called once all parameters
have been set using the state pointer to initialize any arrays and values before rangen_getbits is called
for the first time. If parameters are changed and rangen_init is not called before the next call to
rangen_getbits, behavior is undefined. This will make some generators crash as checks for this are not
performed in the interests of speed.




Rangen_info_getparametername returns a pointer to a string describing parameter number. If number
is higher than the maximum parameter number, NULL is returned. If the first three letters of a
parameter are “Alg” (Short for algorithm) the value required is a pointer to a generator function and the
next parameter is a pointer to generator state. Rangen_info_getparametervalue takes a state pointer in
addition to a parameter number and returns the value of that parameter. Similarly,
rangen_info_setparametervalue can be used to set a value. If parameter zero is available it should be a
value appropriate for “seeding” the generator. No guarantees are made as to how well any particular
seed value will perform however.




There is no sanity checking performed on values selected in this way and unpredictable things (Like
crashes) could happen if invalid values are entered.




Rangen_getbits is the most-used routine and returns an appropriate number of bits from the generator.
In all the generators implemented her this is simply a call to the library routine rangen_generic_getbits
discussed below. Rangen_iterate is intended primarily for use by librandom for putting a new value into
STATE->output. STATE->output always contains the current generator output and STATE->outputbits
the number of bits of random number left in STATE->output.




Throughout the generators, STATE is used as a shorthand for ((struct state_struct *)state). This is
because the state is always passed as a void pointer so that programs without knowledge of the
generator internals can pass it around and thus it requires explicit typing every time it is referenced.



4.3 Library routines

Two library routines are used by generators. These are:




int countbits(unsigned long long);

unsigned long long rangen_generic_getbits(void *state, unsigned int bits, void (*iterate)(void *));




Countbits is used simply to count the position of the left-most one in a binary number and is used by
generators such as Von Neumann and LCGs either as part of the algorithm or to calculate the number
of random bits generated per iteration.




Rangen_generic_getbits takes a state pointer, which must start in the way described above, a number
of bits desired and a pointer to an iteration function and will iterate an appropriate number of times to
produce the desired number of bits. Every generator implemented here uses this routine.



4.4 Implemented generators



4.4.1 “Reference” generators

There are three generators which are implemented to allow testing of the testers. These are Linux, File
and Zero – any valid test should pass the first two and fail the zero generator.




The “Linux” generator uses the crypto-random generator present in the Linux operating system kernel
which uses internal hardware sources of the machine to generate an unpredictable bit stream. The
generator actually uses the non-cryptographic file /dev/urandom as this is much faster than the
cryptographically secure generator /dev/random. It still uses cryptographic functions and hardware
sources so is indeed random.




“File” reads random data from a file “random.dat”. The file used in testing was generated from
http://www.random.org/, which gets data from a hardware random source. The reason this generator
fails the Chi-squared test is because the file in use was only 2Mb and twenty million bytes (10,000,000
16-bit numbers) were tested.




Although primarily intended to test the testers, these two generators could be used in real-world
applications if relatively small amount of randomness is required. There are faster and more convenient
generators available however.




“Zero”, as the name implies, simply generates a stream of zeros.




The Zero and File generators also give an indication of the maximum possible speed of a generator –
I.e. how much is lost in function call overhead compared to actual computational time. The file
generator is marginally faster on Intel because it works on 64 bits values whereas the zero generator
only generates one 32-bit per call to rangen_iterate. (The file generator was reading data off a remote
filesystem on the Sun machine, hence the lower speed)




None of these generators take any parameters and their state cannot be saved.
(Rangen_info_stateside returns zero)



4.4.2 C Library generators

The “Rand” generator tests the properties of the random number generator random() in the system C
library. Rand 1 is results from a Linux machine running glibc2, which performs admirably and is also
reasonably fast. The source code for the library doesn’t give references as to the source of the
generator but it appears to be a “polynomial” type of some description. On a glibc2 machine, this
generator is certainly more than acceptable for randomness although slower than other generators.




Rand 2 are results from a Solaris 2.5.1 machine and passes all the tests except Chi-squared.
Documentation claims this is a “non-linear additive feedback” generator with a period in the order of
2^31 which if true indicates that this failure isn’t in fact due to a small period in the generator but to a
major flaw in it’s implementation.




This generator takes no parameters.



4.4.3 Von Neumann

The Von Neumann generator as implemented performs poorly in almost all tests. The version used is a
“Binary” version – taking the middle binary digits and squaring them – rather than the original decimal
version Von Neumann proposed. A hexadecimal version could also easily be implemented, although
would offer no obvious benefits over the binary version. The original “decimal” version of this generator
would be much slower and also offer no improvement over the binary version.




This generator takes one parameter – number zero is the current internal state of the generator and can
be used to “seed” the generator. The default seed value used is one that has been observed to
perform reasonably well in the Maurer test, which is the most powerful test available, compared to other
seed values. Although this is a relatively fast generator on most platforms it’s failure in over half the
tests makes it unsuitable for use.



4.4.4 Linear Congruential Generator

The values used in LCG 1 are those suggested by Knuth and those in LCG 2 are the defaults compiled
into the generator as suggested by Ritter. Neither set performs particularly well, failing even the Chi-squared test suggesting that the period of these generators is quite low. Although other values may
perform better the generator as implemented is reliant to m being a power of two as we are producing
random bit streams rather than random floating point numbers. Converting from values of m other than
powers of two into a random bit stream is very difficult to do without destroying the randomness of the
generator. LCGs are the most common generators in current usage however because although slower
for large batches of numbers they are easily implemented and fast if only a few values are needed.




This generator takes four parameters. Parameter zero is the internal state of the generator and can be
used as a seed. The first, second and third parameters are m, c and a respectively. M is the modules
of the generator, c the value added each cycle (Zero for pure Multiplicative Congruential Generators)
and a the multiplication value.



4.4.5 Additive Congruential Generators

ACGs perform incredibly well given their simplicity and speed. The initial table used is based on the first
few values in the file used for the “File” generator with a default “lag” of 97. This is the only generator
actually implemented other than Mersenne Twisters that actually passes all the tests and achieve over
a million numbers per second on a PPro200.




This generator takes two parameters. Parameter one is m, the modules and number two is k, the “lag”.
As this generator cannot be usefully seeded in it’s current incarnation there is no parameter zero.



4.4.6 Inversive Congruential Generators

ICGs perform disappointingly, failing every single test, even the very weak runs above/below the mean
tests. Looking at the output it is clear this generator has a very low period – a few hundred – making it
totally useless. There appear to be some suggestions that this is a good generator however I can find
no source that states this with any certainly. Due to it’s use of inverse multiplication it is also very slow,
making it unsuitable for general use. This generator is further hindered by it’s use of the external library
“gmp” – the GNU Multi-Precision math library – for inversive multiplication, which is probably not present
on all systems.



Although it might be possible to speed up this generator and increase it’s period, there seems little
reason to do so. As there is no “hidden” state in this generator it’s maximum period is it’s modules as
with LCGs and there appear to be no benefits of this generator over LCGs despite being slower so the
effort does not seem to be worthwhile.




The parameters of the ICG are the same as for the LCG – zero is the state then one to three are m, c
and a.



4.4.7 L’Ecuyer

The L’Ecuyer combination generator also performs poorly. L’Ecuyer doesn’t give a convincing argument
as to why the output should be uniformly distributed and looking at the verbose output of Chi-squared is
clear there is bias towards particularly high or low results. The bias is only significant for a reasonably
large sample size, greater than 10,000, which is probably why L’Ecuyers original testing of his
generator did not spot this flaw.




The API as defined can not handle an arbitrary number of sub-generators, as L’Ecuyer uses, so the
configuration of this generator is static and there are no parameters.



4.4.8 Algorithm M

Algorithm M performs comparably to the LCGs – It is based on the same generator as LCG 1 for it’s
actual output, shuffled by another identical LCG generator. Although the output of the generator in
terms of it’s ordering is “more random” than a plain LCG, the LCGs used pass most of the ordering-dependant However, LCGs do fail a Chi-squared test and shuffling the output does not help here as
Algorithm M also fails the Chi-squared test.




Algorithm M has quite a few parameters. Firstly, powerk defines how big a “shuffle box” is required –
the implementation limits this to powers of two so, for example, setting powerk to 8 gives a box of size
256. Parameter two is the number of bits obtained from the generator each cycle. Parameters three
through six are the algorithm and state for the two generators X (Number source) and Y. (Shuffle
source)



4.4.9 Linear Feedback Shift Register

This generator passes most of the tests well. Rather surprisingly it fails the gap test, which examination
of verbose output reveals is due to very high gaps between certain reoccurances of the same number.



Unfortunately, because they have to work bit-by-bit it is rather slow to implement in software and are
nearly and order of magnitude slower than Mersenne Twisters and ACGs which perform just as well.
The default settings for the tap bits on this generator is taken from Schneider [4].




There are fiver parameters for this generator. Parameter zero is, as always, the current state and can
be used to seed the generator. (Although a seed of zero will produce a zero output) Parameters one to
four are the tap bit locations, from 0 to 63.



4.4.10 Feedback with Carry Shift Register generators

As with LFSRs, FCSRs suffer from the same failure in the gap test and are even slower. They offer no
obvious advantage over FCSRs and take the same parameters.



4.4.10 Mersenne Twisters

The code for the Mersenne Twisters is adapted from the original code by Matsumoto and Nishimura
[12] with no major modifications. Despite it’s reliance on a relatively large number of operations per
cycle mathematical operations it is very fast as it generates over eight thousand random bits each time.
Rangen_iterate doesn’t return all generated bits every time it is called but breaks them down into more
manageable 32-bit chunks – the real iterate routine is called as required. This generator passes all the
tests easily without any problems and because of it’s speed seems the generator of choice where
anything more than a few random numbers are required. “Seeding” this generator would be difficult as
the initial state is relatively large (Over 8 thousand random bits) although this could be overcome simply
by taking an offset into an existing random source, such as the random.dat currently used to seed the
generator.




Mersenne Twisters have no configurable parameters.



4.5 Unimplemented Generators



4.5.1 Wichmann & Hill Combined

This generator, presented in [2] only produces floating point results and there is insufficient information
in the paper to determine the granularity of the generator which would be required to convert the output
to a bit-stream efficiently.



4.5.2 Polynomial Congruential

As seen by the results of the Rand test, Polynomal Congruential generators are both reasonably fast
and quite random. Unfortunately, although this syle of generator is used by glibc2 and also mentioned
by Knuth insufficient information is available to attempt a decent implementation.



4.6 General observations and conclusion

Implementing the generators proved a much eaiser task than the testers. To construct a good generator
far more theory is required and invariably all the generators include either working code or an algorithm
that can be used to quickly produce working code.




It is clear from the table of results in Appendix B that Mersenne Twisters and ACGs are by far the
generators of choice. They are both very fast and fail none of the tests presented here.


5 Front-ends



5.1 Tester

The first and most useful front-end, “tester”, takes the name of a generator and tester library and runs
them. This program shows how to dynamically load tester libraries on a Unix system that supports
dlopen(). It is necessary to set the environment variable LD_LIBRARY_PATH to the directory holding
the libraries or install the libraries in the system libraries directory so that the dynamic linker can find
them.




Tester is called with two arguments, firstly the full filename of a generator library and second the full
filename of a tester library, for example:




tester librnd_mersenne.so libtest_display.so

(Assuming “tester” is in the current PATH)




If VERBOSE is defined when compiling a library then tester first outputs a short text description of each
library obtained using the info_text functions.




If there are any configurable parameters for the generator then tester will prompt for them one by one,
showing default values. The default values can be accepted by just pressing enter without entering any
data.Tester cannot handle “Alg” type parameters and ignores them.



5.2 test-static

The second tester, test-static, shows how simple it is to include a generator (And tester) in a program.
The libraries included in the static executable can easily be changed by modifying the
STATIC_LIBRARIES variable in the Makefile. Default settings produce a completely static executable,
i.e. one that requires no support libraries other than the standard C libraries. Substituting the .so files
for .o gives a dynamically linked version, where the library files are required to run the program, but
with no modifications to the source. This latter method is the preferred method of linking in the random
number libraries as if a generic name, such as librndgen.so, is used the generator can be changed
merely by replacing the library file with no need to change the source code.



6 Conclusions

The results here are not entirely surprising. As noted in the first report, both Mersenne Twisters and
ACGs were expected to do reasonably well. The failure of LFSRs is a surprise however, as these seem
to be regarded as reasonably proficient generators and although slow in software are often used in
hardware implementations. The failure of combined generators is also somewhat surprising, although
recent research work has moved away from this and into totally new generators.




There is little to be said about the testers as they all performed as expected. It is worth reiterating the
poor quality of work on random number testing however, in the hope that someone may eventually
produce a work that actually includes algorithms for implementing tests quickly and accurately.
Algorithms for this are very thin on the ground and often had to be derived from ambiguous statements
and the ambiguities resolved by experimentation.


References

[1] Knuth, D.E., 1969, The Art of Computer Programming, Volume 2 – Seminumerical Algorithms,
Addison-Wesley

[2] Wichmann, B.A., Hill, I.D., 1982, A Pseudo-Random Number Generator, NPL Report DITC 6/82,
National Physical Laboratory.

[3] L’Ecuyer, P., 1988, Efficient and Portable Combined Random Number Generators, pp.742-749, 774,
Communications of the ACM Volume 31, Number 6.

[4] Schneier, B., 1996, Applied Cryptography, John Wiley & Sons.

[5] Maurer, U.M., 1992, A Universal Statistical Test for Random Bit Generators, pp.89-105, Journal of
Cryptology, Vol.5, No. 2.

[6] Banks, J., Carson 2nd, J., Nelson, B., 1996, Discrete-Event System Simulation, Prentice-Hall.

[7] Fishman, G., 1973, Concepts and Methods in Discrete Event Digital Simulation, John Wiley & Sons.

[8] Ritter, T., Ciphers by Ritter, http://www.io.com/~ritter/

[9] DIEHARD, http://stat.fsu.edu/~geo/diehard.html

[10] Haahr, M., http://www.random.org/

[11] Marsaglia, G., 1984, A current view of Random Number Generators, Computer Science and
Statistics: The Proceedings of the 16th Symposium on the Interface, Atlanta, Elsevier Press.

[12] M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed
uniform pseudorandom number generator, ACM Transactions on Modeling and Computer Simulation,
Vol. 8, No. 1, pp.3-30.

[13] Sullivan, J., 1998, Numerical Analysis & Associated Fields Resource Guide,
http://net.indra.com/~sullivan/q10.html


Appendix A – Evaluation




A.1 Project Overview




Initial work was on structuring the API in such a way that it was flexible enough to cope with most
generators and testers although still specific enough to allow generators to be interchanged easily in a
program. Once a rough outline had been produced the “reference” generators and display and speed
tester were written as a “proof of concept” and the API refined as a result of this. Constructing the API
and resolving issues surrounding the basic framework took around a quater of the time, primarily
because constructing dynamic-link libraries and particularly using the dlopen() call causes problems not
encountered in normal programming. Debugging also proved a problem – although most generators
can be linked statically combination generators such as L’Ecuyer rely on dlopen() to work which makes
using standard debuggers such as gdb tricky.




Only around ten percent of the time was spent implementing generators as this proved a relatively easy
task given the wide availability of algorithms to perform the most common generation functions. Writing
the testers and running tests took over half the time taken. This was due primarily to the difficulties in
implementing testing libraries that have been discussed at length previously. The remaining time was
spent writing the report.


A.2 The Approach




There is little to be said about the approach that has not already been covered in the body of the
report. No major assumptions were required for implementation and the reasoning behind the structure
of the project and choice of language has already been covered in section 2.1.



A.3 Possible Future Work In The Area




This project could be extended quite considerably to encompass far more generators and testers than
are currently implemented. Trying to implement even a significant portion of the available generators
would be quite a significant work however as there are a large number of generators and an almost
infinite number of possible parameters for those generators.


Appendix B – Tables of results


B.1 Empirical test results





























































































































Chi-sq KS Gap Poker Runs Mean Maurer
Linux 0.362208 0.510559

0.675486

0.712583



0.000526

0.478271



0.428924

0.359596


0.874975

0.247679



0.393694

0.184395



0.458329

7.184869
File 1.000000 0.738718



0.948190

0.935290



0.001834

0.011070



0.011181

0.497735
0.132207
0.130925



0.047258

0.174363



0.272003

7.184507
Zero 1.000000 1.000000



0.000000

0.000000



1.000000

1.000000



1.000000

1.000000



1.000000

0.000000



0.000000

0.000000



0.000000

0.000001
Rand 1 0.715177 0.820229



0.018466

0.480549



0.096585

0.414513


0.399850

0.138942



0.473569

0.252494



0.710310

0.138706



0.399274

7.184058
Rand 2 1.000000 0.753250



0.382823

0.071074



0.162250

0.131359


0.109035

0.228733



0.801380

0.473434



0.175626

0.588748



0.818538

7.185135
Von N. 1.000000 1.000000



1.000000

0.000000



1.000000

0.996104


0.972135

1.000000



1.000000

0.859635



0.757239

0.931238



0.753966

7.196248
LCG 1 1.000000 0.247861



0.804789

0.000000



1.000000

0.154877


0.061890

0.230819



0.159677

0.215360



0.324168

0.477696



0.260543

7.198858
LCG 2 0.000000 0.033461



0.952883

0.000000



1.000000

0.032041


0.083639

0.014468



0.013934

0.885685



0.897066

0.914775



0.975280

7.233509
ACG 0.191041 0.771454



0.167969

0.376785



0.270558

0.022058


0.093272

0.292254



0.498318

0.180289



0.403473

0.231975



0.144920

7.183310
ICG 1.000000 1.000000



1.000000

0.999999



0.998766

0.999999


0.998766

1.000000



1.000000

0.928292



0.739950

0.999918



1.000000

7.180711
L’Ecuyer 1.000000 1.000000



1.000000

0.000000



1.000000

0.852401


0.554670

1.000000



1.000000

0.118549



0.250390

0.180989



0.411692

7.169784
Alg. M 1.000000 0.033915



0.811274

0.000000



1.000000

0.149044


0.061221

0.010887



0.498318

0.295884



0.919160

0.324374



0.510433

7.196667
LFSR 0.505779 0.657751



0.104576

0.000000



1.000000

0.033919


0.012633

0.160175



0.184342

0.138039



0.101322

0.291267



0.243762

7.184588
FCSR 0.084193 0.517227



0.511065

0.000000



1.000000

0.033539


0.056973

0.821765



0.818449

0.580329



0.835615

0.836863



0.335689

7.183938
Mer.Tw. 0.806905 0.642702



0.057305

0.532591



0.116994

0.097397


0.119505

0.424862



0.960693

0.717156



0.727823

0.245693



0.405926

7.183670





Red: Failure at the 1% level Yellow: Failure at the 5% level Green: Pass


B.2 Speed test results



















































Intel PPro200 Sun Ultra 5/10

(Sun4u 299MHz)

Alpha 21164 533MHz
file 1851 2251
zero 1770 2483 6736
Mersenne 1470 2125
Von Neumann 1418 727 5120
ACG 1190 1309 4808
Rand 1190 1636
LCG 1058 920 4367
L’Ecuyer 794 336 1295
Algorithm M 374 304 1694
LFSR 160 176
FCSR 141 171
ICG 54
Linux 34




All values are thousands of 16-bit values per second.


Appendix C – Source Code


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.