Knuth's V2 is one of the best discussions of randomness and testing
for it that I know of. Two that are spoken of highly are the
Chi-square and the Spectral tests.
I have done some theoretical reading, which is why I was concerned
about the difference between 31-bit random (apparently used by Darwin
in selecting initial sequence numbers for TCP packets) and 32-bit
random (apparently available from other BSD Unix systems). John Walker
at Fourmilab.ch has offered 'ent' to gauge the quality of putative
random input. Compiling the code, I get less entropy per byte for
small reads (but not for large reads) than John gets in his sample
return in his manpage at <http://www.fourmilab.ch/random/>. 'ent' is
distributed as source and compiles with one warning, which I do not
recall seeing on previously compiling the program:
cc -g -c -o randtest.o randtest.c
randtest.c:26: warning: static declaration for `log2' follows non-static
It appears to run fine.
The Results:
----
Run One (one 128-bit block)
----
>dd if=/dev/random count=1 bs=128 | ./ent
1+0 records in
1+0 records out
128 bytes transferred in 0.000307 secs (416825 bytes/sec)
Entropy = 6.591682 bits per byte.
Optimum compression would reduce the size
of this 128 byte file by 17 percent.
Chi square distribution for 128 samples is 240.00, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 130.5625 (127.5 = random).
Monte Carlo value for Pi is 3.619047619 (error 15.20 percent).
Serial correlation coefficient is -0.086128 (totally uncorrelated =
0.0).
----
Run Two (a bigger sample: 100 1k blocks)
----
>dd if=/dev/random bs=1k count=100|./ent
100+0 records in
100+0 records out
102400 bytes transferred in 0.181526 secs (564107 bytes/sec)
Entropy = 7.998248 bits per byte.
Optimum compression would reduce the size
of this 102400 byte file by 0 percent.
Chi square distribution for 102400 samples is 248.44, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.1300 (127.5 = random).
Monte Carlo value for Pi is 3.117309270 (error 0.77 percent).
Serial correlation coefficient is 0.001780 (totally uncorrelated = 0.0).
----
I am guessing John's entropy source was a hardware entropy source, as
his site offers entropy to the public from a radioactivity sensor. My
OpenBSD hardware died so I no longer have much self-collected basis for
comparison to the Darwin prng, and I'm sorry now that I discarded a
file of output from OpenBSD's /dev/random I made a few years ago, it
didn't survive migration to newer computers :-(. I note from a similar
discussion of prng quality on a PGP list that a Linux 2.0.36 kernel
subjected to examination by 'ent' on 100 1k blocks demonstrated
7.657693 bits per byte (John's man page describes a source with
7.980627), 4% compressibility (John's was 0%), and a chi-square test
which is satisfied if alpha is greater than 25% (John's was 0.01%)
[note, if I am misreading the correct way to apply the chi-square test,
somebody stop me]. Darwin's prng metrics compare favorably, with the
exception of the Chi-squared test. Anyone want to chime in on whether
a p of 0.5 says anything important about the output of Darwin's
/dev/random?
It is interesting that Darwin's /dev/random yields p of 0.5 on the
chi-square test regardless whether one reads 128, 256, or zillions of
bits from it, whereas other measures start looking positively fantastic
as you read more and more bits: entropy per byte yield an increasingly
negligible delta from 8 bits, serial correlation coefficient approaches
zero, and the like. The numbers seem to be much better at high
quantities of read data, which seems to fly in the face of some reading
that suggested to me that the quality of entropy one would get as one
demanded more bits from a prng with a certain size of entropy pool
would be poorer and poorer.
I'm keen to hear anyone's take on the chi-square test, and other tools
designed to examine purportedly random data.