Re: Random number generator without duplicates?
Re: Random number generator without duplicates?
- Subject: Re: Random number generator without duplicates?
- From: email@hidden
- Date: Wed, 18 Apr 2001 22:37:54 -0400
On Wed, 18 Apr 2001 09:41:10 -0300,
Bill Briggs <email@hidden> suggested,
>
At 9:54 AM -0400 17/04/01, email@hidden wrote:
>
>How can I get a list of random numbers that does not include any
>
>duplicates? I would want to use these parameters: six random numbers
>
>from 1 to 56, as a list or as six variables, no duplicate numbers.
>
>
That's a singularly peculiar request. You cannot have a randomly
>
generated list of numbers if you impose such a condition. Random
>
lists of length greater than 1 element will at times produce
>
duplicates (or other multiple instances) in the list. It's a fact.
>
Once you impose your condition of uniqueness of elements, the list
>
you want is *not* a randomly generated list.
"Random" doesn't mean "independent uncorrelated uniformly distributed random
process". A randomly-generated list means just where you can not completely
predict one element of the list from the others.
Random selection of a set without replacement is one of the classic problems of
discrete probability theory. You take an urn (which is defined as a container
for the ashes of dead statisticians). Put three red balls and two black balls
in it. Shake the urn and pick out three balls. What is the probability that
you have three red balls?
If I take a random walk, my position at each step is highly correlated with my
past position. If I argue in the sports bar about two random teams playing, I
don't mean the Giants playing the Giants. When Gallup polls a random sample of
2014 people, they don't call anyone twice. When I state that the Alan Turing
tried random settings on an Enigma's plugboard, I'm not misspeaking because the
plugboard didn't allow two keys to plug into the same slot.
If you as a computer scientist or programmer are asked for a random number
generator, there are many questions you should ask. Are you modeling a discrete
or a continuous process? Is it stationary, or not. Ergodic? What is the
distribution of each sample or value: Gaussian, uniform, Poisson, Bernoulli, or
that cool one used for simulated annealing that has an unbounded second moment,
whose name I can't recall? What is the spectrum of the process in time?
Or maybe its not an engineering-type problem. Is it a cryptology problem, where
you are trying to reduce the ability of an adversary to predict the next value
of the sequence from some sample of it. Or is it signal coding, where you want
a sequence that doesn't correlate with some stochastic noise process in the
environment. (You need a pseudo-random signal, because the receiver and
transmitter have to generate the same sequence independently. At each point in
time, both have to agree on what the next number in the sequence is.)
>
>Isn't the integer random number generator supposed to follow the
>
>same rules as the real, i.e. generate every value once before
>
>repeating?
This generator is a "maximal length sequence". If you are simulating a random
sequence with a pseudo-random sequence, a maximal length sequence has the
often-desirable characteristic that in any run, the population of numbers you
get has statistics that more reasonably match those of the true random sequence.
If the sequence is not maximal length, there are some numbers that are never
generated. That statistical behavior may be undesirable. For example, in the
pseudo-random noise coding used by wireless data systems like AirPort, if you
have a pseudo-random sequence that only hits half of the possible values, you
lose 3 dB of signal processing gain.
But if you are collapsing this sequence down to two values--say you are
simulating the flipping of a coin--you care little about whether .3094882753
appears or not; you just care about whether there are the same numbers above and
below 0.5. And you care whether the sample-to-sample correlation generates the
proper statistics for your simulation.
>
>When done correctly with multipliers and the mod function, the generated
>
>numbers have no obvious sequence, and have a full period, meaning
>
>that every value is generated once before repeating.
>
>
But that is not a random number generator (ignoring any machine
>
generation problems)!!! The sequence you are talking about is based
>
on conditional probabilities, and the random number generator should
>
NOT operate that way. Random numbers are, as I've said, generated as
>
the result of *independent* events. When you toss dice, they don't
>
know what they were the last time they were tossed. They produce
>
random results, and duplicates are certainly possible.
This method of producing pseudo-random numbers is pretty popular, and when done
well produces a pseudo-random sequence that has good statistics for many
applications. Its nice, for testing purposes, to have a (totally deterministic)
pseudo-random sequence, that you can set the seed for, and run over and over and
always get the same result, as you develop, debug, and tune.
However, in many applications, having a truly random number generator is needed.
For a cryptographically-strong 128-bit HTTPS connection to your bank, you want
the session key generated as unpredictable as possible, to minimize the ability
of an eavesdropper to find it easily. That means a number not limited to just a
subset of possible values, uniformly distributed across these values, and not
correlated with previous values, or with the time of day, or the time since the
last restart, or with the number of characters typed. For some demanding
applications, you may have a real random number generator, like a back-biased
diode junction, a lump of radium, or a cup of hot tea hooked to a finite
improbability generator. But for most computers, you need to mix in as many
external sources of randomness (or at least, sources that are hard for an
adversary to measure and predict). Time of day, plus time since the last
reboot, plus keyboard stuff help, but if you regularly reboot at midnight in a
headless server, you need something better. Or if two servers use the same
method, and both reboot after the same power failure, you may have a problem as
well. Wander through memory, check the disk's spin location, listen to
collisions on the Ethernet card. Download SETI@home data.
The real problem for the user or scripter is that it is very hard to engineer a
good random number generator. Bad random number generators are like bad
encryption: they look good, until you work very hard to prove them bad. To the
eye, the data stream from bad random number generator and a data stream made
from the cosmic background noise both look merely like garbage. And to the eye,
5-bit encryption looks just like 128-bit encryption. The general advice to most
scripters and programmers is to use the system's random number package, which is
probably much better characterized and more likely lacking unexpected biases and
oddities than something you write yourself.
Unfortunately, AppleScript's generator in Standard Additions is flawed. Too
bad. I tried running its random number generator into a transcription process,
and all I got was,
To be, or not to be. That is the gozornansplatz.
--
Scott Norton Phone: +1-703-299-1656
DTI Associates, Inc. Fax: +1-703-706-0476
2920 South Glebe Road Internet: email@hidden
Arlington, VA 22206-2768 or email@hidden