Re: Random number generator without duplicates?
Re: Random number generator without duplicates?
- Subject: Re: Random number generator without duplicates?
- From: Bill Briggs <email@hidden>
- Date: Wed, 18 Apr 2001 09:41:10 -0300
... late to the party, but I can't listen to this nonsense any more
At 9:54 AM -0400 17/04/01, email@hidden wrote:
How can I get a list of random numbers that does not include any
duplicates? I would want to use these parameters: six random numbers
from 1 to 56, as a list or as six variables, no duplicate numbers.
That's a singularly peculiar request. You cannot have a randomly
generated list of numbers if you impose such a condition. Random
lists of length greater than 1 element will at times produce
duplicates (or other multiple instances) in the list. It's a fact.
Once you impose your condition of uniqueness of elements, the list
you want is *not* a randomly generated list.
And later on, Bryan Harris chimed in:
I notice after a few trials that the random number function is generating
duplicates after only a few iterations.
So just out of curiosity I generated 56 random numbers "from 1 to
56". Here are the results:
* 39% (22) of the values were never generated
* 28% of the values were generated once
* 1 of the values was generated 4 times
Isn't the integer random number generator supposed to follow the
same rules as the real, i.e. generate every value once before
repeating?
Absolutely not. If that were the case, it would not be a random
number generator. As that late professor Balusubramanian used to say,
"follow me closely".
In a randomly generated sequence (like the toss of a fair die, the
flip of a coin, the spin of a fair roulette wheel) each event is an
independent event. It is not related to any past events (previous
tosses of the die), nor does it affect any future events. The whole
idea of whether events are independent or not is part of the core of
probability theory. And the goal of random number generators is
always to produce a result that is independent of all previous
results. Once you impose a condition on the next event (e.g., that it
can't be one of the previous events), then it is manifestly no longer
random.
A random number generator that generates reals will behave in the
same way; viz. all values are possible with each new number
generated. Duplicates are possible, though on the real number line
the density of possible values is so high that duplicates have a low
likelihood of occurrence except as artifacts of machine generation by
pseudo-random processes.
After which Chris Espinosa wrote:
> No, it's not, because that wouldn't be random. Randomness is "every
time I ask for a number from 1 to 10, I have P=0.1 chance of getting any
number from 1 to 10". If a random number generator were constructed as
you propose, if it generated, say, 5 on the first iteration, then on the
second iteration the probability of generating any number would be
P=0.1111111... except for 5, where it would be P=0. Then if it
generated the number 3, the probability on the third pass would be P=0
for 3 and 5 and P=0.2 for all others, until it got to the tenth cycle,
where it would be P=1 for a certain number and P=0 for all others.
> That's not random.
Which is completely correct.
Then M.S.R.F. Schonewille opined:
Sorry, but that's not true. You confuse the original sample (which is not
random but stochastic) and the random selection from this sample. Compare
choosing numbered pieces of papers from a hat. The last number will have a
probability P=1 of being chosen, the sequence still being perfectly random.
Sorry, but you're the one in error and Chris is perfectly correct.
Selecting randomly from a fixed pool of items without replacement is
NOT the same process as generating random numbers from within a
range. Selecting numbers from a hat exhibits a *conditional
probability* because the probability of each subsequent event is
affected by all of the previous events. Random processes produce
results where each result is from an *independent* event. As Chris
pointed out in a completely clear fashion, the selection process that
you are focused upon is not random but conditional. All strenuous
exhortations to refute this are destined to fail. I urge you to
consult some decent reference book if you are in doubt. (two
excellent references given below)
It is absolutely NOT necessary to invoke a discussion of stochastic
processes to clear up this issue. Stochastic analysis is necessary in
the analysis of ensembles of functions, x(t) in which instances one
needs to look at joint statistics in both the time domain and the
ensemble domain (statistics at fixed values of t across the
ensemble). Stochastic processes is beyond the scope of the current
discussion and an unnecessary obfuscation.
Then Bryan Harris wrote (also in response to CE):
The problem is not this at all. The problem is: How do you get the
computer to generate a value x with probability p? The computer has no
concept of random (except for maybe the user's train of thought =), so the
algorithm must generate pseudo-random values that match the desired
distribution as closely as possible.
The error here is that the distribution of outcomes only levels out
when the number of experiments in the space approaches infinity. If
you want to see the effect, look at the distribution stats for
numbers drawn in the 6/49 lottery that has been running in Canada for
many years. This is not a bad looking distribution (that is, not
severely uneven in terms of draws for each of the numbers), but is
far from level yet, though much more *even* looking than it was
several years ago when they first started publishing the stats.
http://lottery.sympatico.ca/cgi-bin/lottery.cgi
If you really want to see what the random number generator in
AppleScript looks like, then pick a range of values and create a
histogram over a huge number of experiments and see what you get.
Check it at intervals to see what is happening. This is the way to
see if there is a bias in the generator (introduced by it's
pseudo-random process).
When done correctly with multipliers and the mod function, the generated
numbers have no obvious sequence, and have a full period, meaning
that every value is generated once before repeating.
But that is not a random number generator (ignoring any machine
generation problems)!!! The sequence you are talking about is based
on conditional probabilities, and the random number generator should
NOT operate that way. Random numbers are, as I've said, generated as
the result of *independent* events. When you toss dice, they don't
know what they were the last time they were tossed. They produce
random results, and duplicates are certainly possible.
And he went on to opine (also in response to CE):
Just to clear a few things up...
Which it didn't
> There is no such rule.
Oh, no? The measure of the quality of a random variate generator is based
on its performance with regard to exactness (how closely the generated
values follow the specified distribution)
That only matters over VERY LARGE numbers of experiments. We're
talking billions of data points. Think infinity. It does not apply to
56 values generated once in the range of 1 through 56. You are fixed
on a SELECTION process, not a RANDOM GENERATION process.
If your uniform random number generator is not generating every number in
the range before repeating, then it does not have a full period, and it is
not *exact* in producing values from a true uniform distribution. Good
algorithms do this. I have one that's not terribly difficult if you need
it.
This is manifeltly false. Your algorithm is *not* a random number
generator. Each new event (number generated) by your algorithm is
tied to past events. Check your own code. I don't even have to see
it. There are tests performed in there to see what has gone before
and eliminate them as possibilities. This does not constitute a
RANDOM generator, it's a random selector. It's random selection from
a fixed pool, and the subject is combinations and permutations, NOT
random variables.
In your defense, though, you may not have a choice in your implementation.
I'm guessing that your algorithm is simply scaling and rounding
numbers from a U(0,1) distribution. Random variate algorithms
generate streams of random numbers, and thus either AS or the OS is
probably already generating a U(0,1) stream. There's not much point
setting up a new stream every time a script asks for a random number
between 1 and 56 since AS cannot expect that the next request will
be identical.
If the generator reliably spat out every number from 1
to
n in just n iterations, it wouldn't really be random.
Sure it would. Random order, not random distribution as it's giving us now.
No, it wouldn't. For the Nth time, random means that it's an
independent event, not tied by conditional probability to previous
events.
Then Ed Stockly wrote:
It is a
command that will simply generate a single random number every time it's
called and one call has no relation to the next.
And Ed is right. This is the essence of the thing. Independent events.
It is not a "uniform random number generator"
Whatever that might be. It's an oxymoron as far as I'm concerned.
>> Random order, not random distribution as it's giving us now.
Currently it is now giving us random numbers. Both the distribution
and the order of those numbers are, theoretically, random. Each
number having no relation to the number(s) that preceed or follow.
That's all it's designed to do.
Truly random numbers do repeat because every time any number is generated
every number within the range should have an equal probability of being
generated each time a number is generated.
And Ed is absolutely correct.
And then Michelle Steiner wrote:
in response to Bryan Harris <email@hidden> who wrote:
>Isn't the integer random number generator supposed to follow the same rules
as the real, i.e. generate every value once before repeating?
If it did that, it wouldn't be random. True randomness means that every
iteration is distinct from all other iterations. A true random-number
generator is capable of generating the same number 56 times out of 56,
for instance. It would happen once in a zillion times, give or take a
few dozen.
And Michelle is telling you the same thing that I'm saying and that
Chris is saying and the Ed is saying. And she's right.
Assertions made by some on this list that random numbers should
cover a range before repeating are absolutely incorrect. A random
number generator's job is to generate a number within the range each
time it is called to do so. And each time it runs it should generate
any number within the range with the exact same probability as any
other number, irrespective of the history of past numbers generated.
This is the essence of randomness. Independent, not conditional
events.
If you doubt any of the above, I suggest that you get yourself to
the Science library of your nearest university and find a copy of
"Probability, Random Variables, and Stochastic Processes" by
Athanasios Papoulis, or the classic "Stochastic Processes" by Doob
and start reading. The former is more accessible to the novice than
the latter but has a very high information density and it not
bathroom reading, however, perseverance is rewarded and you will
emerge from the reading much enlightened about the concepts or
conditional probabilities, joint probabilities, random processes,
stochastic processes, etc. We use these texts in the graduate school
as foundation courses for the statistical theory of communication.
- web