On Oct 13, 2010, at 2:36 PM, Jean-Daniel Dupas wrote:
Le 13 oct. 2010 à 13:24, Andreas Grosam a écrit :
On Oct 11, 2010, at 3:57 PM, Jean-Daniel Dupas wrote:
Accessing an array using a negative index is undefined in C++, so may crash, or may not depending the compiler, the OS, the phase of the moon, etc…
The behavior is not undefined. Using negative indices is perfectly valid code (see below).
I don't know for C++, but it is undefined in C99 AFAIK (See section 6.5.6.8 of the C99 standard)
It's the same in C++ and you are almost correct, except that we need to differentiate between "indices" in a subscript operator and "referencing elements" in an array.
Say we have an array a:
int a[N]; // N > 0
According the standard we may now write
int x = a[i]
where 0 >= i < N
In this case, a[i] returns a reference to i-th element from the array. It's clear that i must not be negative in order to return a valid element.
Syntactically, we can write a[-1] without any compiler warnings, though. This is because a[i] is equivalent to (a + i). (we can even write i[a]).
The standard also says that if there is a pointer _expression_ (P) that points within the i-th element of an array, the expressions (P) + N and (P) - N (where N = n) point to the i+n-th and the i-n-th element -- provided they exist.
Suppose, there is a pointer
int* p = &a[i]; // where 0 <= i < N
then we can have negative indices which still return a valid element:
int a[4] = {0, 1, 2, 3 };
int* p = &a[3];
int x = p[-1]; // x == 2
So, if the result of p[i] returns an element of the array, *negative indices in a subscript operator* are perfectly valid.
Syntactically, they are always valid - since p[i] is equivalent to (p + i) - and this is basically pointer arithmetic. The result of any (p + i) should be used with care though, since the behavior - as the standard states - is undefined.
If you take a look at the OPs code:
for (j = M;--j;p++)
*p = p[M-N] ^ TWIST(p[0], p[1]); // M = 397, N = 624
it indicates, that p is not an array, but a pointer - possibly pointing into an array. The pointer p will be incremented at every loop.
The _expression_ (M - N) may be perfectly valid (syntactically, at least) even when it evaluates to a negative value.
My guess is, that either or both M and N are unsigned. In this case, on 32-bit respectively 64-bit systems the _expression_ (M - N) yields different values. In case of a 32-bit system, the resulting pointer may still be a valid address within the process - but quite large. While this is very unlikely on a 64-bit system, because there, it is *too* large for being a real address. I'm wondering - if N or M were unsigned - whether the code was initially correct even on 32-bit systems at all ;)
Andreas
And in the list of undefined behavior:
- Addition or subtraction of a pointer into an array object and an integer type produces a result that does not point into the same array object (6.5.6).
-- Jean-Daniel