union { float f; int32_t i; } u = { x };
// Now u.i is a 32-bit integer containing the encoding of floating-
point number f.
This is very useful method instead of casting. Thank you.
I'd be a little careful with this as well. I got into an argument
about using unions to convert types this way over on the performance
list, and as I recall, I lost. :-( Apparently, the C/C++ standard
does not guarantee that f and i will always map onto the same memory
as written above. That said, every compiler I have ever used
(including gcc) seems to do this.
You know, this kind of problem largely went away for me once I
vectorized my time-critical loops. You can cast SSE/AltiVec vector
types back and forth as much as you like without having to worry about
the optimizer, and of course crunching four floats at a time can speed
up a loop.
Just for the fun of it, I redid your algorithm and mine using vector
floats, and compared them to Accelerate's own vrsqrf (see vfp.h)
function:
inline vFloat vInvSqrt(vFloat x0) {
vFloat xhalf = (vFloat){0.5f} * x0;
vUInt32 i = (vUInt32)x0;
vUInt32 halfi;
#if defined( __VEC__ )
halfi = vec_sr(i, (vUInt32){1});
#elif defined( __SSE__ )
halfi = _mm_srli_epi32(i, 1);
#endif
i = (vUInt32){0x5f375a86} - halfi;
vFloat x = (vFloat)i;
vFloat onept5 = (vFloat){1.5f};
x = x * (onept5 - xhalf * x * x);
x = x * (onept5 - xhalf * x * x);
return x;
}
inline vFloat vInvSqrt2(vFloat x0) {
vFloat xhalf = (vFloat){0.5f} * x0;
vFloat x;
#if defined( __VEC__ )
x = vec_rsqrte(x0);
#elif defined( __SSE__ )
x = _mm_rsqrt_ps(x0);
#endif
vFloat onept5 = (vFloat){1.5f};
x = x * (onept5 - xhalf * x * x);
x = x * (onept5 - xhalf * x * x);
return x;
}
int main (int argc, char * const argv[]) {
vFloat x = { 4.620768e+00f, 3.263210e+01f, 2.0f, 3.0f };
for(int i = 0; i < 100000000; i++) {
x = vInvSqrt(x) * x;
// x = vInvSqrt2(x) * x;
// x = vrsqrtf(x) * x;
x *= x;
}
std::printf("%vf\n", x);
return 0;
}
Here are the numbers:
vInvSqrt:
4.620768 32.517288 2.000000 2.032331
real 0m2.412s
user 0m2.390s
sys 0m0.019s
vInvSqrt2:
4.620768 32.632095 2.000000 3.000000
real 0m2.386s
user 0m2.366s
sys 0m0.012s
vrsqrtf:
4.620768 32.632095 2.000000 3.000000
real 0m2.719s
user 0m2.690s
sys 0m0.022s
Mine was the fastest, but not by much. Yours had good speed, but the
accuracy of the results left something to be desired. (It's possible
I made a mistake somewhere. I don't know what's up with that forth
value. The results should be close to the input values.) vrsqrtf
suffers a bit from not being inlined, I gather. There is another
function called vvrsqrtf (see vForce.h) which acts across an array of
arbitrary length. That might have been a fairer test given a good-
sized buffer.
The endian issue of the magic number
i = 0x5f375a86L - (i>>1); // gives initial guess y0
You're right, I wasn't thinking. The byte order in the integer would
be the same as in the float, so it doesn't matter. No worries...
-Ted
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden