Mailing Lists: Apple Mailing Lists
Image of Mac OS face in stamp
Re: [apple scitech] Problem with Optimization
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [apple scitech] Problem with Optimization



On 28-Mar-08, at 8:40 PM, ABE Hiroshi wrote:

On 2008/03/29, at 6:00, Eric Postpischil wrote:
union { float f; int32_t i; } u = { x };
// Now u.i is a 32-bit integer containing the encoding of floating- point number f.

This is very useful method instead of casting. Thank you.


I'd be a little careful with this as well. I got into an argument about using unions to convert types this way over on the performance list, and as I recall, I lost. :-( Apparently, the C/C++ standard does not guarantee that f and i will always map onto the same memory as written above. That said, every compiler I have ever used (including gcc) seems to do this.


You know, this kind of problem largely went away for me once I vectorized my time-critical loops. You can cast SSE/AltiVec vector types back and forth as much as you like without having to worry about the optimizer, and of course crunching four floats at a time can speed up a loop.

Just for the fun of it, I redid your algorithm and mine using vector floats, and compared them to Accelerate's own vrsqrf (see vfp.h) function:


#include <cstdio>

#include <Accelerate/Accelerate.h>

#if defined( __VEC__ )
	#include <ppc_intrinsics.h>
#endif

inline vFloat vInvSqrt(vFloat x0) {
	vFloat xhalf = (vFloat){0.5f} * x0;
	vUInt32 i = (vUInt32)x0;
	vUInt32 halfi;
#if defined( __VEC__ )
	halfi = vec_sr(i, (vUInt32){1});
#elif defined( __SSE__ )
	halfi = _mm_srli_epi32(i, 1);
#endif
	i = (vUInt32){0x5f375a86} - halfi;
	vFloat x = (vFloat)i;
	vFloat onept5 = (vFloat){1.5f};
	x = x * (onept5 - xhalf * x * x);
	x = x * (onept5 - xhalf * x * x);
	return x;
}

inline vFloat vInvSqrt2(vFloat x0) {
	vFloat xhalf = (vFloat){0.5f} * x0;
	vFloat x;
#if defined( __VEC__ )
	x = vec_rsqrte(x0);
#elif defined( __SSE__ )
	x = _mm_rsqrt_ps(x0);
#endif
	vFloat onept5 = (vFloat){1.5f};
	x = x * (onept5 - xhalf * x * x);
	x = x * (onept5 - xhalf * x * x);
	return x;
}

int main (int argc, char * const argv[]) {
	vFloat x = { 4.620768e+00f, 3.263210e+01f, 2.0f, 3.0f };
	for(int i = 0; i < 100000000; i++) {
		x = vInvSqrt(x) * x;
//		x = vInvSqrt2(x) * x;
//		x = vrsqrtf(x) * x;
		x *= x;
	}
	std::printf("%vf\n", x);
    return 0;
}


Here are the numbers:


vInvSqrt:

4.620768 32.517288 2.000000 2.032331

real	0m2.412s
user	0m2.390s
sys	0m0.019s

vInvSqrt2:

4.620768 32.632095 2.000000 3.000000

real	0m2.386s
user	0m2.366s
sys	0m0.012s

vrsqrtf:

4.620768 32.632095 2.000000 3.000000

real	0m2.719s
user	0m2.690s
sys	0m0.022s


Mine was the fastest, but not by much. Yours had good speed, but the accuracy of the results left something to be desired. (It's possible I made a mistake somewhere. I don't know what's up with that forth value. The results should be close to the input values.) vrsqrtf suffers a bit from not being inlined, I gather. There is another function called vvrsqrtf (see vForce.h) which acts across an array of arbitrary length. That might have been a fairer test given a good- sized buffer.


The endian issue of the magic number
i = 0x5f375a86L - (i>>1); // gives initial guess y0


You're right, I wasn't thinking. The byte order in the integer would be the same as in the float, so it doesn't matter. No worries...


-Ted

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >[apple scitech] Problem with Optimization (From: ABE Hiroshi <email@hidden>)
 >Re: [apple scitech] Problem with Optimization (From: Eric Postpischil <email@hidden>)
 >Re: [apple scitech] Problem with Optimization (From: ABE Hiroshi <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2011 Apple Inc. All rights reserved.