Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Floating-Point-to-Int Magic Number conversion on PPC?



I'm looking at some code that uses a Magic Number conversion algorithm (I'll include it below) that's supposed to help speed up floating-point-to-integer conversions (at least on AMD/Intel chips). The code has been optimized initially for x86 and I'm looking to speed it up for PPC.

Here's the explanation and algorithm from an AMD software optimization guide:

http://www.amd.com/us-en/assets/content_type/ white_papers_and_tech_docs/25112.PDF

------------------------------------------------------------------------ -v
For double precision operands, the usual way to accomplish truncating conversion involves the following algorithm:


1. Save the current x87 rounding mode (this is usually round to nearest or even).
2. Set the x87 rounding mode to truncation.
3. Load the floating-point source operand and store the integer result.
4. Restore the original x87 rounding mode.


This algorithm is typically implemented through the C run-time library function ftol. While the AMDAthlon 64 and AMDOpteron processors have special hardware optimizations to speed up the changing of x87 rounding modes and therefore ftol, calls to ftol may still tend to be slow.

For situations where very fast floating-point-to-integer conversion is required, the conversion code in Listing24 on page53 may be helpful. This code uses the current rounding mode instead of truncation when performing the conversion. Therefore, the result may differ by 1 from the ftol result. The replacement code adds the “magic number” 2^52+2^51 to the source operand, then stores the double precision result to memory and retrieves the lower doubleword of the stored result. Adding the magic number shifts the original argument to the right inside the double precision mantissa, placing the binary point of the sum immediately to the right of the least-significant mantissa bit. Extracting the lower doubleword of the sum then delivers the integral portion of the original argument.

The following conversion code causes a 64-bit store to feed into a 32-bit load. The load is from the lower 32 bits of the 64-bit store, the one case of size mismatch between a store and a dependent load that is specifically supported by the store-to-load-forwarding hardware of the AMDAthlon 64 and AMDOpteron processors.

Examples
Listing 23. Slow

double x; int i;

i = x;

Listing 24. Fast
#define DOUBLE2INT(i, d) \
    {double t = ((d) + 6755399441055744.0); i = *((int *)(&t));}

double x; int i;

DOUBLE2INT(i, x);
------------------------------------------------------------------------ -^


Our code uses this algorithm to store the results of one function in an array of doubles, then in a following function it accesses the integer part of each array element.

Example (typed in Mail, probably lots of typos):

------------------------------------------------------------------------ -v
double[] aDoubles;
double dResults;


while (x != 0) {
	--x;
	//	do a bunch of processing here from an input array,
	//	the results of which are in a double, dResults
	aDoubles[x] = dResults + 6755399441055744.0;
}

//	later on in another function

long		nTemp;

while (x != 0) {
--x;
// we access the integer part of aDoubles[x]
#ifdef macintosh
nTemp = ((long *)&aDoubles[x])[1];
#else
nTemp = ((long *)&aDoubles[x])[0];
// do processing with nTemp;
}
------------------------------------------------------------------------ -^


What do you think of this? This code seems to perform alright on the Mac, but do you see this as a possible benefit or bottleneck? Does the PPC do the same kind of rounding mode bookkeeping so that skipping it would be a benefit? Of course I don't think anything beats Altivec floating point conversions, but I'll be working on that later.

I'm going to write up a quick program to test it out and see the differences, but I'm wondering what you guys think about this.

Thanks!
_____________________________

Dave Thorup
Mac Software Engineer
email@hidden

Nikon Inc., Imaging Division
www.nikonusa.com

_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.