I'm looking at some code that uses a Magic Number conversion algorithm
(I'll include it below) that's supposed to help speed up
floating-point-to-integer conversions (at least on AMD/Intel chips).
The code has been optimized initially for x86 and I'm looking to speed
it up for PPC.
Here's the explanation and algorithm from an AMD software optimization
guide:
http://www.amd.com/us-en/assets/content_type/
white_papers_and_tech_docs/25112.PDF
-----------------------------------------------------------------------
--v
For double precision operands, the usual way to accomplish truncating
conversion involves the following algorithm:
1. Save the current x87 rounding mode (this is usually round to
nearest or even).
2. Set the x87 rounding mode to truncation.
3. Load the floating-point source operand and store the integer result.
4. Restore the original x87 rounding mode.
This algorithm is typically implemented through the C run-time library
function ftol. While the AMDAthlon 64 and AMDOpteron processors have
special hardware optimizations to speed up the changing of x87
rounding modes and therefore ftol, calls to ftol may still tend to be
slow.
For situations where very fast floating-point-to-integer conversion is
required, the conversion code in Listing24 on page53 may be helpful.
This code uses the current rounding mode instead of truncation when
performing the conversion. Therefore, the result may differ by 1 from
the ftol result. The replacement code adds the “magic number”
2^52+2^51 to the source operand, then stores the double precision
result to memory and retrieves the lower doubleword of the stored
result. Adding the magic number shifts the original argument to the
right inside the double precision mantissa, placing the binary point
of the sum immediately to the right of the least-significant mantissa
bit. Extracting the lower doubleword of the sum then delivers the
integral portion of the original argument.
The following conversion code causes a 64-bit store to feed into a
32-bit load. The load is from the lower 32 bits of the 64-bit store,
the one case of size mismatch between a store and a dependent load
that is specifically supported by the store-to-load-forwarding
hardware of the AMDAthlon 64 and AMDOpteron processors.
Examples
Listing 23. Slow
double x; int i;
i = x;
Listing 24. Fast
#define DOUBLE2INT(i, d) \
{double t = ((d) + 6755399441055744.0); i = *((int *)(&t));}
double x; int i;
DOUBLE2INT(i, x);
-----------------------------------------------------------------------
--^
Our code uses this algorithm to store the results of one function in
an array of doubles, then in a following function it accesses the
integer part of each array element.
Example (typed in Mail, probably lots of typos):
-----------------------------------------------------------------------
--v
double[] aDoubles;
double dResults;
while (x != 0) {
--x;
// do a bunch of processing here from an input array,
// the results of which are in a double, dResults
aDoubles[x] = dResults + 6755399441055744.0;
}
// later on in another function
long nTemp;
while (x != 0) {
--x;
// we access the integer part of aDoubles[x]
#ifdef macintosh
nTemp = ((long *)&aDoubles[x])[1];
#else
nTemp = ((long *)&aDoubles[x])[0];
// do processing with nTemp;
}
-----------------------------------------------------------------------
--^
What do you think of this? This code seems to perform alright on the
Mac, but do you see this as a possible benefit or bottleneck? Does
the PPC do the same kind of rounding mode bookkeeping so that skipping
it would be a benefit? Of course I don't think anything beats Altivec
floating point conversions, but I'll be working on that later.
I'm going to write up a quick program to test it out and see the
differences, but I'm wondering what you guys think about this.
Thanks!
_____________________________
Dave Thorup
Mac Software Engineer
email@hidden
Nikon Inc., Imaging Division
www.nikonusa.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list
(email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/iano%
40apple.com
This email sent to email@hidden