Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: float to int (kinda OT)




On Oct 21, 2004, at 6:26 PM, John Stiles wrote:

A naive compiler will branch when generating (remainder255 >= (255 + 128)); that's a comparison, and comparisons generate branches.
A smarter compiler might have a clever way of avoiding the branch, perhaps involving a subtract followed by a cntlzw (?). That's just off the top of my head; I haven't tested it.
If "alpha" is a constant across the whole image operation, you could multiply it by a cleverly-chosen constant so that "(alpha * multiplier) >> something" gives you numbers scaled to any range you prefer. This is a trick I've used in the past with good results. (The constant tends to be something wacky; I remember getting 0x8102 or 0x10203 for various operations in the past.) It only has one caveat; some PowerPCs can multiply tiny numbers (i.e. 8-bit  values) a little faster than big numbers. I have no idea if this is ancient history or if G4s and G5s still have this restriction. Still, I'd rather pay an extra two cycles on the multiply than add extra instructions to the inner loop.

Fast ways of doing this are not a tightly held secret. The beautiful thing about fixed point operations is that approximations can turn out to be completely correct after rounding. No reason to spend a lot of time doing a divide when a 1st order polynomial will do! Divide by a number sufficiently close to 1/255 to give the right result. You can get a free right shift if you use mulhw(u). GCC knows this one. So, for example, for the inner loop of:


int main( void )
{
        int alpha, red;

        for( alpha = 0; alpha < 256; alpha++ )
        {
             for( red = 0; red < 256; red++ )
             {
                  int correct = (alpha*red+127)/255;
                  printf( "%d\n", correct );
             }
        }
 
        return 0;
}
 
...GCC does the following:

00002cfc mulhw r7,r30,r27
00002d00 srawi r6,r30,31
00002d04 addis r4,r31,0x0
00002d08 addi r3,r4,0x320
00002d0c add r0,r7,r30
00002d10 srawi r5,r0,7
00002d14 subf r4,r6,r5
00002d18 bl 0x2eac ; symbol stub for: _printf
00002d1c addic. r29,r29,0xffff
00002d20 add r30,r30,r28
00002d24 bge 0x2cfc

So unfortunately, this is one of those cases where if you had just written

    (alpha * red + 127 )/ 255

it might have been faster. 

As it turns out, in the particular case of /255, the result of the multiplication can sometimes be generated by a permute instruction instead of a multiplication, so you might not even need to do a multiply. 

Ian

_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >Graphics card tricks (From: "Edward K. Chew" <email@hidden>)
 >Re: Graphics card tricks (From: Holger Bettag <email@hidden>)
 >Re: Graphics card tricks (From: Niall Dalton <email@hidden>)
 >float to int (kinda OT) (From: Ando Sonenblick <email@hidden>)
 >Re: float to int (kinda OT) (From: Brendan Younger <email@hidden>)
 >Re: float to int (kinda OT) (From: Chris Cox <email@hidden>)
 >Re: float to int (kinda OT) (From: Brendan Younger <email@hidden>)
 >Re: float to int (kinda OT) (From: Chris Cox <email@hidden>)
 >Re: float to int (kinda OT) (From: Keith Bauer <email@hidden>)
 >Re: float to int (kinda OT) (From: John Stiles <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.