Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Gcc 333 and altivec optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gcc 333 and altivec optimizations

Subject: Re: Gcc 333 and altivec optimizations
From: Shaun Wexler <email@hidden>
Date: Wed, 14 Apr 2004 10:19:25 -0700

On Apr 14, 2004, at 10:12 AM, Shaun Wexler wrote:

On Apr 14, 2004, at 9:36 AM, Shawn Erickson wrote:
On Apr 13, 2004, at 10:39 PM, Marc Van Olmen wrote:
Hi
I have some altivec code:
            t0 = vec_mradds( tr0, r0, c0 );
            t1 = vec_mradds( tr0, r1, c0 );
            t2 = vec_mradds( tr0, r2, c0 );
It is generating the following assembler code:
0x0000b53c  <+0944>  addi    r2,r30,160
0x0000b540  <+0948>  lvx    v12,r0,r2
0x0000b544  <+0952>  addi    r2,r30,304
0x0000b548  <+0956>  lvx    v9,r0,r2
0x0000b54c  <+0960>  vmhraddshs    v11,v0,v12,v9
0x0000b550  <+0964>  addi    r2,r30,176
0x0000b554  <+0968>  lvx    v9,r0,r2
0x0000b558  <+0972>  addi    r2,r30,304
0x0000b55c  <+0976>  lvx    v6,r0,r2
0x0000b560  <+0980>  vmhraddshs    v12,v0,v9,v6
0x0000b564  <+0984>  addi    r2,r30,192
0x0000b568  <+0988>  lvx    v6,r0,r2
0x0000b56c  <+0992>  addi    r2,r30,304
0x0000b570  <+0996>  lvx    v5,r0,r2
0x0000b574  <+1000>  vmhraddshs    v9,v0,v6,v5
To me this looks lousy because there is no need to all of this... Extra code...
What extra instructions do you see (not clear what tr0, r0, c0 and t0 are... they stack vars?, etc.)? The only extra stuff I see is the two additional addi r2,r30,304 that could be avoid if one wanted to utilize an extra register to cache the calculated value and the related caching of the value loaded.

It is not clear what optimizer options or tune option you are using ... if the debug build then likely no optimization is taking place. So the compiler is being literal in what it is doing when working with your code.
Probably -o0, at first glance... the CPU has to load tr0 and c0 (once each), plus each r* vector before the vec_mradds can execute, and lvx has a 3 or 4 cycle latency depending on if it's a G4 or G5. The addi is "free" since it's dispatched concurrently to a simple integer execution unit. Reordering the pipeline would probably place all of the lvx instructions first, followed by the three mradds; but the addi's can be ignored as they doesn't impact performance in the above code snippet, AFAICT. It's the stalls before each mradds, waiting on lvx, and the unnecessary reloads of c0 from memory before the last two madds.

Ah, at second glance, the addi's probably do cost an extra cycle, since each subsequent lvx is dependent on their results. Bad code, bad. ;) Clean your target, build using -o3, then reexamine the disassembled code. -- Shaun Wexler MacFOH http://www.macfoh.com

PS - Please ignore previous post's grammatical typos...
_______________________________________________
xcode-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/xcode-users
Do not post admin requests to the list. They will be ignored.

References:
	>Gcc 333 and altivec optimizations (From: Marc Van Olmen <email@hidden>)
	>Re: Gcc 333 and altivec optimizations (From: Shawn Erickson <email@hidden>)
	>Re: Gcc 333 and altivec optimizations (From: Shaun Wexler <email@hidden>)

Prev by Date: Re: Gcc 333 and altivec optimizations
Next by Date: Re: Can't build OCUnit example
Previous by thread: Re: Gcc 333 and altivec optimizations
Next by thread: /usr/lib/*.o missing
Index(es):
- Date
- Thread