Re: Gcc 333 and altivec optimizations
Re: Gcc 333 and altivec optimizations
- Subject: Re: Gcc 333 and altivec optimizations
- From: Marc Van Olmen <email@hidden>
- Date: Wed, 14 Apr 2004 13:00:55 -0400
Hi,
Thanks for your reply really appreciated. Thx for pointing me out to this
doc, I see how I can add inline assembly!!! thx!!
I added: -mtune=G4 -fast to my release build.
To answer your question:
I was expecting this
vmhraddshs v11,v0,v6,v9
vmhraddshs v12,v0,v8,v9
vmhraddshs v13,v0,v7,v9
Because my code is in a loop, c0 is a vector initialized outside the loop
Tr0 and r0 and r1 and r2 are the result of a previous vector operations just
above.
But:
I added register to my variables and now it seems that the code in the
release mode is about 16 millisecond to execute in debug 25miliseconds. So
some optimization is done.
Question:
Stilll one question remain how can I see generated assembler code and use
that as basis to optimize.
thx
mvo
> On Apr 13, 2004, at 10:39 PM, Marc Van Olmen wrote:
>
>> Hi
>>
>> I have some altivec code:
>>
>>
>> t0 = vec_mradds( tr0, r0, c0 );
>> t1 = vec_mradds( tr0, r1, c0 );
>> t2 = vec_mradds( tr0, r2, c0 );
>>
>> It is generating the following assembler code:
>>
>> 0x0000b53c <+0944> addi r2,r30,160
>> 0x0000b540 <+0948> lvx v12,r0,r2
>> 0x0000b544 <+0952> addi r2,r30,304
>> 0x0000b548 <+0956> lvx v9,r0,r2
>> 0x0000b54c <+0960> vmhraddshs v11,v0,v12,v9
>> 0x0000b550 <+0964> addi r2,r30,176
>> 0x0000b554 <+0968> lvx v9,r0,r2
>> 0x0000b558 <+0972> addi r2,r30,304
>> 0x0000b55c <+0976> lvx v6,r0,r2
>> 0x0000b560 <+0980> vmhraddshs v12,v0,v9,v6
>> 0x0000b564 <+0984> addi r2,r30,192
>> 0x0000b568 <+0988> lvx v6,r0,r2
>> 0x0000b56c <+0992> addi r2,r30,304
>> 0x0000b570 <+0996> lvx v5,r0,r2
>> 0x0000b574 <+1000> vmhraddshs v9,v0,v6,v5
>>
>>
>> To me this looks lousy because there is no need to all of this... Extra
>> code...
>
> What extra instructions do you see (not clear what tr0, r0, c0 and t0
> are... they stack vars?, etc.)? The only extra stuff I see is the two
> additional addi r2,r30,304 that could be avoid if one wanted to utilize
> an extra register to cache the calculated value and the related caching
> of the value loaded.
>
> It is not clear what optimizer options or tune option you are using ...
> if the debug build then likely no optimization is taking place. So the
> compiler is being literal in what it is doing when working with your
> code.
>
> Anyway assuming you have the developer tools installed (xcode 1.1) then
> the GCC3 release notes have some helpful information as does the
> supplied GCC documentation (also covers how to inline assemble).
>
> <file:///Developer/Documentation/ReleaseNotes/DeveloperTools/GCC3.html>
>
> -Shawn
_______________________________________________
xcode-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/xcode-users
Do not post admin requests to the list. They will be ignored.