On Mon, 18 Oct 2004, Ian Ollmann wrote:
[...]
> 00002d00 stwx r0,r3,r4
> 00002d04 lwzx r5,r3,r4
> 00002d08 lwzx r6,r3,r4
> 00002d0c lwzx r7,r3,r4
[...]
> ...and then run simg5 on the tt6e trace.
>
> What you see is that all four instructions 0x00002d00 - 0x00002d0c
> execute in the same group, there is no dispatch group reject and the
> only stall is a couple of ERAT misses.
Interesting. This means there must be a special purpose optimization built
into the hardware. Unfortunately it doesn't kick in for type conversions.
If you care to waste more time on this, it would be interesting to see
what triggers the optimization and what causes a group rejection. For
example, does the effective address have to be computed exactly the same
way (same addressing mode, same operands)? You could reference the same
item in memory either indexed from a base pointer or with another pointer
directly to check this.
Does the data have to go to the same register file it comes from? If not,
then why are type conversions rejected? I guess partial data overlap in
memory would surely cause a group rejection. But the hardware optimization
might be good enough to kick in for every exact address and size match.
Holger
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
This email sent to email@hidden