Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Moving Data in Memory



Holger,

On the topic of dispatch group rejection, Apple's G5 optimization webpage ( http://developer.apple.com/hardware/ve/g5.html ) contains the following paragraph:

"Certain dependencies are not allowed within a group. For example, if a load and store in the same group are to the same address (or to addresses with the same lower 16 bits) and store forwarding cannot occur, then it is not correct to execute them concurrently, since the aliasing order might not be correct. When the processor encounters this situation, it aborts processing the instructions and breaks the dispatch group apart, and executes the instructions individually. Unfortunately, the processor can't know whether this has happen until the effective address is calculated for both, which happens near the end of execution. This abort/retry process is very expensive, since it amounts to a pipeline flush. It is seen most frequently in processes that move data from one register file to another wherein there are data size changes, such as int to float conversions. Recent versions of GCC-3.3 and later can detect and avoid this pattern. Shark will also automatically detect this pattern."

My question is, when does store-forwarding occur, and can this mitigate the dispatch-group rejection problem? Also, are there any rules of thumb for how long to wait between writing to a memory location and reading from it, or is it enough to simply place the store and load in separate dispatch groups? To give an example: if I store a 32-bit word to memory and the next instruction may incidentally read from it (back into the integer register), how bad is this, or is it only transfers between different register files that trigger the problem?

Ben


2. dispatch group rejection (G5 specific)
Phew, this isn't easy to explain without a lot of processor
architecture background. I'll try to summarize the effects as visible
in performance 'anomalies'.

The G5 doesn't like it much when a memory read fetches data that was
written to memory immediately before. Ordinarily the G5 would have
tried to perform the memory read ahead of time (simply speaking, this
is what "out of order execution" is all about), so that the data is
ready by the time it is needed. But the dependency on the preceding
memory write prevents it from performing this optimization.

To make matters worse, the G5 does not immediately notice such a worst
case situation. It will still perform the memory read ahead of time and
fetch data which is not correct with respect to program logic. Only
later will this grave mistake be noticed. Then all work done based on
the erroneous value is discarded, and the processor has to start again
from the memory read instruction.

(It's even worse than that, because due to other compromises made for
the sake of higher average speed, this worst case can cause the memory
write to be retried as well, which triggers another iteration of an
incorrect memory read. I don't know all the details, but I believe this
can happen up to three times in the absolute worst case, before the
correct program state is finally reached. From there, things proceed
full speed.)

The detection of an address match between a memory write and a
potentially dependant load is not done with the full address (again
due to reasons of speed for the common case of mismatch). So the G5 can
sometimes go through the aforementioned emergency routine even though
it would have been safe to use the data read from memory.

Hopefully this answers more questions than it raises. :-) And BTW, the G5
isn't even particularly bad as far as such performance anomalies go. These
are tradeoffs that many modern high performance CPUs had to make in order
to make the common case faster, but keep the rare case correct. In some
sense PowerPC used to be the exception, because the G4 achieved its
performance through elegance, with a comparably simple processor design.
The G5 joins with the rest of the pack, for better _and_ worse.

Holger

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden

References: 
 >RE: Moving Data in Memory (From: "Gohara, David " <email@hidden>)
 >RE: Moving Data in Memory (From: Holger Bettag <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.