| |||
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] |
2. dispatch group rejection (G5 specific)
Phew, this isn't easy to explain without a lot of processor
architecture background. I'll try to summarize the effects as visible
in performance 'anomalies'.
The G5 doesn't like it much when a memory read fetches data that was
written to memory immediately before. Ordinarily the G5 would have
tried to perform the memory read ahead of time (simply speaking, this
is what "out of order execution" is all about), so that the data is
ready by the time it is needed. But the dependency on the preceding
memory write prevents it from performing this optimization.
To make matters worse, the G5 does not immediately notice such a worst
case situation. It will still perform the memory read ahead of time and
fetch data which is not correct with respect to program logic. Only
later will this grave mistake be noticed. Then all work done based on
the erroneous value is discarded, and the processor has to start again
from the memory read instruction.
(It's even worse than that, because due to other compromises made for
the sake of higher average speed, this worst case can cause the memory
write to be retried as well, which triggers another iteration of an
incorrect memory read. I don't know all the details, but I believe this
can happen up to three times in the absolute worst case, before the
correct program state is finally reached. From there, things proceed
full speed.)
The detection of an address match between a memory write and a
potentially dependant load is not done with the full address (again
due to reasons of speed for the common case of mismatch). So the G5 can
sometimes go through the aforementioned emergency routine even though
it would have been safe to use the data read from memory.
Hopefully this answers more questions than it raises. :-) And BTW, the G5
isn't even particularly bad as far as such performance anomalies go. These
are tradeoffs that many modern high performance CPUs had to make in order
to make the common case faster, but keep the rare case correct. In some
sense PowerPC used to be the exception, because the G4 achieved its
performance through elegance, with a comparably simple processor design.
The G5 joins with the rest of the pack, for better _and_ worse.
Holger
_______________________________________________ Do not post admin requests to the list. They will be ignored. PerfOptimization-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden This email sent to email@hidden
| References: | |
| >RE: Moving Data in Memory (From: "Gohara, David " <email@hidden>) | |
| >RE: Moving Data in Memory (From: Holger Bettag <email@hidden>) |
| Home | Archives | FAQ | Terms/Conditions | Contact | RSS | Lists | About |
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE
Contact Apple | Terms of Use | Privacy Policy
Copyright © 2007 Apple Inc. All rights reserved.