Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Moving Data in Memory




On Oct 18, 2004, at 12:50 PM, Ben Weiss wrote:

My question is, when does store-forwarding occur, and can this mitigate the dispatch-group rejection problem?  Also, are there any rules of thumb for how long to wait between writing to a memory location and reading from it, or is it enough to simply place the store and load in separate dispatch groups?  To give an example: if I store a 32-bit word to memory and the next instruction may incidentally read from it (back into the integer register), how bad is this, or is it only transfers between different register files that trigger the problem?

A dispatch group is a series of up to 5 consecutive instructions from the application's instruction stream. The fifth instruction, if there is one, must be a branch. As long as the store and load are in different dispatch groups you wont get a reject, though other lesser stalls may still occur. GCC works around the rejects in some cases by padding with up to three noops between store and dependent load.  In your own code, you can do this somewhat more efficiently by unrolling the loop by four or more and do
    
    store1
    store2
    store3
    store4
    load1
    load2
    load3
    load4


In this way, the dependent loads are guaranteed not to fall in the same dispatch group as the store. 

As far as store forwarding goes, as long as the dependent load is not in the same dispatch group, the reject should not occur, so that is what I'd shoot for.  There is possibly some latency between when a store finishes and when the data is available for store forwarding. I haven't measured that latency on G5. The latency on G4 (7450) was about 6 cycles. 6 cycles is a very small problem compared to a dispatch reject (particularly on a highly out of order machine like G5), so most of the win is gotten fixing the dispatch reject. Most of the time, this problem happens in int<->float conversions and scalar<->vector data moves.  Apart from cases such as that that require this kind of data movement, the compiler usually does not gratuitously spill registers only to load them back immediately, except for maybe with -O0. With -O0, store forwarding may indeed save the G5's bacon. Personally, I haven't seen a case where store forwarding actually does prevent a dispatch reject, but as I just said, the compiler may not emit them frequently and I haven't been looking for them. (Only by mistake would I be tracing code compiled with optimizations off, and I don't recall seeing that pattern outside of the above described inter-register file moves.)  If you want to experiment, I suggest getting out SimG5 to look at your code. That cycle accurate cpu simulator should show you exactly what is supposed to happen.  It is installed as part of CHUD.  Some instructions for getting it to work are here:

    http://developer.apple.com/hardware/ve/g5.html#simulator

When writing hand tuned code that requires this sort of data motion, I typically let the data rest after the store for a whole loop iteration before loading it back in at the N+2nd iteration. This typically happens in some code that has been software pipelined (http://developer.apple.com/hardware/ve/software_pipelining.html), and is accomplished by simply inserting a software stage in the algorithm that does nothing between store and load. Such code is typically for int<->float conversions.  The best I've done is scalar code that outperforms the compiler (with naive int<->float conversion by typecast) by 33-fold. I think that was before gcc-3.3, though. More recently, I've been beating it by 6-12x for this sort of simple function. Most of the win however was no doubt using fctiwz to do saturation clipping -- typically a requirement of such functions -- rather than rely on the compiler to do it, so wins are likely smaller than that, perhaps 2x.

Ian
Ian
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >RE: Moving Data in Memory (From: "Gohara, David " <email@hidden>)
 >RE: Moving Data in Memory (From: Holger Bettag <email@hidden>)
 >Re: Moving Data in Memory (From: Ben Weiss <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.