On Mon, 18 Oct 2004, Ben Weiss wrote:
> My question is, when does store-forwarding occur, and can this mitigate
> the dispatch-group rejection problem?
Store forwarding cannot help with dispatch group rejection, because
1. a group is not only dispatched atomically, it must also complete
atomically (i.e. the results and side effects of all instructions of
one group are either committed all or none, where "commit" means they
turn "real" in the sense that they change from vague future
possibilities into immutable historic events.)
2. store forwarding can only occur after a store has been committed
So the problem arises that the load would need forwarding to be correct,
and the store would need to commit to be able to forward its data, but
they are both in a single dispatch group. This leads to a circular
dependency. It is resolved by replaying execution with different
istruction grouping, such that the store and the dependant load can commit
independently of each other.
I _believe_ (there is still no public documentation available) that the
"emergency mode" is for the dispatcher to form one special group with a
single instruction, then resume normally with up to four regular
instructions plus possibly a branch per group. So if store was instruction
number three, and load was instruction number four, there'll be another
group rejection, but we did proceed by one instruction, so eventually
the store-load dependency will be broken and properly resolved.
> Also, are there any rules of
> thumb for how long to wait between writing to a memory location and
> reading from it, or is it enough to simply place the store and load in
> separate dispatch groups?
Separate dispatch groups is enough to avoid the worst case group
rejection. It could still happen that the group with the load is replayed,
because the store might not have progressed far enough to forward its data
(again, no hard facts on this are publicly available ... yet?) but at
least the store has been committed and continues to make progress even if
the load were to be replayed.
So in general, if there are at least three instructions between store and
dependent load, no worst case rejects can happen.
> To give an example: if I store a 32-bit word
> to memory and the next instruction may incidentally read from it (back
> into the integer register), how bad is this, or is it only transfers
> between different register files that trigger the problem?
>
In three of four cases, that's as bad as described, and in the remaining
case you are so lucky as to have the store and load end up in different
groups. The source and target register file is not relevant, as far as I
know.
I think Jon Stoke's article on Ars Technica is still one of the best
available sources of information so far:
http://arstechnica.com/cpu/02q2/ppc970/ppc970-1.html
Holger
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
This email sent to email@hidden