Re: Problem on dual-processor Macs
Re: Problem on dual-processor Macs
- Subject: Re: Problem on dual-processor Macs
- From: Eric Dahlman <email@hidden>
- Date: Thu, 25 Sep 2003 10:27:53 -0500
Howdy,
Kaelin, Wade and Dan have all given good answers to this question on
the hardware level, there is just a couple of little compiler and
algorithm level points I would like to add.
On Thursday, September 25, 2003, at 04:47 AM, Tim Hewett wrote:
[snip]
These indexes are C 'int' data types, so are signed 32-bit
integers at the CPU instruction level. I would expect that
incrementing a 32-bit integer to be an atomic operation,
similarly setting one to 0 being atomic too. The increment-
or-wrap code is:
if ( out == bufsize - 1 )
out = 0;
else
out++;
Point #1: You did not say whether you had declared out and bufsize to
be volatile or not. If it is not declared volatile then the compiler
is free to keep the value in a register and then sync up the memory
location at its convenience. If you are running this code in a loop it
would actually be a good idea for the compiler to keep that value in a
register to save going out to memory.
Point #2: C and its descendants do not make any promises about when the
results of assignments within an expression occurs. Instead they talk
about synchronization points which generally fall between statements.
That means something like
. <- Sync point
total = a++ + b++ + ++c - ++d;
.<- sync point
will change the values of all the variables involved but there is no
rule about when the changes within the expression occur. Just that
they will be done by the time the program reaches the next sync point.
Point #3: On multiprocessor machines keeping memory synched between
processors is "expensive". The reason is that if you have two
processors fussing over the same word in memory they cannot practically
store that word in cache since each processor has a separate cache
which the other cannot see. There are lots of ways this is addressed
and I don't know which route was chosen for this architecture but it
really doesn't matter because however it was done it will be
"expensive" by comparison.
Design your algorithm with this constraint in mind. For your present
case instead of a single ring buffer it may well be a better design to
use a ring of buffers. Then for most operations each processor would
be operating in its own space and not stepping on the other's toes,
only the operations of moving form one buffer to the next need to be
properly synched.
Now it may well be that having a little bit o' buffering is not
acceptable in your application and you really need to be able to run in
the same ring buffer with all its hardware implications. Well then I
would suggest that you look at slightly altering the way it works to be
more forgiving of synchronization issues. A simple solution here may
be to introduce a gap into the buffer so you can tolerate the possible
error in what you are reading. If say you are keeping track of the
first and last element and you might be off by one for each value
depending on where the other process is in its execution if you could
always keep a gap of two elements free in your data structure and then
not need to synchronize at all.
So I guess the short of it is that the others told you how to guarantee
automaticity and I am telling you to alter your algorithm to tolerate a
bit of inaccuracy in place of automaticity. Both approaches have there
good and bad points so you need to make a rational decision for which
way you will go and why.
Hope that helps,
-Eric
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.