Re: Threading - How its done?
Re: Threading - How its done?
- Subject: Re: Threading - How its done?
- From: Paul Sargent <email@hidden>
- Date: Fri, 9 May 2008 14:52:40 +0100
Sorry, long non-cocoa post, but maybe there some useful info for
someone.
On 7 May 2008, at 18:33, Army Research Lab wrote:
Pay
particular attention to the section titled "HDL and programming
languages".
Chip designers have had to contend with these problems for years, and
developed languages for expressing parallelism with implicit threading
already (everything in an HDL is parallel unless you carefully force
it to
be sequential). We should be using ideas from those languages.
As somebody who's day job is writing HDL I'd like to just repeat this
for emphasis, but it's not the languages that solves the race
conditions, it's the architectures employed by the engineers. I think
a lot of the problems software engineers have with with threading come
from bad architectures when viewed from a parallel execution point of
view.
Locks and Semaphores are the workaround for this, and (good) hardware
engineers (almost) never use them.
For example: Pipe-lining. If faced with a set of tasks that need to be
performed sequentially on some data blocks a software engineer might
decompose the problem like this (MIGHT, I said MIGHT):
(PA means process A, D1 means data block 1)
Thread 1: PA - D1 | PB - D1 | PC - D1 | PD - D1
Thread 2: PA - D2 | PB - D2 | PC - D2 | PD - D2
Thread 3: PA - D3 | PB - D3 | PC - D3 | PD - D3
Thread 4: PA - D4 | PB - D4 | PC - D4 | PD - D4
The thing to note is how each thread is running the same code (which
must therefore be re-entrant) on different data.
A Hardware engineer would probably do this:
Thread 1: PA - D1 | PA - D2 | PA - D3 | PA - D4
Thread 2: PB - D1 | PB - D2 | PB - D3 | PB - D4
Thread 3: PC - D1 | PC - D2 | PC - D3 | PC - D4
Thread 4: PD - D1 | PD - D2 | PD - D3 |
PD - D4
Note how the data is passed from thread to thread so only one thread
owns the data at any time (no locks necessary), and how no process is
being run in more than one thread at a time so code doesn't have to
worry about being re-entrant.
Granted there's a start-up / shut-down cost where full parallelism
isn't achieved (which is overwhelming in this example, but give it
more data blocks and it becomes negligible), and this doesn't work for
all problems, but it's a useful pattern for data-processing. The other
thing is to make sure that you're stages are of similar complexity, as
the slowest stage will define the performance of the system.
Passing the ownership of data from thread to thread would be done with
FIFOs which can also be written without locks, with some care. (e.g. http://msmvps.com/blogs/vandooren/archive/2007/01/05/creating-a-thread-safe-producer-consumer-queue-in-c-without-using-locks.aspx
, but read the comments esp. w.r.t out of order execution).
Yes there can be issues with something like this in software (passing
data between NUMA processors and non-shared caches), but believe me...
It's makes code far, far, far easier to read, write and DEBUG (unit
tests for each stage).
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden