Re: Newbie query re multithreading
Re: Newbie query re multithreading
- Subject: Re: Newbie query re multithreading
- From: "McLaughlin, Michael P." <email@hidden>
- Date: Tue, 18 Aug 2009 15:18:27 -0400
- Acceptlanguage: en-US
- Thread-topic: Newbie query re multithreading
On 8/18/09 1:46 PM, "Bill Bumgarner" <email@hidden> wrote:
> On Aug 18, 2009, at 10:19 AM, McLaughlin, Michael P. wrote:
>
>> On 8/18/09 11:34 AM, "Bill Bumgarner" <email@hidden> wrote:
>>
>>> On Aug 18, 2009, at 7:25 AM, McLaughlin, Michael P. wrote:
>>>
>>>> 1) With 1 CPU (NSOperation not used), I get 50% of CPU-time
>>>> devoted to
>>>> auto_collection_thread which I presume means Garbage Collection. Is
>>>> this
>>>> normal? It seems excessive.
>>>
>>> It sounds like your algorithm is doing a tremendous amount of memory
>>> thrashing as it runs; allocating and deallocating tons of objects.
>>> Under GC on Leopard, the collector will spend a ton of CPU cycles
>>> trying to keep up.
>>
>> I suspect that this much is true. This code contains a huge number of
>> matrices and vectors, allocated locally and allowed to go out of scope
>> without explicit freeing (given GC). I am not sure of the internals
>> of the
>> Eigen library but that seems to be nearly all templated headers.
>
> The only way that C++ objects would be visible to the collector is if
> you overrode the class(es) constructor to allocate memory from the
> collector's zone via NSAllocateCollectable().
>
> Otherwise, the C++ objects aren't visible to the collector.
>
> Are your matrices and vectors allocated as Objective-C objects?
No. I just re-read the Garbage Collector docs and it states there that the
malloc zone is ignored by GC and the Eigen matrix lib is pure C++. On the
other hand, this means that there are thousands of mallocs.
>>> Do you have something in your concurrent code that is sending
>>> messages
>>> back to the main event loop or posting events to other thread's run
>>> loops? This can be problematic, too, in that you can end up
>>> consuming tons of CPU cycles in message passing.
>>
>> At the end of each NSOperation, there is a single
>> enqueueNotification just
>> before the thread main() exits [for higher-level bookkeeping]. I use
>> enqueue instead of post because it is asynchronous. Otherwise, the
>> calls
>> would pile up and the caller stack frame would not terminate until
>> the whole
>> program was finished.
>
> OK -- that sounds reasonable.
>
> How about data access? I.e. do multiple threads update some subset of
> objects simultaneously? Do you have lock contention along those code
> paths?
Actually, there *is* one MPCriticalRegion [for occasional writing] but, when
I stubbed it out, the poor performance was unchanged and Instruments showed
0% CPU for that function.
>>The caller, at each timestep, looks like this:
>> Nwaiting = numProcessors;
>> for (int op = 0;op < numProcessors;op++) {
>> compOperation *thisOp = [[compOperation alloc]
>> initWithCompData:&compData
>> ID:op
>> step:timeNdx
>> user0:firstUser[op]
>> userN:lastUser[op]
>> queue:[NSNotificationQueue defaultQueue]];
>> [opQueue addOperation:thisOp];
>> }
>>
>
> That looks generally reasonable.
>
> The one red flag is numProcessors. Generally, you shouldn't design
> concurrency around the # of processors on Mac OS X. The operation
> queue (and, in Snow Leopard, Grand Central Dispatch --
> http://www.apple.com/macosx/technology/#grandcentral
> ) should throttle appropriately to maximize throughput and do so in
> consideration of other applications on the system.
This is something of a holdover from my earlier, C++/Carbon version. There,
I declared std::vector<Assistant>, one for each processor and, at the
beginning of the program, gave them all copies of the global data plus their
portion of the input data. Thus, they were independent from that point on.
That code runs 18x faster than my present Cocoa version.
Logically, the choice is
i) NItems/Ncpus compOperations (see above)
vs.
ii) NItems compOperations.
Since each operation requires some setup, I thought that proliferating
operations would just increase overhead proportionately. Currently, one
compOperation processes a list of Items [each = 2-3 pages of matrix
equations].
>
>>> Do you have access to Snow Leopard? If so, use it. The tool chain
>>> for analyzing and improving concurrent applications is vastly
>>> improved. Even if you are going to continue to target Leopard, the
>>> analysis and debugging improvements will make your job easier.
>>>
>>
>> Yes and no. I do have it, at home, but have not installed it
>> because I have
>> only a single Intel Mac and I was unsure of the feasibility of
>> installing
>> two Developer folders on a single hard-drive partition. Sounded like
>> trouble to me ;-)
>
> To run the SL dev tools, you need to have SL installed.
>
> Mac OS X will boot quite happily from an external hard drive, a second
> partition or -- even -- an SDHC card shoved in a USB based SD reader
> (I occasionally boot my MacBook Pro from an SD card shoved in an
> ExpressCard/34 reader).
>
>> Here, at work where I develop the app under discussion, I have only
>> a G5 and am unlikely to improve on that in the foreseeable future.
>
> You wouldn't happen to be writing a 64 bit application, would you?
No.
I'll have to see if I have a spare hard-drive somewhere to use to boot SL.
Looks like my real problem is to identify synchronization bottlenecks. Is
there an Instrument that makes these apparent, i.e., traceable back to code?
Otherwise, I will have to peruse my 26 source files and hope to spot them
somehow.
--
Mike McLaughlin
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden