Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Newbie query re multithreading

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Newbie query re multithreading

Subject: Re: Newbie query re multithreading
From: Bill Bumgarner <email@hidden>
Date: Tue, 18 Aug 2009 10:46:21 -0700

On Aug 18, 2009, at 10:19 AM, McLaughlin, Michael P. wrote:

On 8/18/09 11:34 AM, "Bill Bumgarner" <email@hidden> wrote:
On Aug 18, 2009, at 7:25 AM, McLaughlin, Michael P. wrote:
1) With 1 CPU (NSOperation not used), I get 50% of CPU-time devoted to auto_collection_thread which I presume means Garbage Collection. Is this normal? It seems excessive.
It sounds like your algorithm is doing a tremendous amount of memory
thrashing as it runs;  allocating and deallocating tons of objects.
Under GC on Leopard, the collector will spend a ton of CPU cycles
trying to keep up.
I suspect that this much is true. This code contains a huge number of matrices and vectors, allocated locally and allowed to go out of scope without explicit freeing (given GC). I am not sure of the internals of the Eigen library but that seems to be nearly all templated headers.

(Forgive me if I'm rehashing some basics here -- I just want to make sure we

The only way that C++ objects would be visible to the collector is if you overrode the class(es) constructor to allocate memory from the collector's zone via NSAllocateCollectable().

Otherwise, the C++ objects aren't visible to the collector.

Are your matrices and vectors allocated as Objective-C objects?

2) With 2 CPUs, I get 26% of CPU-time for auto_collection_thread and another 26% for mach_msg_trap --> CFRunLoopSpecific (apparently back to Carbon again).
CF is CoreFoundation, not Carbon.
Yes, but CFRunLoopSpecific leads to RunCurrentEventLoopInMode which is
HIToolbox (supposedly Carbon).

The internals of CFRunLoop are an implementation detail subject to change at any time. :)

Do you have something in your concurrent code that is sending messages back to the main event loop or posting events to other thread's run loops? This can be problematic, too, in that you can end up consuming tons of CPU cycles in message passing.
At the end of each NSOperation, there is a single enqueueNotification just before the thread main() exits [for higher-level bookkeeping]. I use enqueue instead of post because it is asynchronous. Otherwise, the calls would pile up and the caller stack frame would not terminate until the whole program was finished.


OK -- that sounds reasonable.

How about data access? I.e. do multiple threads update some subset of objects simultaneously? Do you have lock contention along those code paths?

If your compute threads are syncrhonously -- waitUntilExit: YES'ing -- between threads, you can quite easily see a "reverse scaling" performance profile as you are now.
Not really sure what this means. The caller, at each timestep, looks like this:


A common -- and often fatally non-performant -- pattern is to use:

[someView performSelectorOnMainThread: @selector(updateNowPlease:) withObject: self waitUntilDone: YES];

If the main event loop is processing lots of these or is otherwise blocked, this pattern very quickly becomes a massive bottleneck.

Nwaiting = numProcessors; for (int op = 0;op < numProcessors;op++) { compOperation *thisOp = [[compOperation alloc] initWithCompData:&compData ID:op step:timeNdx user0:firstUser[op] userN:lastUser[op] queue:[NSNotificationQueue defaultQueue]]; [opQueue addOperation:thisOp]; }

More-or-less copied from Apple's NSOperation sample code. As noted, each operation enqueues an "I'm finished" message to this caller. Nwaiting is decremented upon receipt. compData is a read-only structure containing global data.


That looks generally reasonable.

The one red flag is numProcessors. Generally, you shouldn't design concurrency around the # of processors on Mac OS X. The operation queue (and, in Snow Leopard, Grand Central Dispatch -- http://www.apple.com/macosx/technology/#grandcentral ) should throttle appropriately to maximize throughput and do so in consideration of other applications on the system.

Do you have access to Snow Leopard?  If so, use it.  The tool chain
for analyzing and improving concurrent applications is vastly
improved.   Even if you are going to continue to target Leopard, the
analysis and debugging improvements will make your job easier.
Yes and no. I do have it, at home, but have not installed it because I have only a single Intel Mac and I was unsure of the feasibility of installing two Developer folders on a single hard-drive partition. Sounded like trouble to me ;-)


To run the SL dev tools, you need to have SL installed.

Mac OS X will boot quite happily from an external hard drive, a second partition or -- even -- an SDHC card shoved in a USB based SD reader (I occasionally boot my MacBook Pro from an SD card shoved in an ExpressCard/34 reader).

Here, at work where I develop the app under discussion, I have only a G5 and am unlikely to improve on that in the foreseeable future.


You wouldn't happen to be writing a 64 bit application, would you?

b.bum

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: Newbie query re multithreading
From: "McLaughlin, Michael P." <email@hidden>


Prev by Date:
Re: Allowing error-prone code line to persist in app

Next by Date:
Re: Newbie query re multithreading

Previous by thread:
Re: Newbie query re multithreading

Next by thread:
Re: Newbie query re multithreading

Index(es):

Date
Thread