Please do file a RADAR bug on that.
If stack growth hits a linear search and lock, that can be a big problem for recursive functions in threads (quicksort is the easiest example).
(if it were just alloca, then I wouldn't worry - it's known to be evil)
Many things that are "thread safe" are not necessarily "thread friendly".
I got bitten by memset -- something trivial, and completely thread safe. But someone optimized memset by calling into the VM system and setting pages to constant values, which introduces locks and completely hoses the performance of threads that call memset with more than 4k of data at a time. It was a cute speedup when using a single processor, and didn't show a big slowdown with 2 processors, but on 4 and more it really becomes a problem. memcmp and memcpy appear to have similar optimizations.
Unfortunately, the only way to find these things is to try them - then file bug reports when you find bogus behavior.
Oh, yeah -- and you'll find very different behavior on each OS, and sometimes different versions of each OS.
Chris
-----Original Message-----
From: perfoptimization-dev-bounces+ccox=email@hidden on behalf of Ron Avitzur
Sent: Tue 4/15/2008 10:04 AM
To: email@hidden
Subject: Re: StackSpace / pthread_get_stackaddr_np contention
I received the following reply. While working around this particular
problem will be easy, it makes me wonder what other system calls I
use, though thread-safe might in the future introduce contention.
At 9:27 AM +0100 4/15/08, Quinn wrote:
>>Did StackSpace or pthread_get_stackaddr_np change recently on
>>8-core machines?
>
>It looks like it changed significantly in 10.5. In 10.4 it was a
>simple field access.
>
><http://www.opensource.apple.com/darwinsource/10.4/Libc-391/pthreads/pthread.c>
>
>In 10.5 it's doing a linear search of the threads list (which is
>protected by a spin lock).
>
><http://www.opensource.apple.com/darwinsource/10.5/Libc-498/pthreads/pthread.c>
>
>Not good for you I'm afraid. This slowdown is definitely bugworthy
>IMHO; while I'm sure there was a good reason for the change, such a
>radical slowdown on a routine that is likely to be called often is
>just bad.
>
>As to a workaround, I suggest you call this routine once at the top
>level, cache the result is a per-thread variable, and then do your
>own calculations based on that and the current stack poiner.
>
>S+E
>--
>Quinn "The Eskimo!" <http://www.quinn.echidna.id.au/Quinn/WWW/>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden