Re: thread-local storage, especially on x86
Re: thread-local storage, especially on x86
- Subject: Re: thread-local storage, especially on x86
- From: Paolo Molaro <email@hidden>
- Date: Mon, 12 Sep 2005 12:13:08 +0200
- Mail-followup-to: email@hidden
On 09/11/05 Matt Watson wrote:
> So you would like our toolchain to support the __thread storage
> class? That is a reasonable request. Note that the implementation
I don't have access to OSX-x86, but if it doesn't support __thread
storage, I second the request.
> details in the paper are not designed to be exploited by application
> developers, but, rather, as a design for toolchain writers. It was
The implementation details are not interesting to normal application
developers, but the availability of a fast implementation of __thread is
important to many (in the mono runtime this can have a performance
benefit of 5-10% in some workloads, but it's also used in garbage
collectors, opengl drivers etc.).
> As you can probably understand, allowing developers to inline the TLS
> routines greatly limits the ability to provide release-to-release
> binary compatibility. It would be tantamount to setting the
> implementation in stone for the lifetime of the ABI.
The __thread support detailed in the paper _is_ part of the ABI, just
like the call convention is. Note also that __thread support has no
relation to pthread_getspecific(), except that they do similar things.
So you can keep the implementation details of pthread_getspecific()
hidden in the libc^WlibSystem library, but the details of the __thread
implementation need to be public, since they are part of the ABI.
I hope Apple will just reuse the existing design and support for
__thread that exists on x86 Linux (at least the gcc code should be
readily shared, of course the loader and linker will need changes too).
> On other platforms, such as Linux, developers are expected to
> recompile frequently. This is not the case with Mac OS X, so we are
> limited in the ways we can optimize such routines, and we must make
> certain concessions to performance to accommodate this requirement.
The TLS support is part of the ABI, as such it is fixed, there are no
recompilations required.
> More directly, is the overhead of pthread_getspecific() really that
> bad? Has it shown up in Shark samples or as a bottleneck in a
> critical routine for your software?
On Mono it shows up in profiles if __thread is not supported (or
disabled manually). I bet any JIT with a good garbage collector will
suffer for this, since, for example, allocating an object on the thread
local heap will change from a pointer increase to a function call to get
the pointer first. In the Mono runtime we have also other uses for TLS data
(appdomain isolation), but other runtimes will have similar issues.
lupus
--
-----------------------------------------------------------------
email@hidden debian/rules
email@hidden Monkeys do it better
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden