site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com Mail-followup-to: darwin-dev@lists.apple.com User-agent: Mutt/1.5.6+20040907i On 09/11/05 Matt Watson wrote:
So you would like our toolchain to support the __thread storage class? That is a reasonable request. Note that the implementation
I don't have access to OSX-x86, but if it doesn't support __thread storage, I second the request.
details in the paper are not designed to be exploited by application developers, but, rather, as a design for toolchain writers. It was
The implementation details are not interesting to normal application developers, but the availability of a fast implementation of __thread is important to many (in the mono runtime this can have a performance benefit of 5-10% in some workloads, but it's also used in garbage collectors, opengl drivers etc.).
As you can probably understand, allowing developers to inline the TLS routines greatly limits the ability to provide release-to-release binary compatibility. It would be tantamount to setting the implementation in stone for the lifetime of the ABI.
The __thread support detailed in the paper _is_ part of the ABI, just like the call convention is. Note also that __thread support has no relation to pthread_getspecific(), except that they do similar things. So you can keep the implementation details of pthread_getspecific() hidden in the libc^WlibSystem library, but the details of the __thread implementation need to be public, since they are part of the ABI. I hope Apple will just reuse the existing design and support for __thread that exists on x86 Linux (at least the gcc code should be readily shared, of course the loader and linker will need changes too).
On other platforms, such as Linux, developers are expected to recompile frequently. This is not the case with Mac OS X, so we are limited in the ways we can optimize such routines, and we must make certain concessions to performance to accommodate this requirement.
The TLS support is part of the ABI, as such it is fixed, there are no recompilations required.
More directly, is the overhead of pthread_getspecific() really that bad? Has it shown up in Shark samples or as a bottleneck in a critical routine for your software?
On Mono it shows up in profiles if __thread is not supported (or disabled manually). I bet any JIT with a good garbage collector will suffer for this, since, for example, allocating an object on the thread local heap will change from a pointer increase to a function call to get the pointer first. In the Mono runtime we have also other uses for TLS data (appdomain isolation), but other runtimes will have similar issues. lupus -- ----------------------------------------------------------------- lupus@debian.org debian/rules lupus@ximian.com Monkeys do it better _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... This email sent to site_archiver@lists.apple.com