Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: thread-local storage, especially on x86



Here's one from the archives: a few years back there was a thread about adding direct support for thread-local storage to the OS X toolchain (as distinct from pthread support). See the mail below or this archive URL for the context:

  http://lists.apple.com/archives/darwin-dev/2005/Sep/threads.html#00005

Matt Watson's comments at the time sounded moderately positive about the possibility of adding this support, but I haven't seen any more recent developments. Have there been any? Is there any prospect of support for a __thread storage class in Apple's GCC in the foreseeable future? I presume it would be useful to file it in Radar regardless?

Thanks,

Ian

On Sun, 11 Sep 2005, Gary Byers wrote:
On Sun, 11 Sep 2005, Matt Watson wrote:


On Sep 11, 2005, at 11:12 AM, Gary Byers wrote:
As I thought I explained in my message, I'm concerned about perfomance.

I'm sorry, that wasn't clear from your message, to me at least.

There's a difference between "having a dedicated register point to TLS" and "having an API for fast access to TLS." An example of
the latter is described in:


<http://people.redhat.com/drepper/tls.pdf>


So you would like our toolchain to support the __thread storage class? That is a reasonable request. Note that the implementation details in the paper are not designed to be exploited by application developers, but, rather, as a design for toolchain writers. It was not clear from your initial message that you were interested in this from any other aspect than as an application developer. Though I see now from your web site that your organization does a Common Lisp implementation, so you may fall into the latter category.

Ideally, yes, I'd like to see Apple's toolchain support TLS. I recognize that this would be a major undertaking, and want to try to to find a short-term solution.


As you can probably understand, allowing developers to inline the TLS routines greatly limits the ability to provide release-to-release binary compatibility. It would be tantamount to setting the implementation in stone for the lifetime of the ABI.


On other platforms, such as Linux, developers are expected to recompile frequently. This is not the case with Mac OS X, so we are limited in the ways we can optimize such routines, and we must make certain concessions to performance to accommodate this requirement.

I think that it's possible to expose a fast TLS mechanism, keep the pthread implementation details opaque, and maintain ABI compatibility. Achieving these goals requires careful design; a lot of effort and review and redesign went into the Linux TLS design, and I think that it does a good job of balancing these goals. (If you ask IBM and Novell and RedHat whether they expect their developers to recompile frequently and pass the costs of doing so on to their customers, my guess is that they'd say "ummm, no.") I would also expect this to also take significant time: it should.


More directly, is the overhead of pthread_getspecific() really that bad? Has it shown up in Shark samples or as a bottleneck in a critical routine for your software?

The PowerPC version version of my software is able to keep some very frequently accessed thread-specific data in registers and can afford to keep more things that're slightly-less-frequently accessed in a structure addressed by dedicated register. Those options aren't (all) available on IA-32, and I have serious doubts about the viability of a design that would reqire very frequent calls to pthread_getspecific().

Some of my concerns are certainly application-specific, and I wouldn't
expect those concerns to drive Apple's development efforts.  I
-suspect- (but don't know) that some other developers would have
difficulty agreeing with the assertion that "pthread_getspecific() is
good enough", and some of this may be hard to measure: people may
structure programs differently depending on whether direct TLS access
is available or not.

In the short term, I've pretty much convinced myself that I have to
burn a register.  The register that I'd most -like- to burn is %fs,
since it's just kind of sitting there uselessly.  In order to be
able to use it for lower-cost TLS that I think would be of critical
importance, I -think- that the Mach kernel's "interface for the
user level settable LDT entry feature" - the machine-dependent kernel
call "thread_set_user_ldt()" - would need to be extended to make the
entries it creates per-thread.  Comments in the description of that
function indicate that such an extension is considered as a future
enhancement, but express doubt that there are real reasons for that
extension.  I certainly see such reasons, and hope to be able to
convince whoever's responsible for extending thread_set_user_ldt()
to consider doing so.

Sorry if my original message didn't express all of that clearly;
I hope that it's clear now.

Gary Byers
email@hidden
www.clozure.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/email@hidden

This email sent to email@hidden

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.