Re: thread-local storage, especially on x86
Re: thread-local storage, especially on x86
- Subject: Re: thread-local storage, especially on x86
- From: Ian Lister <email@hidden>
- Date: Wed, 2 Jul 2008 08:54:03 +1000 (EST)
Here's one from the archives: a few years back there was a thread about
adding direct support for thread-local storage to the OS X toolchain (as
distinct from pthread support). See the mail below or this archive URL for
the context:
http://lists.apple.com/archives/darwin-dev/2005/Sep/threads.html#00005
Matt Watson's comments at the time sounded moderately positive about the
possibility of adding this support, but I haven't seen any more recent
developments. Have there been any? Is there any prospect of support for a
__thread storage class in Apple's GCC in the foreseeable future? I presume
it would be useful to file it in Radar regardless?
Thanks,
Ian
On Sun, 11 Sep 2005, Gary Byers wrote:
On Sun, 11 Sep 2005, Matt Watson wrote:
On Sep 11, 2005, at 11:12 AM, Gary Byers wrote:
As I thought I explained in my message, I'm concerned about perfomance.
I'm sorry, that wasn't clear from your message, to me at least.
There's a difference between "having a dedicated register point to TLS"
and "having an API for fast access to TLS." An example of
the latter is described in:
<http://people.redhat.com/drepper/tls.pdf>
So you would like our toolchain to support the __thread storage class? That
is a reasonable request. Note that the implementation details in the paper
are not designed to be exploited by application developers, but, rather, as
a design for toolchain writers. It was not clear from your initial message
that you were interested in this from any other aspect than as an
application developer. Though I see now from your web site that your
organization does a Common Lisp implementation, so you may fall into the
latter category.
Ideally, yes, I'd like to see Apple's toolchain support TLS. I recognize
that this would be a major undertaking, and want to try to to find a
short-term solution.
As you can probably understand, allowing developers to inline the TLS
routines greatly limits the ability to provide release-to-release binary
compatibility. It would be tantamount to setting the implementation in
stone for the lifetime of the ABI.
On other platforms, such as Linux, developers are expected to recompile
frequently. This is not the case with Mac OS X, so we are limited in the
ways we can optimize such routines, and we must make certain concessions to
performance to accommodate this requirement.
I think that it's possible to expose a fast TLS mechanism, keep the
pthread implementation details opaque, and maintain ABI compatibility.
Achieving these goals requires careful design; a lot of effort and
review and redesign went into the Linux TLS design, and I think that
it does a good job of balancing these goals. (If you ask IBM and
Novell and RedHat whether they expect their developers to recompile
frequently and pass the costs of doing so on to their customers, my
guess is that they'd say "ummm, no.") I would also expect this to
also take significant time: it should.
More directly, is the overhead of pthread_getspecific() really that bad?
Has it shown up in Shark samples or as a bottleneck in a critical routine
for your software?
The PowerPC version version of my software is able to keep some very
frequently accessed thread-specific data in registers and can afford
to keep more things that're slightly-less-frequently accessed in a
structure addressed by dedicated register. Those options aren't (all)
available on IA-32, and I have serious doubts about the viability of a
design that would reqire very frequent calls to pthread_getspecific().
Some of my concerns are certainly application-specific, and I wouldn't
expect those concerns to drive Apple's development efforts. I
-suspect- (but don't know) that some other developers would have
difficulty agreeing with the assertion that "pthread_getspecific() is
good enough", and some of this may be hard to measure: people may
structure programs differently depending on whether direct TLS access
is available or not.
In the short term, I've pretty much convinced myself that I have to
burn a register. The register that I'd most -like- to burn is %fs,
since it's just kind of sitting there uselessly. In order to be
able to use it for lower-cost TLS that I think would be of critical
importance, I -think- that the Mach kernel's "interface for the
user level settable LDT entry feature" - the machine-dependent kernel
call "thread_set_user_ldt()" - would need to be extended to make the
entries it creates per-thread. Comments in the description of that
function indicate that such an extension is considered as a future
enhancement, but express doubt that there are real reasons for that
extension. I certainly see such reasons, and hope to be able to
convince whoever's responsible for extending thread_set_user_ldt()
to consider doing so.
Sorry if my original message didn't express all of that clearly;
I hope that it's clear now.
Gary Byers
email@hidden
www.clozure.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden