Re: thread-local storage, especially on x86
Re: thread-local storage, especially on x86
- Subject: Re: thread-local storage, especially on x86
- From: Gary Byers <email@hidden>
- Date: Sun, 11 Sep 2005 14:27:39 -0600 (MDT)
On Sun, 11 Sep 2005, Matt Watson wrote:
On Sep 11, 2005, at 11:12 AM, Gary Byers wrote:
As I thought I explained in my message, I'm concerned about perfomance.
I'm sorry, that wasn't clear from your message, to me at least.
There's a difference between "having a dedicated register point to TLS" and
"having an API for fast access to TLS." An example of
the latter is described in:
<http://people.redhat.com/drepper/tls.pdf>
So you would like our toolchain to support the __thread storage class? That
is a reasonable request. Note that the implementation details in the paper
are not designed to be exploited by application developers, but, rather, as a
design for toolchain writers. It was not clear from your initial message that
you were interested in this from any other aspect than as an application
developer. Though I see now from your web site that your organization does a
Common Lisp implementation, so you may fall into the latter category.
Ideally, yes, I'd like to see Apple's toolchain support TLS. I recognize
that this would be a major undertaking, and want to try to to find a
short-term solution.
As you can probably understand, allowing developers to inline the TLS
routines greatly limits the ability to provide release-to-release binary
compatibility. It would be tantamount to setting the implementation in stone
for the lifetime of the ABI.
On other platforms, such as Linux, developers are expected to recompile
frequently. This is not the case with Mac OS X, so we are limited in the ways
we can optimize such routines, and we must make certain concessions to
performance to accommodate this requirement.
I think that it's possible to expose a fast TLS mechanism, keep the
pthread implementation details opaque, and maintain ABI compatibility.
Achieving these goals requires careful design; a lot of effort and
review and redesign went into the Linux TLS design, and I think that
it does a good job of balancing these goals. (If you ask IBM and
Novell and RedHat whether they expect their developers to recompile
frequently and pass the costs of doing so on to their customers, my
guess is that they'd say "ummm, no.") I would also expect this to
also take significant time: it should.
More directly, is the overhead of pthread_getspecific() really that bad? Has
it shown up in Shark samples or as a bottleneck in a critical routine for
your software?
The PowerPC version version of my software is able to keep some very
frequently accessed thread-specific data in registers and can afford
to keep more things that're slightly-less-frequently accessed in a
structure addressed by dedicated register. Those options aren't (all)
available on IA-32, and I have serious doubts about the viability of a
design that would reqire very frequent calls to pthread_getspecific().
Some of my concerns are certainly application-specific, and I wouldn't
expect those concerns to drive Apple's development efforts. I
-suspect- (but don't know) that some other developers would have
difficulty agreeing with the assertion that "pthread_getspecific() is
good enough", and some of this may be hard to measure: people may
structure programs differently depending on whether direct TLS access
is available or not.
In the short term, I've pretty much convinced myself that I have to
burn a register. The register that I'd most -like- to burn is %fs,
since it's just kind of sitting there uselessly. In order to be
able to use it for lower-cost TLS that I think would be of critical
importance, I -think- that the Mach kernel's "interface for the
user level settable LDT entry feature" - the machine-dependent kernel
call "thread_set_user_ldt()" - would need to be extended to make the
entries it creates per-thread. Comments in the description of that
function indicate that such an extension is considered as a future
enhancement, but express doubt that there are real reasons for that
extension. I certainly see such reasons, and hope to be able to
convince whoever's responsible for extending thread_set_user_ldt()
to consider doing so.
Sorry if my original message didn't express all of that clearly;
I hope that it's clear now.
Gary Byers
email@hidden
www.clozure.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden