Re: thread-local storage, especially on x86
site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com http://lists.apple.com/archives/darwin-dev/2005/Sep/threads.html#00005 Thanks, Ian On Sun, 11 Sep 2005, Matt Watson wrote: I'm sorry, that wasn't clear from your message, to me at least. <http://people.redhat.com/drepper/tls.pdf> Ideally, yes, I'd like to see Apple's toolchain support TLS. I recognize that this would be a major undertaking, and want to try to to find a short-term solution. I think that it's possible to expose a fast TLS mechanism, keep the pthread implementation details opaque, and maintain ABI compatibility. Achieving these goals requires careful design; a lot of effort and review and redesign went into the Linux TLS design, and I think that it does a good job of balancing these goals. (If you ask IBM and Novell and RedHat whether they expect their developers to recompile frequently and pass the costs of doing so on to their customers, my guess is that they'd say "ummm, no.") I would also expect this to also take significant time: it should. The PowerPC version version of my software is able to keep some very frequently accessed thread-specific data in registers and can afford to keep more things that're slightly-less-frequently accessed in a structure addressed by dedicated register. Those options aren't (all) available on IA-32, and I have serious doubts about the viability of a design that would reqire very frequent calls to pthread_getspecific(). Some of my concerns are certainly application-specific, and I wouldn't expect those concerns to drive Apple's development efforts. I -suspect- (but don't know) that some other developers would have difficulty agreeing with the assertion that "pthread_getspecific() is good enough", and some of this may be hard to measure: people may structure programs differently depending on whether direct TLS access is available or not. In the short term, I've pretty much convinced myself that I have to burn a register. The register that I'd most -like- to burn is %fs, since it's just kind of sitting there uselessly. In order to be able to use it for lower-cost TLS that I think would be of critical importance, I -think- that the Mach kernel's "interface for the user level settable LDT entry feature" - the machine-dependent kernel call "thread_set_user_ldt()" - would need to be extended to make the entries it creates per-thread. Comments in the description of that function indicate that such an extension is considered as a future enhancement, but express doubt that there are real reasons for that extension. I certainly see such reasons, and hope to be able to convince whoever's responsible for extending thread_set_user_ldt() to consider doing so. Sorry if my original message didn't express all of that clearly; I hope that it's clear now. Gary Byers gb@clozure.com www.clozure.com _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/list-darwin-dev%40lister.d... This email sent to list-darwin-dev@lister.dnsalias.net _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... Here's one from the archives: a few years back there was a thread about adding direct support for thread-local storage to the OS X toolchain (as distinct from pthread support). See the mail below or this archive URL for the context: Matt Watson's comments at the time sounded moderately positive about the possibility of adding this support, but I haven't seen any more recent developments. Have there been any? Is there any prospect of support for a __thread storage class in Apple's GCC in the foreseeable future? I presume it would be useful to file it in Radar regardless? On Sun, 11 Sep 2005, Gary Byers wrote: On Sep 11, 2005, at 11:12 AM, Gary Byers wrote: As I thought I explained in my message, I'm concerned about perfomance. There's a difference between "having a dedicated register point to TLS" and "having an API for fast access to TLS." An example of the latter is described in: So you would like our toolchain to support the __thread storage class? That is a reasonable request. Note that the implementation details in the paper are not designed to be exploited by application developers, but, rather, as a design for toolchain writers. It was not clear from your initial message that you were interested in this from any other aspect than as an application developer. Though I see now from your web site that your organization does a Common Lisp implementation, so you may fall into the latter category. As you can probably understand, allowing developers to inline the TLS routines greatly limits the ability to provide release-to-release binary compatibility. It would be tantamount to setting the implementation in stone for the lifetime of the ABI. On other platforms, such as Linux, developers are expected to recompile frequently. This is not the case with Mac OS X, so we are limited in the ways we can optimize such routines, and we must make certain concessions to performance to accommodate this requirement. More directly, is the overhead of pthread_getspecific() really that bad? Has it shown up in Shark samples or as a bottleneck in a critical routine for your software? This email sent to site_archiver@lists.apple.com
participants (1)
-
Ian Lister