Re: Does Mac OS X support interior pointers?
Re: Does Mac OS X support interior pointers?
- Subject: Re: Does Mac OS X support interior pointers?
- From: John Engelhart <email@hidden>
- Date: Mon, 7 Sep 2009 05:32:58 -0400
On Sun, Sep 6, 2009 at 9:20 PM, Quincey Morris
<email@hidden>wrote:
> On Sep 6, 2009, at 17:36, John Engelhart wrote:
>
> So, since the Mac OS X documentation uses the term "interior pointer" in a
>> totally non-standard way, and I can't find anything wrt/ to what I'm looking
>> for, my question is:
>>
>> Does the Mac OS X garbage collector support interior pointers (as defined
>> at http://www.memorymanagement.org/glossary/i.html#interior.pointer )?
>>
>
> Huh?
>
> You know it doesn't. You've beaten up on GC before, with reference to
> *both* senses of "interior pointer".
To my recollection, I have never discussed this problem on the list. I have
discussed the following: (both related to
http://developer.apple.com/mac/library/documentation/Cocoa/Conceptual/GarbageCollection/Articles/gcUsing.html#//apple_ref/doc/uid/TP40008006-SW7)
1) The problems of "Mac OS X Interior pointers", in so far that the pointer
from [data bytes] does not keep 'data' live.
2) The compiler, in GC mode, generates code that is violates C99 section
6.2.4: Storage duration of objects.
Point 1 is not contested by anyone (in so far as I know). The only point of
contention is the degree that this causes problems.
Point 2 is not up for debate- One can write an example test case of
objective-c code that when compiled with -fobjc-gc, the assembly output from
the compiler can be checked to see if it violates section 6.2.4. Whether or
not the assembly output violates 6.2.4 is unambiguous and not open to
interpretation. The referenced Mac OS X GC section essentially says this is
what happens. In fact, the referenced section effectively says that point 1
and 2 happen.
The problem I've brought up is neither point 1 or point 2.
> I certainly hope this isn't the case because this essentially means that
>> it is fundamentally impossible to write programs that execute in a
>> deterministic fashion when using GC. It is basically impossible to write
>> code that guarantees that the base pointer remains visible to the collector,
>> particularly when __weak and/or the optimizer is used. Things work the vast
>> majority of the time because there is (usually) a very small window of time
>> where it could actually cause a problem and one of two things are true: 1)
>> When the compiler generates code that, as a side effect, only uses interior
>> pointers (this happens much, much more frequently than you would think),
>> there's "something" that covers this fundamental error with a base pointer.
>> This is invariably due to a happy set of coincidences and rarely the
>> explicit, intentional result of the programmer. 2) The collector is not
>> collecting during the window of vulnerability.
>>
>
> You also know that no one else on this list agrees with you on this
> subject, so I don't understand why you're asking about it.
>
Well, since point 1 is (assumed) uncontested, that leaves point 2.
Those who disagree with me implicitly acknowledge
A) The compiler is generating code that violates the C standard.
B) Because of this, it significantly complicates writing code that uses
garbage collection.
... and I gotta say, hats off to those folks! Even if you put A aside, that
still leaves the effects of B. The dedication to make your life harder and
your programming more difficult just to 'be right' is impressive. Garbage
collection is supposed to make things simpler and easier, after all.
> *Of course* every programmer explicitly and intentionally ensures that the
> lifetimes of "base" pointers continue for as long as "derived" or "interior"
> pointers need to be valid. That's the Mac OS X garbage collection
> architecture, for now.
>
>From this statement I'm going to assume that you have very little experience
with the back ends of compilers. Either that or you didn't bother to read
my message before you responded.
You don't seem to understand that what you've just asserted is, for all
practical purposes (and especially once the optimizer kicks in), *IMPOSSIBLE
*. For example:
// Assume that 'data' is a NSData that is a ASCII string of text.
NSData *data;
const char *cString = [data bytes];
size_t length = strlen(cString);
char cStringCopy[length];
for(NSUInteger x = 0UL; x < length; x++) { cStringCopy[x] = cString[x]; }
The non-declaration usage of '[]' is a pointer operator in C. The use of
'cStringCopy' after the declaration is equivalent to '(&cStringCopy[0])'- a
pointer to the first (index 0) element of the array. This means the for
loop is equivalent to:
for(NSUInteger x = 0UL; x < length; x++) { *(cStringCopy + x) = *(cString +
x); }
It's useful to note that both 'cStringCopy' and 'cString' are "loop
invariant", usually a prime target for optimization. Once the optimizer
gets ahold of this, it will transform the code in to something like:
char *p = cStringCopy;
for(NSUInteger x = 0UL; x < length; x++) { *p++ = *cString++; }
Do you understand what the problem is? It is extremely difficult to keep
the base pointer 'live' for arbitrary pointer types, particularly those that
are indexed via the '[]' operator. The paper at
http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/boecha.ps.gz gives the
following example of optimized code generated for SPARC:
int f()
{
int *a, *b, i, sum;
a = malloc(100000 * sizeof(int));
b = malloc(100000 * sizeof(int));
for(i = 0; i < 100000; i++) {
sum += a[i] + b[i];
}
}
Compiles in to something resembling:
diff = b-a
/* diff reuses b's register, which is now dead. */
for(aptr = a; aptr < a + 100000; a++) {
sum += *aptr + *(aptr+diff);
}
It should be obvious that there are additional optimization opportunities
present. In particular, when using GC, there's no need to call free(a|b),
so the normal 'lifetime extension' past the for loop goes away... which
creates additional optimization opportunities, and the 'b' pointer
completely vanishes. From experience, the optimizer is pretty much
guaranteed to see right through any attempts to artificially extend the
lifetime of a or b past the for loop.
On some architectures and / or compiler options, getting an ivar (such as
__strong char *charP) may not be atomic. The typical sequence is:
Load self in to a register.
Add the offset to charP to self.
Dereference the computed (self + offset) pointer.
So, to generate code for something like:
-(char *)bytes { return(charP); }
This may end up resembling the following C pseudo-code:
char **selfRegister = self;
selfRegister += offsetof(charP);
charPRegister = *selfRegister;
return(charPRegister);
Or, slightly tweaked after some pass of the optimizer:
char **selfRegister; // Already contains the value of self, it is also the
only remaining ptr to self.
selfRegister += offsetof(charP);
selfRegister = *selfRegister;
return(selfRegister);
This is not an unusual code sequence to see pop out the back end of a
compiler.
If the GC system does not support ("common" definition) interior pointers,
the above code has a race condition right after 'selfRegister +=
offsetof(charP);' and before 'selfRegister = *selfRegister;'. At that point
in time there is no pointer that points to a 'base pointer'. So, in your
statement (duplicated from above):
*Of course* every programmer explicitly and intentionally ensures that the
> lifetimes of "base" pointers continue for as long as "derived" or "interior"
> pointers need to be valid. That's the Mac OS X garbage collection
> architecture, for now.
>
.. there is no way for you to write code at the C source level that
explicitly and intentionally ensures that lifetimes of the "base" pointers
continue for as long as "derived" or "interior" pointers need to be valid.
If you have even the slightest hint that this is possible, I strongly
suggest you spend some time auditing the assembly that comes out of the
compiler for your carefully crafted code. The optimizer does a fantastic
job at finding and eliminating code like this.
Even if the compiler precisely implemented the C standard and its abstract
machine, it may not be possible to write C code that works correctly the
garbage collector if the collector does not support interior pointers in the
registers and the stack. There's too much leeway at the sequence points.
And once you turn the optimizer on, all bets are off. In fact, this
condition is likely to be the first casualty of most optimizations. Even
tail end peephole optimizations are likely to whack away at it.
> Of course it's possible to forget to maintain the base pointer (or fail to
> realize that it's necessary). Of course there are a few cases where it's
> much easer to forget than is reasonable. Of course forgetting to do it opens
> up a window of vulnerability. Of course that sucks.
>
You know, I'm sort of the opinion that:
Garbage Collection Memory Management Difficulty and Effort <= Manual Memory
Management Difficultly and Effort.
If GC effort and difficultly is ever greater than manual memory management,
what's the point?
I mean, seriously... when using Mac OS X GC, it's no longer enough to write
source code that's correct- you also need to make sure the code that comes
out of the optimizer reflects the intent of the source code and that crucial
bits haven't been dead code eliminated.
> But it's no more broken now than the last time the list discussed this.
> There's nothing new here. Time to move on.
>
No, this is an entirely different problem than any that I've discussed
before. I also don't recall this problem ever being broached on this list.
In fact, until today, I had always assumed Mac OS X's GC system supported
(the "common" definition of) interior pointers. The effects of C's
interior pointers on garbage collection is a well known, well studied
problem. An excerpt from the paper "A Proposal for Garbage-Collector-Safe C
Compilation" (circa 1992, available at
http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/boecha.ps.gz )
Base Pointers and Derived Pointers
The rest of this proposal is independent of the precise definition of a base
pointer, since that depends on the particular style of garbage collector.
As illustrations, we will refer to two possibilities. First, we identify a
restrictive base pointer definition, in which only pointer values returned
by malloc (or realloc) are considered to be valid base pointers. This has
the advantage that there is a low probability of accidentally misidentifying
non-pointer data as pointers, and thus unnecessarily retaining memory.
Second, we consider a libber base pointer definition, in which a pointer to
any position inside an object, or to one past the end of the object, is
considered a valid base pointer. This requires somewhat more sophisticated
support by the collector and the allocator to be practical, and may require
more memory. But it has the advantage that otherwise arbitrary C programs
that strictly conform to [1] can be used with a garbage collector.
[1] ANSI. Programming Language C, X3.159-1989 (aka C89)
It also proposes a way to ensure the 'restrictive base pointers' is met
while keeping the 'liberal base pointer' semantics. Essentially: any time a
base pointer becomes a derived pointer, make sure that the original base
pointer remains visible. They use a combination of preprocessor macros and
the semantics of volatile, but this is a trivial transformation for a
compiler to make- calculate how many derived pointers are created from base
pointers in a basic block and allocate enough automatic storage space to
store the base pointers. Then, just before the derived pointer is created
from the base pointer, store the base pointer in its reserved slot, and make
the store + derived pointer code its own basic block and mark that block
with some kind of 'do not optimize' internal flag. Simple and cheap.
If these kind of transformations to guarantee 'restrictive base pointer'
visibility of all 'liberal base pointers'' are not made for the collector,
then the collector has to punt and assume the worst case 'liberal base
pointer' definition. The nature of C, typical coding style, and the way
that most compiler intermediate and back-ends work pretty much requires that
a C garbage collector use the 'liberal base pointer' definition. The leaves
a C garbage collection system with one of two pragmatic choices:
1) Modify the compiler to ensure that the base pointer of a derived pointer
is always written to the stack before the derived pointer is used.
2) The collector must deal with interior pointers itself.
Even choice #2 does not necessarily guarantee correctness because the
construction of interior pointers is not atomic on all architectures (ala
ppc HI/LO split loads).
There is a third choice: push the responsibility on to the programmer. The
requires an almost inhuman, prescient awareness of how the compiler (and
optimizer!) interact with your source code. It horrifically complicates
programming, and even if you try to do it, you won't know if your code is
buggy because the bugs are likely only to be tickled under extreme
conditions (ie, heavy collector pressure). Furthermore, from experience, I
can pretty much guarantee that the optimizer is going to see right through
your feeble attempts. The optimizer happens to be particularly adept at
finding and removing bits of code that contribute nothing useful to the
basic block(s) being analyzed, and any shenanigans on your part to
artificially extend the lifetime of a given pointers are, almost by
definition, useless work. Even worse, you will get no indication or warning
that your carefully crafted pointer lifetime extending code was tossed out
by the optimizer, exposing you to the problem you were trying to avoid in
the first place... and the resulting optimized code will work just fine 99%
of the time.
While tracking down yet another GC related mystery problem, it suddenly
occurred to me that the problems I was dealing with would be explained away
if my assumption about how Mac OS X dealt with interior pointers. Hence my
posting asking if anyone could clarify if interior pointer (the "common"
definition variety) break Mac OS X garbage collection.
And you know, since I asked politely and you got all sanctimonious and
implied that this is a well know and documented issue,
*Of course* every programmer explicitly and intentionally ensures that the
> lifetimes of "base" pointers continue for as long as "derived" or "interior"
> pointers need to be valid. That's the Mac OS X garbage collection
> architecture, for now.
>
I think you owe me where the following limitation is explicitly and
unambiguously documented:
__strong char *cStringP = NSAllocateCollectable(1024, 0UL);
strcpy(cStringP, "Hello, world!-Goodbye, world!");
__strong char *otherCStringP = memchr(cStringP, '-', strlen(cStringP));
In particular, why 'cStringP' must remain live and visible and that
'otherCStringP' is not enough to keep base pointer allocation at 'cStringP'
live. I've been over the documentation several times, and I definitely
skimmed parts of it (like the section on finalize), but I can't find
anything.
This is an entirely different problem than the one documented at
http://developer.apple.com/mac/library/documentation/Cocoa/Conceptual/GarbageCollection/Articles/gcUsing.html#//apple_ref/doc/uid/TP40008006-SW7
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden