Re: Garbage collector vs variable lifetime
Re: Garbage collector vs variable lifetime
- Subject: Re: Garbage collector vs variable lifetime
- From: John Engelhart <email@hidden>
- Date: Sat, 7 Jun 2008 11:03:37 -0400
On Jun 6, 2008, at 7:27 PM, Quincey Morris wrote:
On Jun 6, 2008, at 15:48, Bill Bumgarner wrote:
The garbage collector does not currently interpret inner pointers
as something that can keep an encapsulating object alive. Thus,
the behavior is undefined and that it changes between debug and
release builds -- between optimization levels of the compiler -- is
neither surprising nor considered to be a bug.
Yes, it's certainly not the "inner" pointer's job to keep the object
alive. I can't help feeling that it would make sense for the
lifetime of the object pointer variable to be the same as its scope.
(I have no idea if there's any language specification that deals
with this.) If there's any compiler bug, it's the optimization that
shortens the object pointer variable lifetime that's at fault.
In my personal experience, I've had so many problems when using
Leopards GC system that I consider it to be 'unusable'. This isn't
meant as a slight against the GC team, as bolting GC on to a pre-
existing and well establish C system is 'non-trivial', and that's an
understatement. It's also not any single issue or quirk, either, but
the fact that they unintentionally conspire together to make the
problems sum greater than its individual parts.
One recent programming project of mine is TransactionKit (http://transactionkit.sourceforge.net/
), which is a completely lockless multi-reader multi-writer hash
table. Complex lockless data structures and GC go hand in hand as GC
neatly and cleanly solves the sticky issue of when it is safe to
reclaim a piece of memory used to hold the tables data: keeping track
of which threads are referencing a piece of the structure isn't
simple, and freeing allocations while a thread is still using it has
some obvious detrimental consequences.
Bill contacted me and asked about my problems. In the end, I gave it
another go. However, the amount of work I sunk it to making things
work so that it could run for minutes at a time without crashing was
substantial, on the order of a solid week and a half.
Problems ranged from the mundane: The core is written in C for
portability. Pointers are tagged with a #define macro to selectively
add __strong to them, so all the pointers in the C code were
__strong. You'd figure that turning GC on in Xcode and recompiling is
all that it takes, since your pointers are properly marked as
__strong. Nope. Since .c files are C (surprise!), Xcode doesn't pass
the compiler -fobj-gc for these files. However, you can still add
__strong to pointers for 'C' files but it's silently discarded by the
compiler. The end result is that the unintended consequence of all
these interactions is that you can easily be lulled in to thinking .c
files are being properly compiled with GC support, and all your
__strong tags are working. On top of that, single one-shot tests will
almost certainly execute flawlessly as a lot of individual tests
aren't likely to trigger a collection. The chances that you'll 'get
lucky' and have something on the stack 'shadow cover' an allocation
are also pretty high. In my particular case the stress tests caused a
virtually instantaneous crash and it only took a few minutes to figure
out why. The answer is that you can't add -fobjc-gc to .c files as
that just results in a warning, you need to tell the compiler to treat
the .c file as an objective-c .m file.
Other problems included a bug in the GC atomic CAS pointer swap
function. If you call objc_atomicCompareAndSwapGlobalBarrier() and
the address where the CAS will take place is on the stack, libauto
will fault as soon as that thread exits. The reason it faults is that
it attempts to read something from the old threads stack, but the
threads stack has already been unmapped by the system and no longer
exists.
Another quirk is that the compiler doesn't warn you when you
'downcast' a __strong qualified pointer to a non-__strong qualified
pointer. I spent countless hours hunting down a single missed
__strong qualification that was causing random crashes. And the
effects of missing a __strong qualification are completely
unpredictable: some times the test suit would complete without
crashing (taking a few minutes), other times it would crashing right
after starting up. The reason is because missing the __strong
qualification turns it in to a race condition bug: some times you get
lucky and never run in to the conditions which force it to become a
problem, other times you run smack in to it.
Considering the fact that these tests were heavily multi-threaded and
purpose built to stress test short lived objects that were handed off
between threads via a central hash table makes you wonder what would
happen under less demanding conditions. The window in which there is
a problem is often very small, and the vast majority of code is not
going to trigger a condition that will cause the GC system to perform
a collection during these windows.
Once I got things to the point where they were largely stable, I then
had another 'problem'. The performance numbers that the GC version
was returning were dramatically less than the non-GC version, on the
order of five to ten times slower. After a bit of digging the cause
of the performance problem became obvious. To understand why requires
an understanding of how the compiler implements write barriers for
__strong pointers.
Code that is emitted by the compiler for a pointer assignment usually
maps down to a single 'store' like instruction, and for the purposes
of this analogy it's a close enough approximation. Leopards GC system
uses write-barriers to keep the collector informed of changes in the
heap related to pointers, and thus liveness. This requires altering
the compiler and intercepting any 'simple single instruction stores'
with a function call to objc_assign_strongCast (or one of the other
assign functions) that performs the actual store and updates the
collector.
The performance killer isn't the function call per se, it's the fact
that in order to be multi-threading safe, 'updating the collector'
means acquiring the GC mutex spin lock. Therefore what was one a
single instruction to store a pointer without GC turns in to a
function call that acquires and releases a OSSpinLock, and that
doesn't even include any of the 'update' work. The compiler manages
to avoid this if it can reason that the address that holds the pointer
is guaranteed to be on the stack. Such assurances are hard to come by
when crossing function boundaries, so the worst has to be assumed.
Anyone who is using multi-threading for performance reasons and using
Leopards GC to simplify the complicated allocation tracking issues
that pop up in multi-threading should carefully consider how much each
thread is performing write barrier pointer stores because it can very
quickly become the primary bottleneck limiting concurrency. Only a
single thread in an app can be performing a __strong pointer store,
all other threads will block.
It's been said that Leopards GC system is 'mostly for Objective-C
objects', with the implication that it's not really meant to be a
replacement for malloc() and friends. At the end of all my Leopard GC
work, however, it was really driven home just how little use I have
for garbage collecting 'Objective-C Objects'. Virtually all of my
memory management woes are caused by malloc(), even in 100% Cocoa
application programming. I rarely, if ever, have allocation problems
using the autorelease/retain/release methodology. There's some use
cases that it doesn't work well in (circular references), but they're
usually not show stoppers. Using malloc() allocations correctly, on
the other hand, is still a pain in the ass, just as its always been.
Even GC allocations from NSAllocateCollectable() are essentially
useless because it requires anything that touches the allocation to
properly __strong qualify its usage, which in practice is pretty much
impossible to achieve. On top of that, handing any GC pointer to a
'plain C' function or library is virtually guaranteed to cause you
nothing but pain and heartache at some point.
In any case, you need to make sure that <data> stays alive
throughout the entire lifespan of the bytes pointer.
But the puzzling question is: how? Anything involving stack
variables could potentially be optimized away. I could move the
collectable object pointer into a global or instance variable, but
that solution comes unglued as soon as I hit a case where the
containing method is called recursively. I suppose a file static
NSMutableSet temporarily holding collectable objects would work.
Even a translation unit static global might not save you. Since the
compiler has at its disposal all possible uses of that variable, it
can potentially eliminate it if it can reason that its safe to do so,
such as only being used in a single function.
Speaking from (painful, many hours of debugging) experience, using
Leopards GC system actually requires a substantial shift in how you
program if you want things to work correctly in a deterministic
manner. Once you know what to look for and start looking, you start
to realize that there are some pretty serious holes in Leopards GC
system.
The fact of the matter is that under GC, void * is not necessarily the
same as a different void *. Under C, any qualified (ie volatile,
restrict, etc) pointer of any type is the same as a similarly
qualified void * pointer, and assignments to and from a like qualified
void * pointer can be done without loss. Assignments from different
qualified pointers will at least raise a warning, and possibly an
error depending on the specific use details. The logic behind this is
self-evident. However, because __strong was implemented as a GCC
__attribute__(()), it does not fall under the same rules. Therefore,
it's just fine to assign a __strong void * pointer to a plain void *
pointer. The consequences of dropping the __strong qualification can
potentially be disastrous, though, because the compiler will no longer
protect assignments/stores of the new pointer with a write barrier.
Technically, type qualifiers / storage class specifiers like volatile,
restrict, auto, and register (good ole' forgotten register, does
anyone use you any more?) are often just 'hints' to the compiler which
may be ignored by the compilers implementation (though volatile was
tightened up in C99, I think it's still 'implementation dependent').
__strong, on the other hand, is not a 'hint', and ignoring or dropping
it can result in non-deterministic and incorrect program execution
behavior. It only takes a single 10 hour long debugging session of
tracking down a single, silently discarded __strong pointer downcast
that causes impossible to replicate random crashes to question the
wisdom of this default behavior.
Consider the following:
{
char *ptr = NULL;
{
NSMutableData *data = [NSMutableData dataWithLength:4096];
ptr = [data mutableBytes];
/* force collection */
ptr[0] = 'Z';
}
ptr[1] = 0;
}
and
{
char *ptr = NULL;
{
ptr = [[NSMutableData dataWithLength:4096] mutableBytes];
/* force collection */
ptr[0] = 'Z';
}
ptr[1] = 0;
}
Technically, these two examples are identical. The first examples
declaration of 'NSMutableData *data' is really a symbolic declaration
for our benefit. In practice, these 'automatic storage declaration
variables' wind up as a spot on the stack for 99% of the C compilers
out there (the only examples that readily come to mind where this
isn't true is 'small device / embedded' C compilers where a stack is
often an expensive luxury). As you pointed out, and because of the
fact that these two fragments of code are identical, the optimizer can
convert the first example in to the second example. In an overly
pedantic technical sense, there is no requirement in the C language
specs that says that block local 'auto' variables must reside on the
stack. Searching ISO/IEC 9899:TC2 / C99 spec for 'stack' turns up
zero matches.
Thus one is stuck with a GC paradox: What's to stop the collector
from collecting the anonymous NSMutableData if there is no pointer to
it, either in the heap or the stack? Nothing. If the collector runs
between the point where we assign ptr and we're done with it, ptr will
no longer be pointing to a valid allocation.
Now, typical usage patterns are such that the collector will very,
very rarely spool up between the point in time where we get the
'mutableBytes' pointer and the point in time where we're finished with
it. The dominate usage pattern for allocations is one shot, short
lived, very much like the transient life time of the pointer to
[NSMutableData dataWithLength:]. Because of this, there is rarely any
problems in practice, but that's a far cry from 'impossible by design'.
What about the 'char *ptr' pointer, though? Well, if NSMutableData
happens to get the backing allocation from the collector, say from
NSAllocateCollectable(), then you'll likely manage to avoid the
collector reclaiming the allocation. However, an important caveat to
that is 'and the compiler allocated space on the stack in to which it
stored the pointer'. As has already been shown, transformations by
the optimizer can render even that assumption moot. You're pretty
much screwed if NSMutableData is using a malloc() pointer that it
free()s when -(void)finalize is called, though.
To add to the list of problems, consider the case of calling a
standard C library function on the powerpc architecture that is passed
a GC pointer. The standard assumption is that any such pointer is
protected because 'everything on the stack is considered live'. The
problem is, you have no idea what happens to that pointer once a
function is called. The fact of the matter is that under the powerpc
ABI, the vast majority of function calls will have all their arguments
passed via registers. Consider the following:
BOOL matchByRegex(char *regex, char *bufferToScan, size_t bufferLength);
The first that matchByRegex does is call the regex libraries regex
creation function with the regex argument. An opaque pointer that
contains the created regex is returned, and then the buffer is
scanned. Once a result is determined, the created regex is released
via the libraries provided function, and the final result is returned.
Then, we do the following:
BOOL matched = matchByRegex([[NSString stringWithFormat:@"(one|two|%s|
%d)", ptr, num] UTF8String], buffer, length);
This is called from a heavily multithreaded app that is making quite a
few allocations from the memory system while this is taking place.
Without GC, things run just fine. With GC turned on, the program
occasionally crashes.
The reason? Because the pointer to UTF8String is passed via a
register, it never reaches the stack in any meaningful way. Inside
the regex library there is a function that mallocs the struct that
contains all the nitty gritty details and copies the UTF8String
pointer to that struct. Thus, while the pointer got spilled to the
stack at some point, those frames have long since popped by the time
the heavy lifting gets under way of compiling the regex from the
UTF8String pointer, which now exists solely in a chunk of unscanned
heap memory. For extra fun, compiling with -g 'fixes' the problem
because it forces the argument to be saved in the parameter area of
the stack to aid in debugging (lets you see what the arguments were
even after the original value in the register has been squashed).
Depending on the stack to "keep a pointer live" is pretty dubious at
best. It's surprisingly easy to violate this in practice, ESPECIALLY
on the powerpc architecture where arguments are passed via registers
most of the time. Called functions/methods are under no obligation to
spill those arguments to the stack, and such an argument might be live
for only a few dozen instructions in a register before its snuffed
out. The language itself makes zero guarantees that the pointer
returned by UTF8String will be ever be stored on the stack (or for
that matter that the stack will be used for block local auto variables
at all), regardless of how it was declared. Unlike NSMutableData,
keeping the root NSString around doesn't extend its liveness because
it's "a null-terminated UTF8 representation of the receiver."
Using Leopards GC system correctly requires quite a bit of discipline
and nearly constant vigilance of 'pointer visibility' on the
programmers part. There is also no practical way to track pointer
dependencies under the current system as far as I know, such as
realizing that the pointer from mutableBytes being live implies that
the NSMutableData that returned it must still be live as well. The
only solution I've found is constantly asking "Ok, what if.." and then
defensively coding to prevent the possibility. This usually means you
must carefully and meticulously structure things in such way that a GC
pointer (including its roots!) is guaranteed to be visible even in the
face of aggressive compiler optimizations. This is non-trivial. You
can even be forced in to doing goofy things like creating a
NSMutableSet object at the start of a function and adding objects you
need to remain live along the way. Then some kind of 'dummy' call is
added at the end of the function so the holding object remains live
right up to the end and the optimizer can't optimize it away or
shorten its liveness.
In the case of UTF8String, the only way that I can think of to
universally and deterministically prevent the allocation from being
snatched from under you is to basically force the collector to shut
down, get the UTF8String pointer, make a copy to a malloc() based
allocation, then let the collector start up again. Or, if you can see
that its liveness is just a few lines long, then just bracket it with
calls to stop and start the collector. Sometimes you can get lucky
and just tag a pointer with __strong, which seems to work pretty
consistently for char * pointers returned by UTF8String, but this
makes use of 'undocumented side effect' behavior. Regardless, by
using the GC system you sign yourself up to constantly anticipating
and guarding against loss of pointer visibility, with the worst
offenders being "non object" pointers like UTF8String and mutableBytes.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden