Re: Garbage collector vs variable lifetime

Subject: Re: Garbage collector vs variable lifetime
From: John Engelhart <email@hidden>
Date: Sat, 7 Jun 2008 11:03:37 -0400


On Jun 6, 2008, at 7:27 PM, Quincey Morris wrote:

On Jun 6, 2008, at 15:48, Bill Bumgarner wrote:
The garbage collector does not currently interpret inner pointers as something that can keep an encapsulating object alive. Thus, the behavior is undefined and that it changes between debug and release builds -- between optimization levels of the compiler -- is neither surprising nor considered to be a bug.
Yes, it's certainly not the "inner" pointer's job to keep the object alive. I can't help feeling that it would make sense for the lifetime of the object pointer variable to be the same as its scope. (I have no idea if there's any language specification that deals with this.) If there's any compiler bug, it's the optimization that shortens the object pointer variable lifetime that's at fault.

In my personal experience, I've had so many problems when using Leopards GC system that I consider it to be 'unusable'. This isn't meant as a slight against the GC team, as bolting GC on to a pre- existing and well establish C system is 'non-trivial', and that's an understatement. It's also not any single issue or quirk, either, but the fact that they unintentionally conspire together to make the problems sum greater than its individual parts.

One recent programming project of mine is TransactionKit (http://transactionkit.sourceforge.net/ ), which is a completely lockless multi-reader multi-writer hash table. Complex lockless data structures and GC go hand in hand as GC neatly and cleanly solves the sticky issue of when it is safe to reclaim a piece of memory used to hold the tables data: keeping track of which threads are referencing a piece of the structure isn't simple, and freeing allocations while a thread is still using it has some obvious detrimental consequences.

Bill contacted me and asked about my problems. In the end, I gave it another go. However, the amount of work I sunk it to making things work so that it could run for minutes at a time without crashing was substantial, on the order of a solid week and a half.

Problems ranged from the mundane: The core is written in C for portability. Pointers are tagged with a #define macro to selectively add __strong to them, so all the pointers in the C code were __strong. You'd figure that turning GC on in Xcode and recompiling is all that it takes, since your pointers are properly marked as __strong. Nope. Since .c files are C (surprise!), Xcode doesn't pass the compiler -fobj-gc for these files. However, you can still add __strong to pointers for 'C' files but it's silently discarded by the compiler. The end result is that the unintended consequence of all these interactions is that you can easily be lulled in to thinking .c files are being properly compiled with GC support, and all your __strong tags are working. On top of that, single one-shot tests will almost certainly execute flawlessly as a lot of individual tests aren't likely to trigger a collection. The chances that you'll 'get lucky' and have something on the stack 'shadow cover' an allocation are also pretty high. In my particular case the stress tests caused a virtually instantaneous crash and it only took a few minutes to figure out why. The answer is that you can't add -fobjc-gc to .c files as that just results in a warning, you need to tell the compiler to treat the .c file as an objective-c .m file.

Other problems included a bug in the GC atomic CAS pointer swap function. If you call objc_atomicCompareAndSwapGlobalBarrier() and the address where the CAS will take place is on the stack, libauto will fault as soon as that thread exits. The reason it faults is that it attempts to read something from the old threads stack, but the threads stack has already been unmapped by the system and no longer exists.

Another quirk is that the compiler doesn't warn you when you 'downcast' a __strong qualified pointer to a non-__strong qualified pointer. I spent countless hours hunting down a single missed __strong qualification that was causing random crashes. And the effects of missing a __strong qualification are completely unpredictable: some times the test suit would complete without crashing (taking a few minutes), other times it would crashing right after starting up. The reason is because missing the __strong qualification turns it in to a race condition bug: some times you get lucky and never run in to the conditions which force it to become a problem, other times you run smack in to it.

Considering the fact that these tests were heavily multi-threaded and purpose built to stress test short lived objects that were handed off between threads via a central hash table makes you wonder what would happen under less demanding conditions. The window in which there is a problem is often very small, and the vast majority of code is not going to trigger a condition that will cause the GC system to perform a collection during these windows.

Once I got things to the point where they were largely stable, I then had another 'problem'. The performance numbers that the GC version was returning were dramatically less than the non-GC version, on the order of five to ten times slower. After a bit of digging the cause of the performance problem became obvious. To understand why requires an understanding of how the compiler implements write barriers for __strong pointers.

Code that is emitted by the compiler for a pointer assignment usually maps down to a single 'store' like instruction, and for the purposes of this analogy it's a close enough approximation. Leopards GC system uses write-barriers to keep the collector informed of changes in the heap related to pointers, and thus liveness. This requires altering the compiler and intercepting any 'simple single instruction stores' with a function call to objc_assign_strongCast (or one of the other assign functions) that performs the actual store and updates the collector.

The performance killer isn't the function call per se, it's the fact that in order to be multi-threading safe, 'updating the collector' means acquiring the GC mutex spin lock. Therefore what was one a single instruction to store a pointer without GC turns in to a function call that acquires and releases a OSSpinLock, and that doesn't even include any of the 'update' work. The compiler manages to avoid this if it can reason that the address that holds the pointer is guaranteed to be on the stack. Such assurances are hard to come by when crossing function boundaries, so the worst has to be assumed.

Anyone who is using multi-threading for performance reasons and using Leopards GC to simplify the complicated allocation tracking issues that pop up in multi-threading should carefully consider how much each thread is performing write barrier pointer stores because it can very quickly become the primary bottleneck limiting concurrency. Only a single thread in an app can be performing a __strong pointer store, all other threads will block.

It's been said that Leopards GC system is 'mostly for Objective-C objects', with the implication that it's not really meant to be a replacement for malloc() and friends. At the end of all my Leopard GC work, however, it was really driven home just how little use I have for garbage collecting 'Objective-C Objects'. Virtually all of my memory management woes are caused by malloc(), even in 100% Cocoa application programming. I rarely, if ever, have allocation problems using the autorelease/retain/release methodology. There's some use cases that it doesn't work well in (circular references), but they're usually not show stoppers. Using malloc() allocations correctly, on the other hand, is still a pain in the ass, just as its always been. Even GC allocations from NSAllocateCollectable() are essentially useless because it requires anything that touches the allocation to properly __strong qualify its usage, which in practice is pretty much impossible to achieve. On top of that, handing any GC pointer to a 'plain C' function or library is virtually guaranteed to cause you nothing but pain and heartache at some point.

In any case, you need to make sure that <data> stays alive throughout the entire lifespan of the bytes pointer.
But the puzzling question is: how? Anything involving stack variables could potentially be optimized away. I could move the collectable object pointer into a global or instance variable, but that solution comes unglued as soon as I hit a case where the containing method is called recursively. I suppose a file static NSMutableSet temporarily holding collectable objects would work.

Even a translation unit static global might not save you. Since the compiler has at its disposal all possible uses of that variable, it can potentially eliminate it if it can reason that its safe to do so, such as only being used in a single function.

Speaking from (painful, many hours of debugging) experience, using Leopards GC system actually requires a substantial shift in how you program if you want things to work correctly in a deterministic manner. Once you know what to look for and start looking, you start to realize that there are some pretty serious holes in Leopards GC system.

The fact of the matter is that under GC, void * is not necessarily the same as a different void *. Under C, any qualified (ie volatile, restrict, etc) pointer of any type is the same as a similarly qualified void * pointer, and assignments to and from a like qualified void * pointer can be done without loss. Assignments from different qualified pointers will at least raise a warning, and possibly an error depending on the specific use details. The logic behind this is self-evident. However, because __strong was implemented as a GCC __attribute__(()), it does not fall under the same rules. Therefore, it's just fine to assign a __strong void * pointer to a plain void * pointer. The consequences of dropping the __strong qualification can potentially be disastrous, though, because the compiler will no longer protect assignments/stores of the new pointer with a write barrier. Technically, type qualifiers / storage class specifiers like volatile, restrict, auto, and register (good ole' forgotten register, does anyone use you any more?) are often just 'hints' to the compiler which may be ignored by the compilers implementation (though volatile was tightened up in C99, I think it's still 'implementation dependent'). __strong, on the other hand, is not a 'hint', and ignoring or dropping it can result in non-deterministic and incorrect program execution behavior. It only takes a single 10 hour long debugging session of tracking down a single, silently discarded __strong pointer downcast that causes impossible to replicate random crashes to question the wisdom of this default behavior.

Consider the following:

{
  char *ptr = NULL;
  {
    NSMutableData *data = [NSMutableData dataWithLength:4096];
    ptr = [data mutableBytes];
    /* force collection */
    ptr[0] = 'Z';
  }
  ptr[1] = 0;
}

and

{
  char *ptr = NULL;
  {
    ptr = [[NSMutableData dataWithLength:4096] mutableBytes];
    /* force collection */
    ptr[0] = 'Z';
  }
  ptr[1] = 0;
}

Technically, these two examples are identical. The first examples declaration of 'NSMutableData *data' is really a symbolic declaration for our benefit. In practice, these 'automatic storage declaration variables' wind up as a spot on the stack for 99% of the C compilers out there (the only examples that readily come to mind where this isn't true is 'small device / embedded' C compilers where a stack is often an expensive luxury). As you pointed out, and because of the fact that these two fragments of code are identical, the optimizer can convert the first example in to the second example. In an overly pedantic technical sense, there is no requirement in the C language specs that says that block local 'auto' variables must reside on the stack. Searching ISO/IEC 9899:TC2 / C99 spec for 'stack' turns up zero matches.

Thus one is stuck with a GC paradox: What's to stop the collector from collecting the anonymous NSMutableData if there is no pointer to it, either in the heap or the stack? Nothing. If the collector runs between the point where we assign ptr and we're done with it, ptr will no longer be pointing to a valid allocation.

Now, typical usage patterns are such that the collector will very, very rarely spool up between the point in time where we get the 'mutableBytes' pointer and the point in time where we're finished with it. The dominate usage pattern for allocations is one shot, short lived, very much like the transient life time of the pointer to [NSMutableData dataWithLength:]. Because of this, there is rarely any problems in practice, but that's a far cry from 'impossible by design'.

What about the 'char *ptr' pointer, though? Well, if NSMutableData happens to get the backing allocation from the collector, say from NSAllocateCollectable(), then you'll likely manage to avoid the collector reclaiming the allocation. However, an important caveat to that is 'and the compiler allocated space on the stack in to which it stored the pointer'. As has already been shown, transformations by the optimizer can render even that assumption moot. You're pretty much screwed if NSMutableData is using a malloc() pointer that it free()s when -(void)finalize is called, though.

To add to the list of problems, consider the case of calling a standard C library function on the powerpc architecture that is passed a GC pointer. The standard assumption is that any such pointer is protected because 'everything on the stack is considered live'. The problem is, you have no idea what happens to that pointer once a function is called. The fact of the matter is that under the powerpc ABI, the vast majority of function calls will have all their arguments passed via registers. Consider the following:

BOOL matchByRegex(char *regex, char *bufferToScan, size_t bufferLength);

The first that matchByRegex does is call the regex libraries regex creation function with the regex argument. An opaque pointer that contains the created regex is returned, and then the buffer is scanned. Once a result is determined, the created regex is released via the libraries provided function, and the final result is returned.

Then, we do the following:

BOOL matched = matchByRegex([[NSString stringWithFormat:@"(one|two|%s| %d)", ptr, num] UTF8String], buffer, length);

This is called from a heavily multithreaded app that is making quite a few allocations from the memory system while this is taking place. Without GC, things run just fine. With GC turned on, the program occasionally crashes.

The reason? Because the pointer to UTF8String is passed via a register, it never reaches the stack in any meaningful way. Inside the regex library there is a function that mallocs the struct that contains all the nitty gritty details and copies the UTF8String pointer to that struct. Thus, while the pointer got spilled to the stack at some point, those frames have long since popped by the time the heavy lifting gets under way of compiling the regex from the UTF8String pointer, which now exists solely in a chunk of unscanned heap memory. For extra fun, compiling with -g 'fixes' the problem because it forces the argument to be saved in the parameter area of the stack to aid in debugging (lets you see what the arguments were even after the original value in the register has been squashed).

Depending on the stack to "keep a pointer live" is pretty dubious at best. It's surprisingly easy to violate this in practice, ESPECIALLY on the powerpc architecture where arguments are passed via registers most of the time. Called functions/methods are under no obligation to spill those arguments to the stack, and such an argument might be live for only a few dozen instructions in a register before its snuffed out. The language itself makes zero guarantees that the pointer returned by UTF8String will be ever be stored on the stack (or for that matter that the stack will be used for block local auto variables at all), regardless of how it was declared. Unlike NSMutableData, keeping the root NSString around doesn't extend its liveness because it's "a null-terminated UTF8 representation of the receiver."

Using Leopards GC system correctly requires quite a bit of discipline and nearly constant vigilance of 'pointer visibility' on the programmers part. There is also no practical way to track pointer dependencies under the current system as far as I know, such as realizing that the pointer from mutableBytes being live implies that the NSMutableData that returned it must still be live as well. The only solution I've found is constantly asking "Ok, what if.." and then defensively coding to prevent the possibility. This usually means you must carefully and meticulously structure things in such way that a GC pointer (including its roots!) is guaranteed to be visible even in the face of aggressive compiler optimizations. This is non-trivial. You can even be forced in to doing goofy things like creating a NSMutableSet object at the start of a function and adding objects you need to remain live along the way. Then some kind of 'dummy' call is added at the end of the function so the holding object remains live right up to the end and the optimizer can't optimize it away or shorten its liveness.

In the case of UTF8String, the only way that I can think of to universally and deterministically prevent the allocation from being snatched from under you is to basically force the collector to shut down, get the UTF8String pointer, make a copy to a malloc() based allocation, then let the collector start up again. Or, if you can see that its liveness is just a few lines long, then just bracket it with calls to stop and start the collector. Sometimes you can get lucky and just tag a pointer with __strong, which seems to work pretty consistently for char * pointers returned by UTF8String, but this makes use of 'undocumented side effect' behavior. Regardless, by using the GC system you sign yourself up to constantly anticipating and guarding against loss of pointer visibility, with the worst offenders being "non object" pointers like UTF8String and mutableBytes. _______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



References:  
  >Garbage collector vs variable lifetime (From: Quincey Morris <email@hidden>)
  >Re: Garbage collector vs variable lifetime (From: Bill Bumgarner <email@hidden>)
  >Re: Garbage collector vs variable lifetime (From: Quincey Morris <email@hidden>)




Prev by Date:
Re: knowing when WebView is done

Next by Date:
Re: ArrayController Out of Bounds

Previous by thread:
Re: Garbage collector vs variable lifetime

Next by thread:
Re: Garbage collector vs variable lifetime

Index(es):

Date
Thread