Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
- Subject: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
- From: John Engelhart <email@hidden>
- Date: Sun, 3 Feb 2008 20:57:29 -0500
This is bound to be an inflammatory subject. That is not my intent,
and I mean no disrespect to the programmers who worked on the GC
system. I'm quite sure that adding GC to Cocoa is a non-trivial, near
impossible task, filled with trade-offs between "really bad" and "even
worse." Also understand that a bit of 'authoritative' documentation
or instruction can out right negate some of the points I make below as
I have had only the publicly available documentation and my
(relatively brief 2-3 months) experiences with the 10.5 GC system to
form these opinions....
I've had several reservations about Leopard's GC system since I
started working with it. There is very little documentation on
Leopards GC system, so the following has been pieced together by
inference and observations of how the garbage collection system seems
to work. My first concern was with the use of "compiler assisted
write barriers". The current public documentation is extremely vague
as to what a 'write barrier' is, and I'm sure that the majority of
you, like me, assumed the term referred to an "atomic write barrier /
fence" used to ensure that all CPU's past the write barrier would see
the same data at a given location. See `man 3 barrier` for a
description of the OSMemoryBarrier() function that performs this
operation. This would make some sense for a GC system, it would ensure
that the use of a pointer is visible to the collector no matter what
thread or CPU is using the pointer. From what I can tell, the term
'write barrier' as it is used by the GC documentation has absolutely
nothing to do with this traditional meaning of the term.
Anyone who's used garbage collection with C is probably familiar with
the Boehm Garbage Collector. I believe that the Boehm GC library
embodies what most people would expect of a garbage collection
system: The programmer is freed from having to worry about memory
allocations and when or where pointers to allocated memory are, it's
the collectors job to find those pointers and piece together what
memory is still pointed to by active pointers and reclaim the memory
which has no live pointers referencing it. Very roughly, it does this
by starting with a collection of root allocations. It scans these
allocations looking for pointers, and then following those pointers
and scanning those blocks of memory. It builds a graph of references
from these root objects, and when all the memory has been scanned,
memory allocations that are not part of this 'liveness' graph can be
reclaimed. It makes no particular demands of the programmer or
compiler, in fact it can be used as a drop in replacement for malloc()
and free(), requiring no changes.
From what I've pieced together, Leopards GC system is nothing like
this. While the Boehm GC system detects liveness passively by
scanning memory and looking for and tracing pointers, Leopards GC
system does no scanning and requires /active/ notification of changes
to the heap. This, I believe, is what a 'write-barrier' actually is:
it is a function call to the GC system so that it can update it's
internal state as to what memory is live. It relies, I suspect
exclusively, on these function calls to track memory allocations.
If this is indeed the case, it's my opinion that the 10.5 GC system is
fundamentally and fatally flawed. In fact, its use should be actively
discouraged. I'll now outline the reasoning behind this, including an
example that highlights the magnitude of the problem.
This would explain the need for 'dual mode' frameworks, and that an
application that uses GC must be linked to frameworks that are all GC
capable. This is because a non-GC framework would not actively inform
the GC system of its use of pointers, leading to random crashes and
what not as the GC system reclaimed memory that was actively in use.
In order for leopards GC system to function properly, the compiler
must be aware of all pointers that have been allocated by the GC
system so that it can wrap all uses of the pointer with the
appropriate GC notification functions (objc_assign*). Note that this
is subtly different that the definitions and examples used in 'Garbage
Collection Programming Guide'. From 'Garbage Collection Programming
Guide', 'Language Support':
__strong
Specifies a reference that is visible to (followed by) the garbage
collector (see “How the Garbage Collector Works”).
__strong modifies an instance variable or struct field declaration to
inform the compiler to unconditionally issue a write-barrier to write
to memory. __strong is implicitly part of any declaration of an
Objective-C object reference type. You must use it explicitly if you
need to use Core Foundation types, void *, or other non-object
references (__strong modifies pointer assignments, not scalar
assignments).
----
This is a deceptive description. /ANY/ pointer that holds a pointer
to memory that MAY be allocated from the garbage collector must be
marked __strong. The compiler attempts to 'automagically' add
__strong to certain types of pointer references, specifically 'id' and
derivatives of 'id', namely class pointers (NSString *).
Realistically, to properly add __strong to a pointer, you need to know
if that allocation came from the garbage collector. This information
is essentially impossible to know apriori, so the only practical
course of action is to defensively qualify all pointers as __strong.
The consequence of using a pointer that is not properly qualified as
__strong is that the GC system may determine that the allocation is no
longer live and reclaim it, even if there is still a valid pointer out
there. Therefore, all pointer references which have the possibility
of referencing an allocation from the garbage collection system must
treat that pointer as __strong. If any piece of code, at any level,
at any point in time fails to satisfy this condition, you are in for a
world of hurt. The fact of the matter is that, for all practical
purposes, it is impossible to guarantee this. It is also trivial to
get wrong, and the only indication that there's a problem is an
occasional random error or crash. Most of the time things will work,
but every once in awhile... and these 'bugs' are virtually impossible
to track down. (In fact, this message is the result of having to
track down Yet Another GC Problem where something, somewhere, did
something wrong... maybe).
I believe I have a succinct example that illustrates these issues:
----
#import <Foundation/Foundation.h>
@interface GCTest : NSObject {
const char *title;
};
- (void)setTitle:(const char *)newTitle;
- (const char *)title;
@end
@implementation GCTest
- (void)setTitle:(const char *)newTitle
{
printf("Setting title. Old title: %p, new title %p = '%s'\n",
title, newTitle, newTitle);
title = newTitle;
}
- (const char *)title
{
return title;
}
@end
int main(int argc, char *argv[]) {
GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
gcConstTitle = [[GCTest alloc] init];
gcUTF8Title = [[GCTest alloc] init];
[gcConstTitle setTitle:"Hello, world!"];
[gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
\xC2\xA1"] UTF8String]];
[[NSGarbageCollector defaultCollector] collectExhaustively];
NSLog(@"GC test");
printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
[gcConstTitle title]);
printf("gcUTF8Title title: %p = '%s'\n", [gcUTF8Title title],
[gcUTF8Title title]);
return(0);
}
----
[johne@LAPTOP_10_5] GC% gcc -framework Foundation -fobjc-gc-only gc.m -
o gc
[johne@LAPTOP_10_5] GC% ./gc
Setting title. Old title: 0x0, new title 0x1ed4 = 'Hello, world!'
Setting title. Old title: 0x0, new title 0x1011860 = 'Hello, world¡'
2008-02-03 19:07:58.911 gc[6191:807] GC test
gcConstTitle title: 0x1ed4 = 'Hello, world!'
gcUTF8Title title: 0x1011860 = '??0?" '
[johne@LAPTOP_10_5] GC%
The problem is with the pointer returned by UTF8String. From
NSString.h:
- (const char *)UTF8String; // Convenience to return null-terminated
UTF8 representation
I strongly suspect the pointer that UTF8String returns is a pointer to
an allocation from the garbage collector. In fact, by changing the
'title' ivar to include __strong 'solves' the problem.
And herein lies the reason why I believe Leopards GC system is
fundamentally and fatally flawed, and should in fact not be used at
all. There are several possible 'solutions' to this, but you'd better
get it right or you're going to be stuck with race conditions of the
most insidious nature imaginable. Adding fuel to the fire, it's not
clear what the 'right' solution is, or if there even is one.
One might argue that, per the __strong documentation, the ivar
requires the __strong type qualifier. This is, at best, non-obvious,
and considering that the documentation makes references to 'objects'
almost exclusively, one can also argue that this pointer does not
qualify. But this points to a much bigger problem: anyone who has
used UTF8String and not qualified it as __strong has a race condition
just waiting to happen. This is also not a problem that can be fixed
with a patch to Foundation in the next Mac OS X version- every program
that has not qualified their use of UTF8String with __strong must be
recompiled and re-released as there is nothing a shared library fix
can do about this. Add to this the fact that the published
documentation is essentially silent on the topic and offers no
guidance. In fact, it's possible that adding __strong to the 'title'
ivar is just an observable side effect of something else that seems to
fix the problem. I'm not sure what you'd do in that case because at
that point just calling methods that return a pointer that you need
becomes an exercise in luck and race conditions.
This is but one example. I don't think I need to point out that there
are others. A lot of others. And most of them are non-obvious. A
consequence of all of this is that you must not pass pointers that may
have been allocated by the garbage collector to any C function in a
library. For example,
printf("String: %s\n", [@"Hello, world!" UTF8String]);
passes a GC allocated pointer to a C library function, which almost
assuredly does not have the proper write barrier logic in place to
properly guard the pointer. This example is innocent enough, and
likely to work due to its short lived nature, but it's easy to think
of examples where the pointer passed to a C function, say an SQLite3
call, can cause no end of problems if that pointer happens to be
reclaimed in the middle of the function call.
This is the basis for my opinion that the 10.5 GC should not be used.
In order to properly use the GC system, one must guarantee that all
uses of GC allocated pointers have compiler assisted write barrier
logic. This is beyond non-trivial in practice as the passing of
pointers is part of most functions calls. Those functions call other
functions, and at some point that pointer is likely to pass through a
C library function.
Since Leopards GC system places the burden of keeping the state of the
GC system up to date on to the compiler, and in turn to every line of
code that uses a pointer, this increases the possible locations for GC
bugs to every single pointer using line of code. There's a
considerable amount of code that's been added to GCC to facilitate all
of this, and bugs and code being what they are, there's bound to be
bugs in there. Code compiled with those bugs is frozen, the only way
to fix it is to recompile. This means that anyone, /anyone/, who
created GC enabled code needs to recompile their code in order to
receive the bug fix. This is an unalterable consequence of the
decision to move the GC logic in to the compiler._______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden