Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
- Subject: Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
- From: Alastair Houghton <email@hidden>
- Date: Mon, 4 Feb 2008 13:11:59 +0000
On 4 Feb 2008, at 01:57, John Engelhart wrote:
I've had several reservations about Leopard's GC system since I
started working with it. There is very little documentation on
Leopards GC system, so the following has been pieced together by
inference and observations of how the garbage collection system
seems to work. My first concern was with the use of "compiler
assisted write barriers". The current public documentation is
extremely vague as to what a 'write barrier' is,
[snip]
From what I can tell, the term 'write barrier' as it is used by the
GC documentation has absolutely nothing to do with this traditional
meaning of the term.
The meaning of "write barrier" in this context is the traditional one
in the world of garbage collection, which has been around a lot longer
than other meaning. It's certainly traditional though; e.g. see
<ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps>
or the excellent book "Garbage Collection: Algorithms for Automatic
Dynamic Memory Management" by Jones and Lins (Wiley 1997, ISBN
0-471-94148-4 <http://www.amazon.co.uk/dp/0471941484>).
The GC docs actually explain what the write barrier is used for here:
<http://developer.apple.com/documentation/Cocoa/Conceptual/GarbageCollection/Articles/gcArchitecture.html#//apple_ref/doc/uid/TP40002451-SW4
>
Anyone who's used garbage collection with C is probably familiar
with the Boehm Garbage Collector. I believe that the Boehm GC
library embodies what most people would expect of a garbage
collection system: The programmer is freed from having to worry
about memory allocations
[snip]
It makes no particular demands of the programmer or compiler, in
fact it can be used as a drop in replacement for malloc() and
free(), requiring no changes.
From what I've pieced together, Leopards GC system is nothing like
this. While the Boehm GC system detects liveness passively by
scanning memory and looking for and tracing pointers, Leopards GC
system does no scanning and requires /active/ notification of
changes to the heap. This, I believe, is what a 'write-barrier'
actually is: it is a function call to the GC system so that it can
update it's internal state as to what memory is live. It relies, I
suspect exclusively, on these function calls to track memory
allocations.
The Boehm GC and the Leopard Cocoa GC have very different design
goals. In the case of Boehm's collector, it's a requirement that the
collector work without any assistance from the compiler; as a result,
it has to use "conservative" techniques, which may in general result
in leaks of arbitrary amounts of memory simply because of a stray
value that *looks like* a pointer to something. The lack of compiler
assistance means that it's almost impossible to write a collector that
will run in the background (the Boehm collector has to stop *all* the
other threads in your program every so often if you run it in the
background), and it's difficult to implement generational behaviour
without relying on platform-specific features such as access to dirty
bits from the system page table... Even in that case, use of dirty
bits is woefully inefficient compared to compiler co-operation, since
a single dirty bit means you must re-scan an entire page of memory.
The Boehm GC is very clever, certainly, but it has to cope with these
limitations (and more besides).
Cocoa GC, on the other hand, is able to co-operate with the compiler,
and that's what the write barriers are. You have mis-interpreted
their function; they exist to track inter-generational pointers, not
to enable some sort of behind-the-scenes reference counting as I think
you imply. They may also be used to help the collector to obtain a
consistent view of the mutator's objects in spite of running in the
background... I don't know whether the Leopard GC does that or not.
(Incidentally, there is also a read barrier, which is used to help
implement zeroing weak references; the compiler only generates that
for variables marked __weak.)
I think, perhaps, that it would be worth your while reading through
the literature on garbage collection, as you might then understand the
various trade-offs involved better.
In order for leopards GC system to function properly, the compiler
must be aware of all pointers that have been allocated by the GC
system so that it can wrap all uses of the pointer with the
appropriate GC notification functions (objc_assign*).
Yep.
[snip]
Realistically, to properly add __strong to a pointer, you need to
know if that allocation came from the garbage collector. This
information is essentially impossible to know apriori, so the only
practical course of action is to defensively qualify all pointers as
__strong.
No. Cocoa GC mostly deals with objects (which may include Core
Foundation objects). That's why the default assumption, which is that
object pointers are strong, is enough for most situations.
That only changes if you have pointers of non-object types that happen
to point to things that were allocated with the GC, *and only then* if
they are stored in locations that are not scanned by default. This is
an unusual situation, since few methods return things that are
allocated by GC and that are not objects. -UTF8String is probably the
most common example, but since you tend not to store the result of
that method, there would rarely---if ever---be a problem.
The consequence of using a pointer that is not properly qualified as
__strong is that the GC system may determine that the allocation is
no longer live and reclaim it, even if there is still a valid
pointer out there.
Only if there is no copy of the pointer in any of the locations that
are scanned by default (e.g. the stack, in registers, in global
variables).
It is also trivial to get wrong, and the only indication that
there's a problem is an occasional random error or crash.
In most cases, because GC'd things are objects, it's trivial to get
*right*.
It's only in special cases, where you're using C pointer types to
point to GC'd memory, that you need worry about this kind of thing.
I believe I have a succinct example that illustrates these issues:
[snip]
I strongly suspect the pointer that UTF8String returns is a pointer
to an allocation from the garbage collector. In fact, by changing
the 'title' ivar to include __strong 'solves' the problem.
Yes, that's your bug. It doesn't just 'solve' the problem, the lack
of __strong here *is* the problem, but only because this is an ivar
and not e.g. a function argument or a stack-based variable.
But this points to a much bigger problem: anyone who has used
UTF8String and not qualified it as __strong has a race condition
just waiting to happen.
No, because stack variables and registers are included in the set of
GC roots.
This is but one example. I don't think I need to point out that
there are others. A lot of others. And most of them are non-
obvious. A consequence of all of this is that you must not pass
pointers that may have been allocated by the garbage collector to
any C function in a library. For example,
printf("String: %s\n", [@"Hello, world!" UTF8String]);
That code is fine. The reference is on the stack (or, before that, in
the register that holds the return value of -UTF8String). It will be
followed, so the memory won't be released until the printf() function
has finished with it.
passes a GC allocated pointer to a C library function, which almost
assuredly does not have the proper write barrier logic in place to
properly guard the pointer.
The write barrier is nothing to do with it. The write barrier is for
inter-generational pointers, and possibly also to help the collector
to scan in the background safely.
Kind regards,
Alastair.
--
http://alastairs-place.net
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden