Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
- Subject: Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
- From: Greg Parker <email@hidden>
- Date: Mon, 4 Feb 2008 13:13:30 -0800
John Engelhart wrote:
My first concern was with the use of "compiler assisted write
barriers". The current public documentation is extremely vague as
to what a 'write barrier' is, and I'm sure that the majority of you,
like me, assumed the term referred to an "atomic write barrier /
fence" used to ensure that all CPU's past the write barrier would
see the same data at a given location. See `man 3 barrier` for a
description of the OSMemoryBarrier() function that performs this
operation. This would make some sense for a GC system, it would
ensure that the use of a pointer is visible to the collector no
matter what thread or CPU is using the pointer. From what I can
tell, the term 'write barrier' as it is used by the GC documentation
has absolutely nothing to do with this traditional meaning of the
term.
In the GC literature, a "write barrier" is simply the code used to
write a pointer value. It is unrelated to memory barriers, though
sometimes the write barrier code includes a memory barrier.
Most garbage collectors use a write barrier that does more than just a
store instruction. Why? Performance. Without a write barrier (or,
rarely, a read barrier), a garbage collector has no choice but to stop
all threads and scan all memory. This is too slow for programs with
large heaps or threads with responsiveness constraints (e.g. audio
playback). With a write barrier, more sophisticated algorithms can be
used to reduce the amount of scanning or limit thread-stopped time.
The Boehm collector does not use a write barrier, in order to be
compatible with arbitrary C compilers.
Anyone who's used garbage collection with C is probably familiar
with the Boehm Garbage Collector. I believe that the Boehm GC
library embodies what most people would expect of a garbage
collection system: The programmer is freed from having to worry
about memory allocations and when or where pointers to allocated
memory are, it's the collectors job to find those pointers and piece
together what memory is still pointed to by active pointers and
reclaim the memory which has no live pointers referencing it. Very
roughly, it does this by starting with a collection of root
allocations. It scans these allocations looking for pointers, and
then following those pointers and scanning those blocks of memory.
It builds a graph of references from these root objects, and when
all the memory has been scanned, memory allocations that are not
part of this 'liveness' graph can be reclaimed. It makes no
particular demands of the programmer or compiler, in fact it can be
used as a drop in replacement for malloc() and free(), requiring no
changes.
From what I've pieced together, Leopards GC system is nothing like
this. While the Boehm GC system detects liveness passively by
scanning memory and looking for and tracing pointers, Leopards GC
system does no scanning and requires /active/ notification of
changes to the heap. This, I believe, is what a 'write-barrier'
actually is: it is a function call to the GC system so that it can
update it's internal state as to what memory is live. It relies, I
suspect exclusively, on these function calls to track memory
allocations.
Both Leopard's GC and the Boehm-Demers-Weiser GC are conservative
scanning collectors.
Like the Boehm collector, Leopards's GC traces from a set of roots.
Unlike the Boehm collector, the root set is a more limited set of
memory; for example, non-strong global variables are not part of the
root set.
Unlike the Boehm collector, Leopard's GC uses a write barrier; both
"scanning" and "active notification" are required for performance.
Unlike the Boehm collector, Leopard's GC does not scan memory
allocated with malloc(), nor does it manage blocks allocated with
malloc().
Unlike the Boehm collector, Leopard's GC never stops all threads at
once, and stops each thread for only a short period of time (just
enough to scan the thread registers and stack).
The Leopard GC is designed and optimized for Objective-C; it can be
used for ordinary C code, but it's not as easy to use for C code as
the Boehm collector.
A consequence of all of this is that you must not pass pointers that
may have been allocated by the garbage collector to any C function
in a library.
printf("String: %s\n", [@"Hello, world!" UTF8String]);
passes a GC allocated pointer to a C library function, which almost
assuredly does not have the proper write barrier logic in place to
properly guard the pointer.
Not true. This example is perfectly safe, assuming printf() does not
store that pointer for use after the printf() call returns.
Leopard's GC includes every local variable and function parameter in
the root set. No write barriers are required for stack memory.
Ordinary C code can use GC pointers on the stack and as parameters.
/ANY/ pointer that holds a pointer to memory that MAY be allocated
from the garbage collector must be marked __strong. The compiler
attempts to 'automagically' add __strong to certain types of pointer
references, specifically 'id' and derivatives of 'id', namely class
pointers (NSString *).
Realistically, to properly add __strong to a pointer, you need to
know if that allocation came from the garbage collector. This
information is essentially impossible to know apriori, so the only
practical course of action is to defensively qualify all pointers as
__strong.
Luckily, it's easier than you describe for most code.
If your code uses only NSObjects and NSArrays, you don't have to do
anything.
If your code uses C pointers that you allocate and free yourself, you
don't have to do anything.
If your code uses C pointers, and you don't know where they came from,
but you only store those pointers on the stack, you don't have to do
anything.
If your code stores Objective-C pointers or other pointers of unknown
provenance into malloc blocks or global variables or Objective-C
ivars, then you do need to do extra work. Usually "extra work" means
"mark the variable as __strong".
--
Greg Parker email@hidden Runtime Wrangler
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden