• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful


  • Subject: Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
  • From: Greg Parker <email@hidden>
  • Date: Mon, 4 Feb 2008 13:13:30 -0800

John Engelhart wrote:
My first concern was with the use of "compiler assisted write barriers". The current public documentation is extremely vague as to what a 'write barrier' is, and I'm sure that the majority of you, like me, assumed the term referred to an "atomic write barrier / fence" used to ensure that all CPU's past the write barrier would see the same data at a given location. See `man 3 barrier` for a description of the OSMemoryBarrier() function that performs this operation. This would make some sense for a GC system, it would ensure that the use of a pointer is visible to the collector no matter what thread or CPU is using the pointer. From what I can tell, the term 'write barrier' as it is used by the GC documentation has absolutely nothing to do with this traditional meaning of the term.

In the GC literature, a "write barrier" is simply the code used to write a pointer value. It is unrelated to memory barriers, though sometimes the write barrier code includes a memory barrier.


Most garbage collectors use a write barrier that does more than just a store instruction. Why? Performance. Without a write barrier (or, rarely, a read barrier), a garbage collector has no choice but to stop all threads and scan all memory. This is too slow for programs with large heaps or threads with responsiveness constraints (e.g. audio playback). With a write barrier, more sophisticated algorithms can be used to reduce the amount of scanning or limit thread-stopped time.

The Boehm collector does not use a write barrier, in order to be compatible with arbitrary C compilers.


Anyone who's used garbage collection with C is probably familiar with the Boehm Garbage Collector. I believe that the Boehm GC library embodies what most people would expect of a garbage collection system: The programmer is freed from having to worry about memory allocations and when or where pointers to allocated memory are, it's the collectors job to find those pointers and piece together what memory is still pointed to by active pointers and reclaim the memory which has no live pointers referencing it. Very roughly, it does this by starting with a collection of root allocations. It scans these allocations looking for pointers, and then following those pointers and scanning those blocks of memory. It builds a graph of references from these root objects, and when all the memory has been scanned, memory allocations that are not part of this 'liveness' graph can be reclaimed. It makes no particular demands of the programmer or compiler, in fact it can be used as a drop in replacement for malloc() and free(), requiring no changes.


From what I've pieced together, Leopards GC system is nothing like this. While the Boehm GC system detects liveness passively by scanning memory and looking for and tracing pointers, Leopards GC system does no scanning and requires /active/ notification of changes to the heap. This, I believe, is what a 'write-barrier' actually is: it is a function call to the GC system so that it can update it's internal state as to what memory is live. It relies, I suspect exclusively, on these function calls to track memory allocations.

Both Leopard's GC and the Boehm-Demers-Weiser GC are conservative scanning collectors.
Like the Boehm collector, Leopards's GC traces from a set of roots.
Unlike the Boehm collector, the root set is a more limited set of memory; for example, non-strong global variables are not part of the root set.
Unlike the Boehm collector, Leopard's GC uses a write barrier; both "scanning" and "active notification" are required for performance.
Unlike the Boehm collector, Leopard's GC does not scan memory allocated with malloc(), nor does it manage blocks allocated with malloc().
Unlike the Boehm collector, Leopard's GC never stops all threads at once, and stops each thread for only a short period of time (just enough to scan the thread registers and stack).


The Leopard GC is designed and optimized for Objective-C; it can be used for ordinary C code, but it's not as easy to use for C code as the Boehm collector.


A consequence of all of this is that you must not pass pointers that may have been allocated by the garbage collector to any C function in a library.

printf("String: %s\n", [@"Hello, world!" UTF8String]);

passes a GC allocated pointer to a C library function, which almost assuredly does not have the proper write barrier logic in place to properly guard the pointer.

Not true. This example is perfectly safe, assuming printf() does not store that pointer for use after the printf() call returns.


Leopard's GC includes every local variable and function parameter in the root set. No write barriers are required for stack memory. Ordinary C code can use GC pointers on the stack and as parameters.


/ANY/ pointer that holds a pointer to memory that MAY be allocated from the garbage collector must be marked __strong. The compiler attempts to 'automagically' add __strong to certain types of pointer references, specifically 'id' and derivatives of 'id', namely class pointers (NSString *).

Realistically, to properly add __strong to a pointer, you need to know if that allocation came from the garbage collector. This information is essentially impossible to know apriori, so the only practical course of action is to defensively qualify all pointers as __strong.

Luckily, it's easier than you describe for most code.

If your code uses only NSObjects and NSArrays, you don't have to do anything.

If your code uses C pointers that you allocate and free yourself, you don't have to do anything.

If your code uses C pointers, and you don't know where they came from, but you only store those pointers on the stack, you don't have to do anything.

If your code stores Objective-C pointers or other pointers of unknown provenance into malloc blocks or global variables or Objective-C ivars, then you do need to do extra work. Usually "extra work" means "mark the variable as __strong".


-- Greg Parker email@hidden Runtime Wrangler


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Prev by Date: Re: NSTextView/text system slowdown on Leopard
  • Next by Date: Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
  • Previous by thread: Re: NSTextView/text system slowdown on Leopard
  • Next by thread: PDFAnnotationLine problem
  • Index(es):
    • Date
    • Thread