• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful


  • Subject: Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
  • From: Alastair Houghton <email@hidden>
  • Date: Mon, 4 Feb 2008 13:11:59 +0000

On 4 Feb 2008, at 01:57, John Engelhart wrote:

I've had several reservations about Leopard's GC system since I started working with it. There is very little documentation on Leopards GC system, so the following has been pieced together by inference and observations of how the garbage collection system seems to work. My first concern was with the use of "compiler assisted write barriers". The current public documentation is extremely vague as to what a 'write barrier' is,

[snip]

From what I can tell, the term 'write barrier' as it is used by the GC documentation has absolutely nothing to do with this traditional meaning of the term.

The meaning of "write barrier" in this context is the traditional one in the world of garbage collection, which has been around a lot longer than other meaning. It's certainly traditional though; e.g. see


  <ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps>

or the excellent book "Garbage Collection: Algorithms for Automatic Dynamic Memory Management" by Jones and Lins (Wiley 1997, ISBN 0-471-94148-4 <http://www.amazon.co.uk/dp/0471941484>).

The GC docs actually explain what the write barrier is used for here:

<http://developer.apple.com/documentation/Cocoa/Conceptual/GarbageCollection/Articles/gcArchitecture.html#//apple_ref/doc/uid/TP40002451-SW4 >

Anyone who's used garbage collection with C is probably familiar with the Boehm Garbage Collector. I believe that the Boehm GC library embodies what most people would expect of a garbage collection system: The programmer is freed from having to worry about memory allocations

[snip]

It makes no particular demands of the programmer or compiler, in fact it can be used as a drop in replacement for malloc() and free(), requiring no changes.

From what I've pieced together, Leopards GC system is nothing like this. While the Boehm GC system detects liveness passively by scanning memory and looking for and tracing pointers, Leopards GC system does no scanning and requires /active/ notification of changes to the heap. This, I believe, is what a 'write-barrier' actually is: it is a function call to the GC system so that it can update it's internal state as to what memory is live. It relies, I suspect exclusively, on these function calls to track memory allocations.

The Boehm GC and the Leopard Cocoa GC have very different design goals. In the case of Boehm's collector, it's a requirement that the collector work without any assistance from the compiler; as a result, it has to use "conservative" techniques, which may in general result in leaks of arbitrary amounts of memory simply because of a stray value that *looks like* a pointer to something. The lack of compiler assistance means that it's almost impossible to write a collector that will run in the background (the Boehm collector has to stop *all* the other threads in your program every so often if you run it in the background), and it's difficult to implement generational behaviour without relying on platform-specific features such as access to dirty bits from the system page table... Even in that case, use of dirty bits is woefully inefficient compared to compiler co-operation, since a single dirty bit means you must re-scan an entire page of memory. The Boehm GC is very clever, certainly, but it has to cope with these limitations (and more besides).


Cocoa GC, on the other hand, is able to co-operate with the compiler, and that's what the write barriers are. You have mis-interpreted their function; they exist to track inter-generational pointers, not to enable some sort of behind-the-scenes reference counting as I think you imply. They may also be used to help the collector to obtain a consistent view of the mutator's objects in spite of running in the background... I don't know whether the Leopard GC does that or not.

(Incidentally, there is also a read barrier, which is used to help implement zeroing weak references; the compiler only generates that for variables marked __weak.)

I think, perhaps, that it would be worth your while reading through the literature on garbage collection, as you might then understand the various trade-offs involved better.

In order for leopards GC system to function properly, the compiler must be aware of all pointers that have been allocated by the GC system so that it can wrap all uses of the pointer with the appropriate GC notification functions (objc_assign*).

Yep.

[snip]

Realistically, to properly add __strong to a pointer, you need to know if that allocation came from the garbage collector. This information is essentially impossible to know apriori, so the only practical course of action is to defensively qualify all pointers as __strong.

No. Cocoa GC mostly deals with objects (which may include Core Foundation objects). That's why the default assumption, which is that object pointers are strong, is enough for most situations.


That only changes if you have pointers of non-object types that happen to point to things that were allocated with the GC, *and only then* if they are stored in locations that are not scanned by default. This is an unusual situation, since few methods return things that are allocated by GC and that are not objects. -UTF8String is probably the most common example, but since you tend not to store the result of that method, there would rarely---if ever---be a problem.

The consequence of using a pointer that is not properly qualified as __strong is that the GC system may determine that the allocation is no longer live and reclaim it, even if there is still a valid pointer out there.

Only if there is no copy of the pointer in any of the locations that are scanned by default (e.g. the stack, in registers, in global variables).


It is also trivial to get wrong, and the only indication that there's a problem is an occasional random error or crash.

In most cases, because GC'd things are objects, it's trivial to get *right*.


It's only in special cases, where you're using C pointer types to point to GC'd memory, that you need worry about this kind of thing.

I believe I have a succinct example that illustrates these issues:

[snip]

I strongly suspect the pointer that UTF8String returns is a pointer to an allocation from the garbage collector. In fact, by changing the 'title' ivar to include __strong 'solves' the problem.

Yes, that's your bug. It doesn't just 'solve' the problem, the lack of __strong here *is* the problem, but only because this is an ivar and not e.g. a function argument or a stack-based variable.


But this points to a much bigger problem: anyone who has used UTF8String and not qualified it as __strong has a race condition just waiting to happen.

No, because stack variables and registers are included in the set of GC roots.


This is but one example. I don't think I need to point out that there are others. A lot of others. And most of them are non- obvious. A consequence of all of this is that you must not pass pointers that may have been allocated by the garbage collector to any C function in a library. For example,

printf("String: %s\n", [@"Hello, world!" UTF8String]);

That code is fine. The reference is on the stack (or, before that, in the register that holds the return value of -UTF8String). It will be followed, so the memory won't be released until the printf() function has finished with it.


passes a GC allocated pointer to a C library function, which almost assuredly does not have the proper write barrier logic in place to properly guard the pointer.

The write barrier is nothing to do with it. The write barrier is for inter-generational pointers, and possibly also to help the collector to scan in the background safely.


Kind regards,

Alastair.

--
http://alastairs-place.net


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
      • From: John Engelhart <email@hidden>
References: 
 >Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful (From: John Engelhart <email@hidden>)

  • Prev by Date: Re: Updating the trash icon without NSWorkspace
  • Next by Date: Re: copyCGLContextForPixelFormat
  • Previous by thread: Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
  • Next by thread: Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
  • Index(es):
    • Date
    • Thread