Re: Garbage collector vs variable lifetime

Subject: Re: Garbage collector vs variable lifetime
From: Peter Duniho <email@hidden>
Date: Sat, 7 Jun 2008 17:15:23 -0700

Date: Sat, 7 Jun 2008 23:24:22 +0100
From: "Hamish Allan" <email@hidden>

Whenever you write documentation in a natural language, there is scope
for ambiguity. This particular technical specification is only
mentioned in a single sentence:

"The root set is comprised of all objects reachable from root objects
and all possible references found by examining the call stacks of
every Cocoa thread."

Is that the actual specification for the garbage collector? Or just some documentation provided by Apple?

In any case...

Are you seriously telling me that "all *possible* references found by
examining the calls stacks" unambiguously means "all *actual*
references found by examining the rest of the code in the block"?

I'm not telling you that at all. I don't actually know how the Obj-C GC works, but from what's been written here, it appears to me that it's not looking at code, but rather it's looking at data.

It doesn't need to examine the code to know whether there's a value in the stack that refers to the object. If it walks all the stack frames and doesn't find a reference to the object, then no code can possibly be referencing the object via a local variable.

It is available for my use until the end of that scope. The fact that I don't actually make use of it, and the optimising compiler thinks I'm done with it: *that*'s an implementation detail.
No, it's not. It's a _semantic_ detail. If you do in fact not use the reference, then it is in fact _unused_. Garbage collection is all about what's being used or not.
You'll notice that I was replying to a specific point Michael made:
"the promise that is being made is that objects which you're still
pointing to won't go away".

He can speak for himself, but I believe that statement was made in context of the run-time status. And if the compiler optimizes out a variable, indeed it may be that you're not still "pointing to" that particular object.

I don't disagree that "have a pointer"
means something different to "use a pointer".

I don't think it's necessary to consider those to have different meanings for the behavior of the compiler to make sense, even in the context of garbage collection. If you have no code that "uses the pointer", then the compiler is free to use that memory location for something else, at which point you no longer "have a pointer". The two are very much related.

Which part of the
documentation unambiguously specifies that "Garbage collection is all
about what's being used or not"?

Ahhh...please, I'm not trying to defend the documentation. If you find something in the documentation that is misleading or incomplete, that's no surprise to me. That happens in documentation (and not just Apple's) all the time. Especially for relatively new features.

As far as I'm concerned, the only question is whether the behavior of the compiler and its effect on the GC makes sense. And it's my opinion that it does.

Even if the compiler did not optimize away the use of the variable past that point in the code, if the GC system could otherwise determine you weren't ever going to use it again, it would _still_ be valid for the GC system to collect the object.
If you could write a garbage collector that could reliably detect what
was being used, you'd have no need to specify __strong or __weak any
more, and you'd have solved the halting problem to boot. We're not
quite at that stage yet :)

So? My point is that whatever the theoretical capabilities of the GC, as long as you haven't written code that will actually use a reference to an object, the GC should be allowed to collect the object. The fact that writing such an advanced GC is theoretically impossible is immaterial.

That's why we call these "hypothetical" examples.  :)

I agree with you that if it returned collected memory, it would be
more powerful than under retain/release/autorelease. But power and
straightforwardness do not always go hand in hand ;)
Well, in this case I believe they do. The real issue here is that a garbage collection system is trying to co-exist peacefully with a non-GC system.
I disagree. The real issue here is that code optimisations cause
different behaviour in the GC,

Define "different". In every GC system I've ever used, there are never any guarantees about when an object may be collected, except that it won't be collected until it's no longer in use. The fact that in the debug build, an object is collected _later_ than would theoretically be possible in an optimized build doesn't mean that the GC system is broken. It just means that it's non-deterministic, which is always true anyway.

whereas if the GC behaved according to
rules based on the *semantics* of what the programmer writes (i.e.
what would happen if that variable were really on the stack, rather
than optimised away into a register), there wouldn't be a problem (the
optimisation could still happen, of course, but the compiler would
flag that reference as strong).

We will simply have to agree to disagree. I am not of the opinion that the scoping of a variable is a declaration of semantics. That's syntax.

Maybe it's just because I've been using GC systems more than you have, and so I've gotten used to thinking about memory allocations in the paradigm that GC systems impose. Maybe this is a subjective question that cannot be answered. But you aren't going to get me to agree that scoping of a variable is a semantic quality of the code. My opinion is that it's how the variable is actually used, not how it's declared, that defines the semantics of its use.

[...]
That said, even in .NET (for example), the GC system has a "GC.KeepAlive()" method for this very purpose. It doesn't actually do anything, but it interrupts optimizations the compiler might make that would otherwise allow an object to be collected (it would be used exactly as the proposed "[data self]" call would be). This is to allow for situations where the only reference to the object is in "unmanaged" code -- that is, it's been passed across the interface from .NET to the older non-GC'ed Windows API.
Again, I think that signalling to the GC that you don't want the
object collected by creating a stack variable reference to it --
whether or not that variable ever actually ends up on the stack due to
optimisations -- is quite enough.

If that's what you were signaling to the GC system, then yes...that would be enough. But that's not a signal to the GC system at all. Hence the "problem".

No need for GC.KeepAlive(), [data
self], CFRetain(), disableCollectionForPointer, or any other hack.

Nope. Even if you "fixed" the "optimized-away" problem, you'd still have other problems. That's because the root of this problem isn't the optimizing compiler. It's the fact that garbage collection is being used alongside other forms of memory management. It's only because there are things that are allocated outside of the garbage collection system that this even comes up.

[snip]
Not only is it not a bug for the compiler to not concern itself with the issue, the fact is that the extant example here is just the tip of the iceberg. While you might argue that the compiler _could_ prevent this particular problem, a) it requires the compiler to get into the business of memory management
The compiler is already in that business -- hence the modifiers
__strong and __weak.

Not really. In fact, if anything those are proof that the compiler is _not_ in the business of memory management. Instead, it provides a way for YOU to give it information that is used to control memory management. If the compiler were in the business of memory management, it would infer those properties itself.

and b) there will still be lots of other scenarios that
are similar, but not solvable by the compiler.

Could you please elaborate on this?


I'm surprised I need to.

But, consider the NSData example we've been using. Suppose that the byte* in question weren't just used locally, but rather passed to some other data structure. Even if we "fixed" the compiler so that it didn't optimize away the variable referencing the "data" object, when that variable goes out of scope as the method returns, the byte* still would potentially be released, as the "data" object is.

As I said before, the original example is simply a subset of a more general class of problems. These problems come up any time you mix GC and non-GC, because GC-able objects may reference non-GC-able objects or vice a versa. The former is a problem because a GC-able object could wind up releasing a resource that someone else was using when itself gets released, and the latter is a problem because the GC- able object could be referenced in a location that the GC system doesn't know to look.

Either way, you get some data getting released while it's still theoretically in use.

A pure GC system would never have this problem, because it has a complete view of the memory referencing situation.

Sure, you could design NSData differently to mask a design problem in
GC. But GC won't be easier to use than retain/release/autorelease
without simple rules like "if you declare it on the stack, it's in the
root set, regardless of whether the compiler optimises it into a
register".

Well, I disagree. It's already easier to use, without a contrived rule like that.

The design change required for NSData isn't for the purpose of masking a "problem in GC". There's not a problem in GC, there's a problem with NSData. It should not be returning a reference that could go away at some time in the future.

Solving this problem is non-trivial, at least if I understand the other comments about what kinds of references NSData could return. If it's just returning a random block of allocated memory, then the fix would be as simple as making that block also allocated by the GC system. But if the reference can be, for example, a pointer to a memory-mapped file, then some additional housekeeping needs to happen in order to make sure that a) the pointer isn't invalidated when the NSData object is released, and b) the pointer _is_ invalidated when it's really and truly no longer used.

Consider the following:

if ([data length] > 2)
{
  char *cs = (char *)[data bytes];
  char c = *cs; // statement 1
  void *p = (void *)data; // statement 2
  NSLog("First character of data object at %p is %c\n", p, c);
}

Now, who is to say that it wouldn't suit the optimising compiler to
switch round statements 1 and 2, which it can plainly see have no
dependency on one another? So referencing "data" to ensure it remains
alive after I reference its inner pointers doesn't necessarily work.

No one said anything about _referencing_ "data". You have to _use_ it in a way that the compiler _can't_ rearrange. That's why something like "[data self]" works, but just assigning the value to some other variable doesn't.

I agree with you that there are a variety of solutions. I'm just
proposing one that I think makes memory management more
straightforward for the programmer than any others I've heard so far.
If you have any specific objections to it, I'd like to hear them.

By "proposing one", you mean changing the compiler so that it doesn't optimize away the variable?

My specific objection to that is that it's a valuable optimization, and there's no good reason for the mere fact that GC is in use to mean that we do without the optimization.

And it seems to me that with GC being a relatively new addition to Obj-C, that the likelihood of running into such situations is going to be greater than in a more-evolved environment. It's just something that needs to be kept in mind, just as in the older retain/release paradigm there were a number of rules that needed to be kept in mind. IMHO, inasmuch as the need to keep this particular issue in mind comes up less frequently than the retain/release rules needed to, the GC system is more accommodating (though I suppose it also means it's harder to get used to keeping the rule in mind :) ).
I think it also means it'll be harder to track certain bugs down. And
unnecessarily so!

Having garbage collection introduces a whole new class of bugs, yes. But it also removes a whole other class of bugs. Frankly, I find that the class of bugs that it removes are WAY more common than those it introduces, and as the rest of the framework catches up to the GC paradigm, this will only be more and more true.

YMMV.

Pete
_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: Garbage collector vs variable lifetime
From: "Hamish Allan" <email@hidden>


Prev by Date:
Re: Java and Objective-C

Next by Date:
Re: Java and Objective-C

Previous by thread:
Re: Garbage collector vs variable lifetime

Next by thread:
Re: Garbage collector vs variable lifetime

Index(es):

Date
Thread