Re: Garbage collector vs variable lifetime
Re: Garbage collector vs variable lifetime
- Subject: Re: Garbage collector vs variable lifetime
- From: Peter Duniho <email@hidden>
- Date: Sat, 7 Jun 2008 17:15:23 -0700
Date: Sat, 7 Jun 2008 23:24:22 +0100
From: "Hamish Allan" <email@hidden>
Whenever you write documentation in a natural language, there is scope
for ambiguity. This particular technical specification is only
mentioned in a single sentence:
"The root set is comprised of all objects reachable from root objects
and all possible references found by examining the call stacks of
every Cocoa thread."
Is that the actual specification for the garbage collector? Or just
some documentation provided by Apple?
In any case...
Are you seriously telling me that "all *possible* references found by
examining the calls stacks" unambiguously means "all *actual*
references found by examining the rest of the code in the block"?
I'm not telling you that at all. I don't actually know how the Obj-C
GC works, but from what's been written here, it appears to me that
it's not looking at code, but rather it's looking at data.
It doesn't need to examine the code to know whether there's a value
in the stack that refers to the object. If it walks all the stack
frames and doesn't find a reference to the object, then no code can
possibly be referencing the object via a local variable.
It is available for my use until the end of that scope. The fact
that
I don't actually make use of it, and the optimising compiler thinks
I'm done with it: *that*'s an implementation detail.
No, it's not. It's a _semantic_ detail. If you do in fact not
use the
reference, then it is in fact _unused_. Garbage collection is all
about
what's being used or not.
You'll notice that I was replying to a specific point Michael made:
"the promise that is being made is that objects which you're still
pointing to won't go away".
He can speak for himself, but I believe that statement was made in
context of the run-time status. And if the compiler optimizes out a
variable, indeed it may be that you're not still "pointing to" that
particular object.
I don't disagree that "have a pointer"
means something different to "use a pointer".
I don't think it's necessary to consider those to have different
meanings for the behavior of the compiler to make sense, even in the
context of garbage collection. If you have no code that "uses the
pointer", then the compiler is free to use that memory location for
something else, at which point you no longer "have a pointer". The
two are very much related.
Which part of the
documentation unambiguously specifies that "Garbage collection is all
about what's being used or not"?
Ahhh...please, I'm not trying to defend the documentation. If you
find something in the documentation that is misleading or incomplete,
that's no surprise to me. That happens in documentation (and not
just Apple's) all the time. Especially for relatively new features.
As far as I'm concerned, the only question is whether the behavior of
the compiler and its effect on the GC makes sense. And it's my
opinion that it does.
Even if the compiler did not optimize away the
use of the variable past that point in the code, if the GC system
could
otherwise determine you weren't ever going to use it again, it
would _still_
be valid for the GC system to collect the object.
If you could write a garbage collector that could reliably detect what
was being used, you'd have no need to specify __strong or __weak any
more, and you'd have solved the halting problem to boot. We're not
quite at that stage yet :)
So? My point is that whatever the theoretical capabilities of the
GC, as long as you haven't written code that will actually use a
reference to an object, the GC should be allowed to collect the
object. The fact that writing such an advanced GC is theoretically
impossible is immaterial.
That's why we call these "hypothetical" examples. :)
I agree with you that if it returned collected memory, it would be
more powerful than under retain/release/autorelease. But power and
straightforwardness do not always go hand in hand ;)
Well, in this case I believe they do. The real issue here is that
a garbage
collection system is trying to co-exist peacefully with a non-GC
system.
I disagree. The real issue here is that code optimisations cause
different behaviour in the GC,
Define "different". In every GC system I've ever used, there are
never any guarantees about when an object may be collected, except
that it won't be collected until it's no longer in use. The fact
that in the debug build, an object is collected _later_ than would
theoretically be possible in an optimized build doesn't mean that the
GC system is broken. It just means that it's non-deterministic,
which is always true anyway.
whereas if the GC behaved according to
rules based on the *semantics* of what the programmer writes (i.e.
what would happen if that variable were really on the stack, rather
than optimised away into a register), there wouldn't be a problem (the
optimisation could still happen, of course, but the compiler would
flag that reference as strong).
We will simply have to agree to disagree. I am not of the opinion
that the scoping of a variable is a declaration of semantics. That's
syntax.
Maybe it's just because I've been using GC systems more than you
have, and so I've gotten used to thinking about memory allocations in
the paradigm that GC systems impose. Maybe this is a subjective
question that cannot be answered. But you aren't going to get me to
agree that scoping of a variable is a semantic quality of the code.
My opinion is that it's how the variable is actually used, not how
it's declared, that defines the semantics of its use.
[...]
That said, even in .NET (for example), the GC system has a
"GC.KeepAlive()"
method for this very purpose. It doesn't actually do anything,
but it
interrupts optimizations the compiler might make that would
otherwise allow
an object to be collected (it would be used exactly as the
proposed "[data
self]" call would be). This is to allow for situations where the
only
reference to the object is in "unmanaged" code -- that is, it's
been passed
across the interface from .NET to the older non-GC'ed Windows API.
Again, I think that signalling to the GC that you don't want the
object collected by creating a stack variable reference to it --
whether or not that variable ever actually ends up on the stack due to
optimisations -- is quite enough.
If that's what you were signaling to the GC system, then yes...that
would be enough. But that's not a signal to the GC system at all.
Hence the "problem".
No need for GC.KeepAlive(), [data
self], CFRetain(), disableCollectionForPointer, or any other hack.
Nope. Even if you "fixed" the "optimized-away" problem, you'd still
have other problems. That's because the root of this problem isn't
the optimizing compiler. It's the fact that garbage collection is
being used alongside other forms of memory management. It's only
because there are things that are allocated outside of the garbage
collection system that this even comes up.
[snip]
Not only is it not a bug for the compiler to not concern itself
with the
issue, the fact is that the extant example here is just the tip of
the
iceberg. While you might argue that the compiler _could_ prevent
this
particular problem, a) it requires the compiler to get into the
business of
memory management
The compiler is already in that business -- hence the modifiers
__strong and __weak.
Not really. In fact, if anything those are proof that the compiler
is _not_ in the business of memory management. Instead, it provides
a way for YOU to give it information that is used to control memory
management. If the compiler were in the business of memory
management, it would infer those properties itself.
and b) there will still be lots of other scenarios that
are similar, but not solvable by the compiler.
Could you please elaborate on this?
I'm surprised I need to.
But, consider the NSData example we've been using. Suppose that the
byte* in question weren't just used locally, but rather passed to
some other data structure. Even if we "fixed" the compiler so that
it didn't optimize away the variable referencing the "data" object,
when that variable goes out of scope as the method returns, the byte*
still would potentially be released, as the "data" object is.
As I said before, the original example is simply a subset of a more
general class of problems. These problems come up any time you mix
GC and non-GC, because GC-able objects may reference non-GC-able
objects or vice a versa. The former is a problem because a GC-able
object could wind up releasing a resource that someone else was using
when itself gets released, and the latter is a problem because the GC-
able object could be referenced in a location that the GC system
doesn't know to look.
Either way, you get some data getting released while it's still
theoretically in use.
A pure GC system would never have this problem, because it has a
complete view of the memory referencing situation.
Sure, you could design NSData differently to mask a design problem in
GC. But GC won't be easier to use than retain/release/autorelease
without simple rules like "if you declare it on the stack, it's in the
root set, regardless of whether the compiler optimises it into a
register".
Well, I disagree. It's already easier to use, without a contrived
rule like that.
The design change required for NSData isn't for the purpose of
masking a "problem in GC". There's not a problem in GC, there's a
problem with NSData. It should not be returning a reference that
could go away at some time in the future.
Solving this problem is non-trivial, at least if I understand the
other comments about what kinds of references NSData could return.
If it's just returning a random block of allocated memory, then the
fix would be as simple as making that block also allocated by the GC
system. But if the reference can be, for example, a pointer to a
memory-mapped file, then some additional housekeeping needs to happen
in order to make sure that a) the pointer isn't invalidated when the
NSData object is released, and b) the pointer _is_ invalidated when
it's really and truly no longer used.
Consider the following:
if ([data length] > 2)
{
char *cs = (char *)[data bytes];
char c = *cs; // statement 1
void *p = (void *)data; // statement 2
NSLog("First character of data object at %p is %c\n", p, c);
}
Now, who is to say that it wouldn't suit the optimising compiler to
switch round statements 1 and 2, which it can plainly see have no
dependency on one another? So referencing "data" to ensure it remains
alive after I reference its inner pointers doesn't necessarily work.
No one said anything about _referencing_ "data". You have to _use_
it in a way that the compiler _can't_ rearrange. That's why
something like "[data self]" works, but just assigning the value to
some other variable doesn't.
I agree with you that there are a variety of solutions. I'm just
proposing one that I think makes memory management more
straightforward for the programmer than any others I've heard so far.
If you have any specific objections to it, I'd like to hear them.
By "proposing one", you mean changing the compiler so that it doesn't
optimize away the variable?
My specific objection to that is that it's a valuable optimization,
and there's no good reason for the mere fact that GC is in use to
mean that we do without the optimization.
And it seems to me that with GC being a relatively new addition to
Obj-C,
that the likelihood of running into such situations is going to be
greater
than in a more-evolved environment. It's just something that
needs to be
kept in mind, just as in the older retain/release paradigm there
were a
number of rules that needed to be kept in mind. IMHO, inasmuch as
the need
to keep this particular issue in mind comes up less frequently
than the
retain/release rules needed to, the GC system is more
accommodating (though
I suppose it also means it's harder to get used to keeping the
rule in mind
:) ).
I think it also means it'll be harder to track certain bugs down. And
unnecessarily so!
Having garbage collection introduces a whole new class of bugs, yes.
But it also removes a whole other class of bugs. Frankly, I find
that the class of bugs that it removes are WAY more common than those
it introduces, and as the rest of the framework catches up to the GC
paradigm, this will only be more and more true.
YMMV.
Pete
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden