Mailing Lists: Apple Mailing Lists
Image of Mac OS face in stamp
Garbage collection and variable lifetime on the stack
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Garbage collection and variable lifetime on the stack



While tracking down something related to garbage collection, I came
across the following line in "Garbage Collection Programming Guide" >
Using Garbage Collection > Interior Pointers:

----
The compiler can reuse stack slots it determines are no longer used
(see “Root Set and Reference Types”). This can mean that objects are
collected more quickly than you might expect—when a local variable is
removed from the stack and hence the corresponding object not
considered rooted. This has implications for situations in which you
access data held by a local variable after the last direct reference
to that variable. To illustrate, consider the following example:

NSData *myData = [someObject getMyData];
const uint8_t *bytes = [myData bytes];
NSUInteger offset = 0, length = [myData length];

while (offset < length) {
  // if you never reference myData again, bytes is a dangling pointer.
}

Suppose that after you send myData the length message, you do not
reference it again directly. The compiler may reuse the stack slot for
myData. myData may then become eligible for collection (see “Root Set
and Reference Types”); if it is collected, then bytes becomes invalid.
----

Is this actually happening? Using a slightly more specific example of the above:

void someFunction(id someObject) {
  NSData *myData = [someObject getMyData];
  const uint8_t *bytes = [myData bytes];
  NSUInteger offset = 0, length = [myData length];

  printf("myData @ %p, myData = %p\n", &myData, myData);

  int sum = 0;
  while (offset < length) {
    // if you never reference myData again, bytes is a dangling pointer.
    sum += bytes[offset++];
  }

  int *intP = &sum;

  printf("sum: %d, intP @ %p, intP: %p, *intP: %d\n", sum, &intP, intP, *intP);

}

Specifically, what the documentation seems to imply is that after the
last use of myData, the compiler is free to re-use that stack slot to
store another variable... that the lifetime of myData is only
guaranteed up until its last use.  In the above example, that would
mean that &intP could potentially be the same as &myData, since the
last use of myData was at the start, in the printf statement.

Is this the compiler really doing this?  Does anyone have an example
of it? If so, this is a pretty blatant compiler bug, at least under
the C89 and C99 specifications.  Both specs have very specific rules
about the duration in which automatic variables must remain valid, at
the same address in memory, and retain the last value written to them:
right up until the closing brace, just as one might intuitively
expect.

The relevant section for C99 is 6.2.4- Storage durations of objects:

1   An object has a storage duration that determines its lifetime.
There are three storage durations: static, automatic, and allocated.
Allocated storage is described in 7.20.3.

2   The lifetime of an object is the portion of program execution
during which storage is guaranteed to be reserved for it. An object
exists, has a constant address, and retains its last-stored value
throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes
indeterminate when the object it points to reaches the end of its
lifetime.

3   An object whose identifier is declared with external or internal
linkage, or with the storage-class specifier static has static storage
duration. Its lifetime is the entire execution of the program and its
stored value is initialized only once, prior to program startup.

4   An object whose identifier is declared with no linkage and without
the storage-class specifier static has automatic storage duration.

5   For such an object that does not have a variable length array
type, its lifetime extends from entry into the block with which it is
associated until execution of that block ends in anyway. (Entering an
enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively,
a new instance of the object is created each time. The initial value
of the object is indeterminate. If an initialization is specified for
the object, it is performed each time the declaration is reached in
the execution of the block; otherwise, the value becomes indeterminate
each time the declaration is reached.

6   For such an object that does have a variable length array type,
its lifetime extends from the declaration of the object until
execution of the program leaves the scope of the declaration. If the
scope is entered recursively, anew instance of the object is created
each time. The initial value of the object is indeterminate.
------

At least according to the spec, myData must be valid (p2 "retains its
last-stored value"...) until the closing brace for that block (p2
..."throughout its lifetime.").

Old ANSI C (C89) had essentially identical requirements, from 2.1.2.3
Program Execution:

An instance of each object with automatic storage duration is
associated with each entry into a block.  Such an object exists and
retains its last-stored value during the execution of the block and
while the block is suspended (by a call of a function or receipt of a
signal).
----

Unless I'm severely misinterpreting the "Interior Pointers" section of
the GC guide, that entire section is fallacious.  The language itself
makes hard guarantees, even in the presence of optimization, so that
doing things like "You can ensure that the data object remains valid
until you’ve finished using it by sending it a message" is completely
unnecessary.

Once again, with the relevant sections of the standard in mind,
consider this text from the GC guide:

"Suppose that after you send myData the length message, you do not
reference it again directly. The compiler may reuse the stack slot for
myData. myData may then become eligible for collection (see “Root Set
and Reference Types”); if it is collected, then bytes becomes
invalid."

In particular, "The compiler may reuse the stack slot for myData."
If the declaration is at the top of a function (ie, myFunc() { NSData
*myData = ...; }), then the compiler most definitely can not reuse the
stack slot for myData- it must remain valid and retain the last value
that was stored until the closing brace, at least according to the
language spec. "The compiler can reuse stack slots it determines are
no longer used. This can mean that objects are collected more quickly
than you might expect" - is simply not possible in a standards
conforming C compiler.

Just from a practical stand point, why would the compiler even try to
optimize stack slot usage like this?  It would take an enormous effort
in lifetime, alias, and escape analysis to be able to safely reuse a
stack slot space, after which you would essentially have to do
"register allocation" for the stack slots in order to actually reuse
them. After all this extremely non-trivial analysis the end result
would be absolutely no performance improvement.  It's the stack and
stack space is so cheap, it's essentially free.  All that effort so
that you can shave maybe 4 to 64 bytes off the stack frame with no
measurable improvement in performance?  It makes no sense to even try
to do this kind of optimization.

Does anyone have an example of source code that clearly and
unambiguously shows the compiler reusing stack slots in a manner that
is inconsistent with the spec?  An example that clobbers the last
written value for myData before the function exits that isn't an
obvious programming error (ie, writing past the bounds of an array)?
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Objc-language mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2011 Apple Inc. All rights reserved.