half-initialized objects during decoding (Re: Persistance)
half-initialized objects during decoding (Re: Persistance)
- Subject: half-initialized objects during decoding (Re: Persistance)
- From: Chris Kane <email@hidden>
- Date: Wed, 10 Oct 2001 13:21:01 -0700
The 'half-initialized object' problems I was referring to I describe
below. Encoding is in several ways the easy direction. Decoding is
harder, and has a couple hard problems that I'm going to discuss here,
in the hopes that somebody has a clever idea.
The current Cocoa archiving system does not solve these problems, and
it's quite possible your classes may be experiencing (or inducing) these
problems without realizing it, since things seem to get away with them
in general. If you run into them, however, you can very possibly just
be hosed, or have to jump through hoops to fix the problem.
I'm going to use NS(Un)Archiver terminology here, but the problems are
general.
1) decodeObject can return an object which is not fully initialized, and
thus possibly not usable
In the general case, archiving has to deal with an object graph
with cycles in it. During decoding, imagine that an object A is in
initWithCoder: and calls decodeObject. That call to decodeObject
recursively begins to decode that object and the objects it points to
and again recursively decoded, etc. There is a cycle in the graph
involving A however. During the recursive decoding of some object B,
decodeObject in its initWithCoder: is invoked, and the decoder finds
that this was a reference to A (at encode time), and now needs to decode
A. A is already being decoded, however, but it is not done being
decoded. If the decoder attempts to decode a new object as A (a
different A) there will be infinite recursion and the decoder follows
this cycle forever (and it would break the object reference as well).
So it returns the A it had already made (as it would with any object
already decoded). This allows all the references to be restored to the
original ones in the encoded graph. However, if B tries to invoke some
methods on A to find out about it in order to proceed with its
unarchiving, trouble! This is more common than you might think, but
fortunately usually happens with leaf objects (objects which contain no
references to other objects).
A common example is in the AppKit: NSWindows reference the contained
views (via the top-level content view) and the views all reference the
window they're in. The window starts being decoded, and it calls
decodeObject to get the content view, which recursively decodes its
subviews etc. When a view is decoded, it decodes its window, but the
window is still in the middle of its own -initWithCoder:, and in fact
won't return there until ALL the views have been decoded and created.
But the -decodeObject (to get the window) called by the NSViews has to
return _something_, and per the previous discussion, returns the
half-initialized window. Fortunately, the NSViews just want to
reference the window, they don't need to get any information out of the
window, so AppKit gets away with this. But if NSView ever wanted to
call a method on the window, there'd possibly be trouble.
One historical insidious manifestation of this problem is when an object
A implements internal reference counting in one of its fields. During
decoding, B gets a reference to A (with -decodeObject) and retains it,
invoking -retain on the half-initialized A. A's initWithCoder:
proceeds, and at some point sets the ref count field to 0 (initializing
it), but this blows away B's retain (and anybody else's). The problems
here do not only occur with fields which are object references, as this
example shows.
2) it is not legitimate to return a new object (object other than
original self) from initWithCoder: or awake* type methods
Again, during unarchiving, we consider what happens as the decoder
tries to restore the references between objects in the object graph when
there is a cycle. Suppose object A, in its initWithCoder: is decoding
stuff, and this triggers the decode of B (as above), which completes. B
has now been given a reference to A. The recursion returns over and
over again to A's -initWithCoder: as it calls decodeObject, and finally
it is done. A then computes a new object A1, deallocates the original
self (which the decoder knew as A) and returns A1. Maybe the decoder
updates its tables and A1 will now be returned for references to A. But
any references to A _already returned_ are now broken -- the object may
have been forcefully dealloc'd instead of released, in which case they
have invalid memory. At best, they have one version of A (which may not
be fully initialized, if initWithCoder: wasn't stuffing all the decoded
material into ivars, which it isn't required to do), and other objects
have the new (final) version. The references have not been completely
restored by the decoding process.
There are a couple approaches to solving these, but I don't want to
spoil anybody's fun in thinking about these problems (for those that are
inclined to ponder). Unfortunately, the techniques I know of (that
would even be "correct") are either a pain for implementors of
initWithCoder: methods, or involve two passes during decoding, or solve
one of the problems but not both, or result in object graphs which
simulate but are not the actual original object graphs and induce a
performance hit thereafter.
Chris Kane
Cocoa Frameworks, Apple