Re: Help DTrace gurus: suggestions for capturing a mis-allocated NSData object on a customer's system
Re: Help DTrace gurus: suggestions for capturing a mis-allocated NSData object on a customer's system
- Subject: Re: Help DTrace gurus: suggestions for capturing a mis-allocated NSData object on a customer's system
- From: James Bucanek <email@hidden>
- Date: Wed, 24 Nov 2010 20:20:40 -0700
Ken Thomases <mailto:email@hidden> wrote (Sunday,
November 21, 2010 6:00 PM -0600):
On Nov 21, 2010, at 12:56 PM, James Bucanek wrote:
Ken Thomases <mailto:email@hidden> wrote (Sunday, November 21, 2010 6:13 AM -0600):
Note that valgrind is not saying that the data was being overwritten, but
that NSConcreteData merely "depends on" (a.k.a. "reads") information from an
address space that formerly belonged to the stack frame of
-readNamePackagesOp, which should now be considered as invalid/uninitialized.
My point is that I suspect something is doing a copy of some uninitialized
bytes from a stack variable over top of NSConcreteData's internal state.
Later, when NSConcreteData accesses it, it's basing a jump or move on this
corrupted data, which Valgrind knows comes from uninitialized data.
(Valgrind knows that a copy from uninitialized data propagates the
uninitialized-ness, even though the destination bytes have been written to.)
But what code could that be? The NSData object's internal variables are
allocated on the heap by +alloc. The only reference my code has to that is
the NSData* data variable.
There are only three other references to this variable after it's created,
and those are all used to send it a message. The only messages sent are, in
order:
-autorelease
-retain
-bytes
-release
That's total extent of interaction with the NSData object.
Well, I had been thinking that a) NSConcreteData might store the data buffer
inline, using some obscure technique, like the extraBytes parameter of
NSAllocateObject;
It does, and I think this is the source of the uninitialized value.
I had some time to experiment again this evening, and found that
for allocations less than 16K, the NSConcreteData class stores
the data in-line. The -bytes property always returns an address
32 bytes offset from the address of the NSData* pointer. Details
in a second...
and b) overrunning or underrunning the buffer would thus
clobber the object's internal state. That would have required writing to the
read-only contents, which you've said is definitely not happening.
I agree. It fits the symptoms so well, that I was unwilling to
take this off the table; On the other hand, I just couldn't see
how it could be getting corrupted.
So, it's
probably a dead end.
Probably, but not for those reasons.
I wrote some code to dump the address of the NSData objects, the
-bytes property, the size of the calculated structure, and
(using some crude introspection) dumped the internal structure
of the NSData object to a log file.
I found that for large (>16K) allocations, NSData malloc()'s a
block to store the data and keep a reference to it.
This is how the NSData object appears immediately after it's created:
2010-11-24 19:26:42.297 -0700 #debug# -[PackageNames
readNamePackagesOp] batch=0x10077cbd0->0x111008000,
data=0x10f30cc60, count=1024, length=16400,
*data=NSConcreteData<0x10f30cc60>{_isInline=(b1)2,_shouldFree=(b1)2,_hasVM=(b1)2,_retainCount=(b29)2,_length=16400,_capacity=16400,_bytes=(v*)0x111008000,_space=type([12C])}
[1.734100.6572.24]
batch: the address of the RecordsBatch structure on the stack,
and its new address returned by [data bytes]
data: the address of the new NSData object
count: the number of records in RecordsBatch
length: the length of the RecordsBatch header structure + the
number of records (this is all correct, I checked the
calculations by hand)
*data: is a dump of the NSData object using some Obj-C
introspection. (The Obj-C Ivar stuff doesn't break apart bit
fields, so that's why the first few values are all the same.)
See that the _bytes field points to the allocated block of
memory that holds the copy of the original buffer.
Now, just before the object gets released, it looks like this:
2010-11-24 19:26:42.303 -0700 #debug# -[InsertPackageNamesOp
dealloc] batch=0x111008000, data=0x10f30cc60, count=1024,
length=16400,
*data=NSConcreteData<0x10f30cc60>{_isInline=(b1)2,_shouldFree=(b1)2,_hasVM=(b1)2,_retainCount=(b29)2,_length=16400,_capacity=16400,_bytes=(v*)0x111008000,_space=type([12C])}
[1.734100.6572.37]
Basically, everything is the same.
Similarly, when the allocation is small, this is what happens:
2010-11-24 19:26:42.295 -0700 #debug# -[PackageNames
readNamePackagesOp] batch=0x10df28bd0->0x10152d7a0,
data=0x10152d780, count=7, length=128,
*data=NSConcreteData<0x10152d780>{_isInline=(b1)1,_shouldFree=(b1)1,_hasVM=(b1)1,_retainCount=(b29)1,_length=128,_capacity=128,_bytes=(v*)0x0,_space=type([12C])}
[1.734100.6572.21]
2010-11-24 19:26:42.297 -0700 #debug# -[InsertPackageNamesOp
dealloc] batch=0x10152d7a0, data=0x10152d780, count=7,
length=128,
*data=NSConcreteData<0x10152d780>{_isInline=(b1)1,_shouldFree=(b1)1,_hasVM=(b1)1,_retainCount=(b29)1,_length=128,_capacity=128,_bytes=(v*)0x0,_space=type([12C])}
[1.734100.6572.23]
As you can see, the internal _bytes value is zero, and the
address returned by -bytes (0x10152d7a0) is 32 bytes after the
address of the NSData object (0x10152d780).
By dumping the internal structure of the NSData object
immediately after it was created and again just before it was
destroyed, I was able to convince myself that the NSData object
is not getting corrupted.
Now, here where things get weird, and where I was sure there was
a problem. After a few of the inline NSData objects are created,
I start to get ones that look like this:
2010-11-24 19:26:42.297 -0700 #debug# -[PackageNames
readNamePackagesOp] batch=0x10df28bd0->0x10152b990,
data=0x10152b970, count=38, length=320,
*data=NSConcreteData<0x10152b970>{_isInline=(b1)1,_shouldFree=(b1)1,_hasVM=(b1)1,_retainCount=(b29)1,_length=320,_capacity=320,_bytes=(v*)0x1,_space=type([12C])}
[1.734100.6572.22]
2010-11-24 19:26:42.298 -0700 #debug# -[InsertPackageNamesOp
dealloc] batch=0x10152b990, data=0x10152b970, count=38,
length=320,
*data=NSConcreteData<0x10152b970>{_isInline=(b1)1,_shouldFree=(b1)1,_hasVM=(b1)1,_retainCount=(b29)1,_length=320,_capacity=320,_bytes=(v*)0x1,_space=type([12C])}
[1.734100.6572.25]
Same as before, but note that the internal _bytes field is 0x1
(not 0x0). A bad address for sure. And this seems to be the
cause of the valgrind warnings, because I soon as my
introspection code executes it spits out this error:
==5787== Thread 8:
==5787== Conditional jump or move depends on uninitialised value(s)
==5787== at 0x100EBD1F5: __vfprintf (in /usr/lib/libSystem.B.dylib)
==5787== by 0x100EE475F: snprintf_l (in /usr/lib/libSystem.B.dylib)
==5787== by 0x101132C76: _CFStringAppendFormatAndArgumentsAux
(in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==5787== by 0x10113127C:
_CFStringCreateWithFormatAndArgumentsAux (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==5787== by 0x100745FC0: -[NSPlaceholderString
initWithFormat:locale:arguments:] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==5787== by 0x1007483A2: +[NSString stringWithFormat:] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==5787== by 0x1000EC49A: ObjectDetail (PackageNames.m:2044)
==5787== by 0x1000E5244: -[PackageNames readNamePackagesOp] (PackageNames.m:693)
==5787== by 0x10118596B: __invoking___ (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==5787== by 0x10118583C: -[NSInvocation invoke] (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==5787== by 0x10084DD93: -[NSInvocationOperation main] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==5787== by 0x10076FDE3: -[__NSOperationInternal start] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==5787== Uninitialised value was created by a stack allocation
==5787== at 0x1000E4A36: -[PackageNames readNamePackagesOp] (PackageNames.m:595)
The same error it was complaining about -[NSData dealloc], but
this time it's complaining about my ObjectDetails function, at
the point where it is using object_getIvar() to read the value
of the _bytes variable.
So here's my guess as to what's going on: When +[NSData
dataWithBytes:length:] is given a buffer that's less than 16K,
it calls a special allocation routine that mallocs (not callocs)
the space for the object and all of the data. It then copies the
data into itself, but never sets the _bytes variable. When the
object is released, the code probably reads the _bytes variable
in some common dealloc code, but then decides that's there's no
memory block to release. But valgrind doesn't know that. It just
sees the code load the _bytes property and it knows that it's uninitliazed.
The reason the problem went away when I switch to malloc blocks,
or always copied the entire structure, is because that always
forces NSData to create a reference to an external memory block.
So it would appear that this entire exercise was chasing a red
hering, although I did learn a lot about valgrind. :)
Have you considered Valgrind's Ptrcheck tool? I don't actually know if it's
available for Mac OS X, but if it is, it could provide a different view on
this problem.
I will pursue that next.
James
James Bucanek
____________________________________________________________________
Author of Professional Xcode 3 ISBN: 9780470525227
<http://www.proxcode3.com/>
and Learn Objective-C for Java Developers ISBN: 9781430223696
<http://objectivec4java.com/>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden