Re: Help DTrace gurus: suggestions for capturing a mis-allocated NSData object on a customer's system
Re: Help DTrace gurus: suggestions for capturing a mis-allocated NSData object on a customer's system
- Subject: Re: Help DTrace gurus: suggestions for capturing a mis-allocated NSData object on a customer's system
- From: James Bucanek <email@hidden>
- Date: Sat, 20 Nov 2010 10:58:07 -0700
Ken,
Thanks for weighing in on this. All good questions, which I'll
try to answer.
Ken Thomases <mailto:email@hidden> wrote (Saturday,
November 20, 2010
12:09 AM -0600):
On Nov 19, 2010, at 5:26 PM, James Bucanek wrote:
The problem seems to be that NSConcreteData is accessing, or trying to do
something with, the address of the buffer used to initialize
the NSData object. But the address is a stack frame automatic,
and the NSData
object was created with +[NSData dataWithBytes:length:], which should copy the
contents of the bytes parameter, not hang onto it.
Except that wouldn't explain why it's uninitialized. Or, at least, I don't
see how it does. (I see how that would break things, of course.)
I think the term "uninitialized" in this context is a valgrind
concept. Valgrind runs your application in a simulator. It
associates a "valid-value" bit with every byte of memory. If
your application ever tries to read a byte of data that hasn't
been written yet, it catches it. <http://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine>
<reordering your reply a little...>
Which line of your real code corresponds to:
==78805== Uninitialised value was created by a stack
allocation ==78805== at 0x1000E4BA7: -[PackageNames
readNamePackagesOp] (PackageNames.m:590)
The line in question is:
NSData* data = [NSData dataWithBytes:&batch length:offsetof(ReadBatch,set)+sizeof(PackageNameRecord)*batch.count];
Valgrind knows that &batch refers to an address that was
"created by a stack allocation". When the function returns,
valgrind marks all the bytes in that stack frame as invalid. Any
future attempt to read or write from those addresses is caught
as an error:
=78805== Thread 9:
==78805== Conditional jump or move depends on uninitialised value(s)
==78805== at 0x1002AAD08: -[NSConcreteData dealloc] (in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation)
==78805== by 0x1000EA4B4: -[InsertPackageNamesOp dealloc] (PackageNames.m:1824)
=
Note that valgrind is not saying that the data was being
overwritten, but that NSConcreteData merely "depends on" (a.k.a.
"reads") information from an address space that formerly
belonged to the stack frame of -readNamePackagesOp, which should
now be considered as invalid/uninitialized.
I have no idea what NSConcreteData is doing with the information
it reads, but it looks a lot like it's the source of the problem
when my application crashes. This is from the crash log of the
customer who is encountering problems:
Thread 3 Crashed:
0 libSystem.B.dylib 0x00007fff85784e4e
__semwait_signal_nocancel + 10
1 libSystem.B.dylib 0x00007fff85784d50
nanosleep$NOCANCEL + 129
2 libSystem.B.dylib 0x00007fff857e16a2
usleep$NOCANCEL + 57
3 libSystem.B.dylib 0x00007fff85800c75 __abort + 113
4 libSystem.B.dylib 0x00007fff85800cd9
abort_report_np + 0
5 libSystem.B.dylib 0x00007fff857186f5 free + 128
6 com.apple.Foundation 0x00007fff8637dd48
-[NSConcreteData dealloc] + 90
7 com.apple.CoreFoundation 0x00007fff854c91e6
_CFAutoreleasePoolPop + 230
8 com.apple.Foundation 0x00007fff8636e04b
-[NSAutoreleasePool release] + 158
I suspect that the reason NSData is getting released from the
autorelease pool, instead of in -[InserPackageNamesOp dealloc],
in the crash is due to timing, but I haven't confirmed that for
sure. Valgrind's simulator runs in a single thread and emulates
a multi-processor environment, so it's not surprising that a
release race might come out differently. Or it could be that the
first instance of this problem is caught and logged by valgrind,
but wouldn't crash the process under normal execution. There are
still lots of questions...
Do you ever take the -bytes of that data object and cast it to (RecordBatch*),
as opposed to (const RecordBatch*)? Likewise, accessing the 'set' member,
even from a (const RecordBatch*), will give you a non-const (Record*). Does
the code then, perhaps, accidentally modify the pointed-to data? And could it
write past the end of the 'set' field (not past the declared size, but of the
dynamic size in the 'count' field)? That might modify the NSConcreteData
internals, breaking things.
No, no, no, and no. The body of InsertPackageNamesOp is:
- (void)main
{
...
const RecordBatch* batch = (const RecordBatch*)[data bytes];
SortedIndexSlowLock(namesIndex);
[namesIndex insertCount:batch->count
records:batch->set.names excludingDuplicates:YES];
SortedIndexSlowUnlock(namesIndex);
}
'records:' is a (const PackageNameRecord*) parameter (I
simplified this to just Record* in the earlier code example).
The contents of this array are never modified by -insertCount:records:excludingDuplicates:.
Also, are you sure that -[InsertNamesOp initWithBatchData:] is properly
retaining the passed-in NSData? Have you run the static analyzer on your
code? Have you tried NSZombieEnabled=YES? Have you tried MallocScribble=1?
I can assure you this is not a retain/release/zombie problem.
I've run with zombies enabled, looked at the problem in
ObjectAlloc, with malloc scribble, reviewed the malloc history,
and even tried guard malloc. This NSData object is properly
retained and is being destroyed exactly when and where is should
be, in the -dealloc method of the InsertPackageNamesOp object
that it was assigned to.
James Bucanek
____________________________________________________________________
Author of Professional Xcode 3 ISBN: 9780470525227
<http://www.proxcode3.com/>
and Learn Objective-C for Java Developers ISBN: 9781430223696
<http://objectivec4java.com/>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden