Re: archiving report
Re: archiving report
- Subject: Re: archiving report
- From: Kyle Sluder <email@hidden>
- Date: Wed, 27 Feb 2013 23:41:00 -0800
On Feb 27, 2013, at 10:46 PM, "Gerriet M. Denkmann" <email@hidden> wrote:
>
> On 28 Feb 2013, at 02:28, Tony Parker <email@hidden> wrote:
>
>> On Feb 26, 2013, at 10:56 AM, Gerriet M. Denkmann <email@hidden> wrote:
>>>
>>> On 27 Feb 2013, at 01:00, Gwynne Raskind <email@hidden> wrote:
>>>
>>>>> 2. NSKeyedArchiver seems to be ok.
>>>>> But it does create unnecessary data. E.g. in the case of an array containing identical objects, like:
>>>>> NSArray *a = @[ @"a", @"a", ...., @"a"];
>>>>> With 1 000 000 items it creates 10,000,395 bytes - my version creates only 1 000 332 bytes
>>>>> and the output is still readable by NSKeyedUnarchiver.
>>>>
>>>> Are you sure this is happening? NSKeyedArchiver is documented as doing deduplication of objects. If this is true, it's definitely a bug and there is no reason Apple wouldn't want it fixed.
>>>
>>> Just try it yourself:
>>> #define NBR 1000000
>>> NSMutableArray *m = [ NSMutableArray array ];
>>> for ( NSUInteger i = 0; i < NBR; i++ ) [ m addObject: @"a" ];
>>> NSData *dataKeyed = [ NSKeyedArchiver archivedDataWithRootObject: m ];
>>> NSLog(@"%s NSKeyedArchiver created lu bytes ", __FUNCTION__, [dataKeyed length]);
>>> Then change NBR to 1000001 and compare.
>>
>> Out of curiosity, what do you expect to happen if your string is @“ab” or something even longer, but repeated 1 million times? Your test implies that the answer is 2,000,000 but in fact the answer is that it only grows one more byte.
> True.
>
> But what happens when you increase the number of items in the array from 1 000 000 to 1 000 001?
> Answer: the output of NSKeyedArchiver grows by 10 bytes.
>
> There is just one StringThing like 0x61 'a' at index 0x9 (arbitrary number). Not a million of them. Good.
>
> But then: for every item in the array there is one ObjectReference, which are all identical like 080 0x09, all referencing the thing at index 9, which is the StringThing "a" (actually referencing $objects[9] which I assume contains the index 9 of our StringThing).
> These 1 000 000 ObjectReference things have the indices 10 ... 1000010.
>
> Then there is an ArrayThing like 0xaf 12 00 0f 42 40 followed by 1 000 000 indices, which are all 0x00 00 00 0a (= 10), referencing the first ObjectReference at index 0xa, which points to the StringThing "a".
>
> At the end there is a table with the offsets for all 1000011 things.
>
> Note: There are 999 999 unused and useless ObjectReferences (each 2 bytes) (at indices 11 ... 1000010).
> And there are 999 999 unused and useless offsets (each 4 bytes) for these.
>
> Also: because of the useless 999 999 things, all 1 000 000 references in the array have to be 4 bytes long (otherwise 1 byte would be enough).
>
> See bug: NSKeyedArchiver creates bloated archives. tracking number for this issue is Bug ID# 13303422.
All you have shown is that NSKeyedArchiver is not optimized for your contrived use case. Where are your stats about serializing and de serializing real world archives? For NIBs vs for the kind of object graphs that are sent over DO connections?
How often do you honestly think someone creates an archive containing a million references to equal but distinct string objects? Tony has already made it clear that this is a case NSKeyedArchiver is not optimized for.
>
> And please note also that after removing these useless bytes the archive is still readable by the current NSKeyedUnarchiver.
This isn't guaranteed. Perhaps at some point Apple determined the effort required to thin out the offset map wasn't worth it for real-world use cases, and so removed it from NSKeyedArchiver while maintaining read-compatibility in NSKeyedUnarchiver. Or perhaps leaving the big table of offsets enables some other useful optimization. It's my understanding that NSKeyedUnarchiver does some crazy tricks with in-place object instantiation.
>
>>>
>>> I have filed the $null bug. Got back as duplicate with a very low id-number. Meaning: this bug is known to Apple since several years. Still no fix.
>>
>> Thank you for your bug reports. Yes, we do get them and we do listen to them.
> Listening is nice. Acting would be kind of better though.
This is a real bug with actual potential security and stability implications, and you're right to be concerned about it. But your other points are high on bluster and low on data. A contrived case of a million letter A's is not enlightening.
--Kyle Sluder
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden