Re: archiving report
Re: archiving report
- Subject: Re: archiving report
- From: Tony Parker <email@hidden>
- Date: Wed, 27 Feb 2013 11:28:51 -0800
Hi Gerriet,
On Feb 26, 2013, at 10:56 AM, Gerriet M. Denkmann <email@hidden> wrote:
>
> On 27 Feb 2013, at 01:00, Gwynne Raskind <email@hidden> wrote:
>
>> On Feb 26, 2013, at 12:47 PM, Gerriet M. Denkmann <email@hidden> wrote:
>>> My investigations regarding archiving on OS X:
>>>
>>> 1. NSArchiver stores all strings in Utf-8.
>>> This is inefficient for strings which contain mainly non-european characters (e.g. Chinese or Thai) as one character will use 3 bytes (Utf-16 would use only 2).
>>> Corollary: It cannot store strings which contain illegal Unicode chars.
>>
>> And then in UTF-16, strings which contain mostly ASCII/European characters are wasting 2x space. Six of one, half dozen of the other. This is a very old debate and I'm grateful that Apple chose UTF-8 for storage, as UTF-16 makes things much more complicated.
>
> Or one could (as NSKeyedArchiver seems to do) choose the shortest representation,
>
>
>>> 2. NSKeyedArchiver seems to be ok.
>>> But it does create unnecessary data. E.g. in the case of an array containing identical objects, like:
>>> NSArray *a = @[ @"a", @"a", ...., @"a"];
>>> With 1 000 000 items it creates 10,000,395 bytes - my version creates only 1 000 332 bytes
>>> and the output is still readable by NSKeyedUnarchiver.
>>
>> Are you sure this is happening? NSKeyedArchiver is documented as doing deduplication of objects. If this is true, it's definitely a bug and there is no reason Apple wouldn't want it fixed.
>
> Just try it yourself:
> #define NBR 1000000
> NSMutableArray *m = [ NSMutableArray array ];
> for ( NSUInteger i = 0; i < NBR; i++ ) [ m addObject: @"a" ];
> NSData *dataKeyed = [ NSKeyedArchiver archivedDataWithRootObject: m ];
> NSLog(@"%s NSKeyedArchiver created lu bytes ", __FUNCTION__, [dataKeyed length]);
> Then change NBR to 1000001 and compare.
>
Out of curiosity, what do you expect to happen if your string is @“ab” or something even longer, but repeated 1 million times? Your test implies that the answer is 2,000,000 but in fact the answer is that it only grows one more byte. The string is being de-duplicated but there is overhead associated with each object in the archive. The amount seems egregious for an object that is so small, (a string with one character), but real world archives are rarely 1-character strings repeated 1 million times. Could the overhead be improved? Probably, but there are many tradeoffs to make. For example, keyed archives are used for nib files, which are read far more often than they are written. So we’ve made a decision to prefer read performance over write performance, when we have to choose between them. Regardless, I will look at the bug you filed and see if there are some improvements that can be made for this case without unduly impacting other, more common use cases.
>>
>>> 3, NSKeyedUnarchiver has several bugs.
>>> The string $null is converted to nil.
>>> Very harmful if this string is part of a collection (like array, set or dictionary).
>>
>> It should have already been mangled by NSKeyedArchiver.
>
> Strings (other than keys) do NOT get mangled by NSKeyedArchiver.
>
>>
>>> If the key in: encodeXXX:forKey: starts with an "$" NSKeyedArchiver correctly mangles this by prefixing
>>> another "$". But NSKeyedUnarchiver does not find these mangled keys and returns nil or 0.
>>
>> You can, as a workaround, consider keys prefixed by $ as reserved, however this is certainly a bug. The fact that no one has reported it/gotten it fixed in so much time shows that it's probably not a major issue, though.
>>
>>> I have not reported these bugs, as I am convinced that Apple has no interest in fixing these problems.
>>
>> This is the exact attitude that causes Apple to be perceived as not having interest. Please file the bugs - the engineers reading this list can't give high priority to things that developers don't report, as much as they'd probably like to.
>
> I have filed the $null bug. Got back as duplicate with a very low id-number. Meaning: this bug is known to Apple since several years. Still no fix.
>
> Gerriet.
>
Thank you for your bug reports. Yes, we do get them and we do listen to them.
- Tony
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden