Re: NSString accessing characters
Re: NSString accessing characters
- Subject: Re: NSString accessing characters
- From: Andreas Grosam <email@hidden>
- Date: Fri, 17 Jun 2011 13:19:03 +0200
On Jun 17, 2011, at 11:05 AM, Quincey Morris wrote:
> On Jun 17, 2011, at 00:46, Andreas Grosam wrote:
>
>> Given an NSString as input source, what is the fastest method to "feed" the parser?
>
> As usual, Ken's answer is better than the one I was composing, but I don't think *any* answer is of use to you unless you specify what representation your parser is converting *to*.
It does not convert at all. It can understand any Unicode encoding form, though.
Technically, the parser is a C++ template class with the iterator type and the encoding form as a template argument. Different treatment is solved through specializations, function overloading and argument dependent name-lookup.
>
> If it's something that NSString can encode to, then you're probably better off not using your parser.
The parser does not encode at all, it just iterates over the text. The purpose is parsing a JSON text. The result of the parser is a JSON container, which for Objective-C is implemented in standard foundation containers.
>
> If not, it might be fastest to first convert the string to whatever encoding your parser is itself fastest at parsing.
This will be probably UTF-32 or UTF-16. But it doesn't matter much. Compared to the costs of creating the json container, parsing is very fast. Converting a NSString's content to a different encoding before feeding the parser would be a noticeable cost. Even determining the length of a NSString in a certain encoding is already a cost, that must be considered - since it is O(n).
A conversion may also populate the autorelease pool, which is not quite desired.
>
> Or, if your parser is very fast for all the encodings it accepts, try using '[NSString fastestEncoding]' to see if it's one that your parser can handle.
I've thought about that as well. But the documentation does not say anything about whether the NSString's content has to be converted to that said encoding before I can actually access it in this "fastest" way. It says, fastestEncoding returns the encoding where character retrieval is the fastest and where there is no loss of information. It does not say anything about the costs of a possibly conversion.
Otherwise, I don't fully understand the method.
> I'd bet that for very large strings (the only ones you'd care about from a performance perspective, probably) 'fastestEncoding' is almost always UTF-8 or UTF-16, because such strings are statistically likely to have originated in a file in one of those two representations.
You are right about this. The parser's input almost always originates from a network connection. And here, the JSON text is most likely encoded in UTF-8.
However, if I have to deal with "very large input", I guess NSString is no option anyway, not only for performance reasons, but also from a memory foot-print perspective( IFF my current understanding of NSString is correct, namely that its content is stored in memory).
So, for possibly large input, I cannot simply save large amounts of text in memory, given we might use the parser on iPhones and similar devices. So, for large text, the content is provided as a stream from some network connection. The parser then shall also support this form of input, and additionally, it can parse while the input is still read, while it is avoided to waste memory for buffering the whole text or wasting CPU cycles for saving it to disk. The encoding will be determined at runtime at the very start, which is possible. Most likely, as you mentioned, it is UTF-8.
Regards
Andreas_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden