Re: NSString accessing characters
Re: NSString accessing characters
- Subject: Re: NSString accessing characters
- From: Andreas Grosam <email@hidden>
- Date: Fri, 17 Jun 2011 12:35:24 +0200
Thank you Ken, for your valuable tips,
On Jun 17, 2011, at 10:40 AM, Ken Thomases wrote:
> On Jun 17, 2011, at 2:46 AM, Andreas Grosam wrote:
>
>> Given an NSString as input source, what is the fastest method to "feed" the parser?
>>
>> Also worth mentioning is the possible fact about hidden autoreleased memory objects, for instance when retrieving c-strings from the NSString object or when converting the NSString's internal encoding to some specified external form.
>
> First, you're probably prematurely optimizing. The chance that accessing the string contents will be a significant portion of the time taken by parsing is small.
If possible, I would prefer to avoid any conversions performed by NSString as a result of accessing the content in any way. The parser is capable to parse any Unicode encoding form, so if possible, I just would take the NSString's content "as is" - if it is encoded in a Unicode form, and - of course - if I am able to figure out what actual encoding this is.
Given the parser's speed, any encoding conversion made by NSString is a significant performance penalty.
A priory I cannot determine what UTF encoding form the original source is encoded to, but in almost all cases it is UTF-8.
>
> That said, I guess you should try CFStringGetCharactersPtr() and CFStringGetCStringPtr() first. If either returns non-NULL, then that's about as fast as you're going to get.
Aha! I read the docs and that's probably what I need.
If I understood the functions correctly, CFStringGetCharactersPtr() returns non-NULL if the internal representation is UTF-16 (machine endianness, I guess). If CFStringGetCStringPtr(theString, kCFStringEncodingUTF8) returns non-NULL the internal representation equals UTF-8.
Now, there is the possibility that the original source, where the NString has been initialized, is provided as UTF-32, but this is rather unlikely. In that case, I guess, NSString will perform a conversion internally to either UTF-16 or else. Possibly in this case, I need to convert to some UTF encoding form before feeding the parser.
> After that, maybe use CFStringInitInlineBuffer() and CFStringGetCharacterFromInlineBuffer(), although that doesn't fit with your iterator interface. You could wrap an iterator around them easily, though.
Yes, I can wrap any kind of iterators around the pointers. The parser just requires the semantics of an Input Iterator. In fact, internally the parser applies an iterator adapter anyway to support byte-swapping, if needed.
(The iterator interface exists due to support streams. But I haven't thought about using NSStreams so far - hopefully this will work as well).
I don't understand the purpose of the "inline buffer" facility fully, though. Is accessing a NSString's content through an inline buffer faster than accessing the content of a raw pointer whose content has a known encoding?
> After that, it probably doesn't matter much. You'll be doing some combination of allocation and encoding conversion no matter what. So, go with the most convenient method, which will probably be back in Objective-C.
>
OK, thank you very much for the tips!
Andreas
> Regards,
> Ken
>
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden