Re: NSString accessing characters
Re: NSString accessing characters
- Subject: Re: NSString accessing characters
- From: Jens Alfke <email@hidden>
- Date: Fri, 17 Jun 2011 08:46:41 -0700
On Jun 17, 2011, at 3:35 AM, Andreas Grosam wrote:
> If possible, I would prefer to avoid any conversions performed by NSString as a result of accessing the content in any way. The parser is capable to parse any Unicode encoding form, so if possible, I just would take the NSString's content "as is" - if it is encoded in a Unicode form, and - of course - if I am able to figure out what actual encoding this is.
An NSString internally uses one of two encodings. One is UTF-16, the other is the process’s “default encoding”. The default encoding varies by locale, I think, but is generally the highly obsolete MacRoman*. Yes, this means that NSString never stores its contents in UTF-8. :(
[Disclaimer: This may have changed since circa 2006, the last time I looked at CFString’s internals.]
Given your requirements, I think you’re best off reading the string as UTF-16. Probably the fastest ways to go is CFStringGetCharacterFromInlineBuffer, which is an optimized iterator that’s inlined in order to get characters out of the string in chunks.
> Given the parser's speed, any encoding conversion made by NSString is a significant performance penalty.
I believe you. Back when I studied compilers in college, I was told that something like half the parse time is spent in the lexer, simply because it operates on every character rather than every token.
> The parser does not encode at all, it just iterates over the text. The purpose is parsing a JSON text. The result of the parser is a JSON container, which for Objective-C is implemented in standard foundation containers.
You may be reinventing the wheel: there are several different JSON parsers for Cocoa already. The one I use is SBJson <http://stig.github.com/json-framework/>. If you already know about these and think yours is better, I’d love to know about it (and maybe help out)!
—Jens
* An 8-bit encoding based on ASCII but not at all the same as ISO-8859 (I think it predates it.)
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden