How to convert a UTF-8 byte offset into an NSString character offset?
How to convert a UTF-8 byte offset into an NSString character offset?
- Subject: How to convert a UTF-8 byte offset into an NSString character offset?
- From: Jens Alfke <email@hidden>
- Date: Mon, 05 May 2014 12:06:14 -0700
How can I map a byte offset in a UTF-8 string back to the corresponding character offset in the NSString it came from?
I’m writing an Objective-C wrapper around a C text-tokenizer API that takes a UTF-8 string as input, and as part of its output returns byte ranges of words that it found. So my API takes an NSString, converts it to UTF-8, passes that to the C API, and then gets these byte offsets that it needs to convert into character offsets in the NSString.
I’ve looked through both the NSString and CFString APIs and didn’t see anything relevant to this. I know UTF-8 isn’t rocket science and I could pretty easily write my own function to scan through it counting characters, but I suspect I’d run into the differences between Unicode characters and the UTF-16 code points that NSString actually considers “characters”. I’d much rather let CF do this for me in an internally-consistent way.
—Jens
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden