Re: How to count composed characters in NSString?
Re: How to count composed characters in NSString?
- Subject: Re: How to count composed characters in NSString?
- From: David Niemeijer <email@hidden>
- Date: Sun, 28 Sep 2008 20:17:26 +0200
Michael,
On 28 sep 2008, at 14:41, Michael Gardner wrote:
Upon further investigation, I may be wrong. I based my assertion
upon Apple's NSString documentation ("Returns the number of Unicode
characters in the receiver"), and upon some quick tests I ran. But
this reply made me look into the issue in greater depth.
I re-did my tests more throughly, and it does appear that -length
returns the number of 16-bit words (code units), not the number of
Unicode characters (code points), in the string. If this is true, I
would call it a bug either in the code or in the documentation,
which David should submit to Apple.
i think the docs are clear. In the discussion section for "length" it
says: "The number returned includes the individual characters of
composed character sequences, so you cannot use this method to
determine if a string will be visible when printed or how long it will
appear."
I did file a bug (ID 6253075) as you suggested, because I think there
should be a simple API for this.
I apologize for the apparent misinformation in my previous, hasty
reply.
Well, I mad an error too. i suggested that on 10.5 the
CFStringTokenizer could be used, but only now noticed that it only
supports larger units (words and up). Thus there is no easy API to
count the number of characters in a way that surrogate pairs or other
"long" unicode characters are treated as a single character.
In the meanwhile, David, perhaps you can find a library that can
work with UTF-8 strings. What are you using the length values for?
I need to be able to display the number of characters to the user in a
way that makes sense to them. If they see 3 I should report 3. I also
need it to cut-off certain input to the number of "real" characters
and should not generate results that only make sense for a language
like English where each 16 bits equals a single character.
Using some kind of UTF-8 library may be possible, but that would
require converting all the time between UTF-16 and UTF-8, which is not
efficient for a program that has to do a lot of these kind of
calculations.
david.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden