Re: How to count composed characters in NSString?
Re: How to count composed characters in NSString?
- Subject: Re: How to count composed characters in NSString?
- From: "Michael Ash" <email@hidden>
- Date: Mon, 29 Sep 2008 23:33:36 -0400
On Mon, Sep 29, 2008 at 12:52 AM, Michael Gardner <email@hidden> wrote:
> But composed character sequences aren't the problem; surrogate pairs are.
> Composed character sequences can be taken care of by using either
> -precomposedStringWithCanonicalMapping or
> -precomposedStringWithCompatibilityMapping. In my opinion, -length should
> take surrogate pairs into account, which is what the docs seem to imply.
The NSString API is inherently either UCS-2 or UTF-16. As UCS-2
doesn't cover all of Unicode, it ends up being UTF-16.
The API defines NSString as an ordered collection of 16-bit unichars.
The length is necessarily the number of 16-bit unichars in the string,
nothing else would really make sense. Short of creating a new API that
works on pure Unicode code points, the only thing to do is to document
the fact that -length gives you the number of UTF-16 code units, not
the number of Unicode characters.
(As an aside, changing the API to work with Unicode code points is
something I don't think is really worthwhile. Aside from having to
support the old API which would no doubt be a great deal of hassle,
Unicode code points are pretty useless on their own anyway. You always
end up having to convert and deal with precomposed characters an all
the rest of the Unicode mess regardless. Adding surrogate pairs to all
of that really doesn't increase the burden any further.)
Mike
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden