• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
NSString's handling of Unicode extension B (and C) characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

NSString's handling of Unicode extension B (and C) characters


  • Subject: NSString's handling of Unicode extension B (and C) characters
  • From: Ryan Homer <email@hidden>
  • Date: Thu, 5 Nov 2009 10:39:19 -0500

Unicode 3.1 (2001) brought us Extension B (AFAIK) and the recent Unicode 5.2 (2009-10-01) brings us Extension C. It seems to me that NSString's length method/property does not return the proper length for these characters.

Starting with a small example,

	NSString *s = @"\u7075\u247e5";
	NSLog(@"length=%d",s.length);

you'd think that the result would be 2. It is, however, 3. The first character is a Chinese character from the CJK Unified Ideographs range in the Han category. The second one is from the Han Extension B range. In my very limited testing, this only seems to occur for extension B & C characters, not ext. A. I'm wondering if this is a bug in the way NSString handles ext. B and C characters.

There are many characters that require more than one byte for their internal Unicode representation. However, NSString still counts a character as ONE character, regardless of the number of bytes. So, it was surprising for me to get a length of 3 in the above example.

Can someone provide any insight on this. I am thinking of filing a bug with Apple but would like to hear what other people think about this situation first as I'm not very well versed on the intricacies of Unicode.
_______________________________________________


Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Prev by Date: Re: Best pattern for similar objects with differences
  • Next by Date: Re: NSString's handling of Unicode extension B (and C) characters
  • Previous by thread: Re: Core Data, multiply values when inserted and show immedialty
  • Next by thread: Re: NSString's handling of Unicode extension B (and C) characters
  • Index(es):
    • Date
    • Thread