• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: characters in cocoa
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: characters in cocoa


  • Subject: Re: characters in cocoa
  • From: Douglas Davidson <email@hidden>
  • Date: Mon, 10 Sep 2007 10:19:00 -0700


On Sep 10, 2007, at 8:21 AM, Clark Cox wrote:

Ah, but UTF-16 code units are not characters; the term "UTF-16
character" is meaningless. For the BMP, there *is* a one-to-one
correspondence between UTF-16 code units and Unicode code points, but
this is not true in the general case. Outside of the BMP, it takes two
UTF-16 code units to represent a single Unicode code point.

We have this terminology problem for historical reasons; characterAtIndex: antedates the introduction of surrogate pairs. Whatever the terminology, NSStrings are conceptually UTF-16, and the - length et al. methods reflect that. (This is a common practice in other frameworks as well.)


Fortunately, as I mentioned, most developers should not have to worry about this. If you work with ranges and substrings rather than with individual characters, and use the NSString methods that deal with ranges, they should automatically handle not only most issues with surrogate pairs, but also the more common cases of combining characters, Hangul, etc.

Chapter 2 of the Unicode 5 book has a very good discussion of "text elements", which explains in great detail why it is that the elements that are important for most text processes are in general sequences of characters rather than single characters. Single characters are important for the fundamental definitional purposes of the standard, but in practice what one wishes to deal with for text processing is a sequence of characters constituting a cluster or larger unit.

Douglas Davidson

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >Re: characters in cocoa (From: "Gerriet M. Denkmann" <email@hidden>)
 >Re: characters in cocoa (From: Uli Kusterer <email@hidden>)
 >Re: characters in cocoa (From: "Clark Cox" <email@hidden>)

  • Prev by Date: Re: confusing events
  • Next by Date: Re: NSPortCoder timed out, causes crash
  • Previous by thread: Re: characters in cocoa
  • Next by thread: RS: NSObject members class and isMemberOfClass
  • Index(es):
    • Date
    • Thread