• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: characterAtIndex: method and composite characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: characterAtIndex: method and composite characters


  • Subject: Re: characterAtIndex: method and composite characters
  • From: Deborah Goldsmith <email@hidden>
  • Date: Fri, 6 Apr 2007 18:30:46 -0700

On Apr 4, 2007, at 9:42 AM, Douglas Davidson wrote:
On Apr 4, 2007, at 8:05 AM, Ewan Delanoy wrote:

-when an NSString or
NSAttributedString (let's call it s) appears on-screen as, say, "(a with
tilda)(other characters ...)" is
it guaranteed that [s characterAtIndex: 0] will be "a with tilda", and
not "a" (with "tilda" for a second
character) ?


 -If this is not the case, I need a more accurate version of
"characterAtIndex:". Is this already
built-in ?

Yes. The characterAtIndex: method should be avoided wherever possible; with Unicode strings, examining a single character usually is not sufficient. Instead, use methods like compare:options:range:, rangeOfString:options:range:, and rangeOfCharacterFromSet:options:range:, which will give you the Unicode-conformant operations you are looking for, with a wide variety of options.


If you need to extract substrings, be sure to use rangeOfComposedCharacterSequenceAtIndex: to make sure that you are not dividing a composed character sequence. If you wish to replace substrings in a mutable string, try replaceOccurrencesOfString:withString:options:range:.

NSString does have methods to precompose or decompose an entire string, but these methods are really useful only in special circumstances--for example, when you are dealing with existing code that for some reason requires one form or the other. Bear in mind that most combinations of base characters and combining marks do not have precomposed forms. In general, you are better off using the methods mentioned above for Unicode-conformant comparisons.

In addition to what Doug says, bear in mind that even precomposed Unicode cannot be accessed one "unichar" at a time. First, there may still be surrogate pairs (two consecutive UTF-16 code units used to represent characters beyond the first 16 bits of Unicode), and second, there are some characters that cannot be represented by a single Unicode code point, even in the canonical precomposed form of Unicode (NFC == Normalization Form C). This is because Unicode does not contain a precomposed version of the character in question.


Finally, even if there are no individual characters that require multiple unichar's, some languages have linguistic units consisting of multiple characters that shouldn't be broken apart.

Deborah Goldsmith
Internationalization, Unicode liaison
Apple Inc.
email@hidden


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: characterAtIndex: method and composite characters
      • From: "Ewan Delanoy" <email@hidden>
References: 
 >characterAtIndex: method and composite characters (From: "Ewan Delanoy" <email@hidden>)
 >Re: characterAtIndex: method and composite characters (From: Douglas Davidson <email@hidden>)

  • Prev by Date: Re: Quick question, to show up a panel window.
  • Next by Date: Re: Quick question, to show up a panel window.
  • Previous by thread: Re: characterAtIndex: method and composite characters (SOLVED)
  • Next by thread: Re: characterAtIndex: method and composite characters
  • Index(es):
    • Date
    • Thread