Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: characterAtIndex: method and composite characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: characterAtIndex: method and composite characters

Subject: Re: characterAtIndex: method and composite characters
From: Deborah Goldsmith <email@hidden>
Date: Fri, 6 Apr 2007 18:30:46 -0700

On Apr 4, 2007, at 9:42 AM, Douglas Davidson wrote:

On Apr 4, 2007, at 8:05 AM, Ewan Delanoy wrote:
-when an NSString or NSAttributedString (let's call it s) appears on-screen as, say, "(a with tilda)(other characters ...)" is it guaranteed that [s characterAtIndex: 0] will be "a with tilda", and not "a" (with "tilda" for a second character) ?
 -If this is not the case, I need a more accurate version of
"characterAtIndex:". Is this already
built-in ?
Yes. The characterAtIndex: method should be avoided wherever possible; with Unicode strings, examining a single character usually is not sufficient. Instead, use methods like compare:options:range:, rangeOfString:options:range:, and rangeOfCharacterFromSet:options:range:, which will give you the Unicode-conformant operations you are looking for, with a wide variety of options.

If you need to extract substrings, be sure to use rangeOfComposedCharacterSequenceAtIndex: to make sure that you are not dividing a composed character sequence. If you wish to replace substrings in a mutable string, try replaceOccurrencesOfString:withString:options:range:.

NSString does have methods to precompose or decompose an entire string, but these methods are really useful only in special circumstances--for example, when you are dealing with existing code that for some reason requires one form or the other. Bear in mind that most combinations of base characters and combining marks do not have precomposed forms. In general, you are better off using the methods mentioned above for Unicode-conformant comparisons.

In addition to what Doug says, bear in mind that even precomposed Unicode cannot be accessed one "unichar" at a time. First, there may still be surrogate pairs (two consecutive UTF-16 code units used to represent characters beyond the first 16 bits of Unicode), and second, there are some characters that cannot be represented by a single Unicode code point, even in the canonical precomposed form of Unicode (NFC == Normalization Form C). This is because Unicode does not contain a precomposed version of the character in question.

Finally, even if there are no individual characters that require multiple unichar's, some languages have linguistic units consisting of multiple characters that shouldn't be broken apart.

Deborah Goldsmith
Internationalization, Unicode liaison
Apple Inc.
email@hidden


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: characterAtIndex: method and composite characters
From: "Ewan Delanoy" <email@hidden>


References:  
  >characterAtIndex: method and composite characters (From: "Ewan Delanoy" <email@hidden>)
  >Re: characterAtIndex: method and composite characters (From: Douglas Davidson <email@hidden>)




Prev by Date:
Re: Quick question, to show up a panel window.

Next by Date:
Re: Quick question, to show up a panel window.

Previous by thread:
Re: characterAtIndex: method and composite characters (SOLVED)

Next by thread:
Re: characterAtIndex: method and composite characters

Index(es):

Date
Thread