• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: characters in cocoa
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: characters in cocoa


  • Subject: Re: characters in cocoa
  • From: "Gerriet M. Denkmann" <email@hidden>
  • Date: Mon, 10 Sep 2007 13:38:01 +0200


On 7 Sep 2007, at 21:02, email@hidden wrote:


I would be obliged to hear from the experts what is considered the most appropriate way to handle characters in Cocoa programming. Thanks in advance.

We try to discourage developers from working at the level of individual characters wherever possible, primarily because in Unicode the individual character is usually not the appropriate level at which to operate. This is something that's difficult for those of us who were raised on char *'s to get used to, but it's important to get right. In Unicode the appropriate object on which to operate for most semantic purposes is (at least) a character cluster, such as a base character and its combining marks, or a block of Hangul jamo.

In Cocoa terms this is a range of characters in an NSString; suitable
ranges can be obtained using such methods as
rangeOfComposedCharacterSequenceAtIndex:.  This will also cover the
case of surrogate pairs that arises from NSString's use of UTF-16.
NSString/CFString supply a great variety of methods/functions that
operate on character ranges in a Unicode-conformant fashion:  the
rangeOfCharacterFromSet:... methods, the rangeOfString: methods, the
compare:... methods, and so forth.  They also provide a long list of
Unicode operations, such as casing, normalization, and other
transformations.

Even in apparently simple operations such as casing, the need for
operating on more than a single character is apparent.  For example,
in German we have ß->SS on uppercasing, going from one character to
two; when we get to Greek, the complications increase significantly,
and there are many other examples from less prominent languages.

The basic recommendation for dealing with characters is to work with
strings, and ranges in strings, and substrings, and as much as
possible to use the NSString methods that deal with these; that lets
the kit handle all of the difficult Unicode issues.  For those who
need to do their own low-level processing, and who are willing to
handle Unicode complications themselves, we provide access to UTF-16
directly via characterAtIndex: et al., and to other representations
with getBytes:... and related methods.

This is an excellent summary.

One might add that -[NSString length], which the documentation says "Returns the number of Unicode characters in the receiver." does nothing like this, but returns the number of shorts used with NSUnicodeStringEncoding (aka Utf-16).
For example: [[NSString stringWithUTF8String: "𐐀" ] length] = 2 (if someone cannot handle Unicode (like the mail digest software at Apple) : this is a DESERET CAPITAL LETTER LONG I) - although the string clearly contains one character.


And one should also note that "characterAtIndex:" does not do what the name indicates, but returns the short at the index in utf-16.

getCharacters: "Returns by reference the characters from the receiver." - the documentation really should mention in which encoding these characters will be copied.

Maybe the documentation could be slightly improved: it is confusing if it says "character" when it means "unsigned short int in a specific (but unspecified) encoding".

Kind regards,

Gerriet.

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: characters in cocoa
      • From: Uli Kusterer <email@hidden>
  • Prev by Date: Re: RS: Reply about NSObject members class, isMemberOfClass, isKindOfClass
  • Next by Date: Re: RS: Reply about NSObject members class, isMemberOfClass, isKindOfClass
  • Previous by thread: Re: characters in cocoa
  • Next by thread: Re: characters in cocoa
  • Index(es):
    • Date
    • Thread