Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: characters in cocoa

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: characters in cocoa

Subject: Re: characters in cocoa
From: "Gerriet M. Denkmann" <email@hidden>
Date: Mon, 10 Sep 2007 13:38:01 +0200


On 7 Sep 2007, at 21:02, email@hidden wrote:


I would be obliged to hear from the experts what is considered the
most appropriate way to handle characters in Cocoa programming.
Thanks in advance.


We try to discourage developers from working at the level of
individual characters wherever possible, primarily because in Unicode
the individual character is usually not the appropriate level at
which to operate.  This is something that's difficult for those of us
who were raised on char *'s to get used to, but it's important to get
right.  In Unicode the appropriate object on which to operate for
most semantic purposes is (at least) a character cluster, such as a
base character and its combining marks, or a block of Hangul jamo.

In Cocoa terms this is a range of characters in an NSString; suitable
ranges can be obtained using such methods as
rangeOfComposedCharacterSequenceAtIndex:.  This will also cover the
case of surrogate pairs that arises from NSString's use of UTF-16.
NSString/CFString supply a great variety of methods/functions that
operate on character ranges in a Unicode-conformant fashion:  the
rangeOfCharacterFromSet:... methods, the rangeOfString: methods, the
compare:... methods, and so forth.  They also provide a long list of
Unicode operations, such as casing, normalization, and other
transformations.

Even in apparently simple operations such as casing, the need for
operating on more than a single character is apparent.  For example,
in German we have ß->SS on uppercasing, going from one character to
two; when we get to Greek, the complications increase significantly,
and there are many other examples from less prominent languages.

The basic recommendation for dealing with characters is to work with
strings, and ranges in strings, and substrings, and as much as
possible to use the NSString methods that deal with these; that lets
the kit handle all of the difficult Unicode issues.  For those who
need to do their own low-level processing, and who are willing to
handle Unicode complications themselves, we provide access to UTF-16
directly via characterAtIndex: et al., and to other representations
with getBytes:... and related methods.


This is an excellent summary.

One might add that -[NSString length], which the documentation says "Returns the number of Unicode characters in the receiver." does nothing like this, but returns the number of shorts used with NSUnicodeStringEncoding (aka Utf-16). For example: [[NSString stringWithUTF8String: "𐐀" ] length] = 2 (if someone cannot handle Unicode (like the mail digest software at Apple) : this is a DESERET CAPITAL LETTER LONG I) - although the string clearly contains one character.

And one should also note that "characterAtIndex:" does not do what the name indicates, but returns the short at the index in utf-16.

getCharacters: "Returns by reference the characters from the receiver." - the documentation really should mention in which encoding these characters will be copied.

Maybe the documentation could be slightly improved: it is confusing if it says "character" when it means "unsigned short int in a specific (but unspecified) encoding".

Kind regards,

Gerriet.

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: characters in cocoa
From: Uli Kusterer <email@hidden>


Prev by Date:
Re: RS: Reply about NSObject members class, isMemberOfClass,	isKindOfClass

Next by Date:
Re: RS: Reply about NSObject members class, isMemberOfClass,	isKindOfClass

Previous by thread:
Re: characters in cocoa

Next by thread:
Re: characters in cocoa

Index(es):

Date
Thread