Re: characters in cocoa
Re: characters in cocoa
- Subject: Re: characters in cocoa
- From: Douglas Davidson <email@hidden>
- Date: Fri, 7 Sep 2007 09:22:52 -0700
On Sep 7, 2007, at 2:17 AM, Hans van der Meer wrote:
This is a question about the treatment of characters in Cocoa.
Because my program will do a lot with individual characters, both
as basic type as well as encapsulated in objects, I am anxious to
get it right from the start. I feel I should refrain in Cocoa as
much as possible from working with plain C's char's, especially in
the light of the (unsigned char) - (signed char) dilemma. Is that
correct?
The NSString class works with characters in Unicode, as is told in
the "String Programming Guide for Cocoa". Its method
characterAtIndex retrieves its contents as type unichar and this is
typedef'ed in NSString.h as unsigned short. The NSNumber class then
has methods like numberWithUnsignedShort but not more descriptive
ones for unichar characters like numberWithUnichar; allthough
otherwise the use of Unicode is so clearly present in NSString.
Does someone know why this is missing?
In fact I would prefer handling character objects with something
like NSCharacter (for example as a subclass of NSNumber), but there
seems to be not one among the Foundation classes. I feel myself not
up to the task of subclassing in a class cluster. Maybe someone has
already made one as a subclass of NSNumber?
I would be obliged to hear from the experts what is considered the
most appropriate way to handle characters in Cocoa programming.
Thanks in advance.
We try to discourage developers from working at the level of
individual characters wherever possible, primarily because in Unicode
the individual character is usually not the appropriate level at
which to operate. This is something that's difficult for those of us
who were raised on char *'s to get used to, but it's important to get
right. In Unicode the appropriate object on which to operate for
most semantic purposes is (at least) a character cluster, such as a
base character and its combining marks, or a block of Hangul jamo.
In Cocoa terms this is a range of characters in an NSString; suitable
ranges can be obtained using such methods as
rangeOfComposedCharacterSequenceAtIndex:. This will also cover the
case of surrogate pairs that arises from NSString's use of UTF-16.
NSString/CFString supply a great variety of methods/functions that
operate on character ranges in a Unicode-conformant fashion: the
rangeOfCharacterFromSet:... methods, the rangeOfString: methods, the
compare:... methods, and so forth. They also provide a long list of
Unicode operations, such as casing, normalization, and other
transformations.
Even in apparently simple operations such as casing, the need for
operating on more than a single character is apparent. For example,
in German we have ß->SS on uppercasing, going from one character to
two; when we get to Greek, the complications increase significantly,
and there are many other examples from less prominent languages.
The basic recommendation for dealing with characters is to work with
strings, and ranges in strings, and substrings, and as much as
possible to use the NSString methods that deal with these; that lets
the kit handle all of the difficult Unicode issues. For those who
need to do their own low-level processing, and who are willing to
handle Unicode complications themselves, we provide access to UTF-16
directly via characterAtIndex: et al., and to other representations
with getBytes:... and related methods.
In practice, I have found that many operations for which I had
expected to have to use individual character operations (probably due
to habits of thought acquired in the days of char *'s) actually could
be done fairly simply with a little thought and a suitable
combination of rangeOfCharacterFromSet:..., rangeOfString:...,
compare:..., and related methods.
Douglas Davidson_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden