• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: characters in cocoa
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: characters in cocoa


  • Subject: Re: characters in cocoa
  • From: Douglas Davidson <email@hidden>
  • Date: Fri, 7 Sep 2007 09:22:52 -0700


On Sep 7, 2007, at 2:17 AM, Hans van der Meer wrote:

This is a question about the treatment of characters in Cocoa. Because my program will do a lot with individual characters, both as basic type as well as encapsulated in objects, I am anxious to get it right from the start. I feel I should refrain in Cocoa as much as possible from working with plain C's char's, especially in the light of the (unsigned char) - (signed char) dilemma. Is that correct?

The NSString class works with characters in Unicode, as is told in the "String Programming Guide for Cocoa". Its method characterAtIndex retrieves its contents as type unichar and this is typedef'ed in NSString.h as unsigned short. The NSNumber class then has methods like numberWithUnsignedShort but not more descriptive ones for unichar characters like numberWithUnichar; allthough otherwise the use of Unicode is so clearly present in NSString. Does someone know why this is missing?

In fact I would prefer handling character objects with something like NSCharacter (for example as a subclass of NSNumber), but there seems to be not one among the Foundation classes. I feel myself not up to the task of subclassing in a class cluster. Maybe someone has already made one as a subclass of NSNumber?

I would be obliged to hear from the experts what is considered the most appropriate way to handle characters in Cocoa programming. Thanks in advance.

We try to discourage developers from working at the level of individual characters wherever possible, primarily because in Unicode the individual character is usually not the appropriate level at which to operate. This is something that's difficult for those of us who were raised on char *'s to get used to, but it's important to get right. In Unicode the appropriate object on which to operate for most semantic purposes is (at least) a character cluster, such as a base character and its combining marks, or a block of Hangul jamo.


In Cocoa terms this is a range of characters in an NSString; suitable ranges can be obtained using such methods as rangeOfComposedCharacterSequenceAtIndex:. This will also cover the case of surrogate pairs that arises from NSString's use of UTF-16. NSString/CFString supply a great variety of methods/functions that operate on character ranges in a Unicode-conformant fashion: the rangeOfCharacterFromSet:... methods, the rangeOfString: methods, the compare:... methods, and so forth. They also provide a long list of Unicode operations, such as casing, normalization, and other transformations.

Even in apparently simple operations such as casing, the need for operating on more than a single character is apparent. For example, in German we have ß->SS on uppercasing, going from one character to two; when we get to Greek, the complications increase significantly, and there are many other examples from less prominent languages.

The basic recommendation for dealing with characters is to work with strings, and ranges in strings, and substrings, and as much as possible to use the NSString methods that deal with these; that lets the kit handle all of the difficult Unicode issues. For those who need to do their own low-level processing, and who are willing to handle Unicode complications themselves, we provide access to UTF-16 directly via characterAtIndex: et al., and to other representations with getBytes:... and related methods.

In practice, I have found that many operations for which I had expected to have to use individual character operations (probably due to habits of thought acquired in the days of char *'s) actually could be done fairly simply with a little thought and a suitable combination of rangeOfCharacterFromSet:..., rangeOfString:..., compare:..., and related methods.

Douglas Davidson_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >characters in cocoa (From: Hans van der Meer <email@hidden>)

  • Prev by Date: Send Flash FSCommand to Cocoa Application and Vice Versa
  • Next by Date: Blocking Selection in Columns of NSTableView
  • Previous by thread: Send Flash FSCommand to Cocoa Application and Vice Versa
  • Next by thread: Re: characters in cocoa
  • Index(es):
    • Date
    • Thread