• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: does NSTextField always use UTF8 encoding
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: does NSTextField always use UTF8 encoding


  • Subject: Re: does NSTextField always use UTF8 encoding
  • From: Alastair Houghton <email@hidden>
  • Date: Wed, 18 Jun 2008 17:38:09 +0100

On 18 Jun 2008, at 07:25, Andrew Farmer wrote:

NSStrings are encoding-independent. They represent strings, not sequences of bytes.

Not *entirely*. The docs are a little sloppy on this, unfortunately, both for Cocoa and Core Foundation; in both cases they talk about "Unicode characters" and suggest that these may be 16-bits in size. There was a point in the past where Unicode (as opposed to ISO10646, which later merged with it if I've got my history right) was indeed a 16-bit per "character" encoding, which is probably the reason the docs read the way they do, but it isn't really true today and so it's best not to think of it that way.


Perhaps more accurately, NSString is a sequence of UTF-16 code units, which is not the same thing at all (in fact, the word "character" is generally one to avoid because it's often unclear what you mean when you use it).

In particular, -characterAtIndex: can return either half of a surrogate pair (e.g. if you have a string containing a non-BMP code point like MUSICAL SYMBOL G CLEF U+1D11E, which is encoded D834 DD1E according to Character Palette, you might get 0xD834 or 0xDD1E, but you won't ever get 0x1D11E). Nor is that the only trap for the unwary; you can also get various types of Unicode control codes as well as several kinds of combining characters (though the most common group is probably accents).

The String Programming Guide does warn about this to some extent:

"If you need to access string objects character-by-character, you must understand the Unicode character encoding—specifically, issues related to composed character sequences."

Anyway, this is often not a big deal, but in some applications it can be so it's worth bearing in mind.

Kind regards,

Alastair.

--
http://alastairs-place.net

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >does NSTextField always use UTF8 encoding (From: "Wayne Shao" <email@hidden>)
 >Re: does NSTextField always use UTF8 encoding (From: Andrew Farmer <email@hidden>)

  • Prev by Date: Re: Send Key Event to Focused Window
  • Next by Date: Re: Getting all subclasses
  • Previous by thread: Re: does NSTextField always use UTF8 encoding
  • Next by thread: Send Key Event to Focused Window
  • Index(es):
    • Date
    • Thread