Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get array of characters from NSString



On Aug 12, 2008, at 8:41 AM, Kyle Sluder wrote:
On Mon, Aug 11, 2008 at 9:30 PM, Deborah Goldsmith <email@hidden> wrote:
Anyone who is considering writing code that looks through the contents of an
NSString (as opposed to just treating the whole string as a unit) needs to
learn the basics of processing Unicode.

Joel Spolsky has a great primer on just how deep the Unicode rabbit hole goes, entitled "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)":

http://www.joelonsoftware.com/articles/Unicode.html

That article is missing several concepts which are essential for understanding Unicode; like many programmers, Mr. Spolsky thinks of Unicode as "wide ASCII", which it is not. The article doesn't cover surrogate pairs (the fact that he uses the term UCS-2 instead of UTF-16 shows he's not up to date) or combining sequences (grapheme clusters). If you're going to go groveling through Unicode text, you need to understand both.


This article is a bit stuffy, but also more complete, and is even shorter (I think):

http://unicode.org/standard/principles.html

This is also good:

http://icu-project.org/userguide/unicodeBasics.html

Also, Unicode does not, and likely never will, contain the Klingon script. While there was a proposal to encode it, it was rejected due to the fact that the Klingon user community (yes, it exists: http://www.amazon.com/Klingon-Hamlet-Lawrence-Schoen/dp/0671035789/) does not use the script: they write Klingon using ASCII (e.g., "tlhIngan Hol"). Things don't get encoded in Unicode unless there is actually a user community.

That doesn't mean that fictional scripts are prohibited. There are proposals to encode Tengwar and Cirth, for example, as these have (small) user communities. :-)

http://std.dkuug.dk/JTC1/SC2/WG2/docs/n1641/n1641.htm
http://std.dkuug.dk/JTC1/SC2/WG2/docs/n1641/n1641.htm

They've been languishing since 1997 due to more pressing work for the Unicode Technical Committee, so I wouldn't plan on writing Quenya or Sindarin in Unicode any time soon...

Deborah Goldsmith
Apple Inc.
email@hidden

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/email@hidden

This email sent to email@hidden
References: 
 >How to get array of characters from NSString (From: "SridharRao M" <email@hidden>)
 >Re: How to get array of characters from NSString (From: Phil Faber <email@hidden>)
 >Re: How to get array of characters from NSString (From: Deborah Goldsmith <email@hidden>)
 >Re: How to get array of characters from NSString (From: "Kyle Sluder" <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.