Re: How to get array of characters from NSString
Re: How to get array of characters from NSString
- Subject: Re: How to get array of characters from NSString
- From: Deborah Goldsmith <email@hidden>
- Date: Tue, 12 Aug 2008 09:13:23 -0700
On Aug 12, 2008, at 8:41 AM, Kyle Sluder wrote:
On Mon, Aug 11, 2008 at 9:30 PM, Deborah Goldsmith
<email@hidden> wrote:
Anyone who is considering writing code that looks through the
contents of an
NSString (as opposed to just treating the whole string as a unit)
needs to
learn the basics of processing Unicode.
Joel Spolsky has a great primer on just how deep the Unicode rabbit
hole goes, entitled "The Absolute Minimum Every Software Developer
Absolutely, Positively Must Know About Unicode and Character Sets (No
Excuses!)":
http://www.joelonsoftware.com/articles/Unicode.html
That article is missing several concepts which are essential for
understanding Unicode; like many programmers, Mr. Spolsky thinks of
Unicode as "wide ASCII", which it is not. The article doesn't cover
surrogate pairs (the fact that he uses the term UCS-2 instead of
UTF-16 shows he's not up to date) or combining sequences (grapheme
clusters). If you're going to go groveling through Unicode text, you
need to understand both.
This article is a bit stuffy, but also more complete, and is even
shorter (I think):
http://unicode.org/standard/principles.html
This is also good:
http://icu-project.org/userguide/unicodeBasics.html
Also, Unicode does not, and likely never will, contain the Klingon
script. While there was a proposal to encode it, it was rejected due
to the fact that the Klingon user community (yes, it exists: http://www.amazon.com/Klingon-Hamlet-Lawrence-Schoen/dp/0671035789/)
does not use the script: they write Klingon using ASCII (e.g.,
"tlhIngan Hol"). Things don't get encoded in Unicode unless there is
actually a user community.
That doesn't mean that fictional scripts are prohibited. There are
proposals to encode Tengwar and Cirth, for example, as these have
(small) user communities. :-)
http://std.dkuug.dk/JTC1/SC2/WG2/docs/n1641/n1641.htm
http://std.dkuug.dk/JTC1/SC2/WG2/docs/n1641/n1641.htm
They've been languishing since 1997 due to more pressing work for the
Unicode Technical Committee, so I wouldn't plan on writing Quenya or
Sindarin in Unicode any time soon...
Deborah Goldsmith
Apple Inc.
email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden