Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed)
Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed)
- Subject: Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed)
- From: Dietrich Epp <email@hidden>
- Date: Mon, 6 Jan 2003 23:18:04 -0800
On Monday, January 6, 2003, at 06:29 , Andrew Thompson wrote:
On Monday, Jan 6, 2003, at 15:55 America/New_York, Aki Inoue wrote:
Douglas Davidson
We use UTF-16, so you can just use surrogate pairs. NSLayoutManager
et
al. understand them. I'm not sure if we have anything public for
bulk-converting 32-bit data, though.
As I understand it that would require I have a Big Honking Table (tm)
of composed char <-> surrogate pair mappings, if I want to be able to
handle the general case and not just a few characters of interest. Is
such a thing readily available?
eg,
is the mapping a mathematical function?
or is it a big lookup table
or does it require lots of context and domain knowledge about the
script in question?
If it is a table, is it available in machine readable form anywhere?
Why would it require a table?
Surrogates are covered in section 3.7 of the Unicode standard. Given a
character, C, which is at least $10000, the surrogate pair is:
(C - $10000)/$400 + $D800, (C - $10000)%$400 + $DC00
This cannot encode very many 32-bit numbers as surrogates only encode up
to 20 bits. But it should be enough, and characters in such a high
range aren't actually assigned (except some for private use).
Surrogates are illegal unpaired, in UTF-8, or UTF-32.
Aki Inoue
As Doug mentioned here, you can access the whole 32 bit Unicode space
via surrogates.
And we have no plan to change the fundamental concept that
NSString/CFString is a wrapper for UTF-16 character array.
We're considering to add direct mapping between UTF-32 bytes to
NSString/CFString in the future.
That would be great. Adding something like kCFStringEncodingUTF32 would
allow the creation of NSString and CFStrings from UTF32Char[] data very
easily.
I'd also like to see something targeted at the character level though.
Maybe on NSString
+ (unichar *) decompose (const UTF32Char *) bytes; -> convert 32
bit characters to surrogate pairs where needed
+ (UTF32Char *) compose (const unichar *) bytes -> convert
UTF16/unichar data to composed characters where needed
Would it help if I enter something in the Bug Reporter?
At this point I am exceedingly curious regarding your reasons for using
32-bit encodings.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.