• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed)


  • Subject: Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed)
  • From: Dietrich Epp <email@hidden>
  • Date: Mon, 6 Jan 2003 23:18:04 -0800

On Monday, January 6, 2003, at 06:29 , Andrew Thompson wrote:

On Monday, Jan 6, 2003, at 15:55 America/New_York, Aki Inoue wrote:

Douglas Davidson

We use UTF-16, so you can just use surrogate pairs. NSLayoutManager et
al. understand them. I'm not sure if we have anything public for
bulk-converting 32-bit data, though.


As I understand it that would require I have a Big Honking Table (tm) of composed char <-> surrogate pair mappings, if I want to be able to handle the general case and not just a few characters of interest. Is such a thing readily available?

eg,

is the mapping a mathematical function?
or is it a big lookup table
or does it require lots of context and domain knowledge about the script in question?

If it is a table, is it available in machine readable form anywhere?

Why would it require a table?

Surrogates are covered in section 3.7 of the Unicode standard. Given a character, C, which is at least $10000, the surrogate pair is:
(C - $10000)/$400 + $D800, (C - $10000)%$400 + $DC00

This cannot encode very many 32-bit numbers as surrogates only encode up to 20 bits. But it should be enough, and characters in such a high range aren't actually assigned (except some for private use). Surrogates are illegal unpaired, in UTF-8, or UTF-32.


Aki Inoue

As Doug mentioned here, you can access the whole 32 bit Unicode space via surrogates.
And we have no plan to change the fundamental concept that NSString/CFString is a wrapper for UTF-16 character array.

We're considering to add direct mapping between UTF-32 bytes to NSString/CFString in the future.

That would be great. Adding something like kCFStringEncodingUTF32 would allow the creation of NSString and CFStrings from UTF32Char[] data very easily.

I'd also like to see something targeted at the character level though.

Maybe on NSString

+ (unichar *) decompose (const UTF32Char *) bytes; -> convert 32 bit characters to surrogate pairs where needed
+ (UTF32Char *) compose (const unichar *) bytes -> convert UTF16/unichar data to composed characters where needed

Would it help if I enter something in the Bug Reporter?

At this point I am exceedingly curious regarding your reasons for using 32-bit encodings.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.
  • Follow-Ups:
    • Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed)
      • From: Andrew Thompson <email@hidden>
References: 
 >Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed) (From: Andrew Thompson <email@hidden>)

  • Prev by Date: Re: Use of alloc in class (factory) methods
  • Next by Date: Making framework accessible to plug-ins?
  • Previous by thread: Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed)
  • Next by thread: Re: Working with 32-bit Unicode (NSString stringWithUTF32String: (const UTF32Char *) bytes needed)
  • Index(es):
    • Date
    • Thread