• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: endian problems with UTF16 on Intel Macs [SOLVED]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: endian problems with UTF16 on Intel Macs [SOLVED]


  • Subject: Re: endian problems with UTF16 on Intel Macs [SOLVED]
  • From: Donald Hall <email@hidden>
  • Date: Tue, 29 Aug 2006 23:14:03 -0600

Many thanks to Rick and Chris for their feedback. Here is what I ended up with:

- (NSData *)dataRepresentationOfType:(NSString *)aType
{
    [self updateString];

	CFStringBuiltInEncodings stringEncoding;
#if __BIG_ENDIAN__
	stringEncoding = kCFStringEncodingUnicode;
#elif __LITTLE_ENDIAN__
	stringEncoding = kCFStringEncodingUTF16BE;
#endif

CFDataRef myDataRef = CFStringCreateExternalRepresentation (kCFAllocatorDefault, (CFStringRef)string, stringEncoding,'?');
return (NSData *)myDataRef;
}


Since 'kCFStringEncodingUTF16BE' isn't available in Panther I needed the conditional compile - Panther is always big endian, so it can use what I originally had - 'kCFStringEncodingUnicode', and endian is not an issue on PPC Macs whether running Panther or Tiger. Now when my Cocoa program saves the Unicode text file it will be in the same format as my AppleScript application is using when adding to the file.

Don


On Tuesday, August 29, 2006, at 06:57AM, Chris Suter <email@hidden> wrote:



On 29/08/2006, at 9:42 PM, Ricky Sharp wrote:


On Tuesday, August 29, 2006, at 00:59AM, Chris Suter <email@hidden> wrote:



On 29/08/2006, at 3:47 PM, Donald Hall wrote:

 Furthermore, I understood that "external representation" was always
 big endian.

No. The representation is as dictated by the encoding. Some encodings don't have an endian aspect to them (UTF-8 for example). I'm guessing if you pick kCFStringEncodingUTF16, OS X is free to choose big-endian or little-endian.

Not quite. According to <http://www.unicode.org/faq/
utf_bom.html#36>, unmarked UTF-16 and UTF-32 uses big-endian by default. I would expect the Cocoa frameworks to honor that default.

But Cocoa can write the byte order mark, and as it turns out, I'm right.

#include <CoreFoundation/CoreFoundation.h>
#include <stdio.h>

int main ()
{
   CFDataRef data;

data = CFStringCreateExternalRepresentation (NULL, CFSTR ("test"), kCFStringEncodingUTF16, 0);

   int i;

   for (i = 0; i < CFDataGetLength (data); ++i)
     printf ("x ", CFDataGetBytePtr (data)[i]);

   putchar ('\n');

   return 0;
}

produces:

ff fe 74 00 65 00 73 00 74 00

on an Intel machine.

Here, the actual encoding being used is UTF-16LE. By placing the BOM in there, receivers of the data know how to interpret it. But in cases where 16-bit data is unmarked, you should interpret the bytes as being big-endian by default.


--
Rick Sharp
Instant Interactive(tm)


--
Donald S. Hall, Ph.D.
Apps & More Software Design, Inc.
email@hidden
http://www.appsandmore.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: endian problems with UTF16 on Intel Macs [SOLVED]
      • From: Chris Suter <email@hidden>
References: 
 >endian problems with UTF16 on Intel Macs (From: Donald Hall <email@hidden>)
 >Re: endian problems with UTF16 on Intel Macs (From: Chris Suter <email@hidden>)
 >Re: endian problems with UTF16 on Intel Macs (From: Ricky Sharp <email@hidden>)
 >Re: endian problems with UTF16 on Intel Macs (From: Chris Suter <email@hidden>)
 >Re: endian problems with UTF16 on Intel Macs (From: Ricky Sharp <email@hidden>)

  • Prev by Date: Re: Getting progress of HTTP POST as it's being UPLOADED
  • Next by Date: Re: endian problems with UTF16 on Intel Macs [SOLVED]
  • Previous by thread: Re: endian problems with UTF16 on Intel Macs
  • Next by thread: Re: endian problems with UTF16 on Intel Macs [SOLVED]
  • Index(es):
    • Date
    • Thread