Re: endian problems with UTF16 on Intel Macs [SOLVED]
Re: endian problems with UTF16 on Intel Macs [SOLVED]
- Subject: Re: endian problems with UTF16 on Intel Macs [SOLVED]
- From: Donald Hall <email@hidden>
- Date: Tue, 29 Aug 2006 23:14:03 -0600
Many thanks to Rick and Chris for their feedback. Here is what I ended up with:
- (NSData *)dataRepresentationOfType:(NSString *)aType
{
[self updateString];
CFStringBuiltInEncodings stringEncoding;
#if __BIG_ENDIAN__
stringEncoding = kCFStringEncodingUnicode;
#elif __LITTLE_ENDIAN__
stringEncoding = kCFStringEncodingUTF16BE;
#endif
CFDataRef myDataRef = CFStringCreateExternalRepresentation
(kCFAllocatorDefault, (CFStringRef)string, stringEncoding,'?');
return (NSData *)myDataRef;
}
Since 'kCFStringEncodingUTF16BE' isn't available in Panther I needed
the conditional compile - Panther is always big endian, so it can use
what I originally had - 'kCFStringEncodingUnicode', and endian is not
an issue on PPC Macs whether running Panther or Tiger. Now when my
Cocoa program saves the Unicode text file it will be in the same
format as my AppleScript application is using when adding to the file.
Don
On Tuesday, August 29, 2006, at 06:57AM, Chris Suter
<email@hidden> wrote:
On 29/08/2006, at 9:42 PM, Ricky Sharp wrote:
On Tuesday, August 29, 2006, at 00:59AM, Chris Suter
<email@hidden> wrote:
On 29/08/2006, at 3:47 PM, Donald Hall wrote:
Furthermore, I understood that "external representation" was always
big endian.
No. The representation is as dictated by the encoding. Some encodings
don't have an endian aspect to them (UTF-8 for example). I'm guessing
if you pick kCFStringEncodingUTF16, OS X is free to choose big-endian
or little-endian.
Not quite. According to <http://www.unicode.org/faq/
utf_bom.html#36>, unmarked UTF-16 and UTF-32 uses big-endian by
default. I would expect the Cocoa frameworks to honor that default.
But Cocoa can write the byte order mark, and as it turns out, I'm right.
#include <CoreFoundation/CoreFoundation.h>
#include <stdio.h>
int main ()
{
CFDataRef data;
data = CFStringCreateExternalRepresentation (NULL, CFSTR ("test"),
kCFStringEncodingUTF16, 0);
int i;
for (i = 0; i < CFDataGetLength (data); ++i)
printf ("x ", CFDataGetBytePtr (data)[i]);
putchar ('\n');
return 0;
}
produces:
ff fe 74 00 65 00 73 00 74 00
on an Intel machine.
Here, the actual encoding being used is UTF-16LE. By placing the
BOM in there, receivers of the data know how to interpret it. But
in cases where 16-bit data is unmarked, you should interpret the
bytes as being big-endian by default.
--
Rick Sharp
Instant Interactive(tm)
--
Donald S. Hall, Ph.D.
Apps & More Software Design, Inc.
email@hidden
http://www.appsandmore.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden