Re: Displaying Unicode
Re: Displaying Unicode
- Subject: Re: Displaying Unicode
- From: Ali Ozer <email@hidden>
- Date: Thu, 29 Nov 2001 10:11:51 -0800
Well, the stream you have looks like plain old UCS-2, that is,
NSUnicodeStringEncoding. One thing that looks wrong is that the byte
order mark (BOM) seems reversed --- That is, it implies the characters
are stored big endian, but they really are little endian. BOM is
supposed to be 0xFEFF, which implies the chars are 0xB930, 0xC830, etc,
which seem fishy (they are all over the map). If you instead reverse
them, you get the much more reasonable 0x30B9, 0x30C8, etc. Doing this,
and loading the resulting byte stream into TextEdit, I get a reasonable
looking Japanese string (I have no idea what it says though). In
addition, [NSString initWithContentsOfFile:] should just read this fine.
No need to use TEC...
Ali
On Thursday, November 29, 2001, at 09:06 AM, Bill Bumgarner wrote:
Ahh... yes... I hadn't taken the concept of 'zero cost bridging' to its
ultimate conclusion. That's fixed now.
I still can't display the unicode strings properly and it appears to
be a
conversion problem-- i.e. I'm not coming up with the correct string for
the actual display, but the display is working correctly.
The latest incarnation does something like the following (without
working)
.
The test byte sequence is (copy/pasted from a file in a random
Japanese.lproj -- it will make very little sense, I suspect):
fe ff b9 30 c8 30 dc 30 fc 30 c9 30 6b 30 b9 30 c8 30 ea 30 f3 30 b0
30 4c 30 42 30 8a 30 7e 30 5b 30 93 30 00 00
It renders as this in iTunes:
c9ccc<c c+c9cc*c3c0ccc
c>cc
But renders as this in my app if I use kCFStringEncodingUnicode as shown
in the original code.
k$0l 0m00o00l$0f,0k$0l 0n(0o0k0d00d0h(0g80e,0i0
The code below fails completely because no conversion from
kTextEncodingISO10646_1993 to kTextEncodingUnicodeDefault -- direct or
indirect -- can be deduced by TECCreateConverter
b.bum
+ (TECObjectRef) _unicodeID3v2ToMacTextConverter
{
static TECObjectRef id3v2ToMacTextConverter = NULL;
if (id3v2ToMacTextConverter == NULL) {
OSStatus status;
status = TECCreateConverter(&id3v2ToMacTextConverter,
kTextEncodingISO10646_1993, kTextEncodingUnicode);
if (status != 0) {
NSLog(@"TECCreateConverter() error %d", status);
return nil;
}
}
return id3v2ToMacTextConverter;
}
+ (CFStringRef /* implies caller retain */) _convertUnicodeText:
(ConstTextPtr) textBuffer ofLength: (unsigned int) length
{
ByteCount actualInputConsumed;
ByteCount actualOutputProduced;
UInt8 outputBuffer[1024];
CFMutableStringRef returnString = CFStringCreateMutable(NULL, 0);
TECObjectRef textConverter = [self
_unicodeID3v2ToMacTextConverter];
do {
OSStatus status;
status = TECConvertText(textConverter, textBuffer, length,
&actualInputConsumed, outputBuffer, sizeof(outputBuffer),
&actualOutputProduced);
if (status != 0) {
NSLog(@"TECConvertText() error %d", status);
return nil;
}
CFStringAppendCharacters(returnString, (const UniChar
*)outputBuffer, actualOutputProduced);
length = length - actualInputConsumed;
textBuffer = textBuffer + actualInputConsumed;
} while (length > 0);
return returnString;
}
On Thursday, November 29, 2001, at 03:54 AM, Ali Ozer wrote:
returnString = [NSString stringWithString: (NSString *)
uniString]; // works-- but not without the cast... why?
You don't need this step, as CFStrings and NSStrings are toll-free
bridged, meaning they are equivalent. (A cast is all you need.)
CFDataRef stringData = CFDataCreateWithBytesNoCopy(NULL, (const
UInt8
*) parsePtr, stringLength, kCFAllocatorNull);
uniString = CFStringCreateFromExternalRepresentation(NULL,
stringData,
kCFStringEncodingUnicode);
CFRelease(stringData);
These other steps look fine, and assuming the original data was in
UCS-2
format, should work. You should open the original data in TextEdit (or
copy/paste) and see how it shows up...
Ali
_______________________________________________
cocoa-dev mailing list
email@hidden
http://www.lists.apple.com/mailman/listinfo/cocoa-dev