• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Displaying Unicode
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Displaying Unicode


  • Subject: Re: Displaying Unicode
  • From: Ali Ozer <email@hidden>
  • Date: Thu, 29 Nov 2001 10:11:51 -0800

Well, the stream you have looks like plain old UCS-2, that is, NSUnicodeStringEncoding. One thing that looks wrong is that the byte order mark (BOM) seems reversed --- That is, it implies the characters are stored big endian, but they really are little endian. BOM is supposed to be 0xFEFF, which implies the chars are 0xB930, 0xC830, etc, which seem fishy (they are all over the map). If you instead reverse them, you get the much more reasonable 0x30B9, 0x30C8, etc. Doing this, and loading the resulting byte stream into TextEdit, I get a reasonable looking Japanese string (I have no idea what it says though). In addition, [NSString initWithContentsOfFile:] should just read this fine. No need to use TEC...

Ali


On Thursday, November 29, 2001, at 09:06 AM, Bill Bumgarner wrote:

Ahh... yes... I hadn't taken the concept of 'zero cost bridging' to its
ultimate conclusion. That's fixed now.

I still can't display the unicode strings properly and it appears to be a
conversion problem-- i.e. I'm not coming up with the correct string for
the actual display, but the display is working correctly.

The latest incarnation does something like the following (without working)
.

The test byte sequence is (copy/pasted from a file in a random
Japanese.lproj -- it will make very little sense, I suspect):

fe ff b9 30 c8 30 dc 30 fc 30 c9 30 6b 30 b9 30 c8 30 ea 30 f3 30 b0
30 4c 30 42 30 8a 30 7e 30 5b 30 93 30 00 00

It renders as this in iTunes:

c9ccc<c c+c9cc*c3c0c cc
c>cc

But renders as this in my app if I use kCFStringEncodingUnicode as shown
in the original code.

k$0l 0m00o00l$0f,0k$0l 0n(0o 0k0d00d0h(0g80e,0i 0

The code below fails completely because no conversion from
kTextEncodingISO10646_1993 to kTextEncodingUnicodeDefault -- direct or
indirect -- can be deduced by TECCreateConverter

b.bum

+ (TECObjectRef) _unicodeID3v2ToMacTextConverter
{
static TECObjectRef id3v2ToMacTextConverter = NULL;

if (id3v2ToMacTextConverter == NULL) {
OSStatus status;

status = TECCreateConverter(&id3v2ToMacTextConverter,
kTextEncodingISO10646_1993, kTextEncodingUnicode);
if (status != 0) {
NSLog(@"TECCreateConverter() error %d", status);
return nil;
}
}

return id3v2ToMacTextConverter;
}

+ (CFStringRef /* implies caller retain */) _convertUnicodeText:
(ConstTextPtr) textBuffer ofLength: (unsigned int) length
{
ByteCount actualInputConsumed;
ByteCount actualOutputProduced;
UInt8 outputBuffer[1024];
CFMutableStringRef returnString = CFStringCreateMutable(NULL, 0);
TECObjectRef textConverter = [self _unicodeID3v2ToMacTextConverter];

do {
OSStatus status;

status = TECConvertText(textConverter, textBuffer, length,
&actualInputConsumed, outputBuffer, sizeof(outputBuffer),
&actualOutputProduced);
if (status != 0) {
NSLog(@"TECConvertText() error %d", status);
return nil;
}

CFStringAppendCharacters(returnString, (const UniChar
*)outputBuffer, actualOutputProduced);

length = length - actualInputConsumed;
textBuffer = textBuffer + actualInputConsumed;
} while (length > 0);

return returnString;
}





On Thursday, November 29, 2001, at 03:54 AM, Ali Ozer wrote:

returnString = [NSString stringWithString: (NSString *)
uniString]; // works-- but not without the cast... why?

You don't need this step, as CFStrings and NSStrings are toll-free
bridged, meaning they are equivalent. (A cast is all you need.)

CFDataRef stringData = CFDataCreateWithBytesNoCopy(NULL, (const UInt8
*) parsePtr, stringLength, kCFAllocatorNull);
uniString = CFStringCreateFromExternalRepresentation(NULL, stringData,
kCFStringEncodingUnicode);
CFRelease(stringData);

These other steps look fine, and assuming the original data was in UCS-2
format, should work. You should open the original data in TextEdit (or
copy/paste) and see how it shows up...

Ali
_______________________________________________
cocoa-dev mailing list
email@hidden
http://www.lists.apple.com/mailman/listinfo/cocoa-dev


  • Follow-Ups:
    • Re: Displaying Unicode Followup (And Rendering Bug)
      • From: Bill Bumgarner <email@hidden>
    • Re: Displaying Unicode
      • From: Bill Bumgarner <email@hidden>
References: 
 >Re: Displaying Unicode (From: Bill Bumgarner <email@hidden>)

  • Prev by Date: Re: Bezier folies
  • Next by Date: Re: Displaying Unicode
  • Previous by thread: Re: Displaying Unicode
  • Next by thread: Re: Displaying Unicode
  • Index(es):
    • Date
    • Thread