Re: Displaying Unicode Followup (And Rendering Bug)
Re: Displaying Unicode Followup (And Rendering Bug)
- Subject: Re: Displaying Unicode Followup (And Rendering Bug)
- From: Bill Bumgarner <email@hidden>
- Date: Thu, 29 Nov 2001 14:14:39 -0500
The implementation I ended up with is as follows.
It is interesting to note that this works-- but makes the assumption that
ALL ID3 tags will have the byte order marker in reverse order. Likely not
the case, but I can't currently verify as I don't have access to MP3 files
with ID3 tags that are both unicode and NOT created by iTunes (Audion
doesn't even attempt to display unicode ID3 tags).
I am continuing to use CFStringCreateWithBytes() because it allows for
specifying the encoding when creating a string from a buffer-- something
NSString cannot do without first creating an NSData object containing the
data.
I'm still having display problems-- likely because the first two bytes
should NOT be swapped; specifically, the byte sequence...
ff fe 2a 59 7d 96 00 00
... causes the appkit's glyph rendering engine to lock up if the first two
bytes are swapped. If the first two are not swapped, everything renders
fine-- but I have no way of verifying if the display is correct. The
display in iTunes appears to be screwed up.
So, it would seem that there is a bug in iTunes in that if you copy/paste
Japanese characters into the ID3 tag editing window, it will write the tag
out with the order marker backwards. If this sounds reasonable, I will
file a bugreport.
b.bum
- (NSString *) _consumeString: (unsigned char **)parsePtrPtr ofLength:
(unsigned int) stringLength asUnicode: (BOOL) unicodeFlag;
{
unsigned char *parsePtr = *parsePtrPtr;
unsigned char *muckABuf;
// damnit! there doesn't seem to be away to create an NSData
containing. Dipping into CoreFoundation provides a solution....
NSString *uniString;
if (unicodeFlag) {
unsigned char t;
muckABuf = alloca(stringLength);
memcpy(muckABuf, parsePtr, stringLength);
t = muckABuf[0];
muckABuf[0] = muckABuf[1];
muckABuf[1] = t;
parsePtr = muckABuf;
uniString = [(NSString *)CFStringCreateWithBytes(NULL, parsePtr,
stringLength, kCFStringEncodingUnicode, true) autorelease];
} else
uniString = [(NSString *)CFStringCreateWithBytes(NULL, parsePtr,
stringLength, kCFStringEncodingISOLatin1, false) autorelease];
(*parsePtrPtr) += stringLength;
return uniString;
}
On Thursday, November 29, 2001, at 01:11 PM, Ali Ozer wrote:
Well, the stream you have looks like plain old UCS-2, that is,
NSUnicodeStringEncoding. One thing that looks wrong is that the byte
order mark (BOM) seems reversed --- That is, it implies the characters
are stored big endian, but they really are little endian. BOM is supposed
to be 0xFEFF, which implies the chars are 0xB930, 0xC830, etc, which seem
fishy (they are all over the map). If you instead reverse them, you get
the much more reasonable 0x30B9, 0x30C8, etc. Doing this, and loading the
resulting byte stream into TextEdit, I get a reasonable looking Japanese
string (I have no idea what it says though). In addition, [NSString
initWithContentsOfFile:] should just read this fine. No need to use TEC..
.
Ali