Re: (unicode -> shift-jis) encoding conversion bug?
Re: (unicode -> shift-jis) encoding conversion bug?
- Subject: Re: (unicode -> shift-jis) encoding conversion bug?
- From: Douglas Davidson <email@hidden>
- Date: Thu, 24 Jan 2002 09:36:55 -0800
On Thursday, January 24, 2002, at 06:22 AM, Jody Fairchild wrote:
NSString *example;
unichar in, out;
NSString *s;
NSData *d;
for (i = 0; i < [length example]; i++)
{
uc = [example characterAtIndex:i];
s = [[NSString alloc] initWithCharacters:&in length:1];
d = [s dataUsingEncoding:NSShiftJISStringEncoding
allowLossyConversion:NO];
[d getBytes:&out];
NSLog(@"unicode = %X, sjis = %X",in,out);
}
The offending line is [d getBytes:&out]. You are taking a stream of
bytes in shift-JIS and trying to put it into a unichar. Since in the
particular case you are interested in, the data is only one byte long,
it is not surprising that the contents of the other byte of your unichar
are not determined.
Try something like this: NSLog(@"the sjis output has length %d and
contents %@", [d length], d). If you want to examine the contents byte
by byte, use [d bytes]. You will also want to check to see whether d is
nil; if the string cannot be converted into shift-JIS, and you do not
allow lossy conversion, d will be nil.
should not the conversion be returning 0x0061 for regular "a"? i
thought
part of the beauty of this unicode stuff was that we wouldn't have to
worry
about when something should be treated as one byte or two ...
Sure, as long as you are working with Unicode. When you start
converting into shift-JIS, you have to follow the rules for shift-JIS.
Douglas Davidson