• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
(unicode -> shift-jis) encoding conversion bug?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

(unicode -> shift-jis) encoding conversion bug?


  • Subject: (unicode -> shift-jis) encoding conversion bug?
  • From: email@hidden (Jody Fairchild)
  • Date: Thu, 24 Jan 2002 23:22:59 +0900

a situation in which converting characters from unicode to shift-jis
seems to produce incorrect (or at least counterintuitive) results, e.g:

let's run the following code snippetoid:

NSString *example;
unichar in, out;
NSString *s;
NSData *d;

for (i = 0; i < [length example]; i++)
{
uc = [example characterAtIndex:i];
s = [[NSString alloc] initWithCharacters:&in length:1];
d = [s dataUsingEncoding:NSShiftJISStringEncoding
allowLossyConversion:NO];
[d getBytes:&out];

NSLog(@"unicode = %X, sjis = %X",in,out);
}

for an example string containing two characters (from unicode input via an
NSTextField). the characters are plain lowercase "a", and hiragana "あ"
(the japanese phonetic character representing the "ah" sound) ... we get
something like the following output:

unicode = 61, sjis = 6114 (for regular "a")
unicode = 3042, sjis = 82A0 (for hiragana "a")

the problem is that regular "a" should be 0x61 in both unicode _and_
shift-jis, but the converted char gets a garbage byte tacked onto the end
of it. this garbage byte is essentially random, and tends to change each
time the code is run. note that the first byte of the unichar holds the
correct value ... note also that the conversion works for a regular
double-byte character; hiragana "a" is indeed 0x3042 in unicode and 0x82A0
in shift-jis.

should not the conversion be returning 0x0061 for regular "a"? i thought
part of the beauty of this unicode stuff was that we wouldn't have to worry
about when something should be treated as one byte or two ...

is this a bug in the conversion stuff or am i missing something?
opinions? any encoding gurus out there care to point out some fatal flaw
in my approach?

thanks,
-jf


  • Follow-Ups:
    • Re: (unicode -> shift-jis) encoding conversion bug?
      • From: Greg Titus <email@hidden>
    • Re: (unicode -> shift-jis) encoding conversion bug?
      • From: Douglas Davidson <email@hidden>
  • Prev by Date: NSArchiver/NSCoder future and cross platform compatibility?
  • Next by Date: Re: Weird calculation
  • Previous by thread: Re: NSArchiver/NSCoder future and cross platform compatibility?
  • Next by thread: Re: (unicode -> shift-jis) encoding conversion bug?
  • Index(es):
    • Date
    • Thread