Re: converting unicode text representation to unichar
Re: converting unicode text representation to unichar
- Subject: Re: converting unicode text representation to unichar
- From: Daniel Child <email@hidden>
- Date: Tue, 11 Aug 2009 13:30:59 -0400
Thank you. I think I'm almost there, though I'm getting incorrect
values. I tried both %c and %C. These yield, respectively, î and ,
which are incorrect.
2009-08-11 13:24:58.879 ParseTest[566:10b] The num is U+4E01
2009-08-11 13:24:58.889 ParseTest[566:10b] codeItself is 4E01
2009-08-11 13:24:58.891 ParseTest[566:10b] charAsString is 19969
2009-08-11 13:24:58.892 ParseTest[566:10b] strc is î
2009-08-11 13:24:58.893 ParseTest[566:10b] strC is
UnicodeRecordParsingStrat *urps = [[UnicodeRecordParsingStrat alloc]
init];
theUniWord = [urps parseUnicodeWord: unicodeLine]; // yields @"U+4E01"
codeItself = [urps theCharacterFromCode: theUniWord]; // yields @"4E01"
NSString *ox = @"0x";
NSString *hexString;
hexString = [ox stringByAppendingString: codeItself]; // yields
@"0x4E01"
NSScanner *scanner = [NSScanner scannerWithString: hexString];
NSString *charAsString;
unsigned value;
if ([scanner scanHexInt:&value]){
charAsString = [NSString stringWithFormat: @"%u", value]; // yields
19969
NSLog(@"charAsString is %@\n", charAsString);
NSString *strc = [NSString stringWithFormat: @"%c", &value];
NSLog(@"strc is %@\n", strc);
NSString *strC = [NSString stringWithFormat: @"%C", &value];
NSLog(@"strC is %@\n", strC);
} else {
NSLog( @"Hex reading failed." );
}
Seems like a lot of code for a simple conversion.....
On Aug 11, 2009, at 11:12 AM, Alastair Houghton wrote:
On 11 Aug 2009, at 15:40, Daniel Child wrote:
Unihan.txt provides text files showing characters in the format U
+XXXX.
If I scan these in, naturally I can obtain the NSString
representation XXXX.
But I need to convert this text to genuine unichars OR NSStrings
(the actual characters represented).
Two questions:
1. I didn't see any relevant conversion methods under NSString or
NSNumber. Are there Cocoa functions to perform this easily?
Use NSScanner's -scanHexInt (or similar) to scan the hexadecimal
part, then stick that in a unichar and either use -
stringWithFormat's %C format code, or NSString's -
initWithCharacters:length:/+stringWithCharacters:length: methods.
Or you can just scan the hex part yourself manually; it isn't hard.
You could even turn it into UTF-8 and use strtoul() if you were
feeling mildly masochistic.
2. I am assuming I have to convert to the format '0xXXXX', but is
it also possible to work with U+XXXX directly in cocoa? I got error
messages for all of the following formats:
unichar uch = 0x0041;
NSString *str = [NSString stringWithCharacters:&uch length:1];
NSLog (@"str is \"%@\".", str);
/* Output:
str is "A" */
Kind regards,
Alastair.
--
http://alastairs-place.net
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden