[Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness
[Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness
- Subject: [Q] UTF-8 stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding weirdness
- From: JongAm Park <email@hidden>
- Date: Wed, 18 Jun 2008 11:49:02 -0700
Hello, all.
I found a very interesting and strange behaviour of the
NSString:stringByAddingPercentEscapedUsingEncoding.
I got a UTF-8 string from a Final Cut Pro project file, which is
exported as an XML.
There is a video clip named "자연", which means "Nature" in Korean.
And its pathurl is
file://localhost/Users/young/Movies/자연.mov
The 자연 part is 자연.
So, it is percent escaped string.
So, I tried getting a UTF8 version of "자연" by issuing either of :
1. NSString *anUTF16String = [NSString stringWithString:@"자연"];
NSString *anUTF8String = [anUTF16String UTF8String];
or
2. NSString *anUTF8String = [NSString stringWithUTF8String:"자연"];
And they returned same data.
And, I tried making a percent escaped string by calling :
NSString *anUTF8PercentEscapedString = [anUTF8String
stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
And tried reverting back to original string by calling :
NSString *revertedUTF8String = [anUTF8PercentEscapedString
stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
This gave me the same original data to the one tried in either 1 or 2 above.
And I checked what data it contains by calling :
3. char *revertedCStringOne = (char *)[revertedUTF8String
cStringUsingEncoding:NSUTF8StringEncoding];
It was : EC 9E 90 EC 97 B0
As I mentioned above, the pathurl string of FCP project looks different
from the result 3.
So, I tried converting the Korean part of the pathurl by calling :
char test[] ={ 0xE1, 0x84, 0x8C, 0xE1, 0x85, 0xA1, 0xE1, 0x84, 0x8B,
0xE1, 0x85, 0xA7, 0xE1, 0x86, 0xAB, 0};
length = strlen( test );
for( i = 0; i < length; i++ )
{
NSLog(@"%X", test[i] );
}
printf("\n");
// 4. It prints the same "자연"
NSString *questionedString = [NSString stringWithUTF8String:test];
NSLog(@"Questioned String = %@", questionedString );
and.. when the questionedString is converted to a percent escaped string
by calling :
NSString *questionedPercentEscapedString = [questionedString
stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog(@"%@", questionedPercentEscapedString);
It was same to the one in the FCP project pathurl, ie. ᄌ
Can anyone tell me why the two different data source are displayed as
same "자연", while what it contains are different?
I would like to send an Apple event to the Final Cut Pro, but I'm not
sure if it is OK to send the percent escaped one like 1 or 2, or the one
in the FCP project. ( I don't know how to generate the one like in the
FCP project XML file. )
I also tried a Java applet,
http://www.profitcode.net/resources/tools/utf8_encoder_applet.html,
and its result is same to the one tried at 1 or 2 above. It is different
from the one in the FCP project.
I will appreciate any help.
Thank you.
P.S. My whole code is here, just in case.
-----------------------------------------------------------------------------------------------------------------------------------------------
NSString *anUTF16String = [NSString stringWithString:@"자연"];
//NSString *anUTF16PercentEscapedString = [anUTF16String
stringByAddingPercentEscapesUsingEncoding:NSUTF16StringEncoding];
char *UTF16CString = (char *)[anUTF16String
cStringUsingEncoding:NSUTF16StringEncoding];
// 1. Making an NSString object with a UTF8 encoding
NSString *anUTF8String = [NSString stringWithUTF8String:"자연"];
NSString *anUTF8PercentEscapedString = [anUTF8String
stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSString *revertedUTF8String = [anUTF8PercentEscapedString
stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
char *revertedCStringOne = (char *)[revertedUTF8String
cStringUsingEncoding:NSUTF8StringEncoding];
NSLog(@"Unicode 16 : %@", anUTF16String );
//NSLog(@"Unicode 16 Percent Escaped : %@", anUTF16PercentEscapedString );
NSLog(@"Unicode 8 : %@", anUTF8String );
NSLog(@"Unicode 8 Percent Escaped : %@", anUTF8PercentEscapedString );
NSLog(@"Reverted from Unicode 8 Percent Escaped : %@", revertedUTF8String );
NSLog(@"bytes : %s", revertedCStringOne );
// 2. The data : EC 9E 90 EC 97 B0
int length = strlen( revertedCStringOne );
int i;
for( i = 0; i < length; i++ )
{
NSLog(@"%X", revertedCStringOne[i] );
}
printf("\n");
// 3. Data from a Final Cut Pro XML project file which is same to "자연"
// This looks very different from what you can see from // 2.
char test[] ={ 0xE1, 0x84, 0x8C, 0xE1, 0x85, 0xA1, 0xE1, 0x84, 0x8B,
0xE1, 0x85, 0xA7, 0xE1, 0x86, 0xAB, 0};
length = strlen( test );
for( i = 0; i < length; i++ )
{
NSLog(@"%X", test[i] );
}
printf("\n");
// 4. It prints the same "자연"
NSString *questionedString = [NSString stringWithUTF8String:test];
NSLog(@"Questioned String = %@", questionedString );
// 5. Percent Escape representation of it is same to that of //3 not //2
NSString *questionedPercentEscapedString = [questionedString
stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog(@"%@", questionedPercentEscapedString);
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden