Re: Filename encodings
Re: Filename encodings
- Subject: Re: Filename encodings
- From: Tito Ciuro <email@hidden>
- Date: Sun, 11 Jul 2004 01:15:13 -0400
Hello again,
The following excerpt is taken from the Apple's System Overview PDF. If
you scroll to the File Encodings and Fonts section, you'll find this
explanation:
[...]
Although Unicode is considered the native encoding for Mac OS X, there
is no file encoding that is the default for all situations. The file
encoding that is (or should be) used depends on what you want to do, on
the API you use, and on the underlying file system. For example, the
encoding used for filenames differs among the various file systems. Mac
OS Extended (HFS+) uses one particular form of Unicode for filenames:
canonically decomposed Unicode 2.1 in UTF-16 format (a sequence of
16-bit codes). The UFS file system uses a different form of Unicode for
filenames; it allows any character from Unicode 2.1 or later, but uses
UTF-8 format (a sequence of 8-bit codes). And Mac OS Standard (HFS)
uses legacy Mac encodings, such as MacRoman. Note that, because of
implementation differences, erroneous Unicode in filenames on HFS+
volumes displays correctly in Mac OS 9 systems, but always appears with
garbled characters in Mac OS X. In addition, all code that calls BSD
system routines should ensure that the const *char parameters of these
routines are in UTF-8 encoding. All BSD system functions expect their
string parameters to be in UTF-8 encoding and nothing else. An
additional caveat is that string parameters for files, paths, and other
file-system entities must be in canonical UTF-8. In a canonical UTF-8
Unicode string, all decomposable characters are decomposed; for
example, i (0x00E9) is represented as e (0x0065) + (0x0301). To put
things in canonical UTF-8 encoding, use the 3file-system
representation2 APIs defined in Cocoa and Carbon (including Core
Foundation). For example, to get a canonical UTF-8 character string in
Cocoa, use NSString9s fileSystemRepresentation: method; for
noncanonical UTF-8 strings, use NSString9s UTF8String method.
[...]
Regards,
-- Tito
On 11 jul 2004, at 0:09, Mark A. Stratman wrote:
>
Would it be safe to just use NSUTF8StringEncoding? Or are their
>
problems with that as well, that I'm not aware of?
>
I've always used UTF8-encoded strings and have never had a problem
>
with any languages.
>
>
-mark
>
>
On Jul 10, 2004, at 10:38 PM, Tito Ciuro wrote:
>
>
> Hello Adam,
>
>
>
> If I understand correctly, I believe you should let the OS handle that
>
> by using any of the following methods:
>
>
>
> NSString's - (const char *)fileSystemRepresentation
>
> NSFileManager's - (const char
>
> *)fileSystemRepresentationWithPath:(NSString *)path
>
>
>
> I hope this helps,
>
>
>
> -- Tito
>
>
>
> On 10 jul 2004, at 20:49, Adam Maxwell wrote:
>
>
>
>> I've written a program to do find and replace on a list of filenames.
>
>> It works quite well, but testing revealed problems with accented
>
>> characters such as i in a filename. I've come up with a workaround,
>
>> but I don't understand why it's necessary, and am curious if I'm
>
>> missing something.
>
>>
>
>> I run an NSOpenPanel, allow multiple file selection, get the selected
>
>> file list as an NSArray, and send it to this method:
>
>>
>
>> - (void)buildPathsArrayWithArray:(NSArray *)array{
>
>> NSEnumerator *e = [array objectEnumerator];
>
>> NSMutableDictionary *dict = [NSMutableDictionary
>
>> dictionaryWithCapacity:2];
>
>> NSString *oldPath = nil;
>
>> NSString *encodedPath = nil;
>
>> NSData *data;
>
>>
>
>> while(oldPath = [e nextObject]){
>
>>
>
>> data = [oldPath dataUsingEncoding:NSMacOSRomanStringEncoding];
>
>> encodedPath = [[[NSString alloc] initWithData:data
>
>> encoding:NSMacOSRomanStringEncoding] autorelease];
>
>> // NSLog(@"encodedPath is %@", encodedPath);
>
>> [dict setObject:encodedPath forKey:@"Old Path"];
>
>> [dict setObject:@"" forKey:@"New Path"];
>
>> ... do more stuff...
>
>> [pathsArray addObject:[dict copy]];
>
>> [dict removeAllObjects];
>
>> }
>
>> }
>
>>
>
>> Unless I use NSMacOSRomanStringEncoding, things don't work right (I'm
>
>> using the AGRegex framework for find/replace). I originally used the
>
>> oldPath variable directly, and for a filename of "iagSGAMMA"
>
>> (randomly
>
>> chosen), my [pathsArray description] gave
>
>> "e\U0301a\U0301c\U0327SGAMMA". After my NSMacOSRomanStringEncoding
>
>> stuff, the array has "\U00e9\U00e1\U00e7SGAMMA" which actually works.
>
>> Has anyone else run across this, or know of a better way to handle
>
>> it?
>
>>
>
>> thanks,
>
>> Adam Maxwell
[demime 0.98b removed an attachment of type application/pkcs7-signature which had a name of smime.p7s]
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.