Re: renaming a file with special/reserved characters in name
Re: renaming a file with special/reserved characters in name
- Subject: Re: renaming a file with special/reserved characters in name
- From: Shawn Erickson <email@hidden>
- Date: Sat, 28 Feb 2009 09:43:48 -0800
On Sat, Feb 28, 2009 at 8:58 AM, Shawn Erickson <email@hidden> wrote:
> On Sat, Feb 28, 2009 at 8:45 AM, Clark Cox <email@hidden> wrote:
>
>>>... not sure what Michael is
>>> talking about.
>>
>> On Leopard, invalid bytes will indeed be escaped:
>
> Ah going back over the email chain I now get the context of the
> conversation when Michael made his comment about escaping.
>
> Anyway I was mostly pointing out that it isn't HSF+ doing this it is
> the POSIX APIs which expect UTF-8 and presumably some place now escape
> invalid bytes (non-UTF-8). HFS+ as I noted doesn't work with UTF-8.
Ah so the escaping comes from utf8_decodestr (vfs_utfconv.c, shared by
all file systems) if you pass the UTF_ESCAPE_ILLEGAL option. If that
wasn't specified EINVAL would be returned.
/*
* utf8_decodestr - Decodes a UTF-8 string into Unicode
*
* This function takes an UTF-8 input string, utf8p, of utf8len bytes
* and produces the Unicode output into a buffer of buflen bytes pointed
* to by ucsp. The size of the output in bytes (not including a NULL
* termination byte) is returned in ucslen. Both buffers must reside
* in kernel memory.
*
* If '/' chars are allowed in the Unicode output then an alternate
* (replacement) char must be provided in altslash.
*
* FLAGS
* UTF_REV_ENDIAN: Unicode byte order is opposite current runtime
*
* UTF_BIG_ENDIAN: Unicode byte order is always big endian
*
* UTF_LITTLE_ENDIAN: Unicode byte order is always little endian
*
* UTF_DECOMPOSED: generate fully decomposed output (NFD)
*
* UTF_PRECOMPOSED: generate precomposed output (NFC)
*
* UTF_ESCAPE_ILLEGAL: percent escape any illegal UTF-8 input
*
* ERRORS
* ENAMETOOLONG: output did not fit; only ucslen bytes were decoded.
*
* EINVAL: illegal UTF-8 sequence encountered.
*/
At this time it looks like only the HFS+ file system code specifies
this flag when converting incoming UTF-8 names to the HFS+ Unicode
encoding. Interestingly it isn't universally applied when the HFS+
gets UTF-8 names from its callers... It appears to only happen on
catalog entry creation and lookup... it isn't used for attribute name
or post creation name comparison (did a very quick look over of the
HFS+ code in XNU so I could be misunderstanding the pathways a little)
-Shawn
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden