Re: Determining the canonical case for a file name on HFS+ (complete solution)
Re: Determining the canonical case for a file name on HFS+ (complete solution)
- Subject: Re: Determining the canonical case for a file name on HFS+ (complete solution)
- From: Camillo Lugaresi <email@hidden>
- Date: Thu, 22 Dec 2005 23:43:56 +0100
I'll reply to my own message so that the solution will be in the
archives of the list.
On 30/nov/05, at 22:05, Camillo Lugaresi wrote:
I am looking for a way to determine the canonical form for a file
name. Suppose that there is a file named "FILE" in the directory /
folder/, which is on an HFS+ volume: then the canonical form of the
path "/folder/file" would be "/folder/FILE". Both paths refer to
the same file, but the representation on disk is FILE.
This can be done using realpath (man 3 realpath). For example:
#include <stdio.h>
#include <stdlib.h>
#include <sys/param.h>
int main (int argc, char **argv)
{
char real_path[PATH_MAX];
if (argc < 2) return 1;
printf("%s\n", realpath(argv[1], real_path));
return 0;
}
I need to know this for the purpose of comparison, and I cannot
simply do a case-insensitive comparison because I do not know if /
folder/ is on a case-insensitive or case-sensitive volume.
I have thought of opening the file and then using fstat on the
descriptor, but is there a better way to do it?
From a broader point of view, what I need is a reliable solution
for determining if two paths refer to the same file, without
assuming anything about the filesystem on which it is located. I
have searched the archives of this list, but I could not find an
answer to this question.
Here is the solution I ended up using in a cross-platform program.
On Unix-like systems, use stat and compare inodes and device ids. The
Posix standard says:
The st_ino and st_dev fields taken together uniquely identify the
file within the system.
[...]
Networked implementations of a POSIX-conforming system must
guarantee that all files visible within the file tree (including
parts of the tree that may be remotely mounted from other machines
on the network) on each individual processor are uniquely
identified by the combination of the st_ino and st_dev fields.
(from <http://www.opengroup.org/onlinepubs/009695399/basedefs/sys/
stat.h.html>)
Testing showed that Mac OS X and Linux implement it correctly for
both local and network file systems.
In general, since this behaviour is specified in Posix, it is
expected to work on all modern Unix-like systems. The one exception I
have heard of is when a file system with an unusually large inode
number is mounted on a system where ino_t is too small: in that case,
a hash of the real inode number is returned in st_ino, and there is
the possibility of a collision. But that is a very rare scenario, and
such a collision would be a mere annoyance in the context of the
application I was working on.
On Windows, stat is not Posix-compliant:
st_ino
Number of the information node (the inode) for the file (UNIX-
specific). On UNIX file systems, the inode describes the file date
and time stamps, permissions, and content. When files are hard-
linked to one another, they share the same inode. The inode, and
therefore st_ino, has no meaning in the FAT, HPFS, or NTFS file
systems.
(from <http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/vclib/html/_crt__stat.2c_._wstat.2c_._stati64.2c_._wstati64.asp>)
In practice, 0 is returned for all files on most filesystems. Windows
can return file IDs for local file systems using
GetFileInformationByHandle, but unlike in Posix, there are no
guarantees for network file systems. So, on Windows we fall back on
comparing names, using the technique recommended here: <http://
blogs.msdn.com/michkap/archive/2005/10/17/481600.aspx>
I hope this information will be helpful to others who face a similar
problem. :-)
Camillo
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden