Re: First timer: Finder Copy vs. cp
Re: First timer: Finder Copy vs. cp
- Subject: Re: First timer: Finder Copy vs. cp
- From: Terry Lambert <email@hidden>
- Date: Thu, 10 Aug 2006 19:04:29 -0700
On Aug 10, 2006, at 5:44 PM, Dan Shoop wrote:
- and not the time that the original was created. On the other
hand, if the intent of the copy was to create an identical copy,
then you'd need to copy the creation date as well.
As I point out in my article there are philosophical aspects to what
metadata should be maintained depending on what "copy" means to you.
This is understandable.
However the behavior on the Mac of creation date has been rather
well codified and further re-enforced through precedent along with
how un-Mac-like environments should maintain things like creation
date using techniques like Apple Doubles where necessary.
OS X's BSD environment improperly creates Apple Doubles (making them
in fact not true Apple Double representations of the original file)
as it munges the modified date into the creation date field in
opposition to the specs that Apple Double defines should be the
behavior.
What it's actually getting is the real creation time of the file,
which is set to the modification time of the file, since there is no
separate VFS layer semantic for "create". Part of the problem here is
that the POSIX specification states what semantics shall be applied to
file modification times (i.e. "SHALL be marked for update" vs. "SHALL
be updated", etc.). For everything else, there's only "historical
behaviour, whatever that was; hope you have a test suite handy".
The BSD environment only creates Apple Double files for file metadata
that is not possible to represent (or in the case of network FS
protocols, round trip transit) between the Finder and the underlying
FS. In other words, only if you attempt to store data not natively
supported by the underlying FS.
My personal take on this is that setting dates on files should
require ownership of them, and setting creation dates on files is
(effectively) asking the system to lie to the next person who looks
at the file (the POSIX utime() system call doesn't actually
understand the concept of "create time" separate from the other
times, since it only understand 3 timestamps at the API level; in
HFS terms, that's ca_atime, ca_mtime, and ca_ctime, and neither of
ca_itime or ca_btime).
This is an interesting approach, requiring ownership to set dates.
However this falls apart for several reasons:
(1) Creation Date is not ctime so it's apples[sic] and oranges. So
it's not lying at all as it can change ctime and not affect creation
date and stays completely consistent.
The main problem is backup software that does not backup copies of
files because the ca_btime is later than the ca_itime, because we
faithfully copied it. We can either be entirely faithful (and your
file doesn't get backed up - so if you delete the original, and have a
problem, the file is gone), or we can be partially faithful.
(2) The Finder doesn't respect BSD ownerships anyway when making
copies.
This depends on whether you are talking about a volume where ownership
is being ignored (e.g. an external thumb driver of firewire drive), or
one where it is not. Unless you specifically exercise administrative
privilege, it *will* respect BSD ownership. The reason is that in
order to do anything, the Finder will make system calls, and I
guarantee you that those system calls will check the credentials of
the calling process, and those credentials will be used to determine
whether or not the calling process is permitted to make the given call
on the given object.
(3) It ignores the traditional behavior that Mac users have come to
expect
(4) The philosophies are in fact not at odds at all they can coexist
since ctime and creation date are distinct
They are distinct, and I never said differently; I specifically named
the five cnode fields in the on disk structure that HFS uses to store
time stamp information.
From a mainframe perspective, a creation date is inviolable, and
almost as holy as an inode (cnode) number, or a disk offset.
The philosophy issue is one of interpretation; should ca_itime reflect
the creation time of the cnode, as it is cnode metadata, or should it
reflect the creation time of the document?
The problem arises because of the conflation of "application data" vs.
"filesystem metadata".
The problem with the historical point of view is that the place Finder
stores this is not some resource off in the resource fork of the file,
but in the actual FS metadata.
Unfortunately, this problem will only get worse for people who want to
take the view that this is application data, as vendors move their
software off the "exchange data" API in favor of the "safe save" API
so that they can reasonably support FS's that don't support an
"exchange data" operation - like the FAT FS you will find on every
digital camera and most other keychain-type drives, as well as cell
phones, and so on.
In this case, by definition, the metadata will not remain unmodified
over a contents update operation, since instead of keeping the cnode
and changing out the extent pointer, what will happen is that the
original will be renamed over top of as an atomic operation. A
rename, in this case, changes the pointer part of the name/pointer
pair in the directory to point to a completely different cnode (one
with its own creation date).
At this point, I'd call this an application level problem: it needs to
save the information via getattrlist() and reset it via setattrlist(),
if it doesn't want it changed.
For a putative backup utility, for example, following a restore, I'd
expect the backup archive to have the ca_bdate that it stored the
files in the archive as archive metadata, and when a file is restored
from backup, I would expect the ca_bdate field of it's cnode to be set
to *that* data, rather than the date at the time of the archive.
Now you could argue that this should be done in the backed up copy of
the metadata, as a metadata modification prior to storing to the
backup media. But you chould at the same time argue that that
destroys information, and you really want to be able to go to a
backup, and find out the last backup date of the files on it, to find
out how important it was for you to not reuse that archive for
something else.
Me, I'd argue it should be an option settable by the user. Obviously
this butts heads with the UI design goal of keeping things simple...
8-).
The BSD behavior for creation dates is to munge it with modified
date, even when creating what are arguably not Apple Doubles
(since the Apple Double spec says that this metadata should be
preserved in the Apple Double) either explicitly based on their
necessity due to foreign file systems or even 'internally' when
copying to HFS based volumes.
FWIW, this may or may not be on purpose
I'd suggest its from a failure to understand the differences of
ctime and creation date coupled with the nature of how unix treats
ctime, further coupled with a failure to understand the heavy
reliance that Mac users traditionally place on this metadatum.
No, that's not it. I have not examined the Libc copyfile() code in
detail in this case (I maintain kernel code, and that lives in user
space), but if the utime() system call updated both, and the
setattrlist() was able to deal with it individually, and the code were
ordered such that utime() was called after setattrlist() rather than
before, it could put the "correct" (from your point of view) value in
there, and then overwrite it with the "correct" (from my point of
view) value.
As I said, this may or may not be intentional behaviour.
it depends on whether the person who wrote the POSIX copy engine
shares my philosophy on things,
Your philosophy is interesting, but I don't see how it applies here.
Moreover this isn't a btime issue either.
It is. If you have code such that:
// Decide whether or not to back up file...
if ( cnode.ca_btime < cnode.ca_itime || // backup earlier than create
cnode.ca_btime < cnode.ca_mtime || // backup earlier than modify
cnode.ca_btime < cnode.ca_mtime) { // backup earlier than
metadata change
// backup the file
...
} else {
// don't backup the file (access time is not important data)
...
}
then a metadata bit-for-bit faithful copy of an already backedup file
will result in a copy that doesn't get backed up.
or if the utime() system call update occurs after the setattrlist
() and carries things along with it, or if updating the creation
time outside of an actual create operation by the FS is considered
privileged - i.e. you have to be "root" to be permitted to cause
the system to lie to the user - or some other reason I haven't
thought of, etc..
Clearly the Finder has no issues maintaining this in userspace. Why
would it be any less possible for BSD?
It's *NOT* "less possible", though the command line tools are not
built with CF; they aren't built with anything for which we do not
publish source code; failure to do that would make Darwin pretty
useless, unless we gave out sources to CF, etc.
It would take more work to deal with this; we're not averse to doing
more work, we've just got to agree on what that work should be; for
example, how many people now depend on the new behaviour? Is there
backup software that depends on it? Etc.? I personally don't know.
Any satisfactory solution that permits backup software to work is
going to need to have ca_btime < ca_itime; effectively, this will mean
munging one or the other of them to make any backup software the user
uses recognize the need for a backup.
-- Terry
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden