Apple

On Feb 2, 2011, at 3:36 PM, James Bucanek wrote:

One customer is writing and saying my application takes forever to get started when the destination file is on his file server. On further investigation, the call to FSSetForkSize is percolating down and calling pwrite, which is writing gigabytes of data over the network:

(snipped backtrace)

I thought it might be an unusual file system or server, but they assure me that it's a PowerMac and Mini, both running 10.6, over ethernet. The Mini is connected via Firewire to a IOMega RAID (0) formatted HFS+.

It may have something to do with the network protocol being used (especially AFP vs. NFS or SMB or other).

I confirmed that this is happening with a similar configuration here, which really surprises me. On local volumes, an FSSetForkSize is as fast as a delete, and simply assigns free blocks to the file.

Sort of. If you then closed the fork on the local volume (HFS), the close would take a long time as the kernel was busy writing out the gigabytes of zeroes.

On Mac OS X, HFS follows the POSIX behavior of requiring that all unwritten ranges of a file must return zeroes when read back. Since HFS doesn't support sparse files, while the file is still open, it keeps track of which ranges of the file have been written and which haven't. This is how it knows to return zeroes for that allocated and unwritten part of the file. But when the file is closed, that in-memory state needs to be deallocated, so HFS is forced to zero fill all of those newly allocated but never written ranges.

The File Manager APIs tend to have somewhat different implementations for HFS-like file systems and the rest. I'll bet that you're seeing that extra writing of zeroes because the File Manager can't tell whether the network file system (and the file system on the server itself) supports sparse files, and because it is trying to be compatible the the historic behavior of FSSetForkSize on HFS where it not only sets the size of the fork, but also allocates all of that new space.

Confused by that last sentence? Consider a file system like UFS which supports sparse files. Should FSSetForkSize merely set the end of file, or should it also allocate the space between the old EOF and the new EOF? The only file system agnostic way to do that is to write zeroes to the newly allocated space. Unfortunately, there's no standard way of getting a file system which supports sparse files to then allocate some range that was previously sparse, other than writing to it.

Next I tried to use FSAllocateFork on the network volume and it returns almost instantly, but it also doesn't appear to do anything. Testing the free space on the volume before and after the FSAllocateFork call returns the same value. Setting the fork size after the FSAllocateFork call produces the same results as before. So it doesn't appear that anything was actually allocated by FSAllocateFork.

From the documentation for FSAllocateFork:

The FSAllocateFork function attempts to allocate requestCount bytes of physical storage starting at the offset specified by the positionMode andpositionOffset parameters. For volume formats that support preallocated space, you can later write to this range of bytes (including extending the size of the fork) without requiring an implicit allocation.

Notice the "attempts to allocate" wording. HFS is one of the few file systems that is capable of having the physical file size be significantly larger than the logical size (more than just rounding up to a multiple of an allocation block). Most file systems cannot physically allocate extra space without also changing the logical EOF. In the case of the network volume, I would expect the actualCount output to be zero, indicating that it didn't actually allocate any additional physical space.

I repeated these same tests on a local hard drive and got exactly what I expected: FSAllocateFork immediately reduced the available disk space, and FSSetForkSize immediately allocated the data to the file and returned.

That's because HFS is capable of allocating physical space to a file well beyond its logical EOF.

Can anyone shed some light on to what's going on? I'd really like to avoid this, and there are several other situations where I set the size of a fork and don't want massive amounts of data to be written over a network.