Re: How can I access mnt_devblocksize from user space?
Re: How can I access mnt_devblocksize from user space?
- Subject: Re: How can I access mnt_devblocksize from user space?
- From: Jim Luther <email@hidden>
- Date: Tue, 23 Sep 2008 08:39:45 -0700
I think getattrlist()'s ATTR_VOL_MINALLOCATION is as close as you'll
get to mnt_devblocksize in userland from the filesystem. I'll let
someone who knows how to talk to the device layer below the filesystem
give you advice how to get the value from there. (i.e., I'm not going
to guess :-)
- Jim
On Sep 22, 2008, at 11:19 PM, Sam Vaughan wrote:
When getting direct I/O running as fast as possible, it's important
to align the file offsets of every request to avoid the kernel
having to call cluster_copy_upl_data to uiomove everything. The
performance penalty of that is very high and should be easily
avoidable.
I wrote a very simple C program to play around with that opens a
file, sets F_NOCACHE on it and starts issuing 1MB preads from an
offset passed in from the command line, stopping when it hits EOF.
The destination buffer for all the reads is always page aligned.
Running the test tool with the 'time' Bash built-in or monitoring it
with Shark or dtrace quickly shows the problem. If the initial
offset is zero, the reads are fast and the kernel CPU usage is very
low. If the initial offset is something nasty,
cluster_copy_upl_data gets involved, kernel CPU usage shoots up and
the reads are slow.
For a long time I'd simply assumed that as long as the memory was
page aligned and the disk offset was 512 byte sector aligned, no
copies would ever be needed. Then about a year ago I was working on
code to read 2k uncompressed video and I discovered that on many
RAIDs, the alignment needs to be to 4k offsets in the file to avoid
the copies occurring.
What I'd like to know is whether this alignment requirement for any
given volume is easily accessible from user space, because I'd like
to set it dynamically.
Empirical testing using my little C program shows that my build
machine's local disk only requires 512 byte alignment to avoid the
copies, but my laptop, my home machine's software RAID and my test
machine's hardware RAID all require 4k alignment.
I've been using a dtrace script to detect calls to
cluster_copy_upl_data because the backtraces in Shark (and indeed
the output from a call to stack() in dtrace) seem so untrustworthy.
They both claim that cluster_read_ext calls cluster_pageout for
instance! Anyway, after reading some cluster_vfs code I added a
line to my dtrace script to save off vp->v_mount->mnt_devblocksize
when cluster_read_ext is entered. Sure enough, it contains the
correct magic value wherever I run my test. (dtrace really is
awesome :o)
Looking in stat, statfs and getattrlist, I haven't been able to find
a field that exposes this value to user space. Browsing through xnu
in cscope, the getvolattrlist function looks promising, but it turns
out that it will only return mnt_devblocksize if the user asked for
f_bsize and the file system doesn't support that attribute.
I'm wondering if I've missed something obvious in the above APIs, or
whether there's a better way to get at the mnt_devblocksize field of
a mount_t structure from user space. Has anyone tried to do this
before, or is the general idea to simply go with 4k alignment and
leave it at that?
Thanks in advance for any ideas,
Sam
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden