Re: How can I access mnt_devblocksize from user space?
Re: How can I access mnt_devblocksize from user space?
- Subject: Re: How can I access mnt_devblocksize from user space?
- From: Kevin Elliott <email@hidden>
- Date: Tue, 23 Sep 2008 09:38:58 -0700
mnt_devblocksize corresponds to the IOKit property "Preferred Block
Size". Preferred is a small misnomer there- generally speaking
storage hardware is not byte oriented but block oriented. You can
only talk to the hardware in chunks of some fixed size. Typically
hard drives use a block size of 512, removable devices have generally
used 2k or 4k, and CD-burners have used a selection of odd sizes
depending on the data involved. RAID system often use a larger page
size to simplify dividing the data among devices.
Grabbing the block size using IOKit is fairly simple, but I'm not
finding a simple example of it on our website.
The basic overview is:
-Find the dev node for the volume your interested in.
-Find the IOKit object corresponding to that dev node.
-Grab the "Preferred Block Size" property attached to the IOKit object
you found.
-And your done...
Check out the sample below for a rough example of how to do what I'm
describing:
http://developer.apple.com/samplecode/VolumeToBSDNode/index.html
Hopefully that helps!
-Kevin Elliott
On Sep 23, 2008, at 8:39 AM, Jim Luther wrote:
I think getattrlist()'s ATTR_VOL_MINALLOCATION is as close as you'll
get to mnt_devblocksize in userland from the filesystem. I'll let
someone who knows how to talk to the device layer below the
filesystem give you advice how to get the value from there. (i.e.,
I'm not going to guess :-)
- Jim
On Sep 22, 2008, at 11:19 PM, Sam Vaughan wrote:
When getting direct I/O running as fast as possible, it's important
to align the file offsets of every request to avoid the kernel
having to call cluster_copy_upl_data to uiomove everything. The
performance penalty of that is very high and should be easily
avoidable.
I wrote a very simple C program to play around with that opens a
file, sets F_NOCACHE on it and starts issuing 1MB preads from an
offset passed in from the command line, stopping when it hits EOF.
The destination buffer for all the reads is always page aligned.
Running the test tool with the 'time' Bash built-in or monitoring
it with Shark or dtrace quickly shows the problem. If the initial
offset is zero, the reads are fast and the kernel CPU usage is very
low. If the initial offset is something nasty,
cluster_copy_upl_data gets involved, kernel CPU usage shoots up and
the reads are slow.
For a long time I'd simply assumed that as long as the memory was
page aligned and the disk offset was 512 byte sector aligned, no
copies would ever be needed. Then about a year ago I was working
on code to read 2k uncompressed video and I discovered that on many
RAIDs, the alignment needs to be to 4k offsets in the file to avoid
the copies occurring.
What I'd like to know is whether this alignment requirement for any
given volume is easily accessible from user space, because I'd like
to set it dynamically.
Empirical testing using my little C program shows that my build
machine's local disk only requires 512 byte alignment to avoid the
copies, but my laptop, my home machine's software RAID and my test
machine's hardware RAID all require 4k alignment.
I've been using a dtrace script to detect calls to
cluster_copy_upl_data because the backtraces in Shark (and indeed
the output from a call to stack() in dtrace) seem so
untrustworthy. They both claim that cluster_read_ext calls
cluster_pageout for instance! Anyway, after reading some
cluster_vfs code I added a line to my dtrace script to save off vp-
>v_mount->mnt_devblocksize when cluster_read_ext is entered. Sure
enough, it contains the correct magic value wherever I run my
test. (dtrace really is awesome :o)
Looking in stat, statfs and getattrlist, I haven't been able to
find a field that exposes this value to user space. Browsing
through xnu in cscope, the getvolattrlist function looks promising,
but it turns out that it will only return mnt_devblocksize if the
user asked for f_bsize and the file system doesn't support that
attribute.
I'm wondering if I've missed something obvious in the above APIs,
or whether there's a better way to get at the mnt_devblocksize
field of a mount_t structure from user space. Has anyone tried to
do this before, or is the general idea to simply go with 4k
alignment and leave it at that?
Thanks in advance for any ideas,
Sam
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden