• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: How can I access mnt_devblocksize from user space?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How can I access mnt_devblocksize from user space?


  • Subject: Re: How can I access mnt_devblocksize from user space?
  • From: Sam Vaughan <email@hidden>
  • Date: Thu, 25 Sep 2008 12:12:35 +1000

On 25/09/2008, at 4:27 AM, Kevin Elliott wrote:

It's a lot less fragile than you think. Functionally speaking, that key name is equivalent to a fcntl selector- changing it would break lots of code, and lots of code relies on it being there. It isn't going anywhere.

That's good to know. The concern from my perspective is that named properties in IOKit nodes don't make for as concrete an interface as structures and enumerations in BSD header files. They're simply too easy to change. In a past life as a clustered file system weenie I was burnt several times by changes to the layout of the IOKit registry. I understand that things are more mature now, but old wounds take time to heal. :o)


Understand, the BSD layer sits on top of IOKit, not the other way around. Any of the fcntl that provide data on hardware get that data using IOKit. Indeed, if you look at the source for IOMediaBSDClient you'll find a big switch statement that maps fcntl selectors to IOKit keys, then uses those keys to get the requested data.

I understand the layering of BSD over IOKit. I just think that in this case there's a hole in the BSD layer forcing me to reach through and do some messy IOKit stuff when it could all be so much simpler. I doubt there are many programmers who wouldn't choose a one line BSD call over the IOKit alternative. That's why I was asking whether an enhancement request might be well received.


A field in struct statfs would probably be more appropriate than a new fcntl since there are so many other similar properties there already. It's a bit odd really when you think about it. There's not much point advertising an optimal transfer block size if you don't also point out that it will only give good results if the file offsets used are aligned to 4k boundaries.

Yes, It's quite possible you'll see greater than 4k. 8k is fairly common on the mid to high end, and I think I've heard of 16k on the very high end. Basically as the number of drives in a RAID increases, it's common to increase the block size to encourage the OS to use read sizes that give good performance.

That's interesting. I wonder whether the cluster vfs layer ends up in cluster_copy_upl_data when mnt_devblocksize is a multiple of 4k. I'd have thought that once the page size barrier is crossed, pages could simply be remapped in upls as appropriate. I wonder if Joe reads filesystem-dev...


If anyone's curious and might have such a RAID, it's easy to find out. Here's how I've been doing my testing:

- - - - - - - -

Save the two files at the bottom of this email to disk.

Compile the C file:

    $ gcc -o devblocksize devblocksize.c

Create a symlink to a large file called "bigfile":

    $ ln -s /some/large/file bigfile
    $ ls -lLh bigfile
    -rw-r--r--  1 samv  staff   807M Sep 19  2005 bigfile

Run the test from starting offset zero:

    $ sudo su
    # ./devblocksize.d & time ./devblocksize bigfile 0; kill %1
    Read 0x3274eb76 (846523254) bytes

    real    0m2.031s
    user    0m0.001s
    sys     0m0.270s

    mnt_devblocksize is 0x1000 (4096)

                 Function          Calls
    cluster_copy_upl_data              1
         cluster_read_ext            775
           pread_nocancel            775

                 Function    Total Bytes
    cluster_copy_upl_data           2934
           pread_nocancel      812646400

                 Function  CPU Time (ns)
    cluster_copy_upl_data          15862
         cluster_read_ext      221218438
           pread_nocancel      225094068


Note that on this volume, mnt_devblocksize is 4k. The starting offset of zero is aligned, so cluster_copy_upl_data was only called once, to copy the final 2934 bytes of the file. Run the test again from starting offset 512:


    # ./devblocksize.d & time ./devblocksize bigfile 512; kill %1
    Read 0x3274eb76 (846523254) bytes

    real    0m3.881s
    user    0m0.002s
    sys     0m0.795s

    mnt_devblocksize is 0x1000 (4096)

                 Function          Calls
         cluster_read_ext            779
           pread_nocancel            779
    cluster_copy_upl_data            780

                 Function    Total Bytes
           pread_nocancel      816840704
    cluster_copy_upl_data      817162614

                 Function  CPU Time (ns)
    cluster_copy_upl_data      413313593
         cluster_read_ext      721742406
           pread_nocancel      726990224

This time cluster_copy_upl_data is used every time, and accounts for more than half the total CPU time of the operation.

- - - - - - - - devblocksize.c - - - - - - - -

#include <fcntl.h>
#include <stdio.h>
#include <sys/param.h>

#define IO_SIZE (1 << 20)

#define BAIL_IF(cond, fmt, args...) \
    if (cond) {fprintf(stderr, fmt, ##args); err = 1; goto bail;}
#define PBAIL_IF(cond, func) \
    if (cond) {perror(#func " failed"); err = 1; goto bail;}

int main(int argc, char** argv)
{
    int err = 0;

BAIL_IF(argc != 3, "usage: %s <file> <start-offset>\n", basename (argv[0]));

    off_t off = strtoll(argv[2], NULL, 10);

    char* buf = (char*)valloc(IO_SIZE);
    PBAIL_IF(!buf, valloc);

    int fd = open(argv[1], O_RDONLY, 0);
    PBAIL_IF(fd < 0, open);

    err = fcntl(fd, F_NOCACHE, 1);
    PBAIL_IF(err < 0, fcntl);

    ssize_t bytes;
    for (bytes = IO_SIZE; bytes == IO_SIZE; off += bytes)
    {
        bytes = pread(fd, buf, IO_SIZE, off);
        PBAIL_IF(bytes < 0, pread);
    }

    fprintf(stderr, "Read 0x%llx (%lld) bytes\n", off, off);

bail:
    if (buf)
        free(buf);

    return err;
}

- - - - - - - - devblocksize.d - - - - - - - -

#!/usr/sbin/dtrace -s

#pragma D option quiet

:mach_kernel:pread_nocancel:entry /execname=="devblocksize"/
{
    @agg[probefunc] = count();
    @sum[probefunc] = sum(((struct pread_nocancel_args*)arg1)->nbyte);

    self->start[probefunc] = vtimestamp;
}

::cluster_read_ext:entry /execname=="devblocksize"/
{
    blocksize = ((vnode_t)arg0)->v_mount->mnt_devblocksize;
    @agg[probefunc] = count();

    self->start[probefunc] = vtimestamp;
}

::cluster_copy_upl_data:entry /execname=="devblocksize"/
{
    @agg[probefunc] = count();
    @sum[probefunc] = sum(*(int*)arg3);

    self->start[probefunc] = vtimestamp;
}

:mach_kernel:pread_nocancel:return,
::cluster_read_ext:return,
::cluster_copy_upl_data:return /execname=="devblocksize"/
{
    this->time = vtimestamp - self->start[probefunc];
    @times[probefunc] = sum(this->time);
}

dtrace:::END
{
    printf("\nmnt_devblocksize is 0x%x (%d)\n", blocksize, blocksize);

    printf("\n!s  s\n", "Function", "Calls");
    printa("!s  %@13d\n", @agg);

    printf("\n!s  s\n", "Function", "Total Bytes");
    printa("!s  %@13d\n", @sum);

    printf("\n!s  s\n", "Function", "CPU Time (ns)");
    printa("!s  %@13d\n", @times);
}

- - - - - - - -


_______________________________________________ Do not post admin requests to the list. They will be ignored. Filesystem-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden
References: 
 >How can I access mnt_devblocksize from user space? (From: Sam Vaughan <email@hidden>)
 >Re: How can I access mnt_devblocksize from user space? (From: Jim Luther <email@hidden>)
 >Re: How can I access mnt_devblocksize from user space? (From: Kevin Elliott <email@hidden>)
 >Re: How can I access mnt_devblocksize from user space? (From: Sam Vaughan <email@hidden>)
 >Re: How can I access mnt_devblocksize from user space? (From: Kevin Elliott <email@hidden>)

  • Prev by Date: Re: How can I access mnt_devblocksize from user space?
  • Next by Date: How to get Volume List of Ejected Volume
  • Previous by thread: Re: How can I access mnt_devblocksize from user space?
  • Next by thread: How to get Volume List of Ejected Volume
  • Index(es):
    • Date
    • Thread