Re: readdir vs. getdirentriesattr
Re: readdir vs. getdirentriesattr
- Subject: Re: readdir vs. getdirentriesattr
- From: Thomas Tempelmann <email@hidden>
- Date: Mon, 22 Apr 2019 18:59:25 +0200
Jim,
thanks for your comments.
If all you need is filenames and no other attributes, readdir is usually
> faster than getattrlistbulk because it doesn't have to do as much
> work. However, if you need additional attributes, getattrlistbulk is
> usually much faster. Some of that extra work done
> by getattrlistbulk involves checking to see what attributes were requested
> and packing the results into the result buffer.
>
What's interesting is that on HFS+, readdir is not faster in my tests, but
on a recent and fast Mac (i.e. not on my MacPro 2010), it can be twice as
fast as the others when scanning an APFS volume. I wonder why. Is the
implementation for getattrlistbulk in the APFS driver inefficient compared
to the one in HFS+? The source code for the APFS FS driver has still not be
published, or has it?
You'll find that lstat is slightly faster than getattrlist (when
> getattrlist is returning the same set of attributes) for the same reason.
> There's no extra code needed in lstat to see what attributes were requested
> and packing the results into the result buffer.
>
It's also significantly faster than using NSURL's getResourceValue, even if
the NSURL has already been created regardless. That's probably due to all
the objc overhead.
By the way, I haven't tested this but I would expect
> enumeratorAtURL:includingPropertiesForKeys:options:errorHandler: (followed
> by a "for (NSURL *fileURL in directoryEnumerator)" loop) to be slightly
> faster than
> contentsOfDirectoryAtURL:includingPropertiesForKeys:options:error: because
> the URLs aren't retained in a NSArray. Using CFURLEnumerator may also be
> slightly faster than NSFileManager's directory enumeration.
>
Now, that's something I had not considered, yet. Will try.
> Using POSIX/BSD APIs will be the fastest, but that means you have to deal
> with the different capabilities between file systems yourself (although
> getattrlistbulk helps with that a lot).
>
*Most interesting, though:*
Today someone pointed out *fts_read*. This does, so far always beat all
other methods, especially if I also need extra attributes (e.g. file size).
Can you give some more information about the fts implementation? Is this
user-library-level oder kernel code that's doing this? I had expected that
this would only be a convenience userland function that uses readdir or
similar BSD functions, but it appears to beat them all, suggesting this is
optimized at a lower level.
I have updated my test project accordingly (with the fts code) in case
anyone likes to run their own tests:
http://files.tempel.org/Various/DirScanner.zip
Also, I am wondering if using concurrent threads will speed up scanning a
dir tree on an SSD as well, by distributing each directory read to one
thread (or dispatch queue). Will eventually try, but probably not soon.
Gotta get my program out of the door soon, first.
Thomas
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden