Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: readdir vs. getdirentriesattr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: readdir vs. getdirentriesattr

Subject: Re: readdir vs. getdirentriesattr
From: Jim Luther <email@hidden>
Date: Mon, 22 Apr 2019 09:11:39 -0700

If all you need is filenames and no other attributes, readdir is usually faster
than getattrlistbulk because it doesn't have to do as much work. However, if
you need additional attributes, getattrlistbulk is usually much faster. Some of
that extra work done by getattrlistbulk involves checking to see what
attributes were requested and packing the results into the result buffer.
You'll find that lstat is slightly faster than getattrlist (when getattrlist is
returning the same set of attributes) for the same reason. There's no extra
code needed in lstat to see what attributes were requested and packing the
results into the result buffer.

The original implementation of CFURLEnumerator (which is the implementation
under NSFileManager's directory enumeration) was readdir followed by
getattrlist requests to get the additional attributes on each item. Before we
even shipped SnowLeopard, the implementation was changed to use
getdirentriesattr if the file system supported it (getattrlistbulk was not
available until several releases later) because of performance improvements.

By the way, I haven't tested this but I would expect
enumeratorAtURL:includingPropertiesForKeys:options:errorHandler: (followed by a
"for (NSURL *fileURL in directoryEnumerator)" loop) to be slightly faster than
contentsOfDirectoryAtURL:includingPropertiesForKeys:options:error: because the
URLs aren't retained in a NSArray. Using CFURLEnumerator may also be slightly
faster than NSFileManager's directory enumeration. Using POSIX/BSD APIs will be
the fastest, but that means you have to deal with the different capabilities
between file systems yourself (although getattrlistbulk helps with that a lot).

- Jim

> On Apr 21, 2019, at 7:35 PM, Thomas Tempelmann <email@hidden> wrote:
>
> I like to add some info on a thread from 2015:
>
> I recently worked on my file search tool (FAF) and wanted to make sure that I
> use the best method to deep-scan directory contents.
>
> I had expected that getattrlistbulk() would always be the best choice, but it
> turns out that opendir/readdir perform much better in some cases, oddly (this
> is about reading just the file names, no other attributes).
>
> See my blog post: https://blog.tempel.org/2019/04/dir-read-performance.html
> <https://blog.tempel.org/2019/04/dir-read-performance.html>
>
> There's also a test project trying out the various methods.
>
> Any comments, insights, clarifications and bug reports are most welcome.
>
> Enjoy,
>  Thomas Tempelmann
>
>
>> On 12. Jan 2015, at 17:33, Jim Luther <email@hidden
>> <mailto:email@hidden>> wrote:
>>
>> getattrlistbulk() works on all file systems. If the file system supports
>> bulk enumeration natively, great! If it does not, then the kernel code takes
>> care of it. In addition, getattrlistbulk() supports all non-volume
>> attributes (getattrlistbulk only supported a large subset).
>>
>> The API calling convention for getattrlistbulk() is slightly different than
>> getattrlistbulk() — read the man page carefully. In particular:
>>
>> • ATTR_CMN_NAME and ATTR_CMN_RETURNED_ATTRS are required (requiring
>> ATTR_CMN_NAME allowed us to get rid of the newState argument).
>> • A new attribute, ATTR_CMN_ERROR, can be requested to detect error
>> conditions for a specific directory entry.
>> • The method for determining when enumeration is complete is different. You
>> just keep calling getattrlistbulk() until 0 entries are returned.
>>
>> - Jim
>>
>>> On Jan 11, 2015, at 9:31 PM, James Bucanek <email@hidden
>>> <mailto:email@hidden>> wrote:
>>>
>>> Eric,
>>>
>>> I would just like to clarify: the new getattrlistbulk() function works on
>>> all filesystem. We don't have to check the volume's VOL_CAP_INT_READDIRATTR
>>> capability before calling it, correct?
>>>
>>> James Bucanek
>>>
>>>>    Eric Tamura     December 10, 2014 at 5:57 PM
>>>> It should be much faster.
>>>>
>>>> Also note that as of Yosemite, we have added a new API:
>>>> getattrlistbulk(2), which is like getdirentriesattr(), but supported in
>>>> VFS for all filesystems. getdirentriesattr() is now deprecated.
>>>>
>>>> The main advantage of the bulk call is that we can return results in most
>>>> cases without having to create a vnode in-kernel, which saves on I/O: HFS+
>>>> on-disk layout is such that all of the directory entries in a given
>>>> directory are clustered together and we can get multiple directory entries
>>>> from the same cached on-disk blocks.
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Filesystem-dev mailing list      (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Follow-Ups:
- Re: readdir vs. getdirentriesattr
  - From: Thomas Tempelmann <email@hidden>

References:
	>Re: readdir vs. getdirentriesattr (From: Thomas Tempelmann <email@hidden>)

Prev by Date: mds vs f_fstypename
Next by Date: Re: readdir vs. getdirentriesattr
Previous by thread: Re: readdir vs. getdirentriesattr
Next by thread: Re: readdir vs. getdirentriesattr
Index(es):
- Date
- Thread