On 9/22/04 1:36 AM, Philippe Wicker didst favor us with:
> On Sep 22, 2004, at 4:50 AM, Mark Dawson wrote:
>> I need to load in information from 1000's of files at launch (I'm
>> looking inside my app's SharedSupport folder and within the user's and
>> possibly system Application Support folder). The information doesn't
>> need to be available at launch (the user can be told to "wait" until
>> it is), so it seems like a perfect candidate to hand off to tasks to
>> do (it would also allow my program to launch faster). I've never
>> used threads or tasks, though, and the sample code that I found
>> (MPFileCopy) that seemed to match most closely to what I was doing was
>> last modified in 1999, and extensively references OS 9 (I will be
>> running on 10.2 or later).
>> Does anyone know if MPFileCopy is still the best of example of using
>> tasks and file system calls? Does anyone know of any more updated
>> information/example code for thread/and or task & file system usage
>> for OS X?
>> I would assume that preemptive tasks would be the way to go, vs
>> threads. I'd be looking for certain file types within certain
>> folders, so I think spawning tasks for each folder that exists would
>> work out (min of 2 tasks, as many as 6).
> Some times ago I did a little benchmark program to compare the
> throughput that could be reached when reading a bunch of files from one
> thread or a pool of threads. The results were surprising:
> On the internal drive: multi-threaded version gave a twice longer time
> (yes longer !!) than the mono thread version.
Why is this surprising? Using multiple threads doesn't really speed anything
up unless they're running on different processors (and that's not
significant if the thread is I/O bound). In fact, the more threads you have
running the more overhead there is for switching among the threads.
The advantage of using a thread to read or write files is that it allows the
rest of the application to be more responsive. But if you spawn multiple
threads doing disk I/O you risk causing the drive head to move around a lot,
which will kill performance. For example, you spawn two threads, A and B to
read files FileA and FileB. In the middle of reading FileA, the system gives
time to thread B, which causes the drive to move to FileB. Then it gives
time to thread A again, so the head has to move to FileA and so on.
Absolutely nothing will kill your performance like thrashing the disk.
> On an external firewire drive; both methods gave the same numbers.
This could simply be a matter of how the files are laid out on the disk. The
closer they are together, the less of a hit you'll take when moving the
head. If the files on the internal disk were all over the disk and on the
external were close together, you'd see a difference.
> It looks like disk commands are not reordered by the system driver
> and/or the device firmware. In your case, I would suggest to start one
> additional thread (using the POSIX API for instance) that will do the
> files loading.
Agreed. Depending on what you need to do with the files, you may want one
thread reading and another thread processing what's been read (then ideally
they run on different processors). There are probably a number of variables
here we don't know, so it's not possible to recommend the best solution,
except to say don't spawn multiple threads to do the reading.
> Your main thread would do its job concurrently and
> possibly wait (or better periodically poll) at some point for the
> "disk" thread to finish.
Do not post admin requests to the list. They will be ignored.
Mt-smp mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden