Re: fread multiple files

26 Sep 2009


      site_archiver@lists.apple.com
Delivered-To: darwin-dev@lists.apple.com

Thanks again for all replies
Tomek
On 25 Sep 2009, at 22:21, Terry Lambert wrote:

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (Darwin-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl...
In fact one of my "optimization scenarios" was to open the N file
triplets (data, index, label) in N threads. I think I will try it. The
biggest bottleneck though seems to be the fact that XDR implementation
uses fread internally and does not seem to buffer the red data too
well. All my reading and writing goes through XDR calls hence, I don
have control over the internal handling for file stream. I can still
see some space for improvement as XDR stream can be attached to a
memory block, which I could prior read in one go - this though depends
on whether the stream "positions" returned by xdr_getpos are actually
absolute byte positions in a memory block or something else.
You could definitely be doing your three I/Os concurrently instead
of serailly, assuming the index data is not needed to drive the I/O,
at which point you could do your index I/O concurrently and then do
your data file I/O after that (semiserially, in other words).
But if you arranged your code to do the first index read up front
before going into the loop, you could do your *next* index reads
concurrently with your data I/O for your current indices.
You could further increase concurrency , assuming the data is index
linked instead of data linked (i.e. you don't need each set of data
prior to the next set of data) by inlementing a producer/consumer
model, and issuing your I/Os as fast as you can and queueing the
buffered data records for processing (you'd want to rate limit the
number of outstanding I/Os to limit the data you have in hand at one
time, before you ask for more data).
If your data comes in at a rate higher than 1.5 times faster than
you can process it, if you have a sufficiently large pool of work
item queue elements, you could establish high and low water marks on
number of elements in the queue, and then only go back for more data
when you hit the low water mark.
-- Terry

This email sent to site_archiver@lists.apple.com

Tomasz Koziara

tags

participants (1)