Re: fread multiple files
Re: fread multiple files
- Subject: Re: fread multiple files
- From: Terry Lambert <email@hidden>
- Date: Fri, 25 Sep 2009 14:21:20 -0700
You could definitely be doing your three I/Os concurrently instead of
serailly, assuming the index data is not needed to drive the I/O, at
which point you could do your index I/O concurrently and then do your
data file I/O after that (semiserially, in other words).
But if you arranged your code to do the first index read up front
before going into the loop, you could do your *next* index reads
concurrently with your data I/O for your current indices.
You could further increase concurrency , assuming the data is index
linked instead of data linked (i.e. you don't need each set of data
prior to the next set of data) by inlementing a producer/consumer
model, and issuing your I/Os as fast as you can and queueing the
buffered data records for processing (you'd want to rate limit the
number of outstanding I/Os to limit the data you have in hand at one
time, before you ask for more data).
If your data comes in at a rate higher than 1.5 times faster than you
can process it, if you have a sufficiently large pool of work item
queue elements, you could establish high and low water marks on number
of elements in the queue, and then only go back for more data when you
hit the low water mark.
-- Terry
On Sep 25, 2009, at 1:40 PM, Tomasz Koziara wrote:
David, Terry, Jens - thanks for your answers
Here are the attached shark and sampler outpus:
http://people.civil.gla.ac.uk/~koziara/solfec.shark.txt.gz
http://people.civil.gla.ac.uk/~koziara/solfec.trace.tar.gz
Indeed - I do not use MPI IO. I use an XDR based binary output
instead. Would be desirable if the post-processor would not require
MPI, hence I stick to XDR for now. Of course it could change, once I
am not able to overcome the efficiency issues. But then I would
probably go for HDF5.
The post-processing process (single-thread) simply opens all 3 * N
files and keeps them open. Of those 3 * N files, only N store some
substantial data (are large), while the rest stores indexing and
labeling data. So basically what happens is:
For each time t
For each output file f
Seek (f, t)
Read (f)
Tomek
On 25 Sep 2009, at 20:20, Jens Alfke wrote:
Could you post the actual output of running 'sample' on your single-
threaded tool? I think you may be misinterpreting the sampling.
—Jens
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden