Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: High bandwidth disk management techniques

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: High bandwidth disk management techniques

Subject: Re: High bandwidth disk management techniques
From: Herbie Robinson <email@hidden>
Date: Tue, 3 May 2005 04:32:07 -0400

At 1:24 AM -0700 5/2/05, Doug Wyatt wrote:

Use the Carbon File Manager.

You were recommending that people use Unix I/O and one thread per file (i.e., to avoid async disk I/O). Obviously, the first part of this has changed. What about async disk I/O vs. threads?

When deciding how large a buffer per file to maintain, consider drive seek times. A drive with a 10 ms seek time can only read from 100 different files per second.


Under ideal conditions.  It can degrade from that if there are bad blocks.

I've heard of people reading filesystem data structures in order to get a big picture, for all the reads during a given time interval, what tracks/sectors will be read, and reorder the reads to minimize seek times. But that's beyond my experience.

The disk drivers might do that for you if you queue up very long reads/writes. You just have to make sure you have enough stuff queued so they get a good picture. In reality, though, you don't want too much optimization, because one file could end up getting starved. There should also be a limit to keep the optimization from getting too expensive in terms of CPU time (the algorithms are not linear).

At 11:33 AM +0100 5/2/05, Mark Gilbert wrote:

So - would there be any benefit in starting ALL the async reads one after the other, then wait for all the callbacks (one for each read) to complete, and them move onto my deinterleaving code ? Is there potential benefit to having 64 async reads scheduled ? Will the file system take advantage of efficiency opportunities ?

There should be a huge benefit if you are using more than one drive. Even on a single drive, this would let the driver optimize the I/O if it wants to. The fancy queueing would be more likely to be implemented in SCSI (and maybe Firewire) than in IDE. If nothing else, SCSI hardware will do it for writes without any driver help at all. The ATTO documentation for it's latest SCSI drivers claims it does this kind of optimization.

At 2:31 PM +0100 5/2/05, Ben Dougall wrote:

what's the smallest possible chunk of data that can be pulled from the hard drive in one read?

SCSI allows you to read any number of bytes starting on a sector boundary. For all practical purposes that means you must read whole sectors. Also, the disk controllers generally like to reference entire pages of physical memory; so, you really always want to read pages. If you don't, the OS reads pages into its own disk buffers and copies what you need from them (this is called de-blocking).

is it 4096 bytes?

Not very likely. The most common size is 512, larger powers of two are rarer. Seagate will allow SCSI drives to be formatted up to about 2052 (or maybe 2056) bytes per sector. The drive manufacturers don't like large sector sizes because they would require more error correction bits and that would make the hardware for computing them more expensive.

will it always be the same as the memory page size? or does it depend on the drive so therefore is a variable value?

The value is set when the drive is formatted. The actual values allowed vary depending on the drive and what the OS will support. For all practical purposes, most drives are formatted with 512 byte sectors (because PC BIOS implementations only support 512 byte sectors and that forces everything else).

how can you find out what that value is programmatically if the value is variable?

You can probably read it from the OS as volume info, but there isn't any reason you really want to know. The page size is what is important.

At 3:46 PM +0200 5/2/05, Wolfgang Schneider wrote:

does the OS'es disk scheduler have even a chance of optimizing disk access patterns? I'm speaking of minimizing the read / write heads' path in order to improve access time. Since modern hard disks show only a virtual disk geometry to the outside world, nobody can know where a particular sector actually is, resp. how far two physical sectors are away from each other.

That's not completely true. The OS can find out the geometry if it wants to. Also, the SCSI spec does require that the linear address space be ordered such that reading sequential address off the disk is the fastest way to do it -- even if you don't know exactly how many sectors and cylinders are in a track. In other words, the OS can still schedule at a slightly higher level and leave the drive to schedule the low level details.

That's why NCQ was invented, no? The disk itself is the only one who knows about these things, or am I getting this wrong some way?

If you give the disk a 280 sector read or write in a single SCSI command (about 1 sec worth of audio), there is quite a bit of optimization that can be done (without even dealing with multiple command queueing).

At 3:52 PM +0200 5/2/05, philippe wicker wrote:

Do you think that the "hand rescheduling" of sectors read using file system data structures - as mentioned by Doug in his last post - would do a better job? If yes, any pointers on these structures definitions? Wolfgang mentions the NCQ (native command queuing) approach, is this available on Mac OS X?

It strikes me as a compatibility nightmare, but if you are willing to go for it, it might work. It might well be easier to write one's own file system that was optimized for large writes, though. The payoff would be much much better, too.

------

The issue nobody has mentioned here -- and by far the nastiest one -- is allocation.

When do you do it during writes? Do you pre-allocate before starting recording or allocate new disk blocks on the fly? OS 9 pretty much require pre-allocation. Is OS X better in this regard? I don't actually have much experience doing high track count recording on OS X, yet.

How to get as much contiguous allocation as possible?

--
-*****************************************
**  http://www.curbside-recording.com/  **
******************************************
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: High bandwidth disk management techniques
From: Ben Dougall <email@hidden>
Re: High bandwidth disk management techniques
From: Doug Wyatt <email@hidden>


References:  
  >High bandwidth disk management techniques (From: Mark Gilbert <email@hidden>)
  >Re: High bandwidth disk management techniques (From: Doug Wyatt <email@hidden>)




Prev by Date:
Re: High bandwidth disk management techniques

Next by Date:
Re: High bandwidth disk management techniques

Previous by thread:
Re: High bandwidth disk management techniques

Next by thread:
Re: High bandwidth disk management techniques

Index(es):

Date
Thread