Re: Long text file manipulation--BIGGER QUESTION
Re: Long text file manipulation--BIGGER QUESTION
- Subject: Re: Long text file manipulation--BIGGER QUESTION
- From: email@hidden
- Date: Wed, 11 Apr 2001 00:12:10 -0400
On Wed, 4 Apr 2001 15:51:33 -0400, bRaD Weston <email@hidden> asked,
>
Perhaps the question I should be asking then is a bit more complex. The script
that I am creating reads in lines of > text from multiple files, interprets the
data in a variety of ways (part of which determines a CODE). I have no > problem
doing this, but here is where it gets tricky.
>
>
This CODE is used to determine the type of PIN number that is to be read in
from one of nine different PIN files. > For example, if the code is "Georgia",
this means that I have to find a unique PIN from file "Domestic"; if the code >
is "Ontario", this means I have to find a unique PIN from file "Canadian", etc.
But the PINs of each PIN file can > only be used once, and the incoming text
file is sorted and cannot be resorted by CODE -- which is to say that I > am
reading from nine different PIN number text files regularly.
>
>
The PIN files are too large to read into memory because I would need to read
in 9 different files, and the speed of > FileMaker in the back end would slow me
down too much (and I don't know how FileMaker will react to having > several
million or so records in it at a time). You're probably going to tell me to do
it FileMaker anyway, but given > the simplicity of the data (just PINs), with a
type I can determine by the file name, etc., I didn't want to bother with > a
database.
Are you reading sequentially through each file just once? That is, are these
PINs like a one-time pad, where you issue a PIN and then cross it off? If so,
you have a much simpler problem. Just keep track of the byte index of the next
PIN, and seek to it using "read myFile from startingposition"
If you need to access an arbitrary PIN, it may still be easy. If your PINs are
all the same size, (or can be made to be all the same size, for example by
padding with spaces) you can simply perform seeks within the datafile, and treat
it like a large array. If your PINs are 9-digit numbers separated by carriage
returns, just
read myFile from (10*(N-1)+1) for 9
To get PIN number N.
If your PINs are different sizes (within a file), you could preprocess a PIN
file and create an index file, that says, "PINs starting with 5 begin at byte
18354" (if you are searching for a PIN by number), or by saying "PIN number
10000 begins at byte 52355"
But if you are doing more complicated things, you might as well use a database.
You can either use a database application, or write a database application. The
professionally-written, C-coded database will be faster than your one-off,
AppleScripted database.
--
Scott Norton Phone: +1-703-299-1656
DTI Associates, Inc. Fax: +1-703-706-0476
2920 South Glebe Road Internet: email@hidden
Arlington, VA 22206-2768 or email@hidden