Re: Long text file manipulation--BIGGER QUESTION
Re: Long text file manipulation--BIGGER QUESTION
- Subject: Re: Long text file manipulation--BIGGER QUESTION
- From: bRaD Weston <email@hidden>
- Date: Thu, 5 Apr 2001 12:10:33 -0400
>
On Wednesday, April 4, 2001, at 05:54 AM, bRaD Weston wrote:
>
>
>I'm working on a project in which I need to manipulate large text files, one line at a time. I've been looking for some sort of OSAX that can do this because I cannot afford the memory and time requirements to read in lots of big (2.5 MB) text files and then save them out, but have been unsuccessful.
>
>
>
>Essentially, I just want to read the first line of the file and delete it from the file and add another line to the bottom of the file.
>
>
While adding something to the end of a file is fast and easy (just say 'write "stuff" to file "blah" starting at eof'), deleting something from the beginning or middle is an inherently expensive operation that requires someone to read everything in the file after the deletion point and write it back out again. Given the tools currently available in AppleScript, "someone" means your script. You don't have to read the entire file at once, so you can avoid a 2.5 MB memory hit, but you're pretty much stuck for it as far as time goes.
>
>
Something like this will work fairly efficiently using only standard additions:
>
>
set source to open for access file "foo"
>
set destination to open for access file "bar" with write permission
>
set eof destination to 0 -- empty out the destination file.
>
read source until return -- discard first line
>
try
>
repeat
>
read source for 16384 -- read 16K worth of what remains.
>
write the result to destination
>
end
>
end
>
close access source
>
write "trailer" to destination
>
close access destination
>
>
You can twiddle the constant 16384 to alter the time vs. maximum memory profile. (Higher numbers should be faster, but will consume more memory.) If you want to mangle the file in place instead of generating an altered copy, the logic is significantly more difficult.
Perhaps the question I should be asking then is a bit more complex. The script that I am creating reads in lines of text from multiple files, interprets the data in a variety of ways (part of which determines a CODE). I have no problem doing this, but here is where it gets tricky.
This CODE is used to determine the type of PIN number that is to be read in from one of nine different PIN files. For example, if the code is "Georgia", this means that I have to find a unique PIN from file "Domestic"; if the code is "Ontario", this means I have to find a unique PIN from file "Canadian", etc. But the PINs of each PIN file can only be used once, and the incoming text file is sorted and cannot be resorted by CODE -- which is to say that I am reading from nine different PIN number text files regularly.
The PIN files are too large to read into memory because I would need to read in 9 different files, and the speed of FileMaker in the back end would slow me down too much (and I don't know how FileMaker will react to having several million or so records in it at a time). You're probably going to tell me to do it FileMaker anyway, but given the simplicity of the data (just PINs), with a type I can determine by the file name, etc., I didn't want to bother with a database.
--
bRaD
----
bRaD Weston
Director of Research and Development
Halo ISDG Inc.
85-C Mill Street
Roswell, GA 30075
(770) 643-2301
FAX: (770) 649-4734
email@hidden
http://www.haloisdg.com