Reading large texts
Reading large texts
- Subject: Reading large texts
- From: Paul Berkowitz <email@hidden>
- Date: Wed, 03 Apr 2002 00:03:50 -0800
On 4/2/02 10:37 PM, "email@hidden" <email@hidden> wrote:
>
I was under the impression that there was a size limit to a string variable
>
(I'm reading a lot of data)... If there isn't, your suggestion would
>
definitely be helpful. Any additional clarification on string variable size
>
limitations?
>
>
Paul wrote:
>
>
> Hmm. i usually read the whole text in at once, then use AppleScript's text
>
> item delimiters {return} and {tab} as needed to get the separate items.
>
> That
>
> should be much faster than reading a line at a time since tids don't require
>
> a scripting addition (read) call.
I haven't found any problem with that. I do recall scripts here reading text
in 'chunks' of 30 K to avoid some sort of problem along those lines: I'm
sure someone will write in explaining why. Perhaps it was for an older
version of AS (I'm using 1.8.2b3).
One thing to bear in mind, however, if you are ever tempted just to use tab
as delimiter without going through the separate lines (return delimiter)
first, is that lists formed by 'set ls to text items of largeText' have a
limit of about 4600 items or you get a "Stack overload" error. Here's a
handler that can fix that:
set f to open for access someFile
set r to read f
close access f
set AppleScript's text item delimiters to {tab}
try
set tabItems to text items of r
on error -- more than ~4600, too many for list
set s to {}
set a to 1
set z to 4000
set done to false
repeat until done
try
set AppleScript's text item delimiters to {tab}
set tabItems to text items a thru z of r
on error -- last group, fewer than 4000 left
set tabItems to text items a thru -1 of r
set done to true
end try
set s to s & tabClumps
set {a, z} to {a + 4000, z + 4000}
end repeat
set tabItems to s
end try
set AppleScript's text item delimiters to {""}
tabItems -- the list
Another is that if you are getting the lines of r first, it's quicker to
operate on
paragraph i of r
in a repeat loop than on 'item i' of a list based on {return} as tid -
UNLESS you use the prefix 'my' before the list in a top-level script, or use
a script object in a handler along the lines spelled out by Arthur Knapp in
another thread a few hours ago. Those are even faster than 'paragraph' as I
tested a few days ago (see MacScrpt list last week).
--
Paul Berkowitz
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.