Re: How to parse a textfile ?
Re: How to parse a textfile ?
- Subject: Re: How to parse a textfile ?
- From: Joseph Weaks <email@hidden>
- Date: Wed, 29 Sep 2004 00:35:40 -0500
Ok, how about anyone want to take a stab at the format of a data file I
need to parse? I've changed the subject matter for this example to try
and make the structure easier to understand. Here is a sample excerpt
from a data file:
California = 3
Rhode Island Really an island?
Texas Friendship state? = 8
Dallas
Grassy noll
State Fair
Houston
Austin = 3
Memorial stadium
Barton Springs
San Antonio = 2
There are potentially 5 different values to be parsed:
theState, it is preceded by 6 spaces. EVERY record has a state. If no
city or cities are listed for a state, the state becomes the "Place" as
well.
theComment, an optional comment, preceded by 2 spaces, reliably the
only double space in the paragraph, and only associated with states
thePlace, to be used as the "term" or "key" for each record. When
available, it is the city, which is on a line preceded by 12 spaces.
Every city will be a subheading of a "state paragraph" which becomes
part of its record
theSites, an optional listing of one or two or even three paragraphs of
sites associated with a place/city. Preceded by 18 spaces. Sites
paragraphs only follow cities, not state paragraphs.
theCount, every state and city has the potential of an optional count,
always the last word of a paragraph, preceded by " = ". Note counts
associated with states are ignored if there is a city division.
File contains a few lines at the beginning that are to be ignored. They
don't contain the 6 spaces. But every line after that does, with the
exception of a possible empty paragraph at the end.
I've been placing the 5 values in associated lists, so that I can
reference then as item theIndex of placeList, item theIndex of
stateList, etc. The above file would look like this, parsed out:
placeList
-- {"California","Rhode Island","Dallas","Houston","Austin","San
Antonio"}
stateList
-- {"California","Rhode Island","Texas","Texas","Texas","Texas"}
commentList
-- {"","Really an island?","Friendship state?","Friendship
state?","Friendship state?","Friendship state?"}
theSites
-- {"","","Grassy noll, State Fair","","Memorial Stadium, Barton
Springs",""}
theCount
-- {"3","","","","3","2"}
So, for instance, setting theIndex to 6, the resulting record might be:
{thePlace: "San Antonio", theState: "Texas", theComment: "Friendship
State?", theSites: "", theCount: 2}
Of course, I'm dealing with datafiles of 500 to 1000 entries, which
equals as many as 3 paragraphs per entry! My vanilla applescript repeat
routine takes a LONG time.
Joe Weaks
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden