Re: Importing flat files into FMP
Re: Importing flat files into FMP
- Subject: Re: Importing flat files into FMP
- From: has <email@hidden>
- Date: Tue, 26 Nov 2002 18:08:59 +0000
Eric Schult wrote:
>
I'm trying to import flat text files into FileMaker Pro, and so far it's
>
VERY slow going.
[...]
>
Still, I'm processing
>
only a few hundred records per hour, and at that rate, this project is going
>
to take months.
Your script is large and complex with tons of osax calls. Extracting data
from the flat files could be done using TIDs and/or regular expressions in
a few lines. This would greatly simplify your code, with a corresponding
improvement in speed.
>
(And I can't even be sure how well FMP Unlimited will
>
perform with a database of this size. I'll have about two dozen users
>
accessing it, fairly infrequently.)
Not a DB expert, so couldn't tell you myself. But you should find out
_before_ proceeding further, in case you have to switch to a heavier-duty
solution.
>
Can anybody guide me about speeding up this process, and tell me whether I'm
>
kidding myself about FMP handling this?
Based on the flat file sample you gave [1], and assuming the format is
consistent, it's possible to extract the info using a single regular
expression (uses Satimage OSAX) plus a spot of TIDs work. (It could all be
done with TIDs, but the regexp is more convenient.)
======================================================================
property headerPattern : re_compile "DATE: (.*)
PUBLICATION: (.*)
CATEGORY: (.*)
AUTHOR: (.*)
LOCATION: (.*)
*"
property mainTextLabel : "
FULL TEXT:
"
on extractFieldsList(staufferStr)
set AppleScript's text item delimiters to mainTextLabel
set {headerText, mainText} to {text item 1, text (text item 2)
[NO-BREAK]thru -1} of staufferStr
set headerItems to paragraphs of (change headerPattern into
[NO-BREAK]"\\1\\r\\2\\r\\3\\r\\4\\r\\5" in headerText with regexp)
return headerItems & mainText --> {date, pub, cat, author, loc,
[NO-BREAK]fulltext}
end extractFieldsList
-------
--TEST
set str to "DATE: Thu 01-Oct-2002
PUBLICATION: JT
CATEGORY: SP
AUTHOR: WIRE
LOCATION: D3
FULL TEXT:
Giant step
By PAUL NEWBERRY
Associated Press
ATLANTA Barry Bonds didn9t have to come up big for the San
Francisco Giants to get a jump on the Atlanta Braves.
The rest ..."
extractFieldsList(str)
--> {"Thu 01-Oct-2002", "JT", "SP", "WIRE", "D3", " Giant step ... "}
======================================================================
Alternatively, you could mash your Stauffer-exported files into one big
tab-delimited file for subsequent import into FMP.
======================================================================
(* can't recall what FMP replaces tabs and returns with in tab-delim
tables, so have used space and vtab here; modify as necessary *)
property newTabChar : space
property newReturnChar : ASCII character 11 -- vertical tab
property staufferPattern : re_compile "DATE: (.*)
PUBLICATION: (.*)
CATEGORY: (.*)
AUTHOR: (.*)
LOCATION: (.*)
*FULL TEXT:
*(.*)"
--
on filesInFolderAsAliases(theFolder)
tell application "Finder"
try -- kludge
return files of theFolder as alias list
on error -- 'alias list' breaks if only one file
return {(files of theFolder) as alias}
end try
end tell
end filesInFolderAsAliases
--
on convertStaufferStringToTabDelim(str)
set str to change tab into newTabChar in str
set str to change staufferPattern into
[NO-BREAK]"\\1\\t\\2\\t\\3\\t\\4\\t\\5\\t\\6" in str with regexp
set str to change return into newReturnChar in str
return str
end convertStaufferStringToTabDelim
--
on processStaufferFolder(inputFolder, outputFile)
open for access outputFile with write permission returning
[NO-BREAK]outputFileRef
script speedKludge
property lst : filesInFolderAsAliases(inputFolder)
end script
repeat with fileRef in speedKludge's lst
set staufferStr to read (fileRef's contents)
set tabStr to convertStaufferStringToTabDelim(staufferStr)
write (tabStr & return) to outputFileRef
end repeat
close access outputFileRef
return
end processStaufferFolder
-------
--TEST
set inputFolder to alias "frank:staufferTest:"
set outputFile to "frank:staufferOut"
processStaufferFolder(inputFolder, outputFile)
======================================================================
Since you have such a huge number of files to get through, you may be
better writing this script in Perl instead. Perl is much better suited to
heavy text processing than AS, and should be significantly faster. Someone
like JD may be able to help you with that.
has
[1] Your sample flat file seems to lack some fields mentioned in your code.
Since the code shown is likely to need modified as a result, I'd recommend
showing a sample file containing _all_ relevant field labels next time. And
if some fields are optional, appearing in some flat files but not others,
indicate them.
--
http://www.barple.pwp.blueyonder.co.uk -- The Little Page of AppleScripts
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.