Re: Importing/parsing CSV files
Re: Importing/parsing CSV files
- Subject: Re: Importing/parsing CSV files
- From: T&B <email@hidden>
- Date: Wed, 13 Sep 2006 00:30:18 +1000
Following up my mention of my first script:
I also wrote another script a few months back that handles the full CSV spec (including commas and linefeeds within quotes), but it steps though reading each character, so is slow as molasses.
I tested a CSV file containing 324 records, 37 fields, with some quoted values containing linefeeds etc. My old script (character by character) took 10 seconds (on a 2GHz dual Intel) to convert it to a list of lists.
It turns out that this original script isn't as slow as I'd thought, compared to alternatives. It actually processes my test CSV text in about 5 seconds, which is only about twice the time as for my much more complicated newer script (which is too long to post here).
This script steps through the text character by character. It builds up queueText character by character until a comma delimiter is reached, then it flushes queueText as a new field value at the end of queueRow. When a return and/or linefeed is reached, it also flushes queueRow into a new row in queueTable. It uses inQuotes boolean to track whether the current character is within quotes. If inQuotes, then it just treats comma and newline as more text to add to queueText, and "" as ".
I welcome any bug reports or tweaks which speed it up. I've tried optimizing if/then tests etc but they made no difference to speed (but reduced readability).
You're welcome to use the script, just please keep my URL comment in it.
Thanks,
Tom
T&B
property quot : "\"" -- Because earlier Mac OS doesn't know quote constant
property linefeed : ASCII character 10
on CsvToList(csvText)
-- 2006.09.12 T&B http://www.tandb.com.au/applescript/
set queueText to ""
set queueRow to {}
set queueTable to {}
set inQuotes to false
set previousChar to ""
set csvTextLength to length of csvText
repeat with charN from 1 to csvTextLength
set thisChar to character charN in csvText
if thisChar is quot then
if not inQuotes and previousChar is quot then
-- double quote within quotes so actually use a quote
set queueText to queueText & thisChar
end if
set inQuotes to not inQuotes
else if inQuotes then
set queueText to queueText & thisChar
else if thisChar is comma then
set end in queueRow to queueText
set queueText to ""
else if (thisChar is return or thisChar is linefeed) and not inQuotes then
if previousChar is return and thisChar is linefeed then
-- do nothing since new record already created
else
set end in queueRow to queueText
set queueText to ""
set end in queueTable to {} & queueRow
set queueRow to {}
end if
else if charN is csvTextLength then
set queueText to queueText & thisChar
set end in queueRow to queueText
set end in queueTable to {} & queueRow
else
set queueText to queueText & thisChar
end if
set previousChar to thisChar
end repeat
return queueTable
end CsvToList
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden