Re: Importing/parsing CSV files
Re: Importing/parsing CSV files
- Subject: Re: Importing/parsing CSV files
- From: T&B <email@hidden>
- Date: Tue, 12 Sep 2006 14:42:56 +1000
Following up my last post:
It's the allowance for quotes, and commas and newlines (linefeeds) within those quotes that makes the parsing difficult.
I also wrote another script a few months back that handles the full CSV spec (including commas and linefeeds within quotes), but it steps though reading each character, so is slow as molasses.
I tested a CSV file containing 324 records, 37 fields, with some quoted values containing linefeeds etc. My old script (character by character) took 10 seconds (on a 2GHz dual Intel) to convert it to a list of lists.
It seems to me that the power of AppleScript's text item delimiters is more than up to the task. Hasn't anyone created a CSV script? How have the rest of you dealt with the need to import CSV files? It seems to be a very common requirement.
In the absence of anyone else replying with a solution (tho thanks for the many comments), I wrote my own new script that uses AppleScript's text item delimiters to do the work. Basically, it replaces all the linefeeds and commas that are not in quotes, with temporary alternatives, then parses the converted text into a list of lists. I thought it was a neat and efficient solution, but the new script took 197 seconds on the same test file, much longer than I'd hoped. With some performance
tweaking, I got this figure down to 93 seconds, but that's still 10 times longer than my old character by character script. Most of the time seems taken taken to get and set items in a large list, though the actual parsing of text is quite fast.
So, I tried another approach, in a third script, which does this:
Parse the csvText into paragraphs. (AppleScript accepts CR, LF or CRLF).
Repeat with lineItem in each paragraph.
Parse lineItem into commaDelimitedList (AppleScript's text item delimiter = comma)
Repeat with commaDelimitedItem in each.
Parse each commaDelimitedItem into quoteDelimitedList (AppleScript's text item delimiter = quote).
Substitute any "" with ", where within quotes.
If not already inQuotes, and commaDelimitedItem does not contains a quote then simply append commaDelimitedItem to the end in queueRow.
Else:
If opening quote then start queuetext.
Else If closing quote then append comma or linefeed & queueText & closing text to queueText. Append queueText and to queueRow.
Toggle inQuotes TRUE/FALSE after each quote.
End if
End Repeat -- commaDelimitedItem
If not inQuotes then append queueRow to queueList.
End Repeat -- lineItem
This approach takes only 2 seconds, on the same text file. I am testing it now, but it seems great :-)
My conclusion so far is that using AppleScript's text item delimiter for parsing bulk text works very quickly, as long as you don't try to get/set in a large list.
Thanks,
Tom
T&B
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden