• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Importing/parsing CSV files
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Importing/parsing CSV files


  • Subject: Re: Importing/parsing CSV files
  • From: T&B <email@hidden>
  • Date: Tue, 12 Sep 2006 14:42:56 +1000

Following up my last post:

It's the allowance for quotes, and commas and newlines (linefeeds) within those quotes that makes the parsing difficult.

I also wrote another script a few months back that handles the full CSV spec (including commas and linefeeds within quotes), but it steps though reading each character, so is slow as molasses.

I tested a CSV file containing 324 records, 37 fields, with some quoted values containing linefeeds etc. My old script (character by character) took 10 seconds (on a 2GHz dual Intel) to convert it to a list of lists.

It seems to me that the power of AppleScript's text item delimiters is more than up to the task. Hasn't anyone created a CSV script? How have the rest of you dealt with the need to import CSV files? It seems to be a very common requirement.

In the absence of anyone else replying with a solution (tho thanks for the many comments), I wrote my own new script that uses AppleScript's text item delimiters to do the work. Basically, it replaces all the linefeeds and commas that are not in quotes, with temporary alternatives, then parses the converted text into a list of lists. I thought it was a neat and efficient solution, but the new script took 197 seconds on the same test file, much longer than I'd hoped. With some performance tweaking, I got this figure down to 93 seconds, but that's still 10 times longer than my old character by character script. Most of the time seems taken taken to get and set items in a large list, though the actual parsing of text is quite fast.


So, I tried another approach, in a third script, which does this:

Parse the csvText into paragraphs. (AppleScript accepts CR, LF or CRLF).
Repeat with lineItem in each paragraph.
  Parse lineItem into commaDelimitedList (AppleScript's text item delimiter = comma)
  Repeat with commaDelimitedItem in each.
    Parse each commaDelimitedItem into quoteDelimitedList (AppleScript's text item delimiter = quote).
    Substitute any "" with ", where within quotes.
    If not already inQuotes, and commaDelimitedItem does not contains a quote then simply append commaDelimitedItem to the end in queueRow.
    Else:
      If opening quote then start queuetext.
      Else If closing quote then append comma or linefeed & queueText & closing text to queueText. Append queueText and to queueRow.
      Toggle inQuotes TRUE/FALSE after each quote.
    End if
  End Repeat -- commaDelimitedItem
  If not inQuotes then append queueRow to queueList.
End Repeat -- lineItem

This approach takes only 2 seconds, on the same text file. I am testing it now, but it seems great :-)

My conclusion so far is that using AppleScript's text item delimiter for parsing bulk text works very quickly, as long as you don't try to get/set in a large list.

Thanks,
Tom
T&B
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Speed of long lists (was: Importing/parsing CSV files)
      • From: T&B <email@hidden>
    • Re: Importing/parsing CSV files
      • From: kai <email@hidden>
    • Re: Importing/parsing CSV files
      • From: T&B <email@hidden>
    • Re: Importing/parsing CSV files
      • From: "Mark J. Reed" <email@hidden>
  • Prev by Date: Filemaker Pro 8 question
  • Next by Date: Safari + contextual menu.
  • Previous by thread: Re: Importing/parsing CSV files
  • Next by thread: Re: Importing/parsing CSV files
  • Index(es):
    • Date
    • Thread