• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Importing/parsing CSV files
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Importing/parsing CSV files


  • Subject: Re: Importing/parsing CSV files
  • From: T&B <email@hidden>
  • Date: Wed, 6 Dec 2006 12:52:38 +1100

Hi Kai,

Last month, you posted an excellent, fast CSVToList solution,

Unfortunately, it has a bug. When it replaces "" with the temporary quoteSubstitute (q), which are later replaced with " (one quote), it also distorts empty items. For instance the third item in this example:

item1,"quoted item","","that was an empty item","but processes ""this"" correctly"

which becomes a list:

{"item1","quoted item","\"","that was an empty item","but processes \"this\" correctly"}

but should be this (third item changed):

{"item1","quoted item","","that was an empty item","but processes \"this\" correctly"}

I was wondering if we could perhaps come up with a sans-repeat fix for this, so I've inserted a suggested mod below that uses text item delimiters instead. Brief testing suggests that it works - and it should prove a tad faster than a loop. However, I'll leave you to test it in a real world situation... :)

on CsvToList(t)
   set quot to "\""
   set comma to ","
   set oldDelimiters to text item delimiters
   set quoteSubstitute to ASCII character 0
   set newLineSubstitute to ASCII character 1
   set commaSubstitute to ASCII character 2
   set text item delimiters to quot & quot
   set t to t's text items
   set text item delimiters to quoteSubstitute
   set t to t as string

(* suggested alternative fix *)

    set text item delimiters to comma & quoteSubstitute & comma
    set t to t's text items
    set text item delimiters to comma & comma
    set t to t as string	

   set text item delimiters to quot
   script o
      property l : t's text items
   end script
   repeat with i from 1 to count o's l by 2
      set text item delimiters to comma
      set t to text items of o's l's item i

(* repeat loop removed from here *) (* if t contains quoteSubstitute then repeat with columnN from 1 to length of t if item columnN in t is quoteSubstitute then set item columnN in t to "" end if end repeat end if *)

      set text item delimiters to commaSubstitute
      set t to t as string
      set text item delimiters to newLineSubstitute
      set o's l's item i to t's paragraphs as string
   end repeat
   set text item delimiters to ""
   set t to o's l as string
   set text item delimiters to quoteSubstitute
   set o's l to t's text items
   set text item delimiters to quot
   set t to o's l as string
   set text item delimiters to newLineSubstitute
   set o's l to t's text items
   set text item delimiters to commaSubstitute
   repeat with i from 1 to count o's l
      set o's l's item i to text items of o's l's item i
   end repeat
   set text item delimiters to oldDelimiters
   return o's l
end CsvToList

I tested your revised script on my previous example:

item1,"quoted item","","that was an empty item","but processes ""this"" correctly"

which now correctly outputs as:

{"item1","quoted item","","that was an empty item","but processes \"this\" correctly"}

However, it fails if one of the items contains a comma quote comma sequence. For instance, this single item in text:


You may quote speech with any of these characters: ',",{,}

would appear in a CSV text file as:

"You may quote speech with any of these characters: ',"",{,}"

but your new CsvToList handler incorrectly parses it as:

You may quote speech with any of these characters: ',,{,}

which removes the quote character.

So I think we're stuck with the extra repeat loop, unless you have any more rabbits in your hat ;-)

Then there's the separate question of some CSV files where the unquoted items are meant to be parsed as values (eg numbers, null, true, false) and only the quoted items are meant to be parsed as literals (ie text). I think handling such parsing is not possible without nested repeats. I'll post separately about that.

Thanks,
Tom

_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/mailman//archives/applescript-users

This email sent to email@hidden
  • Follow-Ups:
    • CsvToListOfText
      • From: T&B <email@hidden>
  • Prev by Date: Re: Mac or UNIX file
  • Next by Date: Re: Mac or UNIX file
  • Previous by thread: Re: Mac or UNIX file
  • Next by thread: CsvToListOfText
  • Index(es):
    • Date
    • Thread