Re: Importing/parsing CSV files
Re: Importing/parsing CSV files
- Subject: Re: Importing/parsing CSV files
- From: T&B <email@hidden>
- Date: Wed, 6 Dec 2006 12:52:38 +1100
Hi Kai,
Last month, you posted an excellent, fast CSVToList solution,
Unfortunately, it has a bug. When it replaces "" with the
temporary quoteSubstitute (q), which are later replaced with
" (one quote), it also distorts empty items. For instance the
third item in this example:
item1,"quoted item","","that was an empty item","but processes
""this"" correctly"
which becomes a list:
{"item1","quoted item","\"","that was an empty item","but
processes \"this\" correctly"}
but should be this (third item changed):
{"item1","quoted item","","that was an empty item","but processes
\"this\" correctly"}
I was wondering if we could perhaps come up with a sans-repeat fix
for this, so I've inserted a suggested mod below that uses text
item delimiters instead. Brief testing suggests that it works - and
it should prove a tad faster than a loop. However, I'll leave you
to test it in a real world situation... :)
on CsvToList(t)
set quot to "\""
set comma to ","
set oldDelimiters to text item delimiters
set quoteSubstitute to ASCII character 0
set newLineSubstitute to ASCII character 1
set commaSubstitute to ASCII character 2
set text item delimiters to quot & quot
set t to t's text items
set text item delimiters to quoteSubstitute
set t to t as string
(* suggested alternative fix *)
set text item delimiters to comma & quoteSubstitute & comma
set t to t's text items
set text item delimiters to comma & comma
set t to t as string
set text item delimiters to quot
script o
property l : t's text items
end script
repeat with i from 1 to count o's l by 2
set text item delimiters to comma
set t to text items of o's l's item i
(* repeat loop removed from here *)
(*
if t contains quoteSubstitute then
repeat with columnN from 1 to length of t
if item columnN in t is quoteSubstitute then
set item columnN in t to ""
end if
end repeat
end if
*)
set text item delimiters to commaSubstitute
set t to t as string
set text item delimiters to newLineSubstitute
set o's l's item i to t's paragraphs as string
end repeat
set text item delimiters to ""
set t to o's l as string
set text item delimiters to quoteSubstitute
set o's l to t's text items
set text item delimiters to quot
set t to o's l as string
set text item delimiters to newLineSubstitute
set o's l to t's text items
set text item delimiters to commaSubstitute
repeat with i from 1 to count o's l
set o's l's item i to text items of o's l's item i
end repeat
set text item delimiters to oldDelimiters
return o's l
end CsvToList
I tested your revised script on my previous example:
item1,"quoted item","","that was an empty item","but processes
""this"" correctly"
which now correctly outputs as:
{"item1","quoted item","","that was an empty item","but processes
\"this\" correctly"}
However, it fails if one of the items contains a comma quote comma
sequence. For instance, this single item in text:
You may quote speech with any of these characters: ',",{,}
would appear in a CSV text file as:
"You may quote speech with any of these characters: ',"",{,}"
but your new CsvToList handler incorrectly parses it as:
You may quote speech with any of these characters: ',,{,}
which removes the quote character.
So I think we're stuck with the extra repeat loop, unless you have
any more rabbits in your hat ;-)
Then there's the separate question of some CSV files where the
unquoted items are meant to be parsed as values (eg numbers, null,
true, false) and only the quoted items are meant to be parsed as
literals (ie text). I think handling such parsing is not possible
without nested repeats. I'll post separately about that.
Thanks,
Tom
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/mailman//archives/applescript-users
This email sent to email@hidden