Re: Importing/parsing CSV files
Re: Importing/parsing CSV files
- Subject: Re: Importing/parsing CSV files
- From: kai <email@hidden>
- Date: Wed, 13 Sep 2006 05:02:07 +0100
On 13 Sep 2006, at 04:09, T&B wrote:
Following up, here are some more accurate speed measurements:
37 column x 324 row CSV file, all values in quotes:
Time Method/script
4.6s Tom's character by character (posted earlier)
2.3s Tom's delimiter parsing (linefeed, comma, then quotes)
0.75s Kai's delimiter parsing and temp ASCII 0, 1, 2 substitution
19 column x 2323 row CSV file, about 10% of values in quotes:
Time Method/script
60s Tom's character by character (posted earlier)
22s Tom's delimiter parsing (linefeed, comma, then quotes)
0.65s Kai's delimiter parsing and temp ASCII 0, 1, 2 substitution
So the speed really in phenomenal, and negates the need for me to
call an external perl or python or C routine for the CSV parsing,
whoohoo!
It verifies my early theory:
It seems to me that the power of AppleScript's text item
delimiters is more than up to the task.
but executes a solution much faster than I had yet managed. Is the
speed due to the use of a list property in a script object within
the handler? Why is that so much faster?
There's little doubt that the use of a script object, as a way of
referencing a list within a handler, is a major factor here -
especially since the difference in performance becomes more
pronounced as the length of the list increases.
Perhaps a good start to understanding the principle is the "A
Reference To Operator" section described in the Applescript Language
Guide (about halfway down the following page, under the heading
"NOTES"):
http://developer.apple.com/documentation/AppleScript/Conceptual/
AppleScriptLangGuide/AppleScript.99.html
The article explains that the speed of access to items in a
particularly long list can be substantially improved by using a
reference to that list - rather than by referring directly to the
list itself. The precise reasons for the performance characteristics
of references in this context are not generally known. It may have
something to do with the short-circuiting of certain checks that
AppleScript normally makes for circular references - and possibly
with the way in which a list is accessed internally.
It was evidently Serge Belleudy-d'Espinose who discovered that using
a script object's properties to reference a list was not only more
efficient than direct access, but also faster than using "a reference
to". (Global variables or script properties could also be used for
similar referencing, e.g: "item n of *my* scriptProperty".)
---
kai
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden