Re: Dynamic Records?? (or other alternatives)
Re: Dynamic Records?? (or other alternatives)
- Subject: Re: Dynamic Records?? (or other alternatives)
- From: has <email@hidden>
- Date: Thu, 29 Nov 2001 20:38:07 +0000
Greg Back wrote:
>
Hi all-
>
I have set out on my first really helpful applescript, not like the
>
pointless scripts i have been writing. Essentially, it will work on the text
>
on the clipboard and provide a word-count feature, counting the total number
>
of words and providing a separate count for each individual word (i.e. have
>
the output be "2 occurence(s) of 'Applescript', 1 occurence of 'Script'", or
>
the like). So far, I have the following script:
[snip script for space]
>
1) Is there already a utility out there that can do this? No use in
>
re-inventing the wheel. Even if there is, i still want to do this using
>
Applescript, if only for the exercise.
Perhaps a scriptable text editor/word processor? You'd have to look around.
IMO this is a fairly complex function you're trying to implement here... if
I were you I'd really want to use the easy option if there is one.
-------
>
2) Can you create a record during the running of a script, such as
>
>
set wrdz to {"applescript":2 , "script":1}
>
>
adding a new entry for each new word the script encounters. Or are records
>
not the way to do this? Or do I need to go about the entire process a
>
different way. From what I can tell, you have to define everything in the
>
record before it is run. Am i wrong?
You tell correctly, alas.
What you're thinking of is something known as a hash array (aka associative
array); many other languages have them but AS does not. (This has been
discussed on the list recently if you want to poke around the archives for
it.)
All is not yet lost, however, as there are a number of homegrown solutions
currently available for use. First up are the 'roll your own structures'
squad:
AssociativeLib (Arthur Knapp)
TableSearchServices (Victor Yee)
Both available from AppleMods and, iirc, use a loop-based "hunt the list"
method of retrieving values. Simple and robust, but search times will
increase as the list gets bigger. If you want to write your own version,
that should be pretty easy too (which might be more convenient, given your
output requirements).
Arthur also posted a slicker implementation on this list: a faster,
TID-based method, though it's limited to using strings and the labels are
case sensitive. I caught a bug in it and had intended to develop it
further, only I came up with a New Idea instead so it didn't happen (never
even got around to posting it on AppleMods... typical). If you want a copy
I can post it to you, although what you're planning would require
case-insensitive keys so I'm not sure if it's be the best one to use (you'd
need extra code to make labels case-insensitive, which'll slow it down
again).
Next up are the 'gonna do it with records anyhow' methods:
Olof Hellman posted code on this list a while back that allowed a single
{label:value} record to be created using a string for the label. Very
clever stuff. I think Olof(?) also posted code allowing you to retrieve a
record property using a string as label. You'd need to search the list
archives for details. The downside of this method is that it's dependent on
Standard Additions' "run script", which means there's a mean time penalty
every time you use it while the AS compiler gets loaded and unloaded. I
suspect the need for speed will rule this one out for you.
I've also done a 'pseudo-hash' implementation - hashLib - which isn't
dependent on "run script" so is much faster (if you don't count time to
upload <g>; honest, it will appear on macscripter Real Soon;). The downside
is that it requires Smile to run, so if you're looking for a vanilla
solution then you'll need to stick with one of the 'roll your owns'. Also,
the "record to list" conversion is lamentably slow (about a tenth of the
speed of going from list to record).
-------
>
Other issues i have to deal with in writing this script:
>
>
3) A word at the end of a sentence, or right before a comma, will be treated
>
differently by the script.
>
Is there any way to do this without stepping through each character using
>
TIDs?
Use the "word" keyword to get words, rather than your current tid-based
approach. The only problem with the "word" keyword is that it doesn't work
on strings over 32KB (an AS limitation) so if it's an issue you'd need to
break strings longer than 32KB into smaller chunks and work on each in
turn. (I've got a library that can chunk long strings into a list of
smaller ones, 'intelligently' breaking it between words/paragraphs, which
would do you nicely for this. Should be on AppleMods soon.)
-------
Thoughts:
If you intend to return your entire list/record of words-versus-count as a
nicely formatted string (for human consumption), I'd suggest using a
list-based 'hash' method. Also, Serge's qSort (v2) should be just up your
street if you wish to sort the results alphabetically/by frequency for
extra pleasant reading.
For getting the most common word, there's a couple ways you could do it,
depending on what sort of 'hash' method you use:
1. Concurrent Method: Use a couple variables, "commonestWordName" and
"commonestWordCount". Each time you 'get' a word from your 'hash array'
(which you'll need to do so you can increment its value), check if the
count is greater than commonestWordCount. If it is, update
commonestWordName and commonestWordCount. Once you've done tallying words,
return these values.
2. Consecutive Method 1 (only suitable if you're using the 'hunt the list'
method for storing your words): Once you're done tallying words, grab the
"hash's" internal lists and run through their contents using a loop. Update
"commonestWordName" and "commonestWordCount" each time the a word appears
whose count is greater than commonestWordCount.
3. Consecutive Method 2 (if you're using qSort): well... but if you're
using qSort to sort your results before returning then it's probably too
obvious already.:)
-------
I think the real challenge though will be coming up with something that's
both robust and fast. Anyway, good luck.
has