• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Count word's occurences
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Count word's occurences


  • Subject: Re: Count word's occurences
  • From: Christopher Nebel <email@hidden>
  • Date: Wed, 17 Sep 2014 14:27:09 -0700

So, you realize that getting a unique-word count table is a classic shell script problem?  For the full story, read <http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/>, but the short version is that the script below could be effectively replaced with just this:

sort | uniq -c

It’s a lot faster, too, because it only goes through the text twice (once for sort(1), once for uniq(1)), instead of once for every unique word.  uniq(1) prints the count first and then the word; if you want it the other way around, do this:

sort | uniq -c | awk '{ print $2 " " $1 }' | column -t

(I actually didn't know about using “column -t” to clean up the columns; that’s a useful trick.  Another handy trick is to put the script output on the clipboard by piping to pbcopy(1), like this: “echo hello world | pbcopy”.)  For those determined to use grep(1) as part of the solution, you should know about three options:

-w  Only match entire words.  This cures the “apple as a substring” problem.
-i  Match case-insensitively.  Depending on your application, you may or may not want this.
-o  Only print the matching part of the line.

grep(1) normally prints the entire line that the match occurs in.  For the originator, this didn’t matter, because every line had only one word, but when searching arbitrary text, it makes a big difference: this is probably why Omar got 75 for his “Hedges” example — 75 is the number of words in the line where “Hedges” occurs.

Finally, if you aren’t familiar with shell scripting, and you want to use it from AppleScript, read <https://developer.apple.com/library/Mac/technotes/tn2065>.


—Chris Nebel
AppleScript Engineering

On Sep 8, 2014, at 10:43 AM, Christopher Stone <email@hidden> wrote:

On Sep 08, 2014, at 10:22, Christopher Stone <email@hidden> wrote:
When run from a 10,000 word file or BBEdit window using your example words the run-time is less than 2/10 of a second on my system.
______________________________________________________________________

Hey Guido,

Oh, yeah.  If you want to run a TextWrangler text filter you can do this.  It's just about instantaneous on the same 10K word test file.

Remember that text filters are destructive, so if you want to keep the original be sure to run on a copy.

--
Best Regards,
Chris

#! /usr/bin/env bash

T=$(tr '\r' '\n');
S=$(sort -u <<< "$T");
A="";

for i in $S
 do
X=$(grep "$i" <<< "$T" | wc -w);
A="$A$i $X\n";
 done

echo -e "$A" | column -t;
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

References: 
 >Count word's occurences (From: Guido Tangorra <email@hidden>)
 >Re: Count word's occurences (From: Christopher Stone <email@hidden>)
 >Re: Count word's occurences (From: Christopher Stone <email@hidden>)

  • Prev by Date: Re: AppleScript-Users Digest, Vol 11, Issue 347
  • Next by Date: Re: can't make alias into type text
  • Previous by thread: Re: Count word's occurences
  • Next by thread: Re: Count word's occurences
  • Index(es):
    • Date
    • Thread