• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Word count
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Word count


  • Subject: Re: Word count
  • From: "Louis C. Sacha" <email@hidden>
  • Date: Wed, 9 Jun 2004 23:45:56 -0700

Hello...

Or, more simply

(Typed in Mail...)

- (unsigned)wordCountForString:(NSString *)textString
{
NSScanner *wordScanner = [NSScanner scannerWithString:textString];
NSCharacterSet *whiteSpace = [NSCharacterSet whitespaceCharacterSet];

unsigned wordCount = 0;
while ([wordScanner scanUpToCharactersFromSet:whiteSpace intoString:nil]) {wordCount++;}

return wordCount;
}

Since NSScanner skips the whitespace character set by default at the beginning of anything it scans (you can change this with the setCharactersToBeSkipped: method), you don't need to manually scan over the whitespace between words.

Also, you can speed things up by only looking up the character set once, outside the loop. I prefer to use scanUpToCharactersFromSet:intoString: to do the actual scanning, but there probably is very little if any performance difference compared to using scanCharactersFromSet:intoString:.


There are a variety of things that will trip up this way of counting words, for example the string @"Hello World ! ! ! ! ! !" would come out as 8 words (there are spaces between the !'s).


A Cocoa equivalent to Alan's method -- I think ;)-- would be:

- (unsigned)wordCountForString:(NSString *)textString
{
NSScanner *wordScanner = [NSScanner scannerWithString:textString];
NSCharacterSet *nonLetters = [[NSCharacterSet letterCharacterSet] invertedSet];
[wordScanner setCharactersToBeSkipped:nonLetters];

unsigned wordCount = 0;
while ([wordScanner scanUpToCharactersFromSet:nonLetters intoString:nil]) {wordCount++;}

return wordCount;
}


In one application where I wanted the count to be basically accurate to within +/- 5% for a variety of types of text, I did something similar to the following:

static NSCharacterSet *cachedSet = nil;

@implementation ThatClass

+ (NSCharacterSet *)whitespaceAndPunctuationSet
{
if (!cachedSet)
{
NSCharacterSet *tempSet = [NSMutableCharacterSet whitespaceCharacterSet];
[tempSet formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];

cachedSet = [tempSet copy];
}

return cachedSet;
}

- (unsigned)wordCountForString:(NSString *)textString
{
NSScanner *wordScanner = [NSScanner scannerWithString:textString];
NSCharacterSet *whiteSpace = [NSCharacterSet whitespaceCharacterSet];
NSCharacterSet *skipSet = [ThatClass whitespaceAndPunctuationSet];
[wordScanner setCharactersToBeSkipped:skipSet];

unsigned wordCount = 0;
while ([wordScanner scanUpToCharactersFromSet:whiteSpace intoString:nil]) {wordCount++;}

return wordCount;
}

@end


That implementation had the advantage that it it would skip free-standing punctuation, but still count things like "don't", "Micro$oft" and "10,000" as a single word. Of course, there are still things that would throw it off.

The most accurate way of counting words will depend on the exact type of text that you will be checking (and what you consider to be a word). The best way to find out is to write several different versions of the word counting code and throw as many different examples of text at them as you expect to occur in the application's use.

Hope that helps,

Louis


(Typed in Mail...)

int words = 0;
NSScanner *scanner = [NSScanner scannerWithString:string];
while (![scanner isAtEnd])
{
[scanner scanCharactersFromSet:[NSCharacterSet whitespaceAndNewlineCharacterSet] intoString:nil];
if ([scanner scanCharactersFromSet:[[NSCharacterSet whitespaceAndNewlineCharacterSet] invertedSet] intoString:nil])
words++;
}

zach
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.


References: 
 >Word count (From: Matt Baker <email@hidden>)
 >Re: Word count (From: j o a r <email@hidden>)
 >Re: Word count (From: Matt Baker <email@hidden>)
 >Re: Word count (From: Zach Wily <email@hidden>)

  • Prev by Date: Multiple NSDocument Types in a Document-Based Architecture
  • Next by Date: Re: newbie questions about objective-c, ruby, python, groovy and cocoa
  • Previous by thread: Re: Word count
  • Next by thread: Re: Word count
  • Index(es):
    • Date
    • Thread