Re: Abstract Text Example and Question
Re: Abstract Text Example and Question
- Subject: Re: Abstract Text Example and Question
- From: Graham Cox <email@hidden>
- Date: Thu, 10 Feb 2011 14:13:42 +1100
Using LZW or similar compression is likely to give you substantially better file compression, if that's what you're after. Of course you'd have to re-expand it to use it.
The killer here I would guess is the use of [NSArray indexOfObject:] - it has to perform a string-by-string linear search until it finds a match. Instead, if you keep each word in a NSMutableSet (which uses hashing internally) you can test for membership in constant time.
Also, using - componentsSeparatedByString to get an array of words is simple, but going to be a killer on time and space. Instead you could parse and index the text as you go using NSScanner so that a n array of words is not made - as you scan each word, add it to the set (sets automatically only add a single instance). At the end of the scan, the set contains all unique words. I would suggest returning that set rather than putting it back together as a string - as a set it will be more useful for membership testing and you can easily convert that to a string or array as you need.
--Graham
On 10/02/2011, at 2:04 PM, Brad Stone wrote:
> I made this code to remove any duplicate words from a large group of text.  The result is stored in an index file so the text doesn't need to make sense.  I'm removing the duplicates to save space in the index file.  I was wondering if anyone had a suggestion for a more efficient way to accomplishing this.  I'm guessing the separations and joins are taking up memory and slowing things down (even though I'm not positive about that).  Using this code reduced the index file size form 4.7MB to 2.7MB.
>
> Thanks
>
> - (NSString *)abstractText:(NSString *)srcString {
> 	NSMutableArray *resultArray = [[NSMutableArray alloc] init];
> 	NSArray *textArray = [srcString componentsSeparatedByString:@" "];
> 	for (NSString *s in textArray) {
>
> 		s = [s stringByTrimmingCharactersInSet:[NSCharacterSet alphanumericCharacterSet]];
> 		s = [s lowercaseString];
>
> 		if ([resultArray indexOfObject:s] == NSNotFound) {
> 			[resultArray addObject:s];
> 		}
> 	}
>
> 	NSString *resultString = nil;
> 	if ([resultArray count] > 0) {
> 		resultString = [resultArray componentsJoinedByString:@" "];
> 	} else {
> 		resultString = srcString;
> 	}
> 	return resultString;
> }_______________________________________________
>
> Cocoa-dev mailing list (email@hidden)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden