Re: NSScanner Sanity Check
Re: NSScanner Sanity Check
- Subject: Re: NSScanner Sanity Check
- From: "Adam R. Maxwell" <email@hidden>
- Date: Tue, 8 May 2007 19:53:17 -0700
On May 8, 2007, at 10:47, James Hillhouse wrote:
I was working today on reading keplerian element data files (text,
maybe utf-8) using NSString and NSScanner. Here's what I came up
with. To see a more readable version, you can go to my posting at http://cocoacoder.org/cocoacoderblog/?p=20#more-20
Here's the problem. I reading in a potentially large data file of
keplerian elements, which define an orbit, and I was trying to read
from the file the following:
Planet Name
Mean Radius
GravitationalParameter
Eccentricity
This is only a small bit of the info generally included in a
keplerian data file. But I figured that if I could read the above
info, more info was icing on the cake. I'll go ahead and answer the
question of why didn't I use:
[scanner scanCharactersFromSet:[NSCharacterSet
decimalDigitCharacterSet]
intoString:&stringOfElements];
My main reason was that it didn't work.
It does work when I've used it. I glanced at your blog post, and it
appears that you expected "." to be part of the decimal digit
character set; the documentation says that it's 0-9, so that may
explain your problem.
If what's I've done is not elegant or optimal, could someone chime
in and tell me where.
As far as optimal, the only way you'll be able to tell for sure is by
profiling it with Shark, preferably on a really large file. In spite
of that, I'll make a few suggestions, having been bitten by NSScanner
performance problems too many times.
/*
Small sample of Keplerian data containing
Body Mean Radius mu eccentricity
============================================
Moon 1738.0 4902.799 0.05490
Earth 6378.1363 398600.4 0.016708617
that is included in a file PlanetaryData.txt in my bundle :-)
*/
- (void)readDataFileAndStore
{
NSString *bodyName;
NSString *meanRadius;
NSString *gravParameter;
NSString *eccentricity;
double mR;
double gP;
double ecc;
NSString *stringOfElements = nil;
NSString *tmp;
NSString *dataFilePath;
dataFilePath = [[NSBundle mainBundle]
pathForResource:@"PlanetaryData"
ofType:@"txt"];
// This seperates out the lines of data
NSArray *lines = [[NSString stringWithContentsOfFile:dataFilePath]
componentsSeparatedByString:@"\n"];
If you have to deal with other newline characters (\r, \r\n, or the
Unicode line breaks), you should take additional steps. See -
[NSString sourceLinesBySplittingString] in http://bibdesk.svn.sourceforge.net/viewvc/bibdesk/trunk/bibdesk/NSString_BDSKExtensions.m?view=markup
for example.
NSEnumerator *nse = [lines objectEnumerator];
NSMutableArray *values = [NSMutableArray new];
while(tmp = [nse nextObject])
{
NSScanner *scanner = [NSScanner scannerWithString:tmp];
If you have a large number of lines, you're going to fill an
autorelease pool rather quickly by creating an autoreleased NSScanner
for every line. Better to create it per-line with alloc/init and
release it at the end of the loop.
// bodyName
[scanner scanUpToCharactersFromSet:[NSCharacterSet
alphanumericCharacterSet]
intoString:nil];
[scanner scanCharactersFromSet:[NSCharacterSet
alphanumericCharacterSet]
intoString:&stringOfElements];
Be careful here: under some conditions, scanCharactersFromSet is
grossly inefficient since it creates an autoreleased, inverted
character set. This can quickly blow up your autorelease pool as well
(rdar://problem/4652388). You're better off inverting the character
set yourself outside the loop and using scanUpToCharactersFromSet:.
[values addObject:stringOfElements];
// meanRadius
[scanner scanUpToCharactersFromSet:[NSCharacterSet
alphanumericCharacterSet]
intoString:nil];
[scanner scanDouble:&mR];
stringOfElements = [[NSNumber numberWithDouble:mR]
stringValue];
[values addObject:stringOfElements];
Here you've just created an NSNumber and gotten its stringValue. If
you're interested in the string as an end result, treat it as a string
and avoid all the conversions.
My recommendation would be to use NSScanner to scan each double as a
string, or use -[NSString componentsSeparatedByString:] as someone
else suggested, then trim whitespace from each element.
For better performance, you could also use something like
BDStringCreateComponentsSeparatedByCharacterSetTrimWhitespace() from http://bibdesk.svn.sourceforge.net/viewvc/bibdesk/trunk/bibdesk/CFString_BDSKExtensions.m?view=markup
to create an array from each line, passing [NSCharacterSet
whitespaceCharacterSet]. It may have issues with surrogate pairs, but
that's not likely to matter in your case.
regards,
Adam
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden