• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: NSScanner Sanity Check
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSScanner Sanity Check


  • Subject: Re: NSScanner Sanity Check
  • From: "Adam R. Maxwell" <email@hidden>
  • Date: Tue, 8 May 2007 19:53:17 -0700


On May 8, 2007, at 10:47, James Hillhouse wrote:

I was working today on reading keplerian element data files (text, maybe utf-8) using NSString and NSScanner. Here's what I came up with. To see a more readable version, you can go to my posting at http://cocoacoder.org/cocoacoderblog/?p=20#more-20

Here's the problem. I reading in a potentially large data file of keplerian elements, which define an orbit, and I was trying to read from the file the following:

Planet Name
Mean Radius
GravitationalParameter
Eccentricity

This is only a small bit of the info generally included in a keplerian data file. But I figured that if I could read the above info, more info was icing on the cake. I'll go ahead and answer the question of why didn't I use:

[scanner scanCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet]
intoString:&stringOfElements];


My main reason was that it didn't work.

It does work when I've used it. I glanced at your blog post, and it appears that you expected "." to be part of the decimal digit character set; the documentation says that it's 0-9, so that may explain your problem.


If what's I've done is not elegant or optimal, could someone chime in and tell me where.

As far as optimal, the only way you'll be able to tell for sure is by profiling it with Shark, preferably on a really large file. In spite of that, I'll make a few suggestions, having been bitten by NSScanner performance problems too many times.


/*
Small sample of Keplerian data containing

Body    Mean Radius mu          eccentricity
============================================
Moon    1738.0      4902.799    0.05490
Earth   6378.1363   398600.4    0.016708617

that is included in a file PlanetaryData.txt in my bundle :-)
*/

- (void)readDataFileAndStore
{

   NSString *bodyName;
   NSString *meanRadius;
   NSString *gravParameter;
   NSString *eccentricity;


double mR; double gP; double ecc;

   NSString *stringOfElements = nil;
   NSString *tmp;

NSString *dataFilePath;
dataFilePath = [[NSBundle mainBundle] pathForResource:@"PlanetaryData"
ofType:@"txt"];


   // This seperates out the lines of data
   NSArray *lines = [[NSString stringWithContentsOfFile:dataFilePath]
                            componentsSeparatedByString:@"\n"];

If you have to deal with other newline characters (\r, \r\n, or the Unicode line breaks), you should take additional steps. See - [NSString sourceLinesBySplittingString] in http://bibdesk.svn.sourceforge.net/viewvc/bibdesk/trunk/bibdesk/NSString_BDSKExtensions.m?view=markup for example.

NSEnumerator *nse = [lines objectEnumerator];

   NSMutableArray *values = [NSMutableArray new];

   while(tmp = [nse nextObject])
   {
       NSScanner *scanner = [NSScanner scannerWithString:tmp];

If you have a large number of lines, you're going to fill an autorelease pool rather quickly by creating an autoreleased NSScanner for every line. Better to create it per-line with alloc/init and release it at the end of the loop.



// bodyName
[scanner scanUpToCharactersFromSet:[NSCharacterSet alphanumericCharacterSet]
intoString:nil];
[scanner scanCharactersFromSet:[NSCharacterSet alphanumericCharacterSet]
intoString:&stringOfElements];

Be careful here: under some conditions, scanCharactersFromSet is grossly inefficient since it creates an autoreleased, inverted character set. This can quickly blow up your autorelease pool as well (rdar://problem/4652388). You're better off inverting the character set yourself outside the loop and using scanUpToCharactersFromSet:.


       [values addObject:stringOfElements];

// meanRadius
[scanner scanUpToCharactersFromSet:[NSCharacterSet alphanumericCharacterSet]
intoString:nil];
[scanner scanDouble:&mR];
stringOfElements = [[NSNumber numberWithDouble:mR] stringValue];
[values addObject:stringOfElements];

Here you've just created an NSNumber and gotten its stringValue. If you're interested in the string as an end result, treat it as a string and avoid all the conversions.


My recommendation would be to use NSScanner to scan each double as a string, or use -[NSString componentsSeparatedByString:] as someone else suggested, then trim whitespace from each element.

For better performance, you could also use something like BDStringCreateComponentsSeparatedByCharacterSetTrimWhitespace() from http://bibdesk.svn.sourceforge.net/viewvc/bibdesk/trunk/bibdesk/CFString_BDSKExtensions.m?view=markup to create an array from each line, passing [NSCharacterSet whitespaceCharacterSet]. It may have issues with surrogate pairs, but that's not likely to matter in your case.

regards,
Adam
_______________________________________________

Cocoa-dev mailing list (email@hidden)

Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >NSScanner Sanity Check (From: James Hillhouse <email@hidden>)

  • Prev by Date: Keyed multi-values in Spotlight importer?
  • Next by Date: Re: DO message received on wrong thread
  • Previous by thread: Re: NSScanner Sanity Check
  • Next by thread: Re: NSScanner Sanity Check
  • Index(es):
    • Date
    • Thread