Re: Reading in dictionary from txt file: options for speed
Re: Reading in dictionary from txt file: options for speed
- Subject: Re: Reading in dictionary from txt file: options for speed
- From: Michael Ash <email@hidden>
- Date: Tue, 14 Apr 2009 19:06:46 -0400
On Tue, Apr 14, 2009 at 2:12 PM, Miles <email@hidden> wrote:
> [This is sort of in continuation of the thread "Build Settings for Release:
> App/Library is bloated", which gradually changed topics.]
> I'm trying to find the best way to load in a 2MB text file of dictionary
> words and be able to do quick searches.
>
> Simply loading the uncompressed txt file takes about 0.5 seconds which I can
> handle. But when I used the following to create an array of the words from
> the file:
> NSArray *lines = [stringFromFileAtPath componentsSeparatedByString:@
> "\n"];
>
> ... it took about 13 seconds, which is way too long.
>
>
> I'm not super concerned about the 2MB of disk space the txt file takes up,
> although I wouldn't be mad about decreasing it somehow. And once I get the
> whole dictionary in an array, the searches are basically fast enough for my
> purposes. I've still been reading up on Huffman encoding if I decide to try
> to compress this. However, my main issue now is loading time, and it seems
> like this won't help me there.
For best loading time, you want a file format which can be loaded
on-demand, instead of all at once up front. Designing your own such
format is highly non-trivial, and I don't recommend doing that at
least until you're at the point where you're ready to ignore
recommendations from the likes of me. Sqlite has this property and
would be a good choice of storage format if that's what you're after.
The downside is, of course, that the per-query time goes up, as a
tradeoff. If you can stand queries taking longer (but still being
individually instantaneous from the user's point of view) in exchange
for nearly zero load time, this is a good way to go.
If you really do want to load everything up front, my recommendation
would be to do as much parsing as possible on the C side of things
before you move over to Cocoa. Rather than load the entire file into
an NSString and then split it up from there, read the raw bytes,
search for \n directly, and then load the individual lines into
NSStrings. NSString has a lot of fancy capabilities like Unicode
awareness that you simply don't need for this, and which will cost you
a lot.
Using an existing format with existing optimizations, like Apple's
binary plist format, could also be a good way to go here.
Mike
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden