Re: Reading in dictionary from txt file: options for speed
Re: Reading in dictionary from txt file: options for speed
- Subject: Re: Reading in dictionary from txt file: options for speed
- From: WT <email@hidden>
- Date: Wed, 15 Apr 2009 05:12:12 +0200
You know, Miles, I've been thinking about something else you asked
earlier, about storing the trie on disk and loading it that way,
rather than load the data first and build the trie afterwards.
A trie is a tree structure, and so is a plist, so I think you could
combine both and save time in the loading, in the processing after
loading, and in the searching. Build the trie off-line and save it as
a (binary? xml?) plist, then load it directly with NSArray's
arrayWithContentsOfFile.
More specifically, using the example from <http://en.wikipedia.org/wiki/Trie
>, you'd create a plist off-line with the contents:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd
">
<plist version="1.0">
<array>
<dict>
<key>t</key>
<array>
<dict>
<key>o</key>
<string>to</string> <-- you could drop this (see text)
</dict>
<dict>
<key>e</key>
<array>
<dict>
<key>a</key>
<string>tea</string> <-- you could drop this
(see text)
</dict>
<dict>
<key>d</key>
<string>ted</string> <-- you could drop this
(see text)
</dict>
<dict>
<key>n</key>
<string>ten</string> <-- you could drop this
(see text)
</dict>
</array>
</dict>
</array>
</dict>
<dict>
<key>A</key>
<string>A</string> <-- you could drop this (see text)
</dict>
<dict>
<key>i</key>
<array>
<dict>
<key>n</key>
<array>
<dict>
<key>n</key>
<string>inn</string> <-- you could drop this
(see text)
</dict>
</array>
</dict>
</array>
</dict>
</array>
</plist>
The rationale is that each node is an array of dictionaries. Each
dictionary's key is a letter and the associated value is either a
string or another array (representing another node). Thus, in the
example from the Wikipedia page, the root node is an array of 3
dictionaries. The first has the key "t" and its value is an array, the
second has the key "A" and its value is the string "A", and the third
dictionary has the key "i" and its value is another array. And so on.
(You might even not have the string values at all, in which case each
key is either associated with an array, or with a boolean value that
tells whether the key in question terminates a valid word). The actual
strings stored in the dictionary are then the concatenation of keys.
This is more uniform and saves space, which means it will save on
loading time too. Also, since the 26 letters will be cached by the OS,
it won't matter how many keys you have. The cost of storing them is
just the cost of storing their pointers.)
Once you create this plist off-line for your entire dictionary, you
can load it with NSArray's arrayWithContentsOfFile and you have your
trie already in memory. All you need then is a search function, which
essentially traverses the tree, concatenating keys and comparing them
with the target word.
I am not completely sure that this will save you time in the end, and
holding the entire trie in memory might be a problem, but it might be
worth a try. Only you can decide.
You might also want to use OMNI's trie implementation <http://www.omnigroup.com/developer/
>. That will save you both coding and debugging time.
Wagner
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden