RE: objective-c / cocoa efficiency
RE: objective-c / cocoa efficiency
- Subject: RE: objective-c / cocoa efficiency
- From: Jeff Laing <email@hidden>
- Date: Wed, 9 Mar 2005 12:59:59 +1100
> If you really have that many strings and want to pick out one of them
> quickly, you should put them in a dictionary (i.e. NSDictionary), or
> hash table (i.e. unordered_set). A balanced tree (i.e.
> std::set) would
> also be an improvement.
Well, firstly, I did say this two paragraphs later on:
(quote)
(Yes, this is way suboptimal, use a hash-table or something similiar or
better, a flex parser - in both cases, these are harder to code and keep
straight during development (when addition of new keywords require more than
just a .c recompile)
(/quote)
> If you want efficiency, switching on the first character and then a
> block of strcmp's is really a pretty lame solution. You've just moved
> the problem one character deeper into the string—in other words, you're
> taking an O(n) problem and just dividing by a constant for an O(n/k)
> solution. Why settle for less when there are constant-time (hashing) or
> logarithmic-time (tree-based) solutions?
The thing is, when you are pissing about at small values of N, O(n) becomes
computationally confusing where cost vs benefit becomes non-intuitive.
qsort() typically cripples itself back to a bubble sort for low N - no-one
ever points out that qsort() has O(N*N) behaviour for small N, because of
the overhead it has to set up.
Lets assume 2 keywords for each of 'a' to 'z'. Clearly, its better to do 2
compares than 52. Yes, k=26 isn't big in this case, but so what? Its a
reasonable gain for very little coding effort. And if going to a hash table
takes me from 2 to 1, then its harder to get excited about it. Thats, if it
does.
Its quite conceivable that I only have one token per first letter in my
recognition table. In those cases, computing the hash of the input string
is going to be pretty close to the cost of comparing two strings, especially
on architectures that have a "smart" strcmp(). The hash solution still has
to do a strcmp() anyway to ensure that the value it hashed to is actually
the one being sought, since multiple strings can hash to the same value
(unlikely, but it has to be coded that way). So, in the case where I have
one keyword per letter, I can expect the hash solution to be SLOWER than the
strcmp()
Similiarly, there are about 230 first characters that will *never* have a
token in the table, and the switch discards them immediately without
calculating their hashes at all. Thus, guaranteed faster in those cases.
Note that I'm not saying the switch/strcmp() approach is optimal in all
cases, just that its more efficient that 100 consecutive strcmp()'s, and its
not guaranteed that its worse than a hash table (in my experience).
Personally, I think the sun shines out of flex parsers in this domain, but
the tough part is the overhead of precanning an input stream for it to work
through, and dealing with its tendency to "look ahead".
Having said that, I've written compilers that don't waste their time using
flex to recognise individual keywords because its just as easy to binary
search through our limited keyword table afterward, and its a lot easier to
write *re-usable* lexical scanners that way - we have one scanner we use in
ALL our compilers, each of which have different keywords.
The switch/strcmp() approach is a lot easier to read/debug than a flex
parser. Of course, its also easier to goof up (by forgetting the strcmp()
returns FALSE for a match).
In conclusion, let me sum up by saying that the original contention that I
was responding to (that the person recommending switch/strcmp() was a twit),
I'd disagree strongly without knowing all the facts because in *my*
experience, its a valid and very efficient construct for specific problems.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden