Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

[OT] Time complexity (was: Where is NSList?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[OT] Time complexity (was: Where is NSList?)

Subject: [OT] Time complexity (was: Where is NSList?)
From: Allan Odgaard <email@hidden>
Date: Thu, 29 Jul 2004 23:59:27 +0200

On 29. Jul 2004, at 21:55, Stephen Checkoway wrote:

Sorry if this seems pedantic, but...

What many people fail to realize (including some in this thread) is that
this is only supposed to be a rough first estimate since constants are
totally ignored in this analysis but of course they really matter. Let me
give an example:

Quicksort has a worst case bound of Theta(n^2) running time while merge or
heap sort have a worst case bound of Theta(n log n) which is clearly
better. However, both merge and heap sort are slower in practice than
quicksort.

While your point about time complexity not revealing constants is valid (although if an algorithm has high constants it should be added as a footnote), it is not constants which are at play here.

Quicksort is a randomized algorithm and the *expected* running time certainly is as good or better than that of the merge sort, so coupled with the better cache utilization we *expect* it to outperform merge sort, but that is only for a (large) subset of inputs -- for this reason stl's introsort is a hybrid which internally switches to a heap sort IIRC if the number of recursions exceed some threshold, to guarantee a better running time than worst case quicksort, which, given the (im)proper input, will slow down to a crawl ;)

Introsort furthermore switches to insertion sort when there are less than 10-20 elements, because for such low values of n, the constants are dominating, and quicksort being recursive, there will be lots of recursions with n < 20 or so.

Here's an example related to data structures rather than algorithms that
operate on them. An implementation of a stack (or a queue) using a linked
list has a best, average, and worst case running time of O(1) for push/pop
(enqueue/dequeue). An array based implementation on the other hand only
has an amortized running time of O(1) since pushing too many elements
causes the array to be resized which takes linear time.

But that's sort of what the amortization is about. Adding n elements to the array will be O(n). So for adding n elements to an array, the time complexity is exactly the same as for the list.

But adding only *one* element to the array may, for very few elements (every 2^n'th for the doubling strategy) be O(n), where as for the rest, it is O(1).

So again, it's easy to make an example where the array is in fact slower than the linked list. E.g. create an array with 16,383 elements (adding the next should then resize it to 64KB) and time the add, then time an add for a list with the same number of elements. The list add will be much faster.

However, in practice, it is much faster to use an array and maintain an index (or two
indices in the case of a queue) rather than a pointer to the top (or head
and tail).

That is because you allocate memory for each list node, and memory allocation is often not O(1), so effectively adding to the list is not O(1) (but secondly there is the constants, which is higher for the list, because it requires setting up 4 pointers, and finally there is the cache advantages of the array).

But here we are comparing algorithms with the same time complexity, and here it might be useful to look at the constants. But when algorithms have different time complexity, constants rarely matters.

The time complexity is really only a measure for how the algorithm scales, which is what one should care about when choosing/designing data structures/algorithms. Which type of balanced tree we use for our data affects performance much less than say, choosing the balanced tree over an array with linear searches.

Furthermore, the time complexity tells us if we can expect problems. E.g. having something which is O(n^2) or higher is most likely going to be a problem for some user (if n is unbounded), changing this to say, O(n lg n) will in 99% of the cases not be a problem for any users, no matter how high n gets.

For readers following this, try to plot the two graphs if you are not intuitively familiar with them, as can be seen, the first one explodes as n increases, whereas the second behaves quite nicely and "bounded".

O(n^2) can often occur in simple applications, like e.g. iTunes, assume we have all songs in an array, we have another array with selected artists, now we need to construct the array of all songs by the selected artists. A naive implementation would iterate over the array of selected artists, and for each, iterate over all the songs and copy those which match.

Imagine I have 10,000 songs all by different artist, and select all the artists, that's 10,000 x 10,000 iterations = 100,000,000 -- even on my G4, that is an operation I can feel, and I am quite sure that a lot of programs work like this ;)
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: [OT] Time complexity (was: Where is NSList?)
  - From: Marcel Weiher <email@hidden>

References:
	>Re: Where is NSList? (From: Stephen Checkoway <email@hidden>)

Prev by Date: Re: The problem with bindings
Next by Date: Re: [OT] Premature optimizations (was: Where is NSList?)
Previous by thread: Re: Where is NSList?
Next by thread: Re: [OT] Time complexity (was: Where is NSList?)
Index(es):
- Date
- Thread