re: sorting large tables
re: sorting large tables
- Subject: re: sorting large tables
- From: Ben Trumbull <email@hidden>
- Date: Thu, 13 Mar 2008 14:00:28 -0700
Unfortunately, the table is not KVO-compliant so I cannot use
NSSortDescriptor. And I need to pass the primary and secondary sort
fields at runtime, so I cannot use sortUsingSelector since it takes
no parameters.
NSSortDescriptor requires an object be *KVC* compliant. It does not
use KVO. Since KVC does a pretty good job of inferring keys from
method names, typically one has to actually put some effort into
breaking KVC compliance.
Now, NSSortDescriptor does cache some of the results from KVC, so I
suppose if you're doing something pretty whacky like having fake keys
that return the result of a computation every time (like "random" or
a series) you might not get the results you expect. Of course,
trying to sort on such a key is probably semantically meaningless, so
I hereby assert NSSortDescriptor's behavior is still correct ;)
I am using sortArrayUsingFunction: context: to sort a fairly large
table (100k recs). Judging from the readout of comparisons being made
the sort appears to be about 80% done when everything comes to a
halt. No message, no nothing. It just stops.
Run it in gdb, then hit control-C. What does it look like in top ?
Is your heap huge ? Is the machine thrashing (swapping to disk) ?
Also, if it's a stray pointer problem, you could try guard malloc
(man libgmalloc)
I'd really appreciate suggestions on how to 1) get through this file
using the function I set up and 2) ideas on alternative approaches
that will be able to handle much larger files for the future.
Sorting much larger files will require some effort on your part.
First, you'll want to tune your current performance. Creating
temporary objects during the comparison function is bad. If you
absolutely have to, make sure you use -alloc/init and explicitly
-release and avoid autorelease objects and objects from convenience
constructors. -autorelease just won't fly in a comparison method for
a 100,000 objects. This may mean your accessor methods need to be
changed so they don't impair the comparison method.
You definitely don't want to work with all the data at once. You
want to stream through it in a nice batch size, which is large, but
nothing that will put memory pressure on the overall system.
Depending on your system and deployment target that could 10MB or
100MB. Typically there are diminishing returns as the batch size
grows, so some empirically testing (binary search style) should
quickly give you an idea.
One nice thing about sorting, is that this is easy to do: merge sort.
You can break up the problem as much as you want, and use any handy
sort function, and then merge the pieces together.
Algorithms for merging are readily available on the net. Ask google.
It's about 1 page of code to use qsort_r() and a hand written merge
function.
Merge sort is trivially parallelizable with NSOperationQueue, so you
can speed things up more when you're ready to tackle that. It's 1
page of code to allocate, build dependencies, and run a fully
concurrent merge sort this way.
--
-Ben
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden