Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sorting Problem



Dave Taflin wrote:
> For grins, have you tried shell commands?
> 
> cat unsorted.txt | sort > sorted.txt

Unneeded "cat".  That should probably be:

    sort +0.0 +0.10 +0.20 unsorted.txt > sorted.txt

to sort by columns 0, 10, and 20.

However, I don't think you'd want to pull all 2G of text data into memory
for the sort.  I would very much doubt if UNIX sort does out-of-core
processing, which is what you really want for this sort (heh) of thing.
You'd kill your computer with swapping.

However, you might be able to use normal UNIX tools like "cut" to get rid of
the unneeded fields and reduce the problem tremendously.  Once you get the
number of lines down, UNIX sort might be a good bet.  If you replace the
unneeded fields with a simple line number, then you have a simple way of
correlating the sorted results with your original file.

    % cut -c1-29 unsorted.txt | cat -n > trimmed.txt
    % sort +0.7 +0.17 +0.27 trimmed.txt > sorted.txt

I just tried this with a ginned up file with ~354,000 lines, and it took
less than 8 seconds to sort on my 500Mhz G4.  The final "sorted.txt" will
have a number at the start of the line that corresponds to its position in
the original list.

I leave it as an exercise for the reader to figure out how to merge the
sorted data back with the "unneeded" fields.

-Sean

__
email@hidden
925-422-1648
_______________________________________________
scitech mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/scitech
Do not post admin requests to the list. They will be ignored.

References: 
 >RE: Sorting Problem (From: "Dave Taflin" <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.