• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Best way of identifying duplicate files in Cocoa
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best way of identifying duplicate files in Cocoa


  • Subject: Re: Best way of identifying duplicate files in Cocoa
  • From: Michael Watson <email@hidden>
  • Date: Tue, 20 Nov 2007 17:48:46 -0500

I implemented MD5 hashing and comparison in a file diff utility I wrote for internal use, and I gotta say . . . it was *fast* with tens of thousands of files of varying size. (Say, anywhere from 4KB to dozens of megs.)


-- m-s

On 20 Nov, 2007, at 16:42, Frank Reiff wrote:

Hi Jean-Daniel,

Thanks for your response.

On 16 Nov 2007, at 14:46, Jean-Daniel Dupas wrote:


Le 16 nov. 07 à 14:25, Frank Reiff a écrit :
Another issue is of course performance. Comparing byte-by-byte is certainly the simplest and most reliable way of doing this, but it's SLOW.. on the other hand I don't really know what the performance characteristics of an MD5, CRC32, or SHA hash are and whether or not you need to read in the whole file contents to apply them..

It would thus be great if somebody, somewhere had published a ready-to-use - (BOOL) file: (NSString*) path isIdenticalTo: (NSString*) path2; method :-)

I've spent the last two hours searching the web, but I haven't found anything that comes close..

You don't have to check byte-by-byte if the two files have a different size.
Then, comparing byte-per-byte is not so slow, as you can abort the comparaison as soon as two bytes are differents.


Using a hash method has no benefit to compare two files on the disk. It's only usefull if you want to compare a remote file (with precomputed hash) and a local file.

I'll probably be going with:

* check length
* check last few bytes (begin with the same bytes but do not finish with them)
* check byte-by-byte


Computing a hash could be interesting in situations where there are lots and lots of files with the same length. Instead of having to compare each file with all other files of the same length, one could simply compute the hash by traversing it once and then compare the hashes instead. Of course in order to be 100% certain one would need to then do another byte-by-byte check again. Alternatively one could cash the relationships between all files, e.g. A != B and B == C means A != C and C! = A

I can see this could be fun :-)

Best regards,

Frank_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
@bungie.org


This email sent to email@hidden

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: Best way of identifying duplicate files in Cocoa
      • From: Bill Bumgarner <email@hidden>
References: 
 >Best way of identifying duplicate files in Cocoa (From: Frank Reiff <email@hidden>)
 >Re: Best way of identifying duplicate files in Cocoa (From: Jean-Daniel Dupas <email@hidden>)
 >Re: Best way of identifying duplicate files in Cocoa (From: Frank Reiff <email@hidden>)

  • Prev by Date: Re: Position of NSStatusItem
  • Next by Date: Re: sizeToFit without actually changing the size?
  • Previous by thread: Re: Best way of identifying duplicate files in Cocoa
  • Next by thread: Re: Best way of identifying duplicate files in Cocoa
  • Index(es):
    • Date
    • Thread