• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Best way of identifying duplicate files in Cocoa
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best way of identifying duplicate files in Cocoa


  • Subject: Re: Best way of identifying duplicate files in Cocoa
  • From: Frank Reiff <email@hidden>
  • Date: Wed, 21 Nov 2007 15:32:44 +0100

Hi Bill,

Thanks for code example. It always seems that this type of thing is better done in a scripting language. It's something about the economy of the language. I've been using Ruby for around a year now and it really rocks for this type of stuff.

The MD5 clearly is the right solution for comparing large numbers of files between each other. I'm not sure I'll be needing such a heavy- duty solution, but you never know.. once you add a feature people find ways of testing them to the limit..

What's more don't want to keep anything much in memory, so the tree would need to built inside of a Core Data persistent store which might negate any performance gains.

I haven't started implementing the feature yet, but I'll be sure to have a look at your code before embarking on it and I'll see how much of performance bottleneck it will be..

Best regards,

Frank

On 21 Nov 2007, at 10:55, Bill Bumgarner wrote:

On Nov 21, 2007, at 1:33 AM, Jean-Daniel Dupas wrote:
To get a MD5 you have to read the file, AND compute the digest. To compare to file, you just have to read the file. What is the benefit of the MD5 in this case?

I can MD5 the first N bytes and build up a shallow tree of all files that are (a) of the same size and (b) are identical for the first 1024 bytes (as hashed by a checksum of said bytes).


From there, yes, calculating an md5 of full file contents is a complete waste of time when comparing whole files. Given the relative infrequency of files that are identical within the first 1K, the silliness of those particular lines of code were never identified as a performance bottleneck. ;)

b.bum

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >Best way of identifying duplicate files in Cocoa (From: Frank Reiff <email@hidden>)
 >Re: Best way of identifying duplicate files in Cocoa (From: Jean-Daniel Dupas <email@hidden>)
 >Re: Best way of identifying duplicate files in Cocoa (From: Frank Reiff <email@hidden>)
 >Re: Best way of identifying duplicate files in Cocoa (From: Michael Watson <email@hidden>)
 >Re: Best way of identifying duplicate files in Cocoa (From: Bill Bumgarner <email@hidden>)
 >Re: Best way of identifying duplicate files in Cocoa (From: Jean-Daniel Dupas <email@hidden>)
 >Re: Best way of identifying duplicate files in Cocoa (From: Bill Bumgarner <email@hidden>)

  • Prev by Date: Re: Best way of identifying duplicate files in Cocoa
  • Next by Date: Re: Best way of identifying duplicate files in Cocoa
  • Previous by thread: Re: Best way of identifying duplicate files in Cocoa
  • Next by thread: Re: Best way of identifying duplicate files in Cocoa
  • Index(es):
    • Date
    • Thread