• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Fast hash of NSData?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fast hash of NSData?


  • Subject: Re: Fast hash of NSData?
  • From: Marcel Weiher <email@hidden>
  • Date: Mon, 02 Dec 2013 14:57:37 +0000

On Dec 1, 2013, at 15:36 , Graham Cox <email@hidden> wrote:

> Scanning my entire hard drive (excluding hidden files), which took several hours, sure I had plenty of collisions - but absolutely no false ones - they all turned out to be genuine duplicates of existing files. This is using the FNV-1a 64-bit hash + length approach.
>
> I’m thinking this is good enough, really. The odds of a particular user having two different image files that collide, and happening to add those exact images at once to our app must be astronomically low. Talk me out of it :)

IIRC, you were worried about the cost of a full compare.  According to these data, the amortized cost of a full compare is effectively zero if you do a full compare when you get a collision.  So do the full compare when you get a collision in order not to lose data.  Then you can twiddle the hash to get you a good compromise of speed vs. collisions.  Mike Abdullah’s suggestion of file size as a first check seems ideal to me (I’ve been using that technique with string lookups to very good effect, files would work much better).  I wouldn’t use a straight hash table but a slightly more sophisticated data structure using multiple comparison levels.

On Dec 1, 2013, at 18:52 , Kyle Sluder <email@hidden> wrote:

> But as a matter of principle, it’s negligent to knowingly design a system that will silently drop user data in normal operation. There are plenty of times you can make a reasonable argument for “that’s good enough,” but as far as I’m concerned, preserving user data is never one of them.

Seconded, thirded, …  Especially for a performance optimization when the effective performance cost of doing the final check is zero.

Marcel

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden


  • Follow-Ups:
    • Re: Fast hash of NSData?
      • From: Scott Ribe <email@hidden>
References: 
 >Re: Fast hash of NSData? (From: Graham Cox <email@hidden>)

  • Prev by Date: Re: Fast hash of NSData?
  • Next by Date: Re: Fast hash of NSData?
  • Previous by thread: Re: Fast hash of NSData?
  • Next by thread: Re: Fast hash of NSData?
  • Index(es):
    • Date
    • Thread