• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Fast hash of NSData?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fast hash of NSData?


  • Subject: Re: Fast hash of NSData?
  • From: Kyle Sluder <email@hidden>
  • Date: Fri, 29 Nov 2013 13:49:53 -0800

> On Nov 29, 2013, at 12:56 PM, Graham Cox <email@hidden> wrote:
>
> The situation is, my app allows users to include images, which I’d like to preserve in their original formats as far as possible. i.e. if they drag in a JPEG, then I track the JPEG data and ultimately that gets embedded in my file. If they drag the same image in more than once, I want to ensure that the same data is used for as many copies as they include. So I compare the image data with what I know about already, and if there’s a match, according to the hash, then I substitute the original data I know about already.

In this scheme, if there is a hash collision, you lose user data. That should be a non-starter. You *must* do a full bytewise comparison in case of collision.


> So the hash only needs to be computed whenever an object is added, so at that time the time to compute the hash is negligible. But it’s also computed for each image data read when opening the file, because even though each instance is unique, it still needs to be hashed and stored in case the user subsequently adds an image that matches one of these. It’s this that’s taking noticeable time, for files that have lots of different images in them.

You also have another, damn-quick "hash key" that takes zero disk access to compute: -[NSData length].

Use the size as a primary key, then a reasonably low-collision/low-complexity hash algorithm as a secondary key. That seems like a reasonable compromise between runtime efficiency and implementation simplicity.

You could make the keys of your dictionary a custom object:

@interface ImageDataHashKey : NSObject <NSCopying>
+ (instancetype)keyForData:(NSData *)data;
@property (readonly) NSUInteger size;
@end

@implementation ImageDataHashKey
{
  NSUInteger _hash;
}

+ (instancetype)keyForData:(NSData *)data;
{
  ImageDataHashKey *key = [[self alloc] init];
  key->_hash = hashfunc(data);
}

- (NSUInteger)hash;
{
  return (NSUInteger)_hash;
}

- (BOOL)isEqual:(id)other;
{
  return [other isKindOfClass:[ImageDataHashKey class]] && self->_size == other->_size && self->_hash == other->_hash;
}
@end

//--

A further optimization might be to build your own cuckoo hash to avoid collisions entirely.

--Kyle Sluder
_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden


  • Follow-Ups:
    • Re: Fast hash of NSData?
      • From: Scott Ribe <email@hidden>
    • Re: Fast hash of NSData?
      • From: Graham Cox <email@hidden>
References: 
 >Fast hash of NSData? (From: Graham Cox <email@hidden>)
 >Re: Fast hash of NSData? (From: Kyle Sluder <email@hidden>)
 >Re: Fast hash of NSData? (From: Graham Cox <email@hidden>)

  • Prev by Date: Re: Fast hash of NSData?
  • Next by Date: Re: Fast hash of NSData?
  • Previous by thread: Re: Fast hash of NSData?
  • Next by thread: Re: Fast hash of NSData?
  • Index(es):
    • Date
    • Thread