Re: Fast hash of NSData?
Re: Fast hash of NSData?
- Subject: Re: Fast hash of NSData?
- From: Kyle Sluder <email@hidden>
- Date: Fri, 29 Nov 2013 13:49:53 -0800
> On Nov 29, 2013, at 12:56 PM, Graham Cox <email@hidden> wrote:
>
> The situation is, my app allows users to include images, which I’d like to preserve in their original formats as far as possible. i.e. if they drag in a JPEG, then I track the JPEG data and ultimately that gets embedded in my file. If they drag the same image in more than once, I want to ensure that the same data is used for as many copies as they include. So I compare the image data with what I know about already, and if there’s a match, according to the hash, then I substitute the original data I know about already.
In this scheme, if there is a hash collision, you lose user data. That should be a non-starter. You *must* do a full bytewise comparison in case of collision.
> So the hash only needs to be computed whenever an object is added, so at that time the time to compute the hash is negligible. But it’s also computed for each image data read when opening the file, because even though each instance is unique, it still needs to be hashed and stored in case the user subsequently adds an image that matches one of these. It’s this that’s taking noticeable time, for files that have lots of different images in them.
You also have another, damn-quick "hash key" that takes zero disk access to compute: -[NSData length].
Use the size as a primary key, then a reasonably low-collision/low-complexity hash algorithm as a secondary key. That seems like a reasonable compromise between runtime efficiency and implementation simplicity.
You could make the keys of your dictionary a custom object:
@interface ImageDataHashKey : NSObject <NSCopying>
+ (instancetype)keyForData:(NSData *)data;
@property (readonly) NSUInteger size;
@end
@implementation ImageDataHashKey
{
NSUInteger _hash;
}
+ (instancetype)keyForData:(NSData *)data;
{
ImageDataHashKey *key = [[self alloc] init];
key->_hash = hashfunc(data);
}
- (NSUInteger)hash;
{
return (NSUInteger)_hash;
}
- (BOOL)isEqual:(id)other;
{
return [other isKindOfClass:[ImageDataHashKey class]] && self->_size == other->_size && self->_hash == other->_hash;
}
@end
//--
A further optimization might be to build your own cuckoo hash to avoid collisions entirely.
--Kyle Sluder
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden