Re: Fast hash of NSData?
Re: Fast hash of NSData?
- Subject: Re: Fast hash of NSData?
- From: Graham Cox <email@hidden>
- Date: Fri, 29 Nov 2013 21:56:21 +0100
On 29 Nov 2013, at 9:36 pm, Kyle Sluder <email@hidden> wrote:
> Often, it's better to improve the data structure before trying a different hash function. It's much easier to reason about how to improve a solution in terms of algorithm analysis of your data structure rather than it is to perform an empirical analysis of a specific hash function's collision-to-speed tradeoffs.
>
> Long way of saying: what's the actual problem you're trying to solve? Maybe a hash table is a poor fit.
OK, there could well be a better way to do this.
Before I get into it though, thanks to Eric and Robert for the pointers - substituting first fnv1a then djb2 for my hash, I cut the time down to 50% then 36% of the original respectively, so very worthwhile since this was the real time hog in a profile of opening a large file.
The situation is, my app allows users to include images, which I’d like to preserve in their original formats as far as possible. i.e. if they drag in a JPEG, then I track the JPEG data and ultimately that gets embedded in my file. If they drag the same image in more than once, I want to ensure that the same data is used for as many copies as they include. So I compare the image data with what I know about already, and if there’s a match, according to the hash, then I substitute the original data I know about already. The object that manages this keeps a dictionary of known image data, keyed by the hash (as a string). In that way, I just compute the hash of the image data, and if it’s in the dictionary already, I return that copy, otherwise add it to the dictionary.
So the hash only needs to be computed whenever an object is added, so at that time the time to compute the hash is negligible. But it’s also computed for each image data read when opening the file, because even though each instance is unique, it still needs to be hashed and stored in case the user subsequently adds an image that matches one of these. It’s this that’s taking noticeable time, for files that have lots of different images in them.
An obvious different approach is not to embed the image data in my app’s file data, but to copy the image on disk to my file’s package and track it by name or URL. I do plan to do that at some point, but that’s a change that’s quite complex compared to just speeding up the existing design.
—Graham
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden