Re: problem with applying md5 to data
Re: problem with applying md5 to data
- Subject: Re: problem with applying md5 to data
- From: Quincey Morris <email@hidden>
- Date: Wed, 29 Jun 2011 13:31:18 -0700
On Jun 29, 2011, at 12:26, Wilker wrote:
> +(NSString *)generateHashFromPath:(NSString *)path {
> const NSUInteger CHUNK_SIZE = 65536;
>
> NSError *error = nil;
> NSData *fileData = [NSData dataWithContentsOfFile:path options:NSDataReadingMapped | NSDataReadingUncached error:&error];
>
> if (error) {
> return nil;
> }
>
> const uint8_t* buffer = [fileData bytes];
>
> NSUInteger byteLength = [fileData length];
> NSUInteger byteOffset = 0;
>
> if (byteLength > CHUNK_SIZE) {
> byteOffset = byteLength - CHUNK_SIZE;
> byteLength = CHUNK_SIZE;
> }
>
> CC_MD5_CTX md5;
> CC_MD5_Init(&md5);
>
> CC_MD5_Update(&md5, buffer, (unsigned int) byteLength);
> CC_MD5_Update(&md5, buffer + byteOffset, (unsigned int) byteLength);
>
> unsigned char digest[CC_MD5_DIGEST_LENGTH];
>
> CC_MD5_Final(digest, &md5);
>
> NSMutableString *ret = [NSMutableString stringWithCapacity:CC_MD5_DIGEST_LENGTH * 2];
>
> for (int i = 0; i < CC_MD5_DIGEST_LENGTH; i++) {
> [ret appendFormat:@"x", digest[i]];
> }
>
> return [ret lowercaseString];
> }
This looks good to me, but since we're noodling around various topics here, let me nitpick at some details:
1. Format specifier '%x' produces lower case letters, so you don't need to lowercase the returned string. If you'd happened to want uppercase, there's '%X'.
If you're just trying to return an immutable string, there are three possible answers:
a. Don't bother. It's almost always fine to return a NSMutableString when your method's API's return type says NSString.
b. Return '[ret copy]'. That allows NSString to do the fastest copy of the character data that it's capable of. The result is an immutable string.
c. Go back to the way you first did it, where you spelled out the entire format string, so you don't ever create a NSMutableString.
2. There still a flashing danger signal in your code. Pretty much any time you have to cast a scalar type, your code gets smelly (http://en.wikipedia.org/wiki/Code_smell).
If you consult the CC_MD5 man page, you'll see that the length parameter type is officially CC_LONG. What happens if byteLength is bigger than the largest possible CC_LONG? (In other words, what happens if your file is bigger than 4GB?)
WIth this code, the consequences would not be catastrophic, but the wrong digest would likely be calculated.
So, declare 'byteLength' as type CC_LONG and get rid of those ad hoc casts. You might still need a cast, but pair it with a safety check:
CC_LONG byteLength = (CC_LONG) ([file length] > CHUNK_SIZE ? CHUNK_SIZE : [file length]);
That messes up the setting of byteOffset, so you need to change that to something like:
NSUInteger byteOffset = [file length] > CHUNK_SIZE ? [file length] - CHUNK_SIZE : 0;
Depending on your aesthetic tastes, you might actually want to do:
NSUInteger fileLength = [fileData length];
CC_LONG byteLength = (CC_LONG) (fileLength > CHUNK_SIZE ? CHUNK_SIZE : fileLength);
NSUInteger byteOffset = fileLength > CHUNK_SIZE ? fileLength - CHUNK_SIZE : 0;
Anyone reading your code can now see that the cast is harmless, because you've explicitly checked for the error condition.
> I tried it against a video of 8.5GB, stored on external drive (at wifi with Airport Extreme), and it's blazing fast :)
This is a remarkable statement, if you think about it. My pitch for 'NSDataReadingMapped' relied on the assumption that the file was local. If NSData is really providing the effect of memory mapping with a remote file, it's really doing a *lot* of heavy lifting for you.
That's really quite a surprising result.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden