Re: Best way of identifying duplicate files in Cocoa
Re: Best way of identifying duplicate files in Cocoa
- Subject: Re: Best way of identifying duplicate files in Cocoa
- From: Jean-Daniel Dupas <email@hidden>
- Date: Wed, 21 Nov 2007 10:33:12 +0100
On Nov 20, 2007, at 2:48 PM, Michael Watson wrote:
I implemented MD5 hashing and comparison in a file diff utility I
wrote for internal use, and I gotta say . . . it was *fast* with
tens of thousands of files of varying size. (Say, anywhere from
4KB to dozens of megs.)
So did I! Here is source:
http://svn.red-bean.com/bbum/trunk/hacques/dupinator.py
It checks the file sizes and then hashes the first 4k. Finally,
it'll hash the full file if the sizes and first 4k matches.
b.bum
To get a MD5 you have to read the file, AND compute the digest. To
compare to file, you just have to read the file. What is the benefit
of the MD5 in this case?
I use plain C to be able to compare forks, but you can easyli replace
CFDataRef by NSData
static __inline__
OSStatus SOFileReadChunk(SInt16 aFork, CFIndex length,
CFMutableDataRef buffer) {
ByteCount size = 0;
OSStatus err = noErr;
ByteCount remaining = length;
CFDataSetLength(buffer, length);
void *buf = CFDataGetMutableBytePtr(buffer);
do {
err = FSReadFork(aFork, fsAtMark | kFSNoCacheMask, 0, remaining,
buf, &size);
if (noErr == err || eofErr == err)
remaining -= size;
} while (remaining > 0 && noErr == err);
CFDataSetLength(buffer, length - remaining);
return err;
}
#define BUFFER_SIZE_KB 32
static
OSStatus SOCompareFork(FSRef *f1, FSRef *f2, HFSUniStr255 *forkName,
bool *equals) {
OSStatus err = noErr;
SInt16 fnum1 = 0, fnum2 = 0;
err = FSOpenFork(f1, forkName->length, forkName->unicode,
fsRdPerm, &fnum1);
if (noErr == err)
err = FSOpenFork(f2, forkName->length, forkName->unicode,
fsRdPerm, &fnum2);
if (noErr == err) {
*equals = true;
CFMutableDataRef d1 = CFDataCreateMutable(kCFAllocatorDefault,
1024 * BUFFER_SIZE_KB);
CFMutableDataRef d2 = CFDataCreateMutable(kCFAllocatorDefault,
1024 * BUFFER_SIZE_KB);
do {
err = SOFileReadChunk(fnum1, 1024 * BUFFER_SIZE_KB, d1);
if (noErr == err || eofErr == err)
err = SOFileReadChunk(fnum2, 1024 * BUFFER_SIZE_KB, d2);
if (noErr == err || eofErr == err)
*equals = CFEqual(d1, d2);
} while (noErr == err && *equals);
CFRelease(d2);
CFRelease(d1);
}
if (eofErr == err) err = noErr;
if (fnum2) verify_noerr(FSCloseFork(fnum2));
if (fnum1) verify_noerr(FSCloseFork(fnum1));
return err;
}
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden