Re: Retrieving the EXIF date/time from 250k images
Re: Retrieving the EXIF date/time from 250k images
- Subject: Re: Retrieving the EXIF date/time from 250k images
- From: Steve Christensen via Cocoa-dev <email@hidden>
- Date: Tue, 16 Aug 2022 08:03:23 -0700
One way to speed it up is to do as much work as possible in parallel. One way
—and this is just off the top of my head— is:
1. Create a NSOperationQueue, and add a single operation on that queue to
manage the entire process. (This is because some parts of the process are
synchronous and might take a while and you don’t want to block the UI thread.)
2. The operation would create another worker NSOperationQueue where operations
are added that each process a single image file (the contents of your `for`
loop).
3. The manager operation adds operations to the worker queue to process a
reasonable chunk of the files (10? 50?) and then waits for those operations to
complete. (NSOperationQueue has something like a “wait until done” method.) It
then repeats until all the image files have been processed.
4. As each chunk completes, it can report status to the UI thread via a
notification or some other means.
Unlike your synchronous implementation, below, the order of updates to that
array is indeterminate. A way to fix it is to pre-populate it with as many
placeholder items (NSDate.distantPast?) as are in imagefiles and then store
iso_date at the same index as its corresponding filename. Another benefit is
that there is a single memory allocation at the beginning rather than periodic
resizes of the array (and copying the existing contents) as items are added.
And since all these items are running on different threads then you need to
protect access to your dates_and_times array because modifying it is not
thread-safe. One quick way is to create a NSLock and lock it around the array
update:
[theLock lock];
dates_and_times[index] = iso_date;
[theLock unlock];
Anyway, another way to look at the process.
Steve
> On Aug 14, 2022, at 2:22 PM, Gabriel Zachmann via Cocoa-dev
> <email@hidden> wrote:
>
> I would like to collect the date/time stored in an EXIF tag in a bunch of
> images.
>
> I thought I could do so with the following procedure
> (some details and error checking omitted for sake of clarity):
>
>
> NSMutableArray * dates_and_times = [NSMutableArray arrayWithCapacity:
> [imagefiles count]];
> CFDictionaryRef exif_dict;
> CFStringRef dateref = NULL;
> for ( NSString* filename in imagefiles )
> {
> NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
> // escapes any chars that are not allowed in URLs (space, &, etc.)
> CGImageSourceRef image = CGImageSourceCreateWithURL( (__bridge
> CFURLRef) imgurl, NULL );
> CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image,
> 0, NULL );
> bool success = CFDictionaryGetValueIfPresent( fileProps,
> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
> success = CFDictionaryGetValueIfPresent( exif_dict,
> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
> NSString * date_str = [[NSString alloc] initWithString: (__bridge
> NSString * _Nonnull)( dateref ) ];
> NSDate * iso_date = [isoDateFormatter_ dateFromString: date_str];
> if ( iso_date )
> [dates_and_times addObject: iso_date ];
> CFRelease( fileProps );
> }
>
>
> But, I get the impression, this code actually loads each and every image.
> On my Macbook, it takes 3m30s for 250k images (130GB).
>
> So, the big question is: can it be done faster?
>
> I know the EXIF tags are part of the image file, but I was hoping it might be
> possible to load only those EXIF dictionaries.
> Or are the CGImage functions above already clever enough to implement this
> idea?
>
>
> Best regards, Gab.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden