Re: Retrieving the EXIF date/time from 250k images
Re: Retrieving the EXIF date/time from 250k images
- Subject: Re: Retrieving the EXIF date/time from 250k images
- From: Alex Zavatone via Cocoa-dev <email@hidden>
- Date: Fri, 7 Jul 2023 12:18:49 -0500
I’ll support Jim Crate’s suggestion of EXIFTool. I just came across this the
other day while trying to remove the GPS location from a QuickTime video once I
had copied it to my Mac. Honestly, the quickest solution for me was to open it
on another Mac in QuickTime 7 Pro, press command J, delete the EXIF tracks and
save a new copy.
Ohh, how I wish I had my hands on the olden source for the QuickTime 7 Player.
IIRC at one time, there was some Apple source code on the developer site.
Should have snagged it while I had a chance.
Regarding Jim’s thought of NSTask being too slow, if you do take this route, I
wonder how better spawning up to 4 NSTasks would be to grab this info. Examine
the time of one fetch until diminishing results are returned, then match that
to the # of CPU cores and drive throughput. I love these kind of things. It’s
a case of, “well, this sucks. Can we make it quicker and if we can, can we
keep doing it until it stops sucking and actually make it good enough?”
Cheers and happy Friday,
Alex Zavatone
> On Jan 15, 2023, at 5:59 PM, James Crate <email@hidden> wrote:
>
> There is a perl program called exiftool that can load and set exif tool
> without loading the image data (or at least it doesn’t decode the image
> data). I don’t know whether it would be faster than loading image
> data/properties with ImageIO. You could write a perl script that used your
> bundled exiftool to load the exif data and output the results for many files
> in a format your program could handle, because instantiating perl/exiftool
> repeatedly for each image in a separate NSTask would probably be pretty slow.
>
> Jim Crate
>
>
>> On Jan 7, 2023, at 2:07 PM, Alex Zavatone via Cocoa-dev
>> <email@hidden> wrote:
>>
>> Hi Gabe. I’d add basic logging before you start each image and after you
>> complete each image to see how much each is taking on each of problem tests
>> so you can see the extent of how slow it is on your problem platforms.
>>
>> Then you can add more logging to expose the problems and start to address
>> them once you see where the bottlenecks are.
>>
>> I wonder if there is a method to load the EXIF data out of the files without
>> opening them completely. That would seem like the ideal approach.
>>
>> Cheers,
>> Alex Zavatone
>>
>>> On Jan 7, 2023, at 12:36 PM, Gabriel Zachmann <email@hidden> wrote:
>>>
>>> Hi Alex, hi everyone,
>>>
>>> thanks a lot for the many suggestions!
>>> And sorry for following up on this so late!
>>> I hope you are still willing to engage in this discussion.
>>>
>>> Yes, Alex, I agree in that the main question is:
>>> how can I get the metadata of a large amount of images (say, 100k-300k)
>>> *without* actually loading the whole image files.
>>> (For your reference: I am interested in the date tags embedded in the EXIF
>>> dictionary, and those dates will be read just once per image, then cached
>>> in a dictionary containing filename & dates, and that dictionary will get
>>> stored on disk for future use by the app.)
>>>
>>>> CGImageSourceRef imageSourceRef =
>>>> CGImageSourceCreateWithURL((CFURLRef)imageUrl, NULL);
>>>
>>> I have tried this:
>>>
>>> for ( NSString* filename in imagefiles )
>>> {
>>> NSURL * imgurl = [NSURL fileURLWithPath: filename isDirectory: NO];
>>> CGImageSourceRef sourceref = CGImageSourceCreateWithURL( (__bridge
>>> CFURLRef) imgurl, NULL );
>>> }
>>>
>>> This takes 1 minute for around 300k images stored on my internal SSD.
>>> That would be OK.
>>>
>>> However! .. if performed on a folder stored on an external hard disk, I get
>>> the following timings:
>>>
>>> - 20 min for 150k images (45 GB)
>>> - 12 min for 150k images (45 GB), second time
>>> - 150 sec for 25k images (18 GB)
>>> - 170 sec for 25k images (18 GB), with the lines below (*)
>>> - 80 sec for 22k (3 GB) images
>>> - 80 sec for 22k (3 GB) images, with the lines below (*)
>>>
>>> All experiments were done on different folders on the same hard disk, WD
>>> MyPassport Ultra, 1 TB, USB-A connector to Macbook Air M2.
>>> Timings with the same number of files/GB were the same folders, resp.
>>>
>>> (*): these were timings where I added the following lines to the loop:
>>>
>>> CFDictionaryRef fileProps = CGImageSourceCopyPropertiesAtIndex( image,
>>> 0, NULL );
>>> bool success = CFDictionaryGetValueIfPresent( fileProps,
>>> kCGImagePropertyExifDictionary, (const void **) & exif_dict );
>>> CFDictionaryGetValueIfPresent( exif_dict,
>>> kCGImagePropertyExifDateTimeDigitized, (const void **) & dateref );
>>> iso_date = [isoDateFormatter_ dateFromString: (__bridge NSString *
>>> _Nonnull)(dateref) ];
>>> [datesAndTimes_ addObject: iso_date ];
>>>
>>> (Plus some error checking, which I omit here.)
>>>
>>> First of all, we can see that the vast majority of time is spent on
>>> CGImageSourceCreateWithURL().
>>> Second, there seem to be some caching effects, although I have a hard time
>>> understanding that, but that is not the point.
>>> Third, the durations are not linear; I guess it might have something to do
>>> with the sizes of the files, too, but again, didn't investigate further.
>>>
>>> So, it looks to me like CGImageSourceCreateWithURL() really loads the
>>> complete image file.
>>>
>>> I don't see why Ole Begemann (ref'ed in Alex' post) can claim his approach
>>> does not load the whole image.
>>>
>>>
>>> Some people suggested parallelizing the whole task, using
>>> dispatch_queue_create or NSOperationQueue.
>>> (Thanks Steve, Gary, Jack!)
>>> Before restructuring my code for that, I would like to better understand
>>> why you think that will speed up things.
>>> The code above pretty much does no computations, so most of the time is, I
>>> guess, spent on waiting for the data to arrive from hard disk.
>>> So, why would would several threads loading those images in parallel help
>>> here? In my thinking, they will just compete for the same resource, i.e.,
>>> hard disk.
>>>
>>>
>>> I also googled quite a bit, to no avail.
>>>
>>> Any and all hints, suggestions, and insights will be highly appreciated!
>>> Best, Gab
>>>
>>>
>>>>
>>>
>>>
>>>> if (!imageSourceRef)
>>>> return;
>>>>
>>>> CFDictionaryRef props = CGImageSourceCopyPropertiesAtIndex(imageSourceRef,
>>>> 0, NULL);
>>>>
>>>> NSDictionary *properties = (NSDictionary*)CFBridgingRelease(props);
>>>>
>>>> if (!properties) {
>>>> return;
>>>> }
>>>>
>>>> NSNumber *height = [properties objectForKey:@"PixelHeight"];
>>>> NSNumber *width = [properties objectForKey:@"PixelWidth"];
>>>> int height = 0;
>>>> int width = 0;
>>>>
>>>> if (height) {
>>>> height = [height intValue];
>>>> }
>>>> if (width) {
>>>> width = [width intValue];
>>>> }
>>>>
>>>>
>>>> Or this link by Ole Bergmann?
>>>>
>>>> https://oleb.net/blog/2011/09/accessing-image-properties-without-loading-the-image-into-memory/
>>>>
>>>> I love these questions. I find out more about iOS programming by
>>>> researching other people’s problems than the ones that I’m currently faced
>>>> with.
>>>>
>>>> Hopefully some of these will help.
>>>>
>>>> Cheers,
>>>> Alex Zavatone
>>>
>>
>> _______________________________________________
>>
>> Cocoa-dev mailing list (email@hidden)
>>
>> Please do not post admin requests or moderator comments to the list.
>> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>>
>> Help/Unsubscribe/Update your Subscription:
>>
>> This email sent to email@hidden
>
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden