Re: dispatch_async Performance Issues
Re: dispatch_async Performance Issues
- Subject: Re: dispatch_async Performance Issues
- From: Jonathan Taylor <email@hidden>
- Date: Tue, 28 Jun 2011 20:37:05 +0100
> In the meantime however, I found one (surprising) cause of the performance issue. After making the versions *more* equivalent the issue become apparent. I restructured the second version (using the C++ iterators) and will discuss this in more detail. The culprit is in the consumer part as follows:
>
> New restructured code:
>
> [...]
>
> The difference compared to the former code provided in the previous mail is now
>
> 1) The C++ instances, that is the iterators, are defined locally within the block.
>
> 2) The "Future" (that is the result of the operation) is conditional compiled in or out, in order to test its impact.
> Here, the __block modifier is used for the "Future" variables "sum" and "total".
> When using pointers within the block accessing the outside variables, the performance does not differ, but using __block may be more correct.
Ah - now then! I will take a very strong guess as to what is happening there (I've done it myself, and seen it done by plenty of others! [*]). In the case where you do NOT define USE_FUTURE, your consumer thread as written in your email does not make any use of the variables "sum_" and "total_". Hence the compiler is entirely justified in optimizing out those variables entirely! It will still have to check the iterator against eof, and may have to dereference the iterator[**], but it does not need to update the "sum_" or "total_" variables.
It may well be that there is still a deeper performance issue with your original code, and I'm happy to have another look at that when I have a chance. I suggest you deal with this issue first, though, as it appears to be introducing misleading discrepancies in the execution times you're using for comparison.
As I say, it's quite a common issue when you start stripping down code with the aim of doing minimal performance comparisons. The easiest solution is either to printf the results at the end (which forces the compiler to actually evaluate them!), or alternatively do the sort of thing you're doing when USE_FUTURE is defined - writing to a shadow variable at the end. If you declare your shadow variable as "volatile" then the compiler is forced to write the result and is not permitted to optimize everything out.
Hope that helps, even if it may not deal with your original problem yet. Apologies that my first round of guesses were wrong - I'm pretty sure about this one though :)
Jonny
[**] Completely unrelated to this thread, but see this rather extreme example where the claimed performance had to be reduced by a factor of twelve due to this problem! http://www.ibm.com/developerworks/forums/thread.jspa?threadID=226415
[*] I ~think~ ... because this involves a memory access, which is strictly speaking a side effect in itself.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden