Re: dispatch_apply strange results
Re: dispatch_apply strange results
- Subject: Re: dispatch_apply strange results
- From: "Gerriet M. Denkmann" <email@hidden>
- Date: Tue, 11 Feb 2014 14:35:14 +0700
On 9 Feb 2014, at 15:53, Greg Parker <email@hidden> wrote:
> On Feb 9, 2014, at 12:19 AM, Gerriet M. Denkmann <email@hidden> wrote:
>> The real app (which I am trying to optimise) has actually two loops: one is counting, the other one is modifying. Which seems to be good news.
>>
>> But I would really like to understand what I should do. Trial and error (or blindly groping in the mist) is not really my preferred way of working.
>
> Optimizing small loops like this is a black art. Very small effects become critically important, such as the alignment of your loop instructions or the associativity of that CPU's L1 cache.
[...]
> Cache associativity can mean that there are some array split sizes that are much worse than others. If you choose the wrong size then each thread's working memory is on different cache lines, but those cache lines collide with each other in memory caches. Changing the work size to avoid collisions can help.
sysctl hw.cachelinesize returns: hw.cachelinesize: 64
I divided my huge array (malloced, address is multiple of 0x1000) into at most [NSProcessInfo processorCount] chunks, where each chunk starts at a multiple of 2^n (using fewer chunks if required by this rule).
The result of using dispatch_apply:
n time
0 10
1 5.5
2 4
3 3
4 2
5 1.7
6 1.6
7 1.5
16 1.4
That is, your statement "that there are some array split sizes that are much worse than others" is strongly backed up by my tests.
Kind regards,
Gerriet.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden