Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dual-core G5 sysctl



On 29 Nov 2005, at 22:13, Ian Ollmann wrote:

So why does such a modern OS still not have processor affinity?
It should be an easy and obvious win for the kernel team to implement it.

That is an excellent question to put in a Feature Request bug report directed towards the Kernel/Features component.

A search for "strip mining" seems to mean just working on a cache- sized amount of data. My code already works with L2 sized chunks so I guess I'll have to second the processor affinity request. A bit of searching reveals that Linux seems to have the sort of thing I'm looking for with sched_setaffinity / sched_getaffinity:
http://www.die.net/doc/linux/man/man2/sched_setaffinity.2.html
It looks as though the interface is a bitfield (one bit for each processor) and an appropriate mask for the desired processor affinity.


The only BSD / Darwin references I could find were to a kernel scheduler called ULE but I couldn't work out if this has been implemented, is still a work in progress or dead.

I've also found a few references to "utilBindThreadToCPU" from the CHUD framework but this seems to be purely an experimental interface for testing purposes. I take it this is not The Right Way.



On a related note, what is a good size for the amount of data to work on with Apple's vDSP Fourier transform routines? Currently, my code processes data in chunks of groupSize bytes:
spectraPerGroup = l2CacheSize / (4 * spectraLen * sizeof(float));
groupSize = spectraLen * spectraPerGroup;
=> groupSize = l2CacheSize / (4 * sizeof(float))
and then does Fourier transforms with:
if (canOverwriteData) {
if (spectraPerGroup == 1)
vDSP_fft_zip(fftSetup, &input, 1, fftLog2, FFT_INVERSE);
else
vDSP_fftm_zip(fftSetup, &input, 1, spectraLen, fftLog2, spectraPerGroup, FFT_INVERSE);
} else {
if (spectraPerGroup == 1)
vDSP_fft_zop(fftSetup, &input, 1, &freqData, 1, fftLog2, FFT_INVERSE);
else
vDSP_fftm_zop(fftSetup, &input, 1, spectraLen, &freqData, 1, spectraLen, fftLog2, spectraPerGroup, FFT_INVERSE);
}


Should I be using a different value for groupSize with the "zip" versus "zop" routines (presumably "zip" uses less memory).

Additionally, the most common FFT length is 131,072 (i.e. 1 MiB of data). Is there a cunning way to divide this between the caches of multiple processors, e.g. do the twiddle factors multiplication myself?



r i c k
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >Re: Dual-core G5 sysctl (From: Chris Cox <email@hidden>)
 >Re: Dual-core G5 sysctl (From: Ian Ollmann <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.