Re: realtime altivec fft -- checking & performance
Re: realtime altivec fft -- checking & performance
- Subject: Re: realtime altivec fft -- checking & performance
- From: Urs Heckmann <email@hidden>
- Date: Mon, 13 Jan 2003 18:14:21 +0100
Check out this one:
"Supercomputer-style FFT library for Apple G4"
http://developer.apple.com/hardware/ve/acgresearch.html
sample code:
ftp://ftp.apple.com/developer/Sample_Code/Devices_and_Hardware/
Velocity_Engine/VelEng_FFT.sit
It's AltiVec only, and it's rocking fast. The speed improvement over
scalar computation should be noticably more than 20% unless you have
small block sizes.
You can check alignment with something like this (32bit processor :-):
bool is_16byte_aligned ( void* theStuff )
{
if ( (int) theStuff & 15 != 0 ) return false; // should work.
return true;
}
if it is not, it is _maybe_ possible to do something like that (depends
on needs):
1.) make your array at least 4 floats longer
2.) copy leading, unaligned bytes to unused back of array
3.) perform fft on aligned - and now wrapped - buffer
fft thinks of periodical stuff anyway, so it's often kinda indifferent
where you start...
Oh, just wanted to write some more recipies, and now that:
I just realized I hadn't enabled -O3 and now get about 10%, apologies.
Does that sound better?
You mean 10% instead of 80%? - Yupp, that's fine :-)
Forget what I wrote above...
Cheers,
;) Urs
Am Montag, 13.01.03, um 17:39 Uhr (Europe/Berlin) schrieb Brian Whitman:
Thanks to the v2 hint yesterday, I'm off and running with our vst
conversions. I'd be interested in hearing from anyone with a fft/ifft
loop in their AudioUnit about cpu usage in the realtime case. I was
getting 33% with old c code and now 20% with what I think is
Altivec-enhanced code on a Powerbook 1GHZ / 1GB. I say 'what I think
is' because I am not positive that the fft is being vectorized thanks
to doubts that our input data structure is 16-byte aligned. (We have to
interface with lapack/blas with our own vector class, but I've added
the __attribute__ ((aligned (16))) parameter to try to fix it.)
1) Is 20% "OK" for this sort of machine in realtime? My pIII 800 on VST
would get about this, so I'm wondering, especially since the g4 altivec
fft is highly touted.
2) Is there a way to make sure that the fft code is being vectorized?
The vDSP docs say that it will fall back to scalar if certain
parameters aren't met but don't suggest a way to check.
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.