• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: realtime altivec fft -- checking & performance
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: realtime altivec fft -- checking & performance


  • Subject: Re: realtime altivec fft -- checking & performance
  • From: Urs Heckmann <email@hidden>
  • Date: Mon, 13 Jan 2003 18:14:21 +0100

Check out this one:

"Supercomputer-style FFT library for Apple G4"

http://developer.apple.com/hardware/ve/acgresearch.html

sample code:

ftp://ftp.apple.com/developer/Sample_Code/Devices_and_Hardware/ Velocity_Engine/VelEng_FFT.sit

It's AltiVec only, and it's rocking fast. The speed improvement over scalar computation should be noticably more than 20% unless you have small block sizes.

You can check alignment with something like this (32bit processor :-):

bool is_16byte_aligned ( void* theStuff )
{
if ( (int) theStuff & 15 != 0 ) return false; // should work.
return true;
}

if it is not, it is _maybe_ possible to do something like that (depends on needs):

1.) make your array at least 4 floats longer
2.) copy leading, unaligned bytes to unused back of array
3.) perform fft on aligned - and now wrapped - buffer

fft thinks of periodical stuff anyway, so it's often kinda indifferent where you start...


Oh, just wanted to write some more recipies, and now that:

I just realized I hadn't enabled -O3 and now get about 10%, apologies. Does that sound better?

You mean 10% instead of 80%? - Yupp, that's fine :-)

Forget what I wrote above...

Cheers,

;) Urs

Am Montag, 13.01.03, um 17:39 Uhr (Europe/Berlin) schrieb Brian Whitman:

Thanks to the v2 hint yesterday, I'm off and running with our vst
conversions. I'd be interested in hearing from anyone with a fft/ifft
loop in their AudioUnit about cpu usage in the realtime case. I was
getting 33% with old c code and now 20% with what I think is
Altivec-enhanced code on a Powerbook 1GHZ / 1GB. I say 'what I think
is' because I am not positive that the fft is being vectorized thanks
to doubts that our input data structure is 16-byte aligned. (We have to
interface with lapack/blas with our own vector class, but I've added
the __attribute__ ((aligned (16))) parameter to try to fix it.)

1) Is 20% "OK" for this sort of machine in realtime? My pIII 800 on VST
would get about this, so I'm wondering, especially since the g4 altivec
fft is highly touted.

2) Is there a way to make sure that the fft code is being vectorized?
The vDSP docs say that it will fall back to scalar if certain
parameters aren't met but don't suggest a way to check.
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: realtime altivec fft -- checking & performance
      • From: Brian Whitman <email@hidden>
References: 
 >realtime altivec fft -- checking & performance (From: Brian Whitman <email@hidden>)

  • Prev by Date: Re: realtime altivec fft -- checking & performance
  • Next by Date: Re: realtime altivec fft -- checking & performance
  • Previous by thread: Re: realtime altivec fft -- checking & performance
  • Next by thread: Re: realtime altivec fft -- checking & performance
  • Index(es):
    • Date
    • Thread