Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Optimizing 2D FFTs



Happily, one sees that people have promptly found both single- and double-precision
library references for 2D FFTs. If you ask me, this is how the scitech list achieves
its greatest utility---by offering solid advice, and quickly!


I do have some abstract (i.e., not software code per se) observations in regard to 2D FFTs, or even 3D.
Forgive my elementary points if you already know all about this.
(Everything I am saying here applies to the use of the libraries, at any precision.)


1) As is well known, the 2D FFT can be effected via a 1-dim. FFT per row, in-place, and
then a 1-dim. FFT per column, in-place. This means it is straightforward to
build whatever you need from a really great 1-dim. FFT (like Apple's various choices).
(See point 3.) The same easy-construction principle applies to 3D FFTs.


2) If the input data is pure-real, then the time (and transform storage) can be cut in half by
exploiting the natural Hermitian symmetry. The FFT paper at
www.apple.com/acg
gives at least one algorithm for avoiding complex-number storage
in higher-dimensional FFTs. Note that said ACG paper does not show the latest library code;
the point being that any of the fastest library implementations can still be
employed consistently with the paper's ideas.


3) You may be thinking that the row-FFTs will be fast, but the column-FFTs
will have to hop memory. But that's OK, there are several speedups for columns.
such as:


a) "Column-ferrying", by which one bites the bullet and extracts a column into
a (single) auxiliary row, then does all the 1-dim FFT work there, then puts the
aux. row back into the column.


b) Clever tricks abound for mutiple-column ferrying, such as: Do four columns at once,
by reading four-elements-across at each row index of the column, or
in some cases when there is byte data, one can read a 4-byte int across 4 columns,
and so on.


c) For 3D FFTs, one would ferry any column, or any "pile" (the 3rd data axis!) into a single
auxiliary row. This is to say, an NxNxN 3D FFT, just like the NxN 2D FFT, only needs an
aux. row of size N.


4) Not that the aforementioned paper also describes the execution of *huge* 1-dim FFTs, by factoring
the problem into a cache-friendly matrix format, with known phase factors. In this sense
a 2D FFT of size NxN is actually *easier* than a 1-dim FFT of size N^2, because the latter requires
phase factors.


R. E. Crandall
Advanced Computation Group
_______________________________________________
scitech mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/scitech
Do not post admin requests to the list. They will be ignored.




Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.