|Date||April 1, 2005|
|Speaker||Steven G. Johnson (MIT)|
|Topic:||FFTW: "The Fastest Fourier Transform in the West"|
Fast Fourier transforms (FFTs) lie at the heart of many problems in scientific computing and engineering, from spectral methods for partial differential equations, to digital signal processing, to large-number arithmetic for cryptography. However, despite nearly 40 years of scrutiny, they have proved surprisingly resistant to efficient implementation -- FFTs never achieve the peak CPU speed except for very small problems, and their historical performance has doubled every ~30 months over 40 years, considerably slower than Moore's 18-month doubling time followed by peak CPU throughput.
In this talk, we outline the historical progression of the understanding of fast Fourier transforms and their implementation, from Gauss in 1805 to Danielson and Lanczos in 1942 to the digital age that dawned with Cooley and Tukey in 1965, to our recent developments of self-optimizing, auto-generated FFTs at MIT. For much of the FFT's history, the focus was on minimizing the number of arithmetic operations, from Cooley and Tukey's erroneous assertion that radix 3 was optimal to Winograd's demonstration that only O(N) multiplications were required for an FFT of length N. Unfortunately, modern architectures have divorced performance from simple heuristics like operation counts, and the past strategies no longer succeed; the question of the fastest algorithm, therefore, has had an ever-changing answer that is increasingly difficult to predict. Instead, we believe that a more stable solution may be possible by changing the question: instead of asking what is the best algorithm, one should ask what is the smallest collection of simple algorithmic fragments whose compositions span the optimal algorithm on as many computer architectures as possible. We describe an implementation of this strategy---combining self-optimization, code generation, and recursive cache-oblivious algorithms---in the context of FFTW, our free and widely used fast Fourier transform (FFT) library (www.fftw.org).