ARSC HPC Users' Newsletter 238, February 4, 2002

FFTs on Chilkoot

[ Thanks to Tom Logan of ARSC for this article. ]

I have been testing the various FFT routines available at ASRC, with an eye to performance. This article describes results for Chilkoot, the Cray SV1ex.

For comparison purposes, I started with a fairly quick FFT code taken from the INFO-MAC hyper archive whose original author was John Green (I shall refer to this as Green's FFT). Green's FFT routines were reported to be 2-3 times faster than the standard Numerical Recipes FFT algorithms.

I compared Green's complex 1-D FFT with the equivalent CRAY LibSci routine CCFFT and with the IMSL Math Library routines FFTCF/FFTCB. The tests were run on randomly generated complex arrays of varying lengths by calling both the forward and the reverse FFT 10000 times each. The run times in seconds for vectors of lengths from 64 to 16384 are summarized in the following table:

 FFT                       Routine Name
                Green           IMSL            Libsci
                =====           ====            ======
  64            0.49            0.52            0.24
  128           1.08            0.88            0.48
  256           2.28            1.64            0.94
  512           4.75            3.35            2.04
  1024          10.92           6.86            1.76
  2048          24.97           14.27           3.37
  4096          51.95           29.17           8.18
  8192          132.03          65.07           21.85
  16384         296.99          177.55          62.75

The results overwhelmingly show the CRAY LibSci routines are superior.

Figure 1 - FFT Comparison on Chilkoot:

Graphically we see a 2 to 7.5 times speedup over Green's implementation when using LibSci.

Figure 2 - Speedup on SV1ex: LibSci versus Green's:

As a further comparison, I performed a similar test using the 2-D complex FFTs from LibSci (CCFFT2D) and from IMSL (F2T2D/F2T2B), running each 2000 times forward and reverse on matrices of varying sizes. Not surprisingly, the results were quite similar to the 1-D cases, with the LibSci routine running 2.5 to 5 times faster than the IMSL version.

Figure 3 - 2-D FFT Comparison on Chilkoot:

Upon reading the CCFFT2D man page more closely, I noticed under the performance tips the comment "it is very important to make the leading dimensions of the arrays odd numbers to avoid memory bank conflicts." Doing this provided an additional 1.2 to 1.4 times speed up over the already speedy LibSci FFT routine.

 FFT               Leading Dimension
                   N              N+1
                ======          ======
  64              2.07            1.43
  128            11.47            7.91
  256            52.54           38.81
  512           263.31          214.83
Here's a graph showing the speed up obtained

Figure 4 - Speedup Changing Leading Dimension on SV1ex:

I think the conclusion to this exercise is obvious - if your application on the SV1ex uses FFTs, then you will be richly rewarded by faster run times when you use the CRAY LibSci routines and read the man pages for usage tips.

T3E Programming Environment and OS Upgrades

As announced in news/motd, yukon's default programming environment was upgraded to PE last Wednesday, and the default message passing toolkit will be upgraded to MPT this Wednesday.

As always, we'd like to hear of any problems, performance improvements, or other changes you notice. You'll have to recompile your code for the upgrade to have any effect.

UNICOS/mk will be upgraded soon. Watch news/motd.

"slayall" on Cluster

Clusters need to be watched a bit more than HPC systems like the T3E. One common problem is that job processes are inadvertently left running after a job completes.

Users of ARSC's linux cluster, quest, can ensure that all processes have been terminated by issuing the "slayall" command before leaving the system:


slayall $USER

(Where USER is the environment variable with your username in it. You can use the command just as it's given above or substitute your actual username. For example, if you were user "farquat," you could use the command, "slayall farquat".)

Note that if you have jobs running in pbs, slayall will kill them. If not, please run "slayall" prior to logging off.

Quick-Tip Q & A

  (The quick tip is still in hibernation... If you have any tips to
  share, we'd love to see them.)

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top