ARSC HPC Users' Newsletter 274, August 8, 2003
Inlining in Cray FTN[ Thanks to Carl Morgenstern of Cray Inc. for ideas for this article. ]
"Inlining" replaces subroutine or function calls with their actual source code, prior to compilation. Although the compilers for all the major HPC systems can perform inlining, this article focuses on Cray ftn 5.0.
Why inline code?
- On Cray vector machines it can drastically improve performance because it eliminates subroutine calls, which prevent loops from vectorizing,
- Procedure call overhead is reduced,
- The optimizer has a larger block of code to optimize.
Under Cray ftn, the source code for the functions or subroutines to be inlined must:
- Appear in the same file as the caller, or
- Exist in files named explicitly with "-Oinlinefrom= ...", or
- Exist in files in a directory named in "-Oinlinefrom= ...".
This was mentioned in the > Quick-Tip of issue #207 and (of course) in "man ftn" (Cray PrgEnv 5.0):
-O inlinefrom=source[:source] ... The inlinefrom option specifies one or more files and/or directories that contain procedures for inline code expansion.
A list of every directory containing source files is easy to add to a makefile, but it's probably unacceptable. This would make every function eligible for inlining which slows down the compile and possibly the run. Some care in refining the list of functions to inline is a good idea.
On the other hand, selective lists of inline source files can become unruly for a large application with scores of files and multiple subdirectories. What follows are two ideas for managing these lists.
It seems every application is organized according to its own gods, but for example, here's one structure I've seen:
programdir makefile sourcedir_1 src1.1.f src1.2.f ... src1.n.f sourcedir_1.1 src1.1.1.f src1.1.2.f ... src1.1.n.f sourcedir_2 src2.1.f src2.2.f ... src2.n.f ... sourcedir_N srcN.1.f srcN.2.f ... srcN.n.f
Create a new variable in the makefile, and set it to the list of files to inline. E.g.,
PROGHOME=~/Progs/programdir INLINEFROM=$PROGHOME/sourcedir_1/src1.22.f INLINEFROM=$INLINEFROM:$PROGHOME/sourcedir_1/src1.23.f INLINEFROM=$INLINEFROM:$PROGHOME/sourcedir_1/sourcedir_1.1/src1.1.5.f INLINEFROM=$INLINEFROM:$PROGHOME/sourcedir_1/sourcedir_1.1/src1.1.6.f INLINEFROM=$INLINEFROM:$PROGHOME/sourcedir_3/src3.2.f INLINEFROM=$INLINEFROM:$PROGHOME/sourcedir_4/src4.7.f INLINEFROM=$INLINEFROM:$PROGHOME/sourcedir_4/src4.8.f INLINEFROM=$INLINEFROM:$PROGHOME/sourcedir_4/src4.9.fThen reference this variable in the appropriate compile commands, e.g.:
FFLAGS=$FFLAGS -Oinlinefrom=$INLINEFROMIf desired, you could create multiple INLINEFROM variables. In particular, you can't mix free and fixed format fortran code, so you may be forced to have at least two separate lists, INLINEFROM_FIXED and INLINEFROM_FREE.
Create a new subdirectory and populate it with symbolic links to the source files you wish to inline. Then, in the makefile, give the name of the new subdirectory on the compile line. E.g.,For example, you'd add to the directory structure for the application source:
inlinefromdir src1.22.f -> ../sourcedir_1/src1.22.f src1.23.f -> ../sourcedir_1/src1.23.f src1.1.5.f -> ../sourcedir_1/sourcedir_1.1/src1.1.5.f src1.1.6.f -> ../sourcedir_1/sourcedir_1.1/src1.1.6.f src3.2.f -> ../sourcedir_3/src3.2.f src4.7.f -> ../sourcedir_4/src4.7.f src4.8.f -> ../sourcedir_4/src4.8.f src4.9.f -> ../sourcedir_4/src4.9.fAnd add to the makefile:
FFLAGS=$FFLAGS -Oinlinefrom=inlinefromdirIf necessary, you could create multiple "inlinefromdir" directories.
Normally, inlining is restricted to leaf subroutines. Sometimes you might want to inline a non-leaf subroutine. This can be accomplished by compiling with -Oinline5. However, adding this directive can increase compilation time drastically. With the above methods, you may combine "-Oinlinefrom=inlinedir -Oinline5" and compile times don't grow significantly, as -Oinline5 applies only to the specific files.
Please let us know if you manage inlining differently. It's always good to see what other people have figured out...
Conditional Compilation: Part I
[[ Kate Hedstrom of ARSC contributes another great series of two articles. Thanks, Kate! ]]
I have recently been playing with a new ocean model, called FVCOM. It arrived as two files: fvcom.f and comblk00.h. I ran it through fsplit and built a Makefile, but I resisted adding C preprocessor macros. We have been using cpp for doing conditional compilation for quite a while now in other codes. However, after switching between three different test problems needing three different values for g, the gravitational constant, I had to do something. I figure there are at least two choices: cpp and the new Fortran 2000 coco. I will describe using cpp for conditional compilation this time and try out coco next time.
In using cpp, it seems that there are at least two styles, although the Fortran community is far more familiar with the first. Let's start with an example:
Style 1: ------- #define XX #ifdef XX call xx #endif Style 2: ------- #define XX 1 #if XX call xx #endif
I don't know all the pros and cons between these two, although I do know that the gnu coding standards call for the second style. One difference is that with the second style, a literal "XX" elsewhere in your code will be replaced by a "1". Another difference is when checking for one of several options:
Style 1: ------- #if defined XY defined YZ defined ZQ call xyzq #else call ppz #endif Style 2: ------- #if XY YZ ZQ call xyzq #else call ppz #endif
Note that in this context, "||" means "or" and "&&" means "and".
OK, so now you have these cpp lines in your code. Some Fortran compilers will magically call cpp for you if the file extension is .F or .F90. Should you go ahead and let it? I tried that for years, cursing every compiler that didn't do quite the right thing. In the end, I gave up and now my Makefile invokes cpp for me on all systems:
.SUFFIXES: .o .f90 .F .F90 .F.o: $(CPP) -P $(CPPFLAGS) $*.F > $*.f90 $(CLEAN) $*.f90 $(CFT) -c $(FFLAGS) $*.f90 .f90.o: $(CFT) -c $(FFLAGS) $< .F.f90: $(CPP) -P $(CPPFLAGS) $*.F > $*.f90 $(CLEAN) $*.f90
This gains you several things:
- A uniform set of Makefile dependencies (see "Unix Tools for Portable Applications," issue 245 ):
- Keeping the .f90 file can help with debugging.
- There really are compilers out there that don't do what I want - I can invoke the cpp I like.
One example of point 3 is the Cray, where the Fortran fpp does any #include last, after the macro substitutions. Since we like to keep the #define statements in an include file, this doesn't work at all. Years ago I reported it as a bug and was told that it's a feature. Instead, we use the C language cpp with the -N flag (/opt/ctl/bin/cpp on chilkoot and yukon, /opt/PE/bin/cpp on klondike).
Another thing to watch for is the #elif construct. We have a huge chain of these #elifs in our parameters file and I have seen both the desired and the (incorrect) #else lines end up in the output. On the SGIs, for instance:
pike% more xx.F #define YY 1 #ifdef XX g = xx #elif YY g = yy #elif ZZ g = zz #else g = 9.81 #endif end pike% /lib/cpp -P xx.F > xx.f pike% more xx.f g = yy g = 9.81 end
This happens for style 1 and 2. For another example, the SX-6 frontend system at ARSC is an SGI, and when cpp is invoked via the Fortran compiler, we see the problem:
rimegate% sxf90 -EP xx.F rimegate% more i.xx.F g = yy g = 9.81 end
However, sxcpp on the frontend works correctly and when compiling native on the SX-6 itself, /lib/cpp and "f90 -EP" both work correctly.
The solution when all else fails? I have the source code for cpp -- a version that does the "right" thing. If necessary, I'll compile it and invoke it from my Makefile.
All of this hassle and delving into dark corners of C is because the Fortran standard had no conditional compilation - until now. The Fortran 200x standard has a coco feature and implementations are available now on the web. Next time I will talk about these.
Open Events at ARSC
ARSC NSF Summer Internship Program, Final Presentations by our Interns:
Thursday 14th August, 2pm Butrovich 109Faculty Camp Seminars:
With the first week of Faculty Camp 2003 completed, we are moving to more specific topics. Week 2 will focus on a series of visualization classes to be presented by Roger Edberg, and an IBM P4 programming and optimization class being given by several members of ACTC/IBM. These are outlined below. Anybody interested in attending, please contact Guy Robinson to confirm time and place.Monday 11th August
- ACTC/IBM: Present overview of Power4 systems.
- Visualization: Introduction.
- ACTC/IBM: Performance tools, compilers, debuggers.
- Visualization: Fundamentals, openGL, MAYA and Renderman.
- ACTC/IBM: Shared memory programming, openMP, and distributed memory programming, MPI.
- Visualization: AVS/Express.
- ACTC/IBM: Mixing MPI and openMP.
Week 3 will feature more presentations from ARSC Specialists and ARSC/UAF Joint Faculty members on the SX-6, the GRID, HPC in general and we'll wind up camp with final presentations by the attendees.
UAF faculty members considering Faculty Camp 2004 might make a special point to drop by for some of this.
Quick-Tip Q & A
A:[[ Any "vi" experts out there? I'm editing a text file, each line starts [[ with a version number followed by a space and then a word. E.g., [[ [[ ... [[ 33 jade [[ 33.8.2 jasper [[ 10 javelin [[ 7.1 javelina [[ 22 juniper [[ ... [[ [[ Can I move the version numbers to the ends of the lines? Like this: [[ [[ ... [[ jade 33 [[ jasper 33.8.2 [[ javelin 10 [[ javelina 7.1 [[ juniper 22 [[ ... [[ [[ Thought I was getting good at vi regexp's, but this is a stumper! If [[ it's impossible in vi, maybe there's another way. [[ # # Thanks to Ed Kornkven and Rich Griswold who gave the vi solution. # From Ed's email: # Here is the command to move the numbers to the end of the line: :%s/\([^ ]*\) \([a-z]*\)/\2 \1/ Commentary: :%s Substitute over all lines in the file \( \) Names the pattern. Names are \1, \2, etc. and are assigned left-to-right [^ ] Matches anything but a space [^ ]* Matches 0 or more non-blank characters (the version numbers) There is a blank that follows the version number here [a-z]* Matches 0 or more characters in the range a-z (the "words") So, everything between the first pair of slashes is to match the version number followed by a space followed by the "word". The version number is named \1 and the word is named \2. The remainder of the command simply reverses \1 and \2 with a blank in between. # # Rich threw in a perl and NEdit solution: # In perl, this can be done with a simple regex: s/^([^ ]+) (.+)$/\2 \1/ On the command line this becomes: perl -pi -e 's/^([^ ]+) (.+)$/\2 \1/' myfile This will group together one or more non-space characters at the beginning of the line and everything after the first space, then swap them. With NEdit, the same regex is used in the Replace dialog. Q: I changed the optimization level for one compiler optimization option in my makefile, remade everything, and now my program is getting different results. There are over 75 source files. Any suggestions how I might find where this compiler option is causing a difference?
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.