| Newsletter Index | Quick-Tip Index | Search Newsletters |
As any motorist in central Alaska can surely attest, it has never been particularly difficult to happen upon a moose in Fairbanks. But according to University of Alaska Fairbanks police, it may have recently become much easier:
"The University Police Department has had a dramatic increase in the number of nuisance moose calls on the Fairbanks campus this year and the Alaska Department of Fish and Game is reporting an increase in the Fairbanks area moose population."
Be mindful, use common sense, and do not feed the local wildlife. If you are able to tame one of these beasts, however, hitching a moose ride to work may be the greenest of all alternative energies. [Editor's Note: Hybrids are tamed by design.]
The 2009 ARSC "Crayons" took third place out of 26 teams in the annual Biz-Bee, a spelling-bee fund raiser for the Literacy Council of Alaska. Three familiar faces composed this year's team: HPC Systems Analyst Dale Clark, Oceanographic Specialist Kate Hedstrom, and Chief Scientist Greg Newby.
Choosing to spell "steganopodous" over "breviloquence", the Crayons fell to this formidable foe. The winning word for the evening was "foggara", a French translation of the Arabic word "qanat", which is a type of water management system that can be used for irrigation in arid climates.
Other words that made appearances throughout the night were:
didactic hartebeest cuchifrito nidificate
Congratulations, Crayons, for another great performance!
[ By Craig Stephenson ]
Previously, in newsletter 398, I covered the usage of Valgrind's Memcheck tool to catch unintended memory operations and memory leaks.http://www.arsc.edu/support/news/HPCnews/HPCnews398.shtml#article3
Memcheck is only one of the tools provided by Valgrind. In this article, I will discuss the basics of Valgrind's Cachegrind profiler tool.
Simply put, Cachegrind analyzes an executable's low-level instruction and data operations. It reports the total number of assembly instructions and memory reads in addition to the number of cache misses along the way. This information is critical to the performance-minded, as reducing the number of cache misses can equate to substantial improvements in speed. Cachegrind can generate reports as a summary of the entire program, per function, or line-by-line.
Running Cachegrind with its default options will produce a summary of the entire program. For example:
> /u2/wes/PET_HOME/bin/valgrind --tool=cachegrind ./example ... ==2950== I refs: 1,999,515 ==2950== I1 misses: 1,110 ==2950== L2i misses: 1,090 ==2950== I1 miss rate: 0.05% ==2950== L2i miss rate: 0.05% ==2950== ==2950== D refs: 594,114 (462,989 rd + 131,125 wr) ==2950== D1 misses: 16,918 ( 15,298 rd + 1,620 wr) ==2950== L2d misses: 7,868 ( 6,641 rd + 1,227 wr) ==2950== D1 miss rate: 2.8% ( 3.3% + 1.2% ) ==2950== L2d miss rate: 1.3% ( 1.4% + 0.9% ) ==2950== ==2950== L2 refs: 18,028 ( 16,408 rd + 1,620 wr) ==2950== L2 misses: 8,958 ( 7,731 rd + 1,227 wr) ==2950== L2 miss rate: 0.3% ( 0.3% + 0.9% ) Profiling timer expired
(Remember from the previous newsletter article in this series, each line of Valgrind output is prefixed with the process ID.)
A quick key to interpret this output is as follows:
I/i = instructions D/d = data I1 - level 1 instruction cache D1 - level 1 data cache L2 = level 2 shared instruction/data cache rd = data read wr = data write
This example had a level 1 data cache miss rate of 2.8% and a level 2 cache data miss rate of 1.3%. Not too bad.
A textbook example of the benefits of cache optimization comes from the distinction between row-major and column-major programming languages. To traverse the memory of a multi-dimensional array sequentially in a row-major language such as C or C++, the program should access each row in order. Conversely, the memory of a multi-dimensional array in column-major languages, such as Fortran, is sequenced by columns. Let's use Cachegrind to put this to the test with the following two equivalent programs:
Row-Major / Column-Major Comparison in C
----------------------------------------
#include <stdio.h>
void rowMajor()
{
int A[1000][100][10];
int i, j, k;
for(i=0; i < 1000; i++)
{
for(j=0; j < 100; j++)
{
for(k=0; k < 10; k++)
{
A[i][j][k] = i + j + k;
}
}
}
}
void columnMajor()
{
int A[1000][100][10];
int i, j, k;
for(k=0; k < 10; k++)
{
for(j=0; j < 100; j++)
{
for(i=0; i < 1000; i++)
{
A[i][j][k] = i + j + k;
}
}
}
}
int main()
{
rowMajor();
columnMajor();
return 0;
}
----------------------------------------
Row-Major / Column-Major Comparison in Fortran 90
-------------------------------------------------
PROGRAM major
IMPLICIT NONE
CALL rowMajor()
CALL columnMajor()
END PROGRAM major
SUBROUTINE rowMajor()
IMPLICIT NONE
INTEGER, DIMENSION(1000,100,10) :: A
INTEGER :: i, j, k
DO i = 1,1000
DO j = 1,100
DO k = 1,10
A(i,j,k) = i + j + k
END DO
END DO
END DO
RETURN
END
SUBROUTINE columnMajor()
IMPLICIT NONE
INTEGER, DIMENSION(1000,100,10) :: A
INTEGER :: i, j, k
DO k = 1, 10
DO j = 1, 100
DO i = 1, 1000
A(i,j,k) = i + j + k
END DO
END DO
END DO
RETURN
END
-------------------------------------------------
Each program performs the same operation using both orders, so viewing a
summary of the entire program will not be terribly helpful. To see
which of the two functions performs faster, we would be well advised to
generate a per-function report using the cg_annotate command, via the
following series of commands:
First, compile the program with the -O0 flag to disable optimization, and the -g flag to enable debugging information:
> pgcc -O0 -g -o major major.cThen, run the program using Valgrind's Cachegrind tool:
> /u2/wes/PET_HOME/bin/valgrind --tool=cachegrind ./majorThis will display the program summary in addition to writing a file named cachegrind.out.####, where #### is the process ID. Use this file as a parameter to the cg_annotate program:
> /u2/wes/PET_HOME/bin/cg_annotate cachegrind.out.7902
...
--------------------------------------------------------------------------------
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
--------------------------------------------------------------------------------
18,807,008 2 2 7,303,003 0 0 1,101,002 62,501 62,501 major.c:rowMajor
18,008,088 2 2 7,003,033 0 0 1,001,012 840,985 623,193 major.c:columnMajor
According to this function profile, there is no doubt that C is a row- major language. Its row-ordered function produces a mere 62,501 level 1 data cache write misses compared to its column-ordered function's 840,985.
How does the Fortran 90 equivalent of this program fare?
> pgf90 -O0 -g -o major major.f90
> /u2/wes/PET_HOME/bin/valgrind --tool=cachegrind ./major
> /u2/wes/PET_HOME/bin/cg_annotate cachegrind.out.11321
...
--------------------------------------------------------------------------------
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
--------------------------------------------------------------------------------
16,706,007 2 2 8,303,002 0 0 1,202,003 92,258 62,999 major.f90:rowmajor_
15,007,067 3 3 8,003,032 0 0 1,002,023 62,500 62,500 major.f90:columnmajor_
...
Notice how comparable Fortran 90's column-major D1 write misses (62,500) are to C's row-major D1 write misses (62,501). Also worth noting is that C's cache misses appear to be significantly more expensive than Fortran 90's.
In reality, your code's functions are likely to be a tad more complex than a tight triple-nested loop. In this case, it might be worthwhile to look at an annotated line-by-line profile of one (or all) of your source files. The following is an example of how this is done.
First, I separated the Fortran 90 program used above into three separate files: one containing the main program, one containing the rowMajor() function, and one containing the columnMajor() function. As in the previous examples, the program needs to be compiled with optimization disabled and debugging enabled:
> pgf90 -O0 -g -o major rowmajor.f90 columnmajor.f90 main.f90Then, the program needs to be run through Cachegrind using the same syntax as previous examples:
> /u2/wes/PET_HOME/bin/valgrind --tool=cachegrind ./majorFinally, cg_annotate is run with the full path of the source code file(s) you would like to analyze. For example:
> /u2/wes/PET_HOME/bin/cg_annotate cachegrind.out.23196 rowmajor.f90
...
--------------------------------------------------------------------------------
-- User-annotated source: /import/home/u1/uaf/user/major/rowmajor.f90
--------------------------------------------------------------------------------
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
. . . . . . . . . MODULE rowmajor
. . . . . . . . . CONTAINS
. . . . . . . . .
. . . . . . . . . SUBROUTINE rowMajor()
. . . . . . . . . IMPLICIT NONE
. . . . . . . . . INTEGER, DIMENSION(1000,100,10) :: A
. . . . . . . . . INTEGER :: i, j, k
. . . . . . . . .
2 0 0 0 0 0 2 0 0 DO i = 1,1000
2,000 0 0 0 0 0 2,000 0 0 DO j = 1,100
300,000 0 0 0 0 0 200,000 0 0 DO k = 1,10
12,000,000 1 1 5,000,000 0 0 1,000,000 92,258 63,000 A(i,j,k) = i + j + k
4,000,000 0 0 3,000,000 0 0 0 0 0 END DO
400,000 0 0 300,000 0 0 0 0 0 END DO
4,000 0 0 3,000 0 0 0 0 0 END DO
. . . . . . . . .
1 1 1 0 0 0 0 0 0 RETURN
2 0 0 2 0 0 0 0 0 END
. . . . . . . . .
. . . . . . . . . END MODULE rowmajor
...
If you would rather see the annotated source code for all source code
files at once, use cg_annotate's --auto=yes option. E.g.,
> /u2/wes/PET_HOME/bin/cg_annotate --auto=yes cachegrind.out.23196Cachegrind can very easily reveal instruction and data bottlenecks in your code's performance, as seen in these examples. I find myself wanting to run Cachegrind on every code I have ever written to put my programming efficiency to the test.
For more information, refer to Valgrind's Cachegrind manual at the following URL:
http://valgrind.org/docs/manual/cg-manual.html
A:[[ I am writing a script, call it A, that calls another script B.
[[ I want to maintain both scripts in the same directory but I need
[[ to be able to call A from anywhere. The problem I'm having is that
[[ A needs to be able to refer to the directory it is stored at in
[[ order to find B. I tried `pwd` as the path to B but that gives me
[[ the directory I was in when I called A, not the directory that A
[[ (and B) are stored in. Is it possible to do this with a shell
[[ script? I guess I could use Perl if Perl has a way of doing it.
[[ Python? Help!
#
# Reader Ryan Czerwiec made this straightforward suggestion:
#
If script B is in a constant location, then why not just have its
location hardwired in script A? Instead of using something like:
$directory_variable/B.csh
Use:
/really/long/pathname/B.csh
This is the way I always do it, since I keep all of my scripts in a
common directory, so I always know where they are.
Alternatively, if you add the directory containing A and B to your PATH
variable in your .cshrc (or equivalent) file, A and B will run without
any pathname necessary, provided that there aren't scripts with the same
names in a higher-priority part of $PATH. This would also require that
you NOT use the -f option at the top of your scripts in the line
#!/bin/csh (or other shell equivalent). If you need the -f for other
reasons, the hardwired pathname method should work fine.
#
# Chris Petrich, Dale Clark, and Greg Newby pointed out that referencing
# a shell script's $0 variable will disclose the path used to invoke the
# script, whether the script was invoked using a full or relative path.
# Here is Chris Petrich's response:
#
The variable $0 contains the file name of the script complete with the
relative path from your current directory to the directory of that
script. For example, using bash expansion to remove the script's name
you could change to the directory of the script with
cd ${0%/*}
#
# This command will work in ksh and sh, too.
#
#
# Greg Newby and Brad Havel also suggested the following alternative, in
# Greg's words:
#
I'm infering from the question that "A" is in your $PATH.
If so, all you really need is `which A` to insert the full
location into B; use dirname to strip out the filename part.
From the command line:
# ls -l ${HOME}/.bin/A ${HOME}/.bin/B
-rwxr-xr-x 1 newby staff 0 Feb 12 16:26 /Users/newby/.bin/A
-rwxr-xr-x 1 newby staff 0 Feb 12 16:26 /Users/newby/.bin/B
# which A
/Users/newby/.bin/A
# dirname `which A`
/Users/newby/.bin/
So, within your A script, something like:
# Find A's location:
bbaseloc=`dirname \`which A\``
# Run B from that location:
${bbaseloc}/B
#
# Scott Kajihara offered a sed approach:
#
csh:
set DIRECTORY = `which A | sed 's!^\(/.*\)/[^/]*$!\1!'`
sh:
DIRECTORY=`which A | sed 's!^\(/.*\)/[^/]*$!\1!'`
Making these shell variables environment variables is left as
an exercise to the original submitter.
#
# Andrew Roberts combined $0 and dirname for this solution which doesn't
# depend on A being on the $PATH:
#
`dirname $0`/B
#
# Brad Havel discovered that this functionality is present in a Perl
# module as well:
#
If nobody has brought it up yet, Perl has a module that performs the
same functions, probably with better performance than shelling out from
whatever script is being used.
use File::Basename;
($name,$path,$suffix) = fileparse($fullname,@suffixlist);
$name = fileparse($fullname,@suffixlist);
$basename = basename($fullname,@suffixlist);
$dirname = dirname($fullname);
The module looks to be standard distribution as of Perl 5.8.7 for sure,
but mileage may vary if it is present or not...
More information:
http://search.cpan.org/~nwclark/perl-5.8.9/lib/File/Basename.pm
#
# And finally, one more Perl call from the editors:
#
use FindBin;
use lib "$FindBin::Bin/../lib";
$bindir = "$FindBin::Bin";
Q: I saved an important file in the local /scratch directory on one of
20+ Linux workstations, but I don't remember which one. The file
name is "coastline.inp", and it may or may not be in a subdirectory.
Since the /scratch directory is not shared between the workstations,
I need to find the specific machine that has this file. With so many
workstations available, what's the most efficient way to determine
which workstation has the file I need?
[[ Answers, Questions, and Tips Graciously Accepted ]]
Contact:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Craig Stephenson ARSC User Consultant ph: 907-450-8653 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Send comments and questions to the current editors using this Contact Form.E-mail Subscriptions:
| Newsletter Index | Quick-Tip Index | Search Newsletters |
Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 |
voice: 907-474-6935 |
email:
home | search | about | support | news | science | resources