| Newsletter Index | Quick-Tip Index | Search Newsletters |
Modern processors usually have hardware counters which allow an end user to get performance statistics from the processor with minimal performance overhead. The Performance Application Programming Interface library (i.e. PAPI) attempts to provide a common interface for hardware counters for different processor architectures. There are a number of low-level and high-level routines available in PAPI, including high-level routines to calculate MFlop rates.
The PAPI_flops routine queries the floating point instruction counters and outputs a MFlop/s value. It bases this value on the operations performed and the elapsed time since the previous call to PAPI_flops. Here's the C declaration for PAPI_flops (there is also a Fortran interface to this routine).
int PAPI_flops (float *rtime, float *ptime, long_long *flpops,
float *mflops);
Where the input values are:
rtime total realtime since first PAPI_flops() call
ptime total process time since the first PAPI_flops() call
flpops total floating point operations since the first call
mflops Mflop/s achieved since the previous call
Here's a simple example that uses the PAPI_flops routine to get floating point statistics for a program. This also does some basic sanity checking to ensure there isn't a PAPI version mismatch and to ensure that the PAPI library is functional.
mg56 % cat measure_flops.c
#include <papi.h>
#include <stdio.h>
#include <stdlib.h>
#define MSIZE (1024 * 100)
main()
{
int retval; /* return value for PAPI calls */
float rtime; /* total realtime since first PAPI_flops()
call */
float ptime; /* total process time since the first
PAPI_flops() call */
long long flpops; /* total floating point instructions or
operations since the first call */
float mflops; /* Mflop/s achieved since the previous call */
int ii;
int jj;
float * array1;
float * array2;
/* Initialize the PAPI library */
retval = PAPI_library_init(PAPI_VER_CURRENT);
/* Verify there isn't a version mismatch */
if (retval != PAPI_VER_CURRENT && retval > 0)
{
fprintf(stderr,"PAPI library version mismatch!\n");
exit(1);
}
/* verify that there wasn't a different PAPI error */
if (retval < 0)
{
fprintf(stderr, "PAPI Initialization error!\n");
exit(1);
}
array1=(float *) malloc( (size_t)MSIZE * sizeof(float) );
if ( array1 == NULL ) { printf("ERROR: array1==NULL!\n"); exit(1); }
array2=(float *) malloc( (size_t)MSIZE * sizeof(float) );
if ( array2 == NULL ) { printf("ERROR: array2==NULL!\n"); exit(1); }
/* Initialize the arrays */
for(ii=0;ii<MSIZE;++ii)
{
array1[ii]=.1234 * (float)ii;
array2[ii]=.5678 * (float)ii;
}
/* Initialize the counters */
retval=PAPI_flops(&rtime, &ptime, &flpops, &mflops);
if ( retval != PAPI_OK )
{
fprintf(stderr,"Error running PAPI_flops!");
}
/* Do some floating point work */
for(jj=0;jj<MSIZE/2;++jj)
{
for(ii=0;ii<MSIZE;++ii)
{
array1[ii]+=array2[ii];
}
}
printf("array1[0]=%f\n", array1[0]);
/* Reread the counters */
retval=PAPI_flops(&rtime, &ptime, &flpops, &mflops);
if ( retval != PAPI_OK )
{
fprintf(stderr,"Error running PAPI_flops!");
}
/* Print statistics */
fprintf(stdout, "real time = %f\n", rtime);
fprintf(stdout, "processing time = %f\n", ptime);
fprintf(stdout, "floating point ops = %ld\n", flpops);
fprintf(stdout, "MFlop/s = %f\n", mflops);
return 0;
}
PAPI is available in the $PET_HOME directory on iceberg and midnight. Below is an example of how the above code can be compiled on midnight.
mg56 % PAPI=${PET_HOME}/pkgs/papi-3.5.0
mg56 % pathcc -I${PAPI}/include -Ofast \
-L${PAPI}/lib64 \
-Wl,-R ${PAPI}/lib64 \
-lpapi -lperfctr \
measure_flops.c -o measure_flops
When run on a compute node, the "measure_flops" code will produce the following output:
mt257 % ./measure_flops array1[0]=0.000000 real time = 1.177192 processing time = 1.176266 floating point ops = 2705502720 MFlop/s = 2300.077393
NOTE: PAPI is available only on compute nodes on midnight.
Similarly on iceberg:
iceberg2 % PAPI=${PET_HOME}/pkgs/papi-3.5.0-64bit
iceberg2 % xlc -q64 -I${PAPI}/include -O5 \
-L${PAPI}/lib \
-lpapi64 -lpmapi \
measure_flops.c -o measure_flops
iceberg2 % ./measure_flops
array1[0]=0.000000
real time = 4.140038
processing time = 4.133397
floating point ops = 5242883584
MFlop/s = 1268.420166
For more details on PAPI check out the PAPI documentation online:
http://icl.cs.utk.edu/projects/papi/files/html_man3/papi.html
There are tools built on top of PAPI that provide access to PAPI
counters without code modification. One of these tools is TAU,
also installed in $PET_HOME, will be featured in a future issue of
this newsletter. For more information on TAU, see:
http://www.cs.uoregon.edu/research/tau
The default compiler suite on midnight is PathScale. This compiler has a few handy options which can aide in the debugging process.
When the -trapuv flag is used, uninitialized floating point variables are initialized to NaN and the CPU is set to detect floating point exceptions. When an uninitialized variable is used, a core dump will be produced. This option only applies to local scalar and array variables and memory allocated via the "alloca" call. This specifically does not apply to memory allocated via "malloc" (C), "new" (C++) or "allocate" (Fortran 90), nor will this option detect uninitialized integer data.
Works with:pathcc, pathCC, and pathf90 (as well as MPI compilers using PathScale)
Example Use:Here's a simple example using -trapuv to catch an uninitialized REAL*4 variable (b). It a good idea to include -g when using -trapuv to ensure the debugger can make sense of the core file.
mg56 % pathf90 test.f90 -trapuv -g -o test mg56 % ./test Floating point exception (core dumped) mg56 % gdb ./test core ... ... #0 0x0000000000400be4 in MAIN__ () at test.f90:7 7 b=b*a (gdb)
The -zerouv option will set uninitialized variables to zero at runtime rather than NaN. This is an easy option to try when a code misbehaves. This option works with local scalar and array variables and memory allocated via the "alloca" call. There is a slight performance overhead associated with this option as variables are set to zero at run-time.
The -C option will enable array bounds checking for Fortran90 codes. This can be a quick way to track down out-of-bounds array access.
Works With:pathf90
Example Use:
mg56 % pathf90 test.f90 -C -g -o test
mg56 % ./test
lib-4964 : WARNING
Subscript is out of range for dimension 1 for array
'C' at line 11 in file '/lustre/wrkdir/bahls/debug/test.f90',
diagnosed in routine '__f90_bounds_check'.
c(10)= 1.
You can set the environment variable F90_BOUNDS_CHECK_ABORT to yes to have the first out of bound array access cause the application to abort.
mg56 % export F90_BOUNDS_CHECK_ABORT=YES
mg56 % ./test
lib-4964 : UNRECOVERABLE library error
Subscript is out of range for dimension 1 for array
'C' at line 11 in file '/lustre/wrkdir/bahls/debug/test.f90',
diagnosed in routine '__f90_bounds_check'.
Aborted (core dumped)
As with most debugging techniques, it is a good idea to include the -g option when compiling. More importantly, it is a good idea not to use these techniques when running production work as there may be a significant performance hit.
The MVAPICH MPI stack used on midnight has a few nuances that can cause confusion.
# for bash users
mg56 % grep ulimit ~/.bashrc
ulimit -Sc unlimited
# for csh/tcsh users
mg56 % grep limit ~/.cshrc
ulimit -Sc unlimited
limit stacksize unlimited
mpirun -np 4 F90_BOUNDS_CHECK_ABORT=YES ./a.out
A:[[ I have a code written in C++ and would like to append a floating
[[ point value to the end of a string. It's pretty easy to do this
[[ in C using with a character array, e.g.:
[[
[[ #include <stdio.h>
[[ #include <stdlib.h>
[[
[[ int main()
[[ {
[[ char buf[1024];
[[ float v=1.23;
[[ sprintf(buf,"Val= %.2f\n", v);
[[ printf("%s", buf);
[[ }
[[
[[
[[ But, as I said, I need to do this in C++. Is there some slick C++
[[ way to do this? The following doesn't work!
[[
[[ #include <string>
[[ #include <iostream>
[[
[[ int main()
[[ {
[[ std::string buf;
[[ float val=1.23;
[[ buf="Val= " + val;
[[ std::cout << buf << std::endl;
[[ }
[[
#
# Thanks to Greg Newby, Lorin Hochstein, Rich Griswold and Sean
# Ziegeler for sharing solutions using the stringstream class.
#
# Here's Rich's response:
#
The stringstream class is an input/output stream with an associated
string object. You can use the standard insertion and extraction
operators, and you can get and set the associated string with the
str() method.
#include <sstream>
#include <iostream>
int main()
{
std::stringstream buf;
float val=1.23;
buf << "Val= " << val;
std::cout << buf.str() << std::endl;
}
To clear the contents of a stringstream, don't use clear(),
since it only clears the status flags. Instead, call str()
with an empty string: buf.str(""). For more information, see
http://www.cplusplus.com/reference/iostream/stringstream/.
#
# Thanks to Greg Newby for sharing this style recommendation.
#
Note that good style is to capture the output to the ostringstream in
some sort of test for success. Otherwise, you might get unexpected
results if you try to use the class with a non-intrinsic data type.
Substitute something like this, or throw an exception:
if (! (o << buf << val << endl) )
cerr << "Oops!" << endl;
#
# Last but not least thanks to Sean Ziegeler for sharing the
# following
#
C is a valid subset of C++. Just use the C approach (though you
might consider using the more secure snprintf() function). If you
really need a C++ string after that, you can create one from the
character array:
string str(buf);
C++ fanatics might not like you, but if it ain't broke...
Q: My application creates a number of postscript files that I need
to convert to the png image format so I can put them on my webpage.
Currently I open each file in an image editor and save the file
to the new name. This seems like a waste of my time! Is there
a way to automate this process?
[[ Answers, Questions, and Tips Graciously Accepted ]]
Contact:
Thomas J. Baring ARSC Web Specialist ph: 907-450-8619 Donald Bahls ARSC User Consultant ph: 907-450-8674 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Send comments and questions to the current editors using this Contact Form.Email Subscriptions:
| Newsletter Index | Quick-Tip Index | Search Newsletters |
Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 |
voice: 907-474-6935 |
email:
home | search | about | support | news | science | resources