ARSC T3E Users' Newsletter 147, July 24, 1998

VAMPIR Tutorial -- Part II

This article expands last week's VAMPIR tutorial with these topics:

  • User-defined activities
  • Controlling data collection
  • Parallelism
  • Scrutinizing individual messages
  • Linked displays

To review the tutorial's first 8 steps, go to:

/arsc/support/news/t3enews/t3enews146/index.xml

In this issue, we get started again with step number 9.

  1. Adding user-defined activities. Basic VAMPIR usage correlates MPI activity with time. What we often need, however, is correlation with sections of code or subroutines. To do this, you must instrument your T3E code to define, start, and stop, application-specific activities. 9.1) Instrumenting T3E code with user-defined activities. The following VAMPIR API routines deal with user-defined activities (given in C, then Fortran): Defining activities : int VT_symdef (int code, char *symbol, char *class) SUBROUTINE VTSYMDEF (ICODE, SYMBOL, CLASS, IERR) These define a numeric code for a given symbol/class activity. In your program source, all such integer codes must be unique, and you must define them before using them. Starting an activity : int VT_begin (int code) SUBROUTINE VTBEGIN (ICODE, IERR) Start the activity specified by code or ICODE. Stopping an activity : int VT_end (int code) SUBROUTINE VTBEGIN (ICODE, IERR) Stop the activity specified by code or ICODE. For example, we want to know what MPI messages were exchanged during the "ping-pong", and other, sections of our test program from last week. #ifdef VT_setup /* my own macro */ #define PINGPONG 0 #define BROADCAST 1 #define MESSAGES 2    printf ("VT_setup commences...\n"); VT_symdef (PINGPONG, "PPONG", "MY_SUBTASK"); VT_symdef (BROADCAST, "BCAST", "MY_SUBTASK"); VT_symdef (MESSAGES, "MSGS", "MY_SUBTASK"); #endif and in each of the segments of the program, started and stopped the corresponding "activity." (The re-worked program appears at the end of this article. Note, we also rearranged the "ping-pong" and "round-robin broadcast" segments.) 9.2) Color a user-defined activity and display global-timeline. We recompiled the code on the T3E (step 2 from last week's article), ran it (step 3), rcp'ed the resulting .bpv file to the ARSC SGIs (step 5), reran vampir with the new .bpv file (step 7), and produced the global-timeline (step 8.1). The new activity class, MY_SUBTASK, appears white by default, but it automatically appears in all activity menus, and is easy to re-color. >From the main menu, select: Settings:Colors Scroll down in the "Activity" list to "MY_SUBTASK", chose a color, and click "Apply." This process is shown in the following figure: (click on figure for larger view)

    (Note: all VAMPIR pull-down menus can be "torn-off" by clicking on the dashed line. The "Settings" window was "torn-off" for the above image.) The new global-timeline is shown at:

  • (click on figure for larger view)

    We colored MY_SUBTASK activities orange, and they are labeled, as well. Thus, this graph shows us the messages exchanged in the ping-pong segment, and that PE 0 completed its ping-pong segment first.

  • Controlling data collection Given a lengthy, multiple-PE run, with every MPI message logged, the .bpv file can become huge, if not overwhelming. There are two ways to limit data collection:
    1. VAMPIR reads a user-defined configuration file which can be used to turn off the logging off particular activities. For instance, you might log no MPI routines except Send/Recv. (See the manual for more on this.)
    2. You may instrument your code to switch all tracing off for boring PEs and/or code segments. For instance, you might completely trace only one iteration or (as shown below) two PEs.

    10.1) Disabling/re-enable tracing When your T3E code runs, VAMPIR starts generating profiling data immediately after MPI_Init has been executed. This data collection can be arbitrarily disabled and re-enabled during the application runtime. 10.2) Instrumenting T3E code to disable/enable tracing Disabling data collection : void VT_traceoff (void) SUBROUTINE VTTRACEOFF() Re-enabling data collection : void VT_traceon (void) SUBROUTINE VTTRACEON() (Note: these functions affect the calling process, or local PE, only. You can't turn profiling on/off globally: each PE must call the respective API routine.) As an example, we used these calls in the test code to turn logging off for a while on all PEs except 0 and 1. #ifdef VT_partial /* my own macro */ if (my_rank > 1) VT_traceoff (); #endif #endif [ ... ping-pong and round-robin bcast code segments ... ! #ifdef VT_setup /* my own macro */ if (my_rank > 3) /* never turn 2&3 back on */ VT_traceon (); #endif The uncluttered global-timeline: (click on figure for larger view)

  • Graphing parallelism VAMPIR helps us visualize the amount of "work" getting done (as opposed to amount of waiting or message passing) with its graphs of "parallelism." You may append a parallelism graph to the bottom of the global-timeline display. Click the right mouse-button inside the global-timeline window to get the menu, and select: Components:Parallelism display Now chose the activity to graph. In our test code, everything in "MY_SUBTASK" counts as work, so we'll use that. In the main menu, chose: Settings:Properties and in the "user act." field, type "MY_SUBTASK". You may resize the parallelism portion of the window by clicking in it with the left mouse-button and dragging up- or downwards. Our global-timeline with appended parallelism graph is shown here:
  • (click on figure for larger view)

    What does it mean? In general, only 4 of the 8 PEs are doing useful work in parallel. A similar, and more powerful, parallelism display is available under the main menu by selecting: Global Displays:Show Parallelism

  • Identifying individual messages VAMPIR can provide excruciating detail in addition to broad images of parallelism and performance. For example, stats on individual MPI messages are available. Click the right mouse-button inside the global-timeline window to get the menu, and select: Identify Message Then, click with the left mouse-button on any message. A window of information will appear, and the selected message will be designated by an arrow. (The shortcut is to simply hit the keyboard letter "m", then click on the message.) Here is an example of the resulting display: (click on figure for larger view)
  • Linked displays Statistics may be computed over any time segment. In previous examples of available stats, we've used the entire run:
    • Step 8.3 displayed a pie-chart of the percentage of time that each PE spent running application code versus MPI code.
    • Step 8.5 displayed a pie-chart of the percentage of time spent in different types of MPI calls, for each PE.

    13.1) Linking the chart-view time span to the global-timeline. Display both the global-timeline and a chart-view (stats) window simultaneously (steps 8.1 and 8.3 or 8.5). In the chart-view window, click the right mouse-button to get the chart-view menu, and select: Options:Use Timeline Portion That's it! Now, when you zoom-in on the global-timeline, or use the scroll-bar to move forward/backward through it, the chart-view stats will be automatically updated according to the new time range. As an example, we zoomed in on a segment of the "trace-off" run. We configured the linked stats to show the time spent in the MY_SUBTASK activity.

  • (click on figure for larger view)
  • VAMPIRize your own code! For grins, we extracted Don Morton's Fortran code from his article (last week) on MPI intra- and inter-communicators and coupled codes. Here's the view from VAMPIR. It includes the complete global-timeline, the parallelism display, and a linked statistics histogram showing the time spent in all MPI calls: (click on figure for larger view)
  • This week's test code:

    
    /*============================================================ 
      Test code for VAMPIR demonstration. 
    ============================================================*/
    
    #include <stdio.h>
    #include "mpi.h"
    
    #define VT_setup      /* define my own macro to enable vampir tracing */
    #define VT_partial    /* define my own macro to limit extent of tracing */
    
    #define MAX_ORDER 100000
    #define NPASSES 4
    
    
    main(int argc, char* argv[]) {
        int    npes;
        int    my_rank;
        int    test;
        int    flag;
        int    min_size = 0;
        int    max_size = MAX_ORDER;
        int    incr;
        float  x[MAX_ORDER];
        int    size;
        int    pass;
        int    dest, source;
        MPI_Status  status;
        int         i;
        MPI_Comm    comm;
    
        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &npes);
        MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
        MPI_Comm_dup(MPI_COMM_WORLD, &comm);
    
    #ifdef VT_setup
    #define PINGPONG 0
    #define BROADCAST 1
    #define MESSAGES 2
    
        printf ("VT_setup commences...\n");
    
        VT_symdef (PINGPONG, "PPONG", "MY_SUBTASK");
        VT_symdef (BROADCAST, "BCAST", "MY_SUBTASK");
        VT_symdef (MESSAGES, "MSGS", "MY_SUBTASK");
    #endif
    
    #ifdef VT_setup 
    #ifdef VT_partial
        if (my_rank > 1)
          VT_traceoff ();
    #endif
    #endif
    
        if (npes % 2 != 0) {
          printf ("ERROR: Even number of PEs required\n");
          goto BAILOUT;
        } 
        
        MPI_Barrier(MPI_COMM_WORLD);
        
    
        /* Round-robin broadcasts */
    #ifdef VT_setup
        VT_begin (BROADCAST);
    #endif
        for (source = 0; source < npes; source++) {
          if (source == my_rank)
            printf ("PE %d initiates Bcast\n", my_rank);
    
          MPI_Bcast (x, max_size, MPI_FLOAT, source, comm);
        }
    #ifdef VT_setup
        VT_end (BROADCAST);
    #endif
         
    
        /* Ping-pong in pairs */
    #ifdef VT_setup
        VT_begin (PINGPONG);
    #endif
        if (my_rank % 2 == 0) {
            incr = MAX_ORDER/(my_rank+1);
            for (test = 0, size = min_size;
                    size <= max_size; size = size + incr, test++) {
      
                printf ("Even PE %d starting ping-pong test %d\n", my_rank, test);
    
                for (pass = 0; pass < NPASSES; pass++) {
                    MPI_Send(x, size, MPI_FLOAT, my_rank + 1, 0, comm);
                    MPI_Recv(x, size, MPI_FLOAT, my_rank + 1, 0, comm,
                        &status);
                }
    
            }
    
        } 
        else {
            incr = MAX_ORDER/(my_rank);
            for (test = 0, size = min_size; 
                    size <= max_size; size = size + incr, test++) {
    
                for (pass = 0; pass < NPASSES; pass++) {
                    MPI_Recv(x, size, MPI_FLOAT, my_rank - 1, 0, comm,
                       &status); 
                    MPI_Send(x, size, MPI_FLOAT, my_rank - 1, 0, comm);
               } 
            } 
    
    
        }
    
    #ifdef VT_setup
        VT_end (PINGPONG);
    #endif
    
    
    #ifdef VT_setup
        if (my_rank > 3)
          VT_traceon ();
    #endif
    
    
    
        /* Master sends individual messages to all */
    #ifdef VT_setup
        VT_begin (MESSAGES);
    #endif
        if (my_rank == 0) {
            printf("starting batch send from master \n");
    
            for (dest = 1; dest < npes; dest++) {
              MPI_Send(x, max_size, MPI_FLOAT, dest, 0, comm);
            }
        }
        else {
           MPI_Recv(x, max_size, MPI_FLOAT, 0, 0, comm, &status); 
           printf ("PE %d received from 0\n", my_rank);
        } 
    
    
        /* Master receives from all in any order */
        if (my_rank == 0) {
            printf("starting batch receive in any order \n");
    
            for (source = 1; source < npes; source++) {
              MPI_Recv(x, max_size, MPI_FLOAT, MPI_ANY_SOURCE, 0, comm, &status); 
            }
            printf ("Master received from all\n");
        }
        else {
            MPI_Send(x, max_size, MPI_FLOAT, 0, 0, comm);
        } 
    #ifdef VT_setup
        VT_end (MESSAGES);
    #endif
    
    BAILOUT:
        printf ("PE %d done\n", my_rank);
        MPI_Finalize();
    }  /* main */
    

    Published Work of T3E Newsletter Readers

    [ This is a new, occasional series, designed to increase collaboration between readers of this newsletter.

    Whenever you publish research done using the T3E or concerning the T3E, let us know. Send us the citation, relevant web sites, abstract, and your e-mail address (if you like), and we'll pass it on.

    Thanks to Igor Polyakov and Steve de Bruyn Kops for helping us get started. ]

    Igor Polyakov polyakov@yukon.arsc.edu

    COUPLED SEA ICE - OCEAN MODEL OF THE ARCTIC OCEAN

    Journal of Offshore Mechanics and Arctic Engineering, 1998, Vol. 120, pp. 77-84.

    I. V. Polyakov, I. Yu. Kulakov, S. A. Kolesov, N. Eu. Dmitriev, R. S. Pritchard, D. Driver, and A. K. Naumov

    Abstract:

    A fully prognostic coupled ice - ocean model is described. The ice model is based on the elastic-plastic constitutive law with ice mass and compactness described by distribution functions. The ice thermodynamics model is applied individually to each ice thickness category. Advection of the ice partial mass and concentrations is parameterized by a fourth order algorithm that conserves monotonicity of the solution. The ocean is described as a three-dimesional time- dependent baroclinic model with free surface. The coupled model is applied to establish the Arctic Ocean seasonal climatology using fully prognostic models for ice and ocean. Results reflect the importance of the ice melting/freezing in the formation of the thermohaline structure of the upper ocean layer.

    Zygmunt Kowalik and Igor Polyakov

    TIDES IN THE SEA OF OKHOTSK Journal of Physical Oceanography, 28, No 7, 1389-1409.

    Abstract:

    Eight major tidal constituents in the Sea of Okhotsk have been investigated using a numerical solution of tidal equations on a 5' space grid. The tides are dominated by the diurnal constituents. Diurnal tidal currents are enhanced in Shelikhov Bay and Penzhinskaya Guba, at Kashevarov Ba, in proximity to the Kuril Islands and at a few smaller locations. The major energy sink for diurnal tides (over 60\% of the total energy) is Shelikhov Bay and Penzhinskaya Guba. The major portion of semidiurnal tide energy is dissipated in the northwestern region of the Sea of Okhotsk and in Shelikhov Bay and Penzhinskaya Guba. Nonlinear interactions of diurnal currents are investigated through K1 and O1 constituent behavior over Kashevarov Bank. These interactions generate residual circulation of the order of 10 cm/s, major oscillations at semidiurnal and fortnightly periods (13.66 day) and higher harmonics of basic tidal periods. The M2 tidal current, caused by the nonlinear interaction of the diurnal constituents over Kashevarov Bank, constitutes approximately a half of the total M2 tide current there. The fortnightly current, through nonlinear interactions, also influences basic diurnal tidal currents by inducing fortnightly variations in the amplitude of these currents.

    Steve de Bruyn Kops debk@u.washington.edu

    ( Readers interested in related articles should follow the "Turbulent reacting flows" link from http://www.me.washington.edu/research )

    Direct numerical simulation of laboratory experiments in isotropic turbulence

    S. M. de Bruyn Kops and J. J. Riley. Phys. Fluids, 10(9), 1998. (to appear).

    Massively parallel computers are now large enough to support accurate direct numerical simulations (DNS) of laboratory experiments on isotropic turbulence, providing researchers with a full description of the flow field as a function of space and time. The high accuracy of the simulations is demonstrated by their agreement with the underlying laboratory experiment and on checks of numerical accuracy. In order to simulate the experiments, requirements for the largest and smallest length scales computed must be met. Furthermore, an iterative technique is developed in order to initialize the larger length scales in the flow. Using these methods, DNS is shown to accurately simulate isotropic turbulence decay experiments such as those of Comte-Bellot and Corrsin [J. Fluid Mech., 48:273-337, 1971].

    ARSC Users: "Challenges" Newsletter Wants You!

    EARN FAME, FORTUNE, RECOGNITION BY YOUR PEERS!

    Well, maybe not fame and fortune, but if an article about your research is included in the Fall edition of CHALLENGES, ARSC's "print" newsletter, information about your work will indeed be spread far and wide. The next issue of CHALLENGES is just getting underway, and if you would like to be featured in a high-quality, full color publication about research at ARSC, please contact ARSC's public information officer L.J. Evans at info@arsc.edu .

    CHALLENGES is distributed to all current ARSC users, as well as hundreds of other people in Alaska and the lower 48, in academic, DoD and other settings. It is intended for a general audience, and it showcases work underway using ARSC computing and visualization resources.

    Challenges Summer Theme: "Ocean Research"

    Everyone is invited to receive "Challenges."

    ARSC users receive it, automatically. Anyone else may contact L.J. at info@arsc.edu to get on the mailing list. Time is running out to receive the current issue!

    The summer issue features ocean research.

    It includes articles about NRL Gulf Stream simulations by Patrick Hogan, ONR work on acoustic tomography by Michael Wolfson at Penn State, NOAA PMEL work on the TMAP project to simulate El Nino by Ed Harrison, an Arctic Ocean ice/ocean model developed at UAF by Andrey Proshutinsky and Igor Polyakov. It also includes and update on "what's happenin'" at ARSC.

    A Parallel Search for Extra-Terrestrial Intelligence

    We thought this was mighty interesting.

    http://setiathome.ssl.berkeley.edu/

    Here's an excerpt:

    
    > 
    > SETI@home is a grand experiment that will harness the spare power of
    > hundreds of thousands of Internet-connected computers in the Search for
    > Extra-Terrestrial Intelligence (SETI).
    > 
    > When SETI@home launches in late 1998, computer users from around the
    > world will participate in a major scientific experiment...
    > 
    
    > ... while you are getting coffee, or having lunch or sleeping, your
    > computer will be helping the Search for Extra-Terrestrial Intelligence
    > by analyzing data specially captured by the world's largest radio
    > telescope...
    

    Quick-Tip Q & A

    
    A: {{ Is there a way to add the execute permissions to only those files
          that were originally executable?  }}
    
      # Thanks to the readers who responded.  
      #
      # Here's the easy solution (note the capital "X"; read "man chmod" for
      # details):
      
        chmod -R go+rX ./
      
      
      # The current popular solution, apparently, uses the "find" command. 
      # Here are two different responses that use "find":
      
        find . \! -type l -perm -0100 -exec chmod go+x {} \;
      
        Adds "group" and "other" execute to files (except symbolic
        links) that already have owner execute, starting in the current
        directory and working on down.
      # 
      # 
      #
        chmod -R go+r ./
        find ./ -perm -100 -exec chmod go+x {} \;
      
        The find command isn't quite the same as  chmod -R go+X ./  because it
        is looking for user execute permission only, but this is the normal
        situation (expand user execute to group and other also).
      
    Q: I am wasting too much time in re-compilation. Is parallel make
       available on the T3E?
    

    [ Answers, questions, and tips graciously accepted. ]


    Current Editors:
    Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
    Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
    Arctic Region Supercomputing Center
    University of Alaska Fairbanks
    PO Box 756020
    Fairbanks AK 99775-6020
    E-mail Subscriptions: Archives:
      Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
    Back to Top