A nice trip to Barcelona

Table of Contents

Sitemap

---> misc
| ---> 2016
| ---> 2015
| ---> 2014
| ---> 2013
| ---> 2012
`--> Agenda

This page is now outdated. All this work has been moved to http://simgrid.gforge.inria.fr/contrib/smpi-paraver.html. Please, consider using the new up-to-date version.

Achievements

Links to generated files

cp /usr/local/stow/wxparaver-4.5.4-linux-x86_64/bin/dimemas-wrapper.sh ./
for i in *.pl *.sh ; do echo "- [[file:./$i][$i]]" ; done

Presentation of current work from both sides

  • Simulation of MPI programs (Arnaud Legrand)
  • Spatial and Temporal Aggregation of Traces of Parallel Systems (Damien Dosimont)
  • Evolution of the BigDFT code (Luigi Genovese)
  • Presentation of the Paraver Format to improve interoperability (Juan Gonzalez)
  • Clustering techniques applied to BigDFT (Harald Servat)
  • Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures (Luka Stanisic)
  • Raising the Level of Abstraction: Simulation of Large Chip Multiprocessors Running Multithreaded Applications (Alejandro Rico)

TODO BigDFT simulation [2/3]

  • [X] Simulate order(n) BigDFT with SMPI with no modification.
  • [X] Obtained an unbalanced (paje) trace where we could observe the same kind of (paraver) trace as what Luigi, Brice and our BSC colleagues obtained on a real run. The timing obviously do not make any sense as the platform model was completely different from the real platform but the general unbalanced shape was the same and the same process was slowing the whole application.
  • [ ] Instrument order(n) BigDFT to speed up the simulation ?

TODO Interaction between Paraver and SMPI [5/8]

  • [-] Paraver conversion
    • [X] Wrote a paraver to csv/pjdump/smpi converter (in perl) that worked on an old small 8 node BigDFT paraver trace.
    • [ ] A few uggly things had to be done here (reduce, alltoallV, no handling of p2p operations, second/nanosecond issue, …) and need to be cleaned.
    • [ ] Maybe it would be interesting to have an option that allows extrae to trace all the parameters ?
  • [X] Wrote a simple shell script to replay this trace with SMPI and generated an SMPI paje trace.
  • [X] I still need to improve the shell script so that it takes arguments on the command line.
  • [X] Wrote a perl script that converts an SMPI paje trace to the paraver file format.
  • [-] Improve this perl script
    • [X] improve the conversion to export events so that collective operation names are the same and things are easily comparable. (Edit: this was done in Chicago with Harald)
    • [ ] Currently there are two scripts (pjdump2prv.pl and pjsmpi2prv.pl). The 1st one is for ocelotl/pjdump output while the second one is intended for the SMPI -> PRV final step. I'm currently merging them together.
    • [ ] add links (arrows) so that bandwidth can be computed in paraver
  • [X] Managed to open the resulting paraver trace in paraver.
  • [X] Have a prototype integration of SMPI within Paraver. (Edit: this was done in Chicago. If you use the dimemas-wrapper.sh) instead of the original one, it will launch smpi. Better integration to allow to specify platform and deployment would be nice.
  • [ ] Make a model of Mare Nostrum, the Mont-blanc prototype, so that BSC staff can really play with SMPI. (Edit: this was discussed in Chicago with Judit. I explained here the SimGrid XML plaform representation and she will try to play with SMPI and come back to me with questions).

TODO Trace Aggregation [4/5]

All this is better sumarized in the blog entry Damien wrote about this.

  • [X] The paraver to pjdump converter was integrated in framesoc.
  • [X] Damien managed to load several paraver traces in ocelotl and to play with aggregation.
  • [X] Managed to load a SMPI replayed trace of order(n) BigDFT and could aggregate it and easily spot the disturbing process and the application phases.
  • [X] Convert the real O(n) BigDFT paraver trace and aggregates it
  • [ ]

    Convert the 12 GB Nancy LU trace (700 process on 3 clusters) to paraver to see whether the behavior exhibited by ocelotl can be observer in Paraver. This involves slightly modifying the paje to paraver converter which was designed for SMPI paje traces.

    This trace was on flutin and I got it here: file:///exports/nancy_700_lu.C.700.pjdump.bz2

    • [ ] Fix the state name conversion and the event conversion
    • [ ] The ',9' at end of the header is the number of communicators…
    • [ ] The resulting prv starts from the pjdump and I forgot to sort it. Could we give an option to pjdump so that it sorts it according to time?
    • [ ] Do not use state 0 as it's reserved for computation
    • [ ] Create a state and event for MPI application (derived from being outside MPI calls)
    • [ ] clock resolution issue

Interaction between Paraver and SMPI

A year and a half ago, I needed to write paraver converter because in a particular setup I could not trace BigDFT neither with TAU not Scalasca. My goal was simply to compute statistics on the trace using R. Today, we're in Barcelona and we're discussing on whether SMPI could be used as an alternative to Dimemas within the paraver framework. To this end, we need to make sure that SMPI can simulate paraver traces and output paraver traces. Ideally, we would modify SMPI to that it can parse and generate such traces but it's probably more work than what we can achieve in two days so we'll go for simple trace conversions, i.e., a paraver to SMPI time-independent trace format conversion and a Paje to paraver conversion.

Let's start from the traces I used at that time.

cp -r ../../../2013/04/03/paraver_trace ./
ls paraver_trace/
EXTRAE_Paraver_trace_mpich.pcf
EXTRAE_Paraver_trace_mpich.prv
EXTRAE_Paraver_trace_mpich.row

Paraver to CSV and SMPI format Conversion

Juan Gonzalez provided us a description of the Paraver and Dimemas format. The Paraver description is available here, i.e., from the Paraver documentation. Remember the pcf file describes events, the row file defines the cpu/node/thread mapping and the prv is the trace with all events. I reworked my old script to convert from paraver to csv, pjdump and SMPI time-independant trace format during the night. Unfortunately, on the morning, Juan explained me I should not trust the state records but only the the event and communication records. Ideally, I should have worked from the dimemas trace instead of the paraver trace to obtain SMPI trace but at least, this allowed me to get a converter to csv/pjdump, which is very useful to Damien for framesoc/ocelotl.

So I really struggled to make it work and had to make several assumptions and "Uggly hacks" (indicated in the code). In particular, something that is really uggly at the moment is that the V collective operations where send and receive are process specific appear as many times as there are process and since I translate on the fly, I do not produce a correct input for SMPI. The easiest solution to handle this is probably to have two pass but nevermind for a first proof of concept.

  use strict;
  use Data::Dumper;

  my $power_reference=286.087E-3; # in flop/mus

  sub main {
      # default values for $input, $output and $format may have be
      # defined when tangling from babel but command line arguments
      # should always override them.
      my($arg);

      while(defined($arg=shift(@ARGV))) {
          for ($arg) {
              if (/^-i$/) { $input = shift(@ARGV); last; }
              if (/^-o$/) { $output = shift(@ARGV); last; }
              if (/^-f$/) { $format = shift(@ARGV); last; }
              print "unrecognized argument '$arg'";
          }
      }

      if(!defined($input) || $input eq "") { die "No valid input file provided.\n"; }
      if(!defined($output) || $output eq "") { die "No valid input file provided.\n"; }
      
      print "Input: '$input'\n";
      print "Output: '$output'\n";
      print "Format: '$format'\n";

      my($state_name,$event_name) = parse_pcf($input.".pcf");
      my($resource_name) = parse_row($input.".row");
      convert_prv($input.".prv",$state_name,$event_name,$resource_name,$output,$format);
  }

  sub parse_row {
      my($row) = shift;
      my $line;
      my(%resource_name);

      open(INPUT,$row) or die "Cannot open $row. $!";
      while(defined($line=<INPUT>)) {
          chomp $line;
          if($line =~ /^LEVEL (.*) SIZE/) {
              my $type = $1;
              $resource_name{$type}= [];
              while((defined($line=<INPUT>)) &&
                    !($line =~ /^\s*$/g)) {
                  chomp $line;
                  push @{$resource_name{$type}}, $line;
              }
          }
      }

      return (\%resource_name);
  }

  sub parse_pcf {
      my($pcf) = shift;
      my $line;
      my(%state_name, %event_name) ;
      open(INPUT,$pcf) or die "Cannot open $pcf. $!";
      while(defined($line=<INPUT>)) {
          chomp $line;
          if($line =~ /^STATES$/) {
              while((defined($line=<INPUT>)) &&
                    ($line =~ /^(\d+)\s+(.*)/g)) {
                  $state_name{$1} = $2;
              }
          }
          if($line =~ /^EVENT_TYPE$/) {
              while($line=<INPUT>) {
                  if($line =~ /VALUES/g) {last;}
                  $line =~ /[6|9]\s+(\d+)\s+(.*)/g or next; #E.g. , EVENT_TYPE\n 1    50100001    Send Size in MPI Global OP
                  my($id)=$1;
                  $event_name{$id}{type} = $2;
              }
              while((defined($line=<INPUT>)) &&
                    ($line =~ /^(\d+)\s+(.*)/g)) {
                  my($id);
                  foreach $id (keys %event_name) {
                      $event_name{$id}{value}{$1} = $2;
                  }
              }
          }
      }
      # print Dumper(\%state_name);
      # print Dumper(\%event_name);
      return (\%state_name,\%event_name);
  }

  my(%pcf_coll_arg) = (
      "send" => "50100001",
      "recv" => "50100002",
      "root" => "50100003",
      "communicator" => "50100003",
      "compute" => "my_reduce_compute_amount",
  );

  my(%tit_translate) = (
      "Running" => "compute",
      "Not created" => "", # skip me
      "I/O" => "",         # skip me
      "Synchronization" => "", # skip me
      "MPI_Comm_size" => "",   # skip me
      "MPI_Comm_rank" => "",   # skip me
      "Outside MPI" => "",     # skip me
      "End" => "",             # skip me
      "MPI_Init" => "init",
      "MPI_Bcast" => "bcast",
      "MPI_Allreduce" => "allReduce",
      "MPI_Alltoallv" => "allToAllV",
      "MPI_Alltoall" => "allToAll",
      "MPI_Reduce" => "reduce",
      "MPI_Allgatherv" => "", # allGatherV Uggly hack 
      "MPI_Gather" => "gather",
      "MPI_Gatherv" => "gatherV",
      "MPI_Reduce_scatter" => "reduceScatter",
      "MPI_Finalize" => "finalize",
      "MPI_Barrier" => "barrier",
   );

  sub convert_prv {
      my($prv,$state_name,$event_name,$resource_name,$output,$format) = @_;
      my $line;
      my (%event);
      my(@fh)=();

      open(INPUT,$prv) or die "Failed to open $prv:$!\n";


      # Start parsing the header to get the trace hierarchy. 
      # We should get something like
      # #Paraver (dd/mm/yy at hh:m):ftime:0:nAppl:applicationList[:applicationList]

      $line=<INPUT>; chomp $line;
      $line=~/^\#Paraver / or die "Invalid header '$line'\n";
      my $header=$line;
      $header =~ s/^[^:\(]*\([^\)]*\):// or die "Invalid header '$line'\n";
      $header =~ s/(\d+):(\d+)([^\(\d])/$1\_$2$3/g;
      $header =~ s/,\d+$//g;
      my ($max_duration,$resource,$nb_app,@appl) = split(/:/,$header);
      $max_duration =~ s/_.*$//g;
      $resource =~ /^(.*)\((.*)\)$/ or die "Invalid resource description '$resource'\n";
      my($nb_nodes,$cpu_list)= ($1,$2);

      $nb_app==1 or die "I can handle only one application type at the moment\n";

      my @cpu_list=split(/,/,$cpu_list);

      # print("$max_duration --> '$nb_nodes' '@cpu_list'    $nb_app  @appl \n");
      my(%Appl);
      my($nb_task);
      foreach my $app (1..$nb_app) {
          my($task_list);
          $appl[$app-1] =~ /^(.*)\((.*)\)$/ or die "Invalid resource description '$resource'\n";
          ($nb_task,$task_list) = ($1,$2);

          my(@task_list) = split(/,/,$task_list);


          my(%mapping);
          my($task);
          foreach $task (1..$nb_task) {
              my($nb_thread,$node_id) = split(/_/,$task_list[$task-1]);
              if(!defined($mapping{$node_id})) { $mapping{$node_id}=[]; }
              push @{$mapping{$node_id}},[$task,$nb_thread];
          }
          $Appl{$app}{nb_task}=$nb_task;
          $Appl{$app}{mapping}=\%mapping;
      }

      for ($format) {
          if (/^csv$/) { 
              $output .= ".csv";
              open(OUTPUT,"> $output") or die "Cannot open $output. $!"; 
              last; 
          } 
          if (/^pjdump$/) { 
              $output .= ".pjdump";
              open(OUTPUT,"> $output"); 
              my @tab = split(/:/,`tail -n 1 $prv`);
              print OUTPUT "Container, 0, 0, 0.0, $max_duration, $max_duration, 0\n";
              foreach my $node (1..$nb_nodes) {
                  print OUTPUT "Container, 0, N, 0.0, $max_duration, $max_duration, node_$node\n";
              }
              foreach my $app (values(%Appl)) {
                  foreach my $node (keys%{$$app{mapping}}) {
                      foreach my $t (@{$$app{mapping}{$node}}) {
                          print OUTPUT "Container, node_$node, P, 0.0, $max_duration, $max_duration, MPI_Rank_$$t[0]\n";
                          foreach my $thread (1..$$t[1]) {
                              print OUTPUT "Container, MPI_Rank_$$t[0], T, 0.0, $max_duration, $max_duration, Thread_$$t[0]_$thread\n";
                          }
                      }
                  }
              }
              last;
          }
          if(/^tit$/) {
              my $nb_proc = 0;
              foreach my $node (@{$$resource_name{NODE}}) { 
                  my $filename = $output."_$nb_proc.tit";
                  open($fh[$nb_proc], "> $filename") or die "Cannot open > $filename: $!";
                  $nb_proc++;
              }
              last;
          }
          die "Invalid format '$format'\n";
      }
      
      # Now, let's process the records 
      sub process_event {
          my(%event_list)=@_;
          my($sname);
          my($sname_param);
          
          if(defined($event_list{50000003})) {
              $sname = $$event_name{50000003}{value}{$event_list{50000003}};
              $sname_param = "";
          } elsif(defined($event_list{50000002})) {
              $sname = $$event_name{50000002}{value}{$event_list{50000002}};
              my $t;
              if($tit_translate{$sname} =~ /V$/) { # Really Uggly hack because of "poor" tracing of V operations
                  if($event_list{$pcf_coll_arg{"send"}}==251 ||
                     $event_list{$pcf_coll_arg{"recv"}}==251 ) {
                  }

                  $event_list{$pcf_coll_arg{"send"}} = 100000;
                  $event_list{$pcf_coll_arg{"recv"}} = 100000;
                  $sname =~ s/v$//i;
              }

              if($tit_translate{$sname} eq "reduce") { # Uggly hack because the amount of computation is not given
                  $event_list{$pcf_coll_arg{"compute"}} = 1;
              }
              if($tit_translate{$sname} eq "gather") { # Uggly hack because the amount of receive does not make sense here
                  $event_list{$pcf_coll_arg{"recv"}} = $event_list{$pcf_coll_arg{"send"}};
                  $event_list{$pcf_coll_arg{"root"}} = 1; # Uggly hack. AAAAARGH
              }
              if($tit_translate{$sname} eq "reduceScatter") { # Uggly hack because of "poor" tracing
                  $event_list{$pcf_coll_arg{"recv"}} = $event_list{$pcf_coll_arg{"send"}}; 
                  my $foo=$event_list{$pcf_coll_arg{"recv"}};
                  $event_list{$pcf_coll_arg{"recv"}}="";
                  for (1..$nb_task) { $event_list{$pcf_coll_arg{"recv"}} .= $foo." "; }
                  $event_list{$pcf_coll_arg{"compute"}} = 1;
              }

              foreach $t ("send","recv", "compute", "root") {
                  if(defined($event_list{$pcf_coll_arg{$t}}) &&
                     $event_list{$pcf_coll_arg{$t}} ne "0") {
                      if($t eq "root") { $event_list{$pcf_coll_arg{$t}}--; }
                      $sname_param.= "$event_list{$pcf_coll_arg{$t}} ";
                  }
              }
          } else { # This may be application of trace flushing event
                   # and hardware counter, user function, ...
              my($warn)=1;
              for (40000018,40000003,40000001,
                   42009999,42001003,42001010,42001015,300,
                   70000001,70000002,70000003,80000001,80000002,80000003, 
                   45000000) {
                  if(defined($event_list{$_})) {$warn=0; last;}
              }
              if($warn) { print "Skipping event:\n"; 
                          print Dumper(%event_list);}
              next;
          }
          return($sname,$sname_param);
      }

      while(defined($line=<INPUT>)) {
          chomp($line);
          # State records 1:cpu:appl:task:thread : begin_time:end_time : state
          if($line =~ /^1/) {
              my($sname);
              my($sname_param);
              my($record,$cpu,$appli,$task,$thread,$begin_time,$end_time,$state) =
                  split(/:/,$line);
              if($$state_name{$state} =~ /Group/ || $$state_name{$state} =~ /Others/ ) {
                  $line=<INPUT>;
                  chomp $line;
                  my($event,$ecpu,$eappli,$etask,$ethread,$etime,%event_list) =
                      split(/:/,$line);
                  (($event==2) && ($ecpu eq $cpu) && ($eappli eq $appli) && 
                   ($etask eq $task) && ($ethread eq $thread) &&
                   ($etime >= $begin_time) && ($etime <= $end_time)) or
                   die "Invalid event!";

                  ($sname,$sname_param)=process_event(%event_list);
              } else {
                  $sname = $$state_name{$state};
              }

              if($sname eq "Running") { $sname_param.= (($end_time-$begin_time)*$power_reference); }

              if($format eq "csv") {
                  print OUTPUT "State, $task, MPI_STATE, $begin_time, $end_time, ".
                      ($end_time-$begin_time).", 0, ".
                      $sname."\n";
              } 
              if($format eq "pjdump") {
                  print OUTPUT "State, Thread_${task}_$thread, STATE, $begin_time, $end_time, ".
                      ($end_time-$begin_time).", 0, ".
                      $sname."\n";
              }
              if($format eq "tit") {
                  $task=$task-1;                  
                  defined($tit_translate{$sname}) or die "Unknown state '$sname' for tit\n";
                  if($tit_translate{$sname} ne "") {
                      print { $fh[$task] } "$task $tit_translate{$sname} $sname_param\n",
                  }
              }
          } elsif ($line =~ /^2/) {
            # Event records 2:cpu:appl:task:thread : time : event_type:event_value
            my($event,$cpu,$appli,$task,$thread,$time,%event_list) =
                    split(/:/,$line);
            my($sname,$sname_param)=process_event(%event_list);

            if($format eq "tit") {
                $task=$task-1;                  
                defined($tit_translate{$sname}) or die "Unknown state '$sname' for tit:\n\t$line\n";
                if($tit_translate{$sname} ne "") {
                    print { $fh[$task] } "$task $tit_translate{$sname} $sname_param\n",
                }
            }
          } elsif($line =~ /^3/) { 
              # Communication records 3: cpu_send:ptask_send:task_send:thread_send : logical_time_send: actual_time_send: cpu_recv:ptask_recv:task_recv:thread_recv : logical_time_recv: actual_time_recv: size: tag
              print STDERR "Skipping this communication event\n";
          }
          if($line =~ /^c/) {
              # Communicator record c: app_id: communicator_id: number_of_process : thread_list (e.g., 1:2:3:4:5:6:7:8)
              print STDERR "Skipping communicator definition\n";
          }
      }

      for ($format) {
          if (/^csv$/) { 
              close(OUTPUT); print "Generated [[file:$output]]\n";
              last; 
          }
          if (/^pjdump$/) { 
              close(OUTPUT); print "Generated [[file:$output]]\n";
              last; 
          }
          if(/^tit$/) {
              foreach my $f (@fh) {
                  close($f) or die "Failed closing file descriptor. $!\n";
              }
              print "Generated [[file:${output}_0.tit]] among other ones\n";
              last;
          }
          die "Invalid format '$format'\n";
      }
  }

  main();
Input: './paraver_trace/EXTRAE_Paraver_trace_mpich'
Output: './paraver_trace/bigdft_8_rl'
Format: 'tit'
Generated [[file:./paraver_trace/bigdft_8_rl_0.tit]] among other ones
head paraver_trace/bigdft_8_rl.csv
State, 1, MPI_STATE, 0, 10668, 10668, 0, Not created
State, 2, MPI_STATE, 0, 5118733, 5118733, 0, Not created
State, 3, MPI_STATE, 0, 9374527, 9374527, 0, Not created
State, 4, MPI_STATE, 0, 17510142, 17510142, 0, Not created
State, 5, MPI_STATE, 0, 5989994, 5989994, 0, Not created
State, 6, MPI_STATE, 0, 5737601, 5737601, 0, Not created
State, 7, MPI_STATE, 0, 5866978, 5866978, 0, Not created
State, 8, MPI_STATE, 0, 5891099, 5891099, 0, Not created
State, 1, MPI_STATE, 10668, 25576057, 25565389, 0, Running
State, 2, MPI_STATE, 5118733, 18655258, 13536525, 0, Running

Let's try to replay on SMPI

cp /home/alegrand/Work/SimGrid/infra-songs/WP4/SC13/graphene.xml ./graphene.xml
  print_usage()
  {
      echo "Usage: $0 [OPTIONS]"
  cat <<'End-of-message'
    -i|--input Paraver input file
    -o|--output output file (in the paje format)
    -p|--platform XML platform file
    -m|--machine_file 
    -h|help print help information
  End-of-message
   exit 1
  }

  TEMP=`getopt -o i:o:p:m:h --long input:,output:,platform:,machine_file:,help -n 'smpi2pj.sh' -- "$@"`
  eval set -- "$TEMP"
  while true;do 
   case "$1" in 
      -i|--input)
          case "$2" in 
            "") shift 2;;
             *) INPUT=$2;shift 2;;
          esac;;
      -o|--output)
          case "$2" in 
            "") shift 2;;
             *) OUTPUT=$2;shift 2;;
          esac;;
      -p|--platform)
          case "$2" in 
            "") shift 2;;
             *) PLATFORM=$2;shift 2;;
          esac;;
      -m|--machine)
          case "$2" in 
            "") shift 2;;
             *) MACHINE_FILE=$2;shift 2;;
          esac;;
      -h|--help)
          print_usage;shift;;
       --) shift; break;;
       *) echo "Unknown option '$1'"; print_usage;;
   esac
  done


  TMP_WORKING_PATH=`mktemp -d`

  # Creating input for smpi_replay
  REPLAY_INPUT=$TMP_WORKING_PATH/smpi_replay.txt
  ls $INPUT*.tit > $REPLAY_INPUT

  # Get the number of MPI ranks
  export NP=`cat $REPLAY_INPUT | wc -l`

  # Generating a dumb deployment (machine_file) if needed
  if [ -z "$MACHINE_FILE" ]; then
      MACHINE_FILE=$TMP_WORKING_PATH/machine_file.txt;
      if [ -e "$MACHINE_FILE" ]; then
          echo "Ooups $MACHINE_FILE already exists. Do not want to overwrite" ;
          exit 1 ;
      fi;
      rm -f $MACHINE_FILE;
      touch $MACHINE_FILE;
      for i in `seq 1 144`; do
          echo graphene-${i}.nancy.grid5000.fr >> $MACHINE_FILE ;
      done
      cp $MACHINE_FILE $MACHINE_FILE.sav
      cat $MACHINE_FILE.sav $MACHINE_FILE.sav $MACHINE_FILE.sav $MACHINE_FILE.sav > $MACHINE_FILE
  fi

  ## To debug
  # $SMPIRUN -ext smpi_replay --log=replay.thresh:critical --log=smpi_replay.thresh:verbose \
  #          --cfg=smpi/cpu_threshold:-1  -hostfile machine_file -platform $PLATFORM \
  #          -np $NP gdb\ --args\ $REPLAY /tmp/smpi_replay.txt  --log=smpi_kernel.thres:warning \
  #          --cfg=contexts/factory:thread

  $SMPIRUN -ext smpi_replay \
           --cfg=smpi/cpu_threshold:-1 -trace --cfg=tracing/filename:$OUTPUT \
           -hostfile $MACHINE_FILE -platform $PLATFORM -np $NP \
           $REPLAY $REPLAY_INPUT --log=smpi_kernel.thres:warning  \
           --cfg=contexts/factory:thread 2>&1 
  # --log=replay.thresh:critical  --log=smpi_replay.thresh:verbose

SMPI Paje to Paraver Conversion

This was quick and dirty and reused the original pcf file but in the end it kinda works… Yeepee! :)

  use strict;
  use Env;

  my($arg);
  my($strict_option) = "";

  while(defined($arg=shift(@ARGV))) {
      for ($arg) {
          print "$arg \n";
          if (/^-i$/) { $input = shift(@ARGV); last; }
          if (/^-o$/) { $output = shift(@ARGV); last; }
          if (/^-ns$/){ $strict_option = "-n -z"; last; }
          print "unrecognized argument '$arg'";
      }
  }

  my $pjfile=$input;
  $pjfile=~ s/\.trace$/.pjdump/;
  $pjfile ne $input or die;

  $ENV{LANG}="C";

  system("pj_dump $strict_option $input | grep State | sed 's/ //g' | sort -n -t ',' -k 4n > $pjfile");
  my $duration = `tail -n 1 $pjfile`;
  my @duration = split(/,/,$duration);
  $duration = $duration[4];
  $duration *= 1E9;
  my $nb_nodes = `sed -e 's/.*rank-//' -e 's/,.*//' $pjfile | sort | uniq | wc -l`;
  chomp($nb_nodes);
  my(%smpi_to_pcf) = (
      "action_allReduce" => "10",
      "action_allToAll"  => "11",
      "action_barrier"   => "8",
      "action_bcast"     => "7",
      "action_gather"    => "13",
      "action_reduce"    => "9",
      "action_reducescatter" => "80",
  #        "smpi_replay_finalize" => "32",
  #        "smpi_replay_init" => "31"
      );

  my($pcf_file_content)="DEFAULT_OPTIONS

  LEVEL               THREAD
  UNITS               NANOSEC
  LOOK_BACK           100
  SPEED               1
  FLAG_ICONS          ENABLED
  NUM_OF_STATE_COLORS 1000
  YMAX_SCALE          37


  DEFAULT_SEMANTIC

  THREAD_FUNC          State As Is


  STATES
  0    Idle
  1    Running
  2    Not created
  3    Waiting a message
  4    Blocking Send
  5    Synchronization
  6    Test/Probe
  7    Scheduling and Fork/Join
  8    Wait/WaitAll
  9    Blocked
  10    Immediate Send
  11    Immediate Receive
  12    I/O
  13    Group Communication
  14    Tracing Disabled
  15    Others
  16    Send Receive
  17    Memory transfer


  STATES_COLOR
  0    {117,195,255}
  1    {0,0,255}
  2    {255,255,255}
  3    {255,0,0}
  4    {255,0,174}
  5    {179,0,0}
  6    {0,255,0}
  7    {255,255,0}
  8    {235,0,0}
  9    {0,162,0}
  10    {255,0,255}
  11    {100,100,177}
  12    {172,174,41}
  13    {255,144,26}
  14    {2,255,177}
  15    {192,224,0}
  16    {66,66,66}
  17    {255,0,96}

  EVENT_TYPE
  9   50000001    MPI Point-to-point
  VALUES
  2   MPI_Recv
  1   MPI_Send
  0   Outside MPI

  EVENT_TYPE
  9   50000002    MPI Collective Comm
  VALUES
  18   MPI_Allgatherv
  10   MPI_Allreduce
  11   MPI_Alltoall
  12   MPI_Alltoallv
  8   MPI_Barrier
  7   MPI_Bcast
  13   MPI_Gather
  14   MPI_Gatherv
  80   MPI_Reduce_scatter
  9   MPI_Reduce
  0   Outside MPI


  EVENT_TYPE
  9   50000003    MPI Other
  VALUES
  21   MPI_Comm_create
  19   MPI_Comm_rank
  20   MPI_Comm_size
  32   MPI_Finalize
  31   MPI_Init
  0   Outside MPI


  EVENT_TYPE
  1    50100001    Send Size in MPI Global OP
  1    50100002    Recv Size in MPI Global OP
  1    50100003    Root in MPI Global OP
  1    50100004    Communicator in MPI Global OP


  EVENT_TYPE
  6    40000001    Application
  VALUES
  0      End
  1      Begin


  EVENT_TYPE
  6    40000003    Flushing Traces
  VALUES
  0      End
  1      Begin


  GRADIENT_COLOR
  0    {0,255,2}
  1    {0,244,13}
  2    {0,232,25}
  3    {0,220,37}
  4    {0,209,48}
  5    {0,197,60}
  6    {0,185,72}
  7    {0,173,84}
  8    {0,162,95}
  9    {0,150,107}
  10    {0,138,119}
  11    {0,127,130}
  12    {0,115,142}
  13    {0,103,154}
  14    {0,91,166}


  GRADIENT_NAMES
  0    Gradient 0
  1    Grad. 1/MPI Events
  2    Grad. 2/OMP Events
  3    Grad. 3/OMP locks
  4    Grad. 4/User func
  5    Grad. 5/User Events
  6    Grad. 6/General Events
  7    Grad. 7/Hardware Counters
  8    Gradient 8
  9    Gradient 9
  10    Gradient 10
  11    Gradient 11
  12    Gradient 12
  13    Gradient 13
  14    Gradient 14


  EVENT_TYPE
  9    40000018    Tracing mode:
  VALUES
  1      Detailed
  2      CPU Bursts
  ";

  my($pcf_output)=$output;
  $pcf_output =~ s/\.prv$/.pcf/;
  open OUTPUT, "> $pcf_output";
  print OUTPUT $pcf_file_content;
  close OUTPUT;

  my($line);
  open(INPUT,$pjfile) or die;
  open(OUTPUT,"> $output") or die;
  my(@tab);

  @tab=();
  foreach (1..$nb_nodes) {
      push @tab,1;
  }
  my $node_list = join(',',@tab);
  @tab=();
  foreach (1..$nb_nodes) {
      push @tab,"1:$_";
  }
  my $thread_list = join(',',@tab);

  print OUTPUT "#Paraver (generated with perl from SMPI):${duration}_ns:$nb_nodes($node_list):1:$nb_nodes($thread_list),9\n";

  my $comm_list = join(':',(1..$nb_nodes));
  my $comm=1;
  print OUTPUT "c:1:$comm:$nb_nodes:$comm_list\n";  $comm++;
  foreach (1..$nb_nodes) {
      print OUTPUT "c:1:$comm:1:$_\n";  
  }

  while(defined($line=<INPUT>)) {
      chomp($line);
      my($Foo1,$rank,$Foo2,$start,$end,$duration,$Foo3,$type) = split(/,/,$line);
      $rank=~ s/\D*//g;
      $rank++;
      $start *= 1E9;
      $end *= 1E9;
      if(defined($smpi_to_pcf{$type})) {
          print OUTPUT "1:$rank:1:$rank:1:$start:$end:13\n"; # group communication
          print OUTPUT "2:$rank:1:$rank:1:$start:50000002:$smpi_to_pcf{$type}\n";
          print OUTPUT "2:$rank:1:$rank:1:$end:50000002:0\n"; # Output MPI
  #          print OUTPUT "1:$rank:1:$rank:1:$start:$end:$smpi_to_pcf{$type}\n";
      } else {
          warn("Unknown type $type: Skipping $line\n");
      }
  }

Gluing everything together to allow calling SMPI

The Dimemas wrapper called by paraver is file:///usr/local/stow/wxparaver-4.5.4-linux-x86_64/bin/dimemas-wrapper.sh

Let's make a copy of it.

  mv /usr/local/stow/wxparaver-4.5.4-linux-x86_64/bin/dimemas-wrapper.sh /usr/local/stow/wxparaver-4.5.4-linux-x86_64/bin/dimemas-wrapper.sh.backup

Basically, what I want to do is something like

perl prv2pj.pl
sh smpi2pj.sh >/dev/null
perl pjsmpi2prv.pl

Here is an equivalent version inspired from the dimemas wrapper.

  #
  # Simple wrapper for SMPI based on the Dimemas one
  #

  set -e

  function usage
  {
    echo "Usage: $0  source_trace  dimemas_cfg  output_trace  reuse_dimemas_trace [extra_parameters] [-n]"
    echo "  source_trace:        Paraver trace"
    echo "  dimemas_cfg:         Simulation parameters"
    echo "  output_trace:        Output trace of Dimemas; must end with '.prv'"
    echo "  reuse_dimemas_trace: 0 -> don't reuse, rerun prv2dim"
    echo "                       1 -> reuse, don't rerun prv2dim"
    echo "  extra_parameters:    See complete list of Dimemas help with 'Dimemas -h'"
    echo "  -n:                  prv2dim -n parameter => no generate initial idle states"
  }


  # Read and check parameters
  if [ $# -lt 4 ]; then
    usage
    exit 1
  fi

  #PARAVER_TRACE=${1}
  PARAVER_TRACE=`readlink -eqs "${1}"`
  DIMEMAS_CFG=${2}
  OUTPUT_PARAVER_TRACE=${3}
  DIMEMAS_REUSE_TRACE=${4}


  if [[ ${DIMEMAS_REUSE_TRACE} != "0"  && ${DIMEMAS_REUSE_TRACE} != "1" ]]; then
    usage
    exit 1
  fi
  echo "Go to hell!"

  exit 12;

  echo "===============================================================================" 

  # Check SMPI availability
  ### Oh right, we should do that...

  # Get tracename, without extensions
  TRACENAME=$(echo "$PARAVER_TRACE" | sed "s/\.[^\.]*$//")
  EXTENSION=$(echo "$PARAVER_TRACE" | sed "s/^.*\.//")

  #Is gzipped?
  if [[ ${EXTENSION} = "gz" ]]; then
    echo
    echo -n "[MSG] Decompressing $PARAVER_TRACE trace..."
    gunzip ${PARAVER_TRACE}
    TRACENAME=$(echo "${TRACENAME}" | sed "s/\.[^\.]*$//")
    PARAVER_TRACE=${TRACENAME}.prv
    echo "...Done!"
  fi

  DIMEMAS_TRACE=${TRACENAME}.dim

  # Adapt Dimemas CFG with new trace name
  DIMEMAS_CFG_NAME=$(echo "$DIMEMAS_CFG" | sed "s/\.[^\.]*$//")

  DIMEMAS_COPY_CFG_NAME=`basename ${DIMEMAS_CFG_NAME}`
  OLD_DIMEMAS_TRACENAME=`grep "mapping information" ${DIMEMAS_CFG} | grep ".dim" | awk -F'"' {'print $4'}`
  NEW_DIMEMAS_TRACENAME=`basename ${DIMEMAS_TRACE}`
  DIMEMAS_CFG_PATH=`dirname ${DIMEMAS_TRACE}`

  # Append extra parameters if they exist
  shift
  shift
  shift
  shift
  EXTRA_PARAMETERS=""
  PRV2DIM_N=""
  while [ -n "$1" ]; do
    if [[ ${1} == "-n" ]]; then # caution! this works because no -n parameters exists in Dimemas
      PRV2DIM_N="-n"
    else
      EXTRA_PARAMETERS="$EXTRA_PARAMETERS $1"
    fi
    shift
  done

  # Change directory to see .dim
  DIMEMAS_TRACE_DIR=`dirname ${DIMEMAS_TRACE}`/
  pushd . > /dev/null
  cd ${DIMEMAS_TRACE_DIR}


  # Translate from .prv to SMPI time independant trace

  if [[ ${DIMEMAS_REUSE_TRACE} = "0" || \
        ${DIMEMAS_REUSE_TRACE} = "1" && ! -f ${DIMEMAS_TRACE} ]]; then

    if [[ ${DIMEMAS_REUSE_TRACE} = "1" ]]; then
      echo
      echo "[WARN] Unable to find ${DIMEMAS_TRACE}"
      echo "[WARN] Generating it."
    fi

    PARAVER_TRACE_TRIMED=`echo ${PARAVER_TRACE} | sed 's/.prv$//'`
    echo
    echo "[COM] prv2pj.pl -i ${PARAVER_TRACE_TRIMED} -o ${DIMEMAS_TRACE} -f tit"
    echo
    prv2pj.pl -i ${PARAVER_TRACE_TRIMED} -o ${DIMEMAS_TRACE} -f tit
    echo
  fi


  # Simulate
    echo 
    echo "*** Running SMPI :) ***"
    echo 
    echo "[COM]   smpi2pj.sh -i ${DIMEMAS_TRACE} -o ${OUTPUT_PAJE_TRACE}"
    echo
  OUTPUT_PAJE_TRACE=`echo ${OUTPUT_PARAVER_TRACE} | sed 's/.prv$/.trace/'`
  smpi2pj.sh -i ${DIMEMAS_TRACE} -o ${OUTPUT_PAJE_TRACE}

  # Convert back to paraver
    echo
    echo "[COM]   pjsmpi2prv.pl -i ${OUTPUT_PAJE_TRACE} -o ${OUTPUT_PARAVER_TRACE}"
    echo
  pjsmpi2prv.pl -i ${OUTPUT_PAJE_TRACE} -o ${OUTPUT_PARAVER_TRACE}
  echo "===============================================================================" 

  popd > /dev/null

For this to work "system wide", I need to put the previous perl and sh scripts in the PATH. Eventually, they will be shipped with SMPI.

  TMP_FILENAME=`mktemp`

  for i in *.pl ; do
      mv $i $TMP_FILENAME;
      echo "#!/usr/bin/perl" > $i;
      cat $TMP_FILENAME >> $i;
      rm $TMP_FILENAME;
      chmod +x $i;
      cp $i ~/bin/
  done
  for i in  smpi2pj.sh ; do
      mv $i $TMP_FILENAME;
      echo "#!/bin/sh" > $i;
      cat $TMP_FILENAME >> $i;
      rm $TMP_FILENAME;
      chmod +x $i;
      cp $i ~/bin/
  done
  for i in /usr/local/stow/wxparaver-4.5.4-linux-x86_64/bin/dimemas-wrapper.sh ; do
      mv $i $TMP_FILENAME;
      echo "#!/bin/bash" > $i;
      cat $TMP_FILENAME >> $i;
      rm $TMP_FILENAME;
      chmod +x $i;
      cp $i ~/bin/
  done

Specific Pjdump to Paraver Conversion for Damien

  #!/usr/bin/perl

  my $output=q(/exports/nancy_700_lu.C.700.prv);

  my $input=q(/exports/nancy_700_lu.C.700.pjdump.bz2);

  use strict;
  use Env;

  my($duration,$nb_nodes);
  my($strict_option) = "";

  my($arg);
  while(defined($arg=shift(@ARGV))) {
      for ($arg) {
          if (/^-i$/) { $input = shift(@ARGV); last; }
          if (/^-o$/) { $output = shift(@ARGV); last; }
          if (/^-d$/) { $duration = shift(@ARGV); last; }
          if (/^-n$/) { $nb_nodes = shift(@ARGV); last; }
          if (/^-ns$/){ $strict_option = "-n -z"; last; }
        print "unrecognized argument '$arg'";
      }
  }

  print " ---> $input \n";

  my($pjfile);
  if($input =~/\.trace$/) {
      $ENV{LANG}="C";
      $pjfile = $input;
      $pjfile =~ s/\.trace$/.pjdump/;
      my $command = "pj_dump $strict_option $input | grep State | sed 's/ //g' | sort -n -t ',' -k 4n > $pjfile";
      print "---> $command\n";
      system($command);
  } elsif($input =~/\.pjdump/) {
      $pjfile = $input;
  } else {
      die "Unknown input format '$input'\n";
  }

  print " ---> $pjfile \n";

  if(!defined($duration)) {
      $duration = `tail -n 1 $pjfile`;
      my @duration = split(/,/,$duration);
      $duration = $duration[4];
      $duration *= 1E9;
  }

  if(!defined($nb_nodes)) {
      $nb_nodes = `sed -e 's/.*rank-//' -e 's/,.*//' $pjfile | sort | uniq | wc -l`;
      chomp($nb_nodes);
  }

  my($pcf_file_content)="DEFAULT_OPTIONS

  LEVEL               THREAD
  UNITS               NANOSEC
  LOOK_BACK           100
  SPEED               1
  FLAG_ICONS          ENABLED
  NUM_OF_STATE_COLORS 1000
  YMAX_SCALE          37


  DEFAULT_SEMANTIC

  THREAD_FUNC          State As Is


  STATES
  0    Idle
  1    Running
  2    Not created
  3    Waiting a message
  4    Blocking Send
  5    Synchronization
  6    Test/Probe
  7    Scheduling and Fork/Join
  8    Wait/WaitAll
  9    Blocked
  10    Immediate Send
  11    Immediate Receive
  12    I/O
  13    Group Communication
  14    Tracing Disabled
  15    Others
  16    Send Receive
  17    Memory transfer


  STATES_COLOR
  0    {117,195,255}
  1    {0,0,255}
  2    {255,255,255}
  3    {255,0,0}
  4    {255,0,174}
  5    {179,0,0}
  6    {0,255,0}
  7    {255,255,0}
  8    {235,0,0}
  9    {0,162,0}
  10    {255,0,255}
  11    {100,100,177}
  12    {172,174,41}
  13    {255,144,26}
  14    {2,255,177}
  15    {192,224,0}
  16    {66,66,66}
  17    {255,0,96}

  EVENT_TYPE
  9   50000001    MPI Point-to-point
  VALUES
  2   MPI_Recv
  1   MPI_Send
  0   Outside MPI

  EVENT_TYPE
  9   50000002    MPI Collective Comm
  VALUES
  18   MPI_Allgatherv
  10   MPI_Allreduce
  11   MPI_Alltoall
  12   MPI_Alltoallv
  8   MPI_Barrier
  7   MPI_Bcast
  13   MPI_Gather
  14   MPI_Gatherv
  80   MPI_Reduce_scatter
  9   MPI_Reduce
  0   Outside MPI


  EVENT_TYPE
  9   50000003    MPI Other
  VALUES
  21   MPI_Comm_create
  19   MPI_Comm_rank
  20   MPI_Comm_size
  32   MPI_Finalize
  31   MPI_Init
  0   Outside MPI


  EVENT_TYPE
  1    50100001    Send Size in MPI Global OP
  1    50100002    Recv Size in MPI Global OP
  1    50100003    Root in MPI Global OP
  1    50100004    Communicator in MPI Global OP


  EVENT_TYPE
  6    40000001    Application
  VALUES
  0      End
  1      Begin


  EVENT_TYPE
  6    40000003    Flushing Traces
  VALUES
  0      End
  1      Begin


  GRADIENT_COLOR
  0    {0,255,2}
  1    {0,244,13}
  2    {0,232,25}
  3    {0,220,37}
  4    {0,209,48}
  5    {0,197,60}
  6    {0,185,72}
  7    {0,173,84}
  8    {0,162,95}
  9    {0,150,107}
  10    {0,138,119}
  11    {0,127,130}
  12    {0,115,142}
  13    {0,103,154}
  14    {0,91,166}


  GRADIENT_NAMES
  0    Gradient 0
  1    Grad. 1/MPI Events
  2    Grad. 2/OMP Events
  3    Grad. 3/OMP locks
  4    Grad. 4/User func
  5    Grad. 5/User Events
  6    Grad. 6/General Events
  7    Grad. 7/Hardware Counters
  8    Gradient 8
  9    Gradient 9
  10    Gradient 10
  11    Gradient 11
  12    Gradient 12
  13    Gradient 13
  14    Gradient 14


  EVENT_TYPE
  9    40000018    Tracing mode:
  VALUES
  1      Detailed
  2      CPU Bursts
  ";

  my($pcf_output)=$output;
  $pcf_output =~ s/\.prv$/.pcf/;
  open OUTPUT, "> $pcf_output";
  print OUTPUT $pcf_file_content;
  close OUTPUT;

  my(%mpi_to_pcf) = (
      "MPI_Running" => "1",
      "MPI_Send"   => "10",
      "MPI_Recv"     => "11",
      "Collective"  => "13",
      "Others"    => "15",
      );

  my(%mpi_coll_to_pcf) = (
      "MPI_Bcast" => "18",
      "MPI_Allreduce"  => "10",
      "MPI_Alltoall"   => "11",
      "MPI_Alltoallv"     => "12",
      "MPI_Bcast"    => "7",
      "MPI_Gather"    => "13",
      "MPI_Gatherv"    => "14",
      "MPI_Reduce_Scatter" => "80",
      "MPI_Reduce" => "9",
      );

  my(%mpi_others_to_pcf) = (
      "MPI_Comm_create" => "21",
      "MPI_Comm_rank" =>   "19",
      "MPI_Comm_size" =>   "20",  
      "MPI_Finalize"  =>   "32",
      "MPI_Init"      =>   "31",
      );

  my(%smpi_to_mpi) = (
      "action_allReduce" => "MPI_Allreduce",
      "action_allToAll"  => "MPI_Alltoall",
      "action_barrier"   => "MPI_Barrier",
      "action_bcast"     => "MPI_Bcast",
      "action_gather"    => "MPI_Gather",
      "action_reduce"    => "MPI_reduce",
      "action_reducescatter" => "MPI_Reduce_Scatter",
      "smpi_replay_finalize" => "MPI_Finalize",
      "smpi_replay_init" => "MPI_Init",
      "PMPI_Init"        => "MPI_Init",
      "PMPI_Send"        => "MPI_Send",
      "PMPI_Recv"        => "MPI_Recv",
      "PMPI_Finalize"    => "MPI_Finalize"
      );


  my($line);
  open(INPUT,$pjfile) or die;
  open(OUTPUT,"> $output") or die;
  my(@tab);

  @tab=();
  foreach (1..$nb_nodes) {
      push @tab,1;
  }
  my $node_list = join(',',@tab);
  @tab=();
  foreach (1..$nb_nodes) {
      push @tab,"1:$_";
  }
  my $thread_list = join(',',@tab);

  print OUTPUT "#Paraver (generated with perl from SMPI):${duration}_ns:$nb_nodes($node_list):1:$nb_nodes($thread_list),3\n";

  my $comm_list = join(':',(1..$nb_nodes));
  my $comm=1;
  print OUTPUT "c:1:$comm:$nb_nodes:$comm_list\n";  $comm++;
  foreach (1..$nb_nodes) {
      print OUTPUT "c:1:$comm:1:$_\n";  
  }

  while(defined($line=<INPUT>)) {
      chomp($line);
      my($Foo1,$rank,$Foo2,$start,$end,$duration,$Foo3,$type) = split(/,/,$line);
      $rank=~ s/\D*//g;
      $rank++;
      $start *= 1E9;
      $end *= 1E9;

      if($type =~ /action_/ or $type =~ /smpi_/ or $type =~ /PMPI_/) {
          my($key);
          foreach $key (keys(%smpi_to_mpi)) {
              if($type eq $key) {
                  $type = $smpi_to_mpi{$key};
                  last;
              }
          }
      }
      
      if(defined($mpi_to_pcf{$type})) {
          print "$type $mpi_to_pcf{$type}\n";
          print OUTPUT "1:$rank:1:$rank:1:$start:$end:$mpi_to_pcf{$type}\n";
      } elsif(defined($mpi_coll_to_pcf{$type})) {
          print OUTPUT "1:$rank:1:$rank:1:$start:$end:$mpi_to_pcf{Collective}\n"; # group communication
          print OUTPUT "2:$rank:1:$rank:1:$start:50000002:$mpi_coll_to_pcf{$type}\n";
          print OUTPUT "2:$rank:1:$rank:1:$end:50000002:0\n"; # Output MPI
      } elsif(defined($mpi_others_to_pcf{$type})) {
          print OUTPUT "1:$rank:1:$rank:1:$start:$end:$mpi_to_pcf{Others}\n";
          print OUTPUT "2:$rank:1:$rank:1:$start:50000003:$mpi_others_to_pcf{$type}\n";
          print OUTPUT "2:$rank:1:$rank:1:$end:50000003:0\n"; # Output MPI
      } else {
          warn("Unknown type $type: Skipping $line\n");
      }
  }