2. Tutorial (Global-view)

2.1. Introduction

In the global-view model of XcalableMP (XMP), the user adds directives into the serial code to specify parallelism. The following actions can be described by XMP directives.

  • Data mapping (divides data and distributes it among nodes)
  • Work mapping (divides workload and distributes it among nodes)
  • Inter-node communication (exchange data between nodes)

This tutorial introduces the basics of XMP. A simple sequential code will be parallelized by adding XMP directives.

  • C program
#include <stdio.h>

int main(){
  int a[10];

  for(int i=0;i<10;i++){
    a[i] = i+1;
    printf("%d\n", a[i]);
  }

  return 0;
}
  • Fortran program
program main
  integer a(10)

  do i=1, 10
    a(i) = i
    write(*,*) a(i)
  enddo
end program main

Compilers such as gcc/gfortran translate the sequential code into a binary, which generates the following output.

1
2
3
4
5
6
7
8
9
10

XMP provides several directives to parallelize the sequential code.

2.2. Data mapping

The user uses the nodes, template, distribute, and align directives to specify data mapping among nodes. XMP directives start with “#pragma xmp” in XMP/C, and ”!$xmp” in XMP/Fortran.

  • XMP/C program (incomplete)
#include <stdio.h>

int main(){
#pragma xmp nodes p[2]
#pragma xmp template t[10]
#pragma xmp distribute t[block] onto p
  int a[10];
#pragma xmp align a[i] with t[i]

  for(int i=0;i<10;i++){
    a[i] = i+1;
    printf("%d\n", a[i]);
  }

  return 0;
}
  • XMP/Fortran program (incomplete)
program main
!$xmp nodes p(2)
!$xmp template t(10)
!$xmp distribute t(block) onto p
  integer a(10)
!$xmp align a(i) with t(i)

  do i=1, 10
    a(i) = i
    write(*,*) a(i)
  enddo
end program main

In the above example, the user specifies data mapping of an array a (10 elements) among 2 nodes (5 elements per node).

The nodes directive declares node set p of size 2. The template directive declares template t of size 10. In XMP, array indices start from 0 in [], and start from 1 in (). The node set p has element p[0] and p[1] and template t has elements from t[0] to t[9] in the XMP/C style. In XMP/Fortran, node set p has element p(1) and p(2) and template t has elements from t(1) to t(10).

Note

For some historical reasons, the user can use both [] and () in XMP/C. [] is not available in XMP/Fortran. However, we recommend to use the same syntax to the base language.

The distribute directive distributes template elements among nodes. In XMP/C, elements from t[0] to t[4] are assigned to p[0] and the remaining elements are assigned to p[1]. In XMP/Fortran, elements from t(1) to t(5) are assigned to p(1) and the remaining elements are assigned to p(2).

The align directive assigns target array elements based on the specified template. Each align directive has to be declared before the target array definition. In XMP/C, elements from a[0] to a[4] are assigned to p[0] and the remaining elements are assigned to p[1]. In XMP/Fortran, elements from a(1) to a(5) are assigned to p(1) and the remaining elements are assigned to p(2).

The following figure illustrates the behavior of XMP directives for data mapping.

_images/global.png

The target array specified in the align directive is called “distributed array” Other arrays are called “replicated array” when they are not specified in data mapping directives.

Data mapping is now complete. Next, you will perform work mapping using the template used for data mapping.

2.3. Work mapping

2.3.1. loop directive

The user uses the loop directive to specify work mapping of the following loop statement. The loop directive is inserted before the target loop statement.

  • XMP/C program
#include <stdio.h>

int main(){
#pragma xmp nodes p[2]
#pragma xmp template t[10]
#pragma xmp distribute t[block] onto p
  int a[10];
#pragma xmp align a[i] with t[i]

#pragma xmp loop on t[i]
  for(int i=0;i<10;i++){
    a[i] = i+1;
    printf("%d\n", a[i]);
  }

  return 0;
}
  • XMP/Fortran program
program main
!$xmp nodes p(2)
!$xmp template t(10)
!$xmp distribute t(block) onto p
  integer a(10)
!$xmp align a(i) with t(i)

!$xmp loop on t(i)
  do i=1, 10
    a(i) = i
    write(*,*) a(i)
  enddo
end program main

In the above example, in XMP/C, iterations from 0 to 4 are mapped onto p[0] and iterations 5 to 9 are mapped onto p[1]. In XMP/Fortran, iterations from 1 to 5 are mapped onto p(1) and iterations 6 to 10 are mapped onto p(2).

The following output shows the execution result of the sample program with 2 nodes. Each node prints out the list of assigned array values.

1
2
3
4
5
6
7
8
9
10

Note that the order of each node’s output can be changed or merged in the parallel execution.

6
7
8
9
10
1
2
3
4
5

2.3.2. task directive

The task directive limits the range of execution nodes and changes the execution context. In XMP/C, the task directive specifies the parallel execution of the following compound statement. In XMP/Fortran, the end task directive is required to specify the end of the region.

  • XMP/C program
#include <stdio.h>

int main(){
#pragma xmp nodes p[2]

#pragma xmp task on p[0]
  {
    printf("Hello\n");
  }
  return 0;
}
  • XMP/Fortran program
program main
!$xmp nodes p(2)

!$xmp task on p(1)
  write(*,*) "Hello"
!$xmp end task
end program main

In the above example, in XMP/C, p[0] prints out “Hello” on the screen. In XMP/Fortran, p(1) prints out the result.

The user can use an integer triplet to specify multiple nodes.

  • XMP/C program
[start:length:stride]
  • XMP/Fortran program
(start:end:stride)

XMP/Fortran follows the syntax of the array section in Fortran.

XMP/C has a different form. Triplets in XMP/C is written as [start:size:step]. Start means the start index of the node set. When start is omitted, the range start with the first element. Size means the size of the specified node set. When size is omitted, the node set has elements starting from start to the defined size (with specified step). Step can be specified to declare a discontinuous node set. When step is omitted, 1 will be used.

For example, p[0:5] specifies 5 nodes starting from p[0] (from p[0] to p[4]). p[0:5:2] has p[0], p[2], p[4], p[6], p[8].

The following shows some examples of triplet. The size of node set p is 20 (from p[0] to p[19]).

Note

In XMP/Fortran, triplet can be written as (start:end:step). End specifies the last elements in the node set.

The following program uses the task directive to specify the first two nodes in the original node set.

  • XMP/C program
#include <stdio.h>

int main(){
#pragma xmp nodes p[4]

#pragma xmp task on p[0:2]
  {
    printf("Hello\n");
  }
  return 0;
}
  • XMP/Fortran program
program main
!$xmp nodes p(4)

!$xmp task on p(1:2)
  write(*,*) "Hello"
!$xmp end task
end program main

2.4. Inter-node communication

XMP provides some directives specifying typical inter-node communication patterns.

  • XMP/C program
#include <stdio.h>

int main(){
#pragma xmp nodes p[2]
#pragma xmp template t[10]
#pragma xmp distribute t[block] onto p
  int a[10], b[10];
#pragma xmp align a[i] with t[i]

#pragma xmp loop on t[i]
  for(int i=0;i<10;i++){
    a[i] = i+1;
  }

#pragma xmp gmove
  b[:] = a[:];

#pragma xmp task on p[0]
{
  for(int i=0;i<10;i++)
    printf("%d\n", b[i]);
}

  return 0;
}
  • XMP/Fortran program
program main
!$xmp nodes p(2)
!$xmp template t(10)
!$xmp distribute t(block) onto p
  integer a(10), b(10)
!$xmp align a(i) with t(i)

!$xmp loop on t(i)
  do i=1, 10
    a(i) = i
  enddo

!$xmp gmove
  b(:) = a(:)

!$xmp task on p(1)
  do i=1, 10
    write(*,*) b(i)
  enddo
!$xmp end task
end program main

Array b is a replicated array which has the same shape to distributed array a. The program uses the gmove directive to collect all elements from array a to local array b. The task directive is used to print out the elements in array b by a single node.

The gmove directive specified a collective communication between distribute/replicated arrays. The compiler generates collective communication required for the following assignment statement. Triplet form can be used in the assignment statement to specify multiple elements.

In the program, all distributed data elements are collected from the owner nodes to the local array. The following figure illustrates the inter-node communication generated by the compiler.

_images/gmove_allreduce.png

If the target element is allocated locally, data can be moved within the memory, while inter-node communication is required if the target element is allocated in a remote node.

The following output shows the result of the program. p[0] in XMP/C or p(1) in XMP/Fortran prints out the result. The output sequence is always the same because it is handled by a single node.

1
2
3
4
5
6
7
8
9
10

Note

All communication in XMP should be specified explicitly since the language does not assume automatic inter-node communication (which is a big difference its ancestor, HPF). This design choice makes the performance model clear to a user and easier to optimize the performance.