A Domain-Specific Language (DSL) for specifying parallel computations has been developed in this research and is called High-Level Parallelization Language (Hi-PaL). Because the specifications for parallel computations can vary from application to application, different application-domains (e.g.,image processing and stencil-based computations) were evaluated to build the key abstractions in the form of a DSL.

General Structure

The general structure of Hi-PaL code is shown below:

The keywords are highlighted in bold-face and are constant in all the Hi-PaL programs. The Hi-PaL code will not compile if any of these mandatory keywords are missing and appropriate error messages are generated. The syntax of Hi-PaL is similar to the syntax of aspect languages. The end-user needs to specify the hooks in the sequential application where the parallel operation needs to take effect. The complete hook definition includes the specification of hook type along with a search pattern (which is a statement in the sequential application).

There are three types of hooks- before, after, and around- and every syntactically correct statement in a sequential application can qualify as a search pattern in Hi-PaL. In contrast to Hi-PaL, various language extensions of AOP (e.g., AspectC++ and AspectC) only allow for the specification of function call, function execution, object construction, and object destruction for search purposes. The program statement specified as a hook serves as an anchor before or after which the code for parallelization needs to be woven. With the around hook type, the end-user gets the flexibility to delete or modify a particular statement in the sequential application.A set of Hi-PaL API has been developed to obtain concise descriptions of the parallel tasks or operations from the end-users.
The broad categories of the parallel tasks available through Hi-PaL are as follows:

  • MPI Environment and Communicator Management
  • Data Distribution – Scatter/Scatterv
  • Data Collection – Reduce, Allreduce, Gather/Gatherv
  • Send and Receive Messages or Data – Broadcast, Exchange
  • For-loop parallelization – loop indices can be constants or variables (especially useful for irregular loops)
  • Parallel Read
  • Parallel Write

The list of MPI functions that are used for implementing the aforementioned parallel tasks is as follows:

  • Environment Management Routines: e.g., MPI_Init , MPI_Finalize, MPI_Wtime
  • Point-to-Point Communication Routines: e.g., MPI_Irecv , MPI_Isend, MPI_Wait, MPI_Waitall,
  • Collective Communication Routines: e.g., MPI_AllReduce, MPI_Reduce, MPI_Scatter, MPI_Scatterv, MPI_Gather, MPI_Gatherv, MPI_Bcast, MPI_Barrier
  • Communicators Routines: e.g., MPI_Comm_rank, MPI_Comm_size, MPI_Comm_split
  • Derived Types Routines: e.g., MPI_Type_commit, MPI_Type_vector, MPI_Type_extent, MPI_Type_struct
  • Virtual Topology Routines: e.g., MPI_Dims_create, MPI_Cart_create, MPI_Cart_sub, MPI_Cart_coords, MPI_Cart_shift

The key set of Hi-PaL API is presented below. The name of the API are descriptive enough to explain their purpose. For example, ReduceMaxValInt(), means that the variable specified by is of type integer and it needs to be reduced on one node (by default the node with the rank equal to zero) such that while reducing, the maximum value of the variable calculated by the individual processors is selected (MPI_MAX operation). More details including the complete set of API will be provided in the upcoming publication:

  • Parallelize_For_Loop where (<for_init_stmt>;;)
  • ReduceSumInt()
  • ReduceProductInt()
  • ReduceMaxValInt()
  • ReduceMinValInt()
  • AllReduceSumInt()
  • AllReduceProductInt()
  • AllReduceMaxValInt()
  • AllReduceMinValInt()
  • ParBroadCast1DArrayInt(, )
  • ParBroadCast2DArrayInt(, , )
  • ParExchange1DArrayInt(, )
  • ParExchange2DArrayInt(, , )
  • ParDistribute1DArrayInt(, )
  • ParDistribute2DArrayInt(, , )
  • ParGather1DArrayInt(, )
  • ParGather2DArrayInt(, , )
  • WriteIntVar()
  • WriteIntArray1D (, )
  • WriteIntArray2D (, , )
  • ReadIntVar()
  • ReadIntArray1D(, )
  • ReadIntArray2D(, , )
  • ReduceSumDouble()
  • ReduceProductDouble()
  • ReduceMaxValDouble()
  • ReduceMinValDouble()
  • AllReduceSumDouble()
  • AllReduceProductDouble()
  • AllReduceMaxValDouble()
  • AllReduceMinValDouble()
  • ParBroadCast1DArrayDouble(, )
  • ParBroadCast2DArrayDouble(, , )
  • ParExchange1DArrayDouble(, )
  • ParExchange2DArrayDouble(, , )
  • DistributeVectorDouble(, )
  • ParDistribute1DArrayDouble(, )
  • ParDistribute2DArrayDouble(, , )
  • ParGather1DArrayDouble(, )
  • ParGather2DArrayDouble(, , )
  • WriteDoubleVar()
  • WriteDoubleArray1D (, )
  • WriteDoubleArray2D (, , )
  • ReadDoubleVar()
  • ReadDoubleArray1D(, )
  • ReadDoubleArray2D(, , )

Sample Hi-PaL code

A sample of Hi-PaL code showing the broadcast operation specification is shown below:
Parallel section begins after


  • Ritu Arora
  • Puri Bangalore
  • Marjan Mernik


  • Ritu Arora, Purushotham Bangalore, Marjan Mernik. Raising the Level of Abstraction for Developing Message Passing Applications. Accepted in The Journal of Supercomputing.
  • Ritu Arora (Advisor: Purushotham Bangalore). A Framework for Raising the Level of Abstraction of Explicit Parallelization. ICSE 2009.
  • Ritu Arora, and Purushotham Bangalore. FraSPA: A Framework for Synthesizing Parallel Applications. GHC 2008