Parallel Processing using PVM
PVM is free software which provides the capability for using a number of networked (TCP/IP) machines as a parallel virtual machine to perform tasks for which parallelism is advantageous. We use PVM in one of our computer laboratories at Eastern Washington University (EWU), mixing Linux and Sun machines together, to comprise various virtual machines. PVM comes with a rich selection of examples and tutorial material allowing a user to reach a reasonable level of proficiency in a relatively short time. There is no new programming language to learn, presuming one already knows C, C++ or FORTRAN. With two or more Linux boxes networked one can run PVM and investigate parallel programming. With only one Linux box, one can still run applications and emulate the parallelism.
PVM was developed by Oak Ridge National Laboratory in conjunction with several universities, principal among them being the University of Tennessee at Knoxville and Emory University. The original intent was to facilitate high performance scientific computing by exploiting parallelism whenever possible. By utilizing existing heterogeneous networks (Unix at first) and existing software languages (FORTRAN, C and C++), there was no cost for new hardware and the costs for design and implementation were minimized.
A typical PVM consists of a (possibly heterogeneous) mix of machines on the network, one being the “master” host and the rest being “worker” or “slave” hosts. These various hosts communicate by message passing. The PVM is started at the command line of the master which in turn can spawn workers to achieve the desired configuration of hosts for the PVM. This configuration can be established initially via a configuration file. Alternatively, the virtual machine can be configured from the PVM command line (master's console) or during run time from within the application program.
A solution to a large task, suitable for parallelization, is divided into modules to be spawned by the master and distributed as appropriate among the workers. PVM consists of two software components, a resident daemon (pvmd) and the PVM library (libpvm). These must be available on each machine that is a part of the virtual machine. The first component, pvmd, is the message-passing interface between the application program on each local machine and the network connecting it to the rest of the PVM. The second component, libpvm, provides the local application program with the necessary message-passing functionality, so that it can communicate with the other hosts. These library calls trigger corresponding activity by the local pvmd which deals with the details of transmitting the message. The message is intercepted by the local pvmd of the target node and made available to that machine's application module via the related library call from within that program.
The PVM home page is at http://www.epm.ornl.gov/pvm/pvm_home.html. From there one can download the PVM software, obtain the tutorial, get current PVM news, etc. The software, tarred and zipped, only comprises about 600KB. The README file found therein is sufficient to get up and running, but the value of the tutorial must be stressed. The tutorial can be downloaded from the PVM home page and is also available in book form.
With the tutorial as guide, one can work through the selection of examples packaged with the software. The example source files are well documented internally so that one can become comfortable with the usual PVM library calls. The tutorial has a very nice chapter explaining each of the library calls clearly and in detail. This process was essentially how we started using PVM at EWU.
The network we used at EWU has a variety of machines, architectures and Unix flavors. The first challenge was learning how to configure such a mix of machines to form a PVM. It becomes less straightforward as the heterogeneity increases with the complicating factor being the diverse Unix flavors. One of the methods for configuring a PVM uses the Unix rsh (remote shell) command. This relies on the existence of a .rhosts file on the target machine. From the master one starts the PVM daemon (pvmd) on a slave in order to incorporate the slave into the virtual machine—the master uses rsh to do this. However, the target (slave) machine only allows the master access if the master is listed in the target's .rhosts file. The various flavors of Unix do not agree in the finer details for the syntax of the .rhosts file, nor could we always find pertinent documentation. There are several other methods for configuring PVM and, in most cases, we found it a black art. Nevertheless, we persevered and documented our experience on an embryonic web site for a PVM WebCourse (http://knuth.sirti.org/cscdx/) for those who discover similar problems.
It is instructive to look at an example (see Listings 1 and 2) to get the flavor of the programming. The executable for the source in Listing 1 is started at the command line of the master host which, in turn, spawns the second executable on a slave. For simplicity, no error checking is included. However, PVM provides easy ways to check for failure on such library calls as pvm_spawn. The example, adapted from one in the tutorial, merely measures the time it takes for a simple interchange of passed messages, master to slave and slave to master. In particular, the master sends an array to the slave, the slave doubles each of the elements and sends it back. The master prints out the initial array, the final array and the array's round trip time.
PVM has found a regular place in our curriculum. It provides a way to investigate parallel programming in an inexpensive yet realistic way. Thinking through an involved parallel-programming exercise is a new and sometimes difficult evolution for those accustomed only to sequential programming. Once the students have worked through the tutorial material, they move on to problems in which they take a design role and, finally, to substantive projects.
Although PVM rather quickly became a de facto standard, critics pointed out that in actuality there was no formal, enforced standard. It was also clear that the message passing was slower than optimized protocols native to an architecture. Very soon after the spread of PVM, a standard was put forward called MPI, the Message-Passing Interface. MPI has advantages (e.g., faster message passing, a standard) and disadvantages (e.g., not interoperable among heterogeneous architectures, not dynamically reconfigurable) with respect to PVM. PVMPI is a newer project which will combine the virtues of both approaches. The success of this project together with any standardization efforts of PVMPI may determine the future viability of PVM.
Richard Sevenich is a Professor of Computer Science at Eastern Washington University with special enthusiasm for Debian GNU/Linux. His research interests include Application Specific Languages, State Languages for Industrial Control, Fuzzy Logic and Parallel Distributed Processing. He can be reached by e-mail at rsevenich@ewu.edu.