VPN Implementation in Cluster Computing
Special considerations are involved when completing the implementation of a cluster. Even with the queue system and parallel environment, extra services are required for a cluster to function as a multi-user computational platform. These services include the well known network services NFS, NIS and rsh. NFS allows cluster nodes to share user home directories as well as installation files for the queue system and parallel environment. NIS provides correct file and process ownership across all the cluster nodes from the single source on the master machine. Although these services are significant components of a cluster, such services create numerous vulnerabilities. Thus, it would be insecure to have cluster nodes function on an open network. For these reasons, computational cluster nodes usually reside on private networks, often accessible for users only through a firewall gateway. In most cases, the firewall is configured on the master node using ipchains or iptables.
Having all cluster machines on the same private network requires them to be connected to the same switch (or linked switches) and, therefore, localized at the same proximity. This situation creates a severe limitation in terms of cluster scalability. It is impossible to combine private network machines in different geographic locations into one joint cluster, because private networks are not routable with the standard Internet Protocol (IP).
Combining cluster resources on different locations, so that users from various departments would be able to take advantage of available computational nodes, however, is possible. Theoretically, merging clusters is not only desirable but also advantageous, in the sense that different clusters are not localized at one place but are, rather, centralized. This setup provides higher availability and efficiency to clusters, and such a proposition is highly attractive. But in order to merge clusters, all the machines would have to be on a public network instead of a private one, because every single node on every cluster needs to be directly accessible from the others. If we were to do this, however, it might create insurmountable problems because of the potential--the inevitable--security breaches. We can see then that to serve scalability, we severely compromise security, but where we satisfy security concerns, scalability becomes significantly limited. Faced with such a problem, how can we make clusters scalable and, at the same time, establish a rock-solid security on the cluster networks? Enter the Virtual Private Network (VPN).
VPNs often are heralded as one of the most cutting-edge, cost-saving solutions to various applications, and they are widely deployed in the areas of security, infrastructure expansion and inter-networking. A VPN adds more dimension to networking and infrastructure because it enables private networks to be connected in secure and robust ways. Private networks generally are not accessible from the Internet and are networked only within confined locations.
The technology behind VPNs, however, changes what we have previously known about private networks. Through effective use of a VPN, we are able to connect previously unrelated private networks or individual hosts both securely and transparently. Being able to connect private networks opens a whole slew of new possibilities. With a VPN, we are not limited to resources in only one location (a single private network). We can finally take advantage of resources and information from all other private networks connected via VPN gateways, without having to largely change what we already have in our networks. In many cases, a VPN is an invaluable solution to integrate and better utilize fragmented resources.
In our environment, the VPN plays a significant role in combining high performance Linux computational clusters located on separate private networks into one large cluster. The VPN, with its power to transparently combine two private networks through an existing open network, enabled us to connect seamlessly two unrelated clusters in different physical locations. The VPN connection creates a tunnel between gateways that allows hosts on two different subnets (e.g., 192.168.1.0/24 and 192.168.5.0/24) to see each other as if they are on the same network. Thus, we were able to operate critical network services such as NFS, NIS, rsh and the queue system over two different private networks, without compromising security over the open network. Furthermore, the VPN encrypts all the data being passed through the established tunnel and makes the network more secure and less prone to malicious exploits.
The VPN solved not only the previously discussed problems with security, but it also opened a new door for scalability. Since all the cluster nodes can reside in private networks and operate through the VPN, the entire infrastructure can be better organized and the IP addresses can be efficiently managed, resulting in a more scalable and much cleaner network. Before VPNs, it was a pending problem to assign public IP addresses to every single node on the cluster, which limited the maximum number of nodes that can be added to the cluster. Now, with a VPN, our cluster can expand in greater magnitude and scale in an organized manner. As can be seen, we have successfully integrated the VPN technology to our networks and have addressed important issues of scalability, accessibility and security in cluster computing.
In order to implement a VPN, we used free Linux VPN software called FreeS/WAN. FreeS/WAN is readily downloadable from www.freeswan.org, where comprehensive guidelines on installation and configuration are documented on-line. Other useful references for information about VPNs can be found in recent publications (see Resources).
The basic procedure for setting up a VPN gateway is as follows:
Obtain FreeS/WAN source code.
Install FreeS/WAN by compiling it into the kernel.
Configure the gateways with almost identical copies of two system files: ipsec.conf and ipsec.secrets.
Below is the sample ipsec.conf to be used as a reference when setting up two VPN gateways. The details of each field on this configuration file can be found in the FreeS/WAN on-line documentation or in the manpages for ipsec.
Sample ipsec.conf
# basic configuration config setup interfaces="ipsec0=eth0 ipsec1=eth1" klipsdebug=none plutodebug=none plutoload=%search plutostart=%search uniqueids=yes # defaults for subsequent connection descriptions conn %default keyingtries=0 authby=rsasig # VPN connection from Lab to the machine room conn Net-Net leftid=@cluster1-vpn.domainname.com left=128.9.232.78 leftsubnet=192.168.5.0/24 leftnexthop=128.9.232.1 leftrsasigkey=0sAQ... rightid=@cluster2-vpn.domainname.com right=123.9.234.21 rightsubnet=192.168.1.0/24 rightnexthop=123.9.234.1 rightrsasigkey=0sAQ... auto=add
After having VPN gateways set up for two private networks that contain two different clusters, we had to take the necessary steps to make sure they worked properly. These steps required modifying the routing tables and ipchains/iptables rule on firewall gateways for both private networks. It is necessary to configure these firewall gateways, since they also act as default routers running Network Address Translation (NAT) for their respective subnets. In Figure 1, we can see that all the machines in private networks pass through the firewall gateway machine in order to reach the Internet.
When these privately networked nodes attempt to send packets to the outside, the packets first go to the default gateway to take on the source address of this gateway (masquerading) and are then sent to the Internet. However, sending the packets from one private network to another private network is impossible with the default gateway. Therefore, we need to use the VPN gateway to communicate directly between the private networks. On each of the default gateways, we set up routing table as follows:
default-gateway-1# /sbin/route add net 192.168.1.0/24 gw 192.168.5.250 default-gateway-2# /sbin/route add net 192.168.5.0/24 gw 192.168.1.250
where IP addresses 192.168.5.250 and 192.168.1.250 are the IP addresses of the VPN gateways on the private subnets, respectively. The routing table modification above instructs the default gateways to redirect packets to the VPN gateways when hosts between two private networks try to communicate with each other. Figure 2 shows how packets traverse the internal network to use the VPN gateways before going out to the Internet. Notice that with these routing table entries, we can refer directly to remote machines located on other private network by their private IP addresses.
However, we need to make sure that the packets routed through the default gateway to the other private subnet will not be masqueraded. In the ipchains rules, we specify the rule for NAT, including the option not to masquerade for the other private network. Since both private networks have the same two bytes (16 bits) of IP address, we use CIDR (Classless Interdomain Routing) notation to solicit masquerading to IP addresses that do not contain 192.168. in the first two bytes.
/sbin/ipchains -A forward -j MASQ -s 192.168.0.0/16 -d ! 192.168.0.0/16
Alternatively, we can set up routing on each node individually but it requires extra management. Therefore, it is recommended that necessary routing be set up on only the default gateway, not on all the nodes in the private network. After making these changes to our network, we were able to establish connections between two private networks using rsh, NFS and NIS. We then mounted an installation directory of the queue system remotely across the VPN tunnel, and we successfully added machines to the queue system over the VPN connection. Figure 3 shows the topology of two separate, privately networked clusters after implementation of the VPN.
After we connected the two clusters through the the VPN, users were able to log in to the master machine on the first cluster and submit jobs on both of the clusters through the queue system. However, for the high performance parallel computations, we suggest running parallel jobs within either of the clusters, because the VPN and the Internet between the clusters might degrade the performance of message-passing communications. All in all, VPN technology helped us solve the scalability issues of clustering without compromising the network security.
Building Linux Virtual Private Networks (VPNs), Oleg Kolesnikov and Brian Hatch, New Riders, February 2002.
FreeS/WAN On-line Documentation
"Setting up a VPN Gateway", Duncan Napier, Linux Journal, January 2002, Issue 93.
"Administering Linux IPSec Virtual Private Networks", Duncan Napier,Sys Admin, March 2002, Volume 11, Number 3.
"IP VPN Services", Doug Allen, Network Magazine, April 2002, Vol.17, No.4.
J.B. Kim, A.D. Kotelnikov and D.D. Knight are part of the School of Engineering at Rutgers University.