High Availability Linux Web Servers

by Aaron Gowatch

Imagine yourself as the System Administrator for a fairly large web site. It's 5:00 AM on a Monday morning. You're awakened by a page from Big Brother. One of three web servers has just dropped off the network. Suddenly, a third of your traffic is going unanswered. What can you do? The commute to the office isn't a short one, and by the time you get there, you'll already have dropped thousands of hits, which could mean lost revenue, decreased productivity or a missed deadline. Whatever the case may be, someone is going to be affected. As you begin your journey into work, you wonder how this problem could have been prevented.

In fact, a number of solutions are available, many of which require expensive hardware or software. This article outlines a simple and effective method of achieving the same functionality in a cost-effective manner. This method uses a router and the loopback interfaces of your Linux web servers. We achieve high availability by configuring multiple hosts to be capable of serving traffic for the same IP addresses at any given time. Conventionally, we think of virtual IP addresses as being assigned to Ethernet interfaces. However, no two Ethernet interfaces can share the same IP address. We're able to assign the same IP addresses to multiple hosts by binding them to loopback interfaces instead. For instance, a SYN packet, destined for one of these loopback interfaces, travels across the wire to a router that decides the next packet hop based on its routing table. The packet is then forwarded to the next hop—the Ethernet interface on one of many redundant web servers. Then, the packet is forwarded from the Ethernet interface to one of the configured loopbacks on the system. An ACK (acknowledgement) will travel along the same path in reverse. The packet originates on the loopback interface, is forwarded to the Ethernet interface, then back to the router to be sent on its journey back to the original host that sent the SYN packet. Again, the beauty of this scheme is the ability to configure multiple hosts with the same IP address bound to loopback interfaces. By doing so, we've enabled ourselves to redirect traffic for a particular IP address or even an entire subnet by simply changing a route in that last hop router. This saves time and minimizes traffic loss. The process can even be automated using simple shell scripts.

Configuring the Linux Kernel

The kernel must be configured to support IP aliasing. IP aliasing is the process of binding multiple IP addresses to a given network interface, thus creating “virtual” interfaces. Under Linux, interface names are assigned linearly. For example, the first loopback interface is called lo, the second lo:1, the third lo:2 and so on. You can see which interfaces are configured on your system by typing:

/sbin/ifconfig

Configure the kernel with support for TCP/IP, network aliasing and IP aliasing. Under Linux 2.0.x, this is accomplished by answering “yes” to the following kernel configuration options:

Network aliasing (CONFIG_NET_ALIAS) [Y/n/?] y
TCP/IP networking (CONFIG_INET) [Y/n/?] y
IP: aliasing support (CONFIG_IP_ALIAS) [Y/m/n/?] y
Our Network

Our fictitious network will consist of four machines, although you could support the same functionality with as few as two boxes or as many as you anticipate needing. Four boxes will allow us to serve a hefty amount of traffic and still allow plenty of room for growth. Having all four machines handling traffic for a single web site will provide some load balancing as well, using “round robin” DNS. If you ever exceed the capacity of your web servers, adding additional machines is a simple task.

We'll take the class C address 192.168.1.0 and apply a 27-bit subnet mask which will yield 8 subnets and 240 usable hosts.

Note that according to the RFC, the upper and lower subnets will not be usable. Some operating systems will not allow you to configure an interface using an address that falls into one of these subnets. Some routers require you to enable this feature implicitly. For example, Cisco requires that the router be configured with the command ip subnet-zero. This is implementation-dependent, although I have yet to see a UNIX or Microsoft-based host that had a problem utilizing all subnets. If you are unable to use all eight subnets or you are an RFC compliancy fanatic, this configuration will yield 6 subnets and 180 unique hosts.

Traffic can be spread across our 4 hosts for up to 30 different web servers quite easily. It also leaves us with four free subnets for future expansion. Using subnets allows traffic to be redirected from one machine to another with a few simple commands. However, your requirements may not call for an implementation as large as the one in our example. The same functionality can be achieved using host routes, so instead of the routing table having an entry for an entire subnet, the entry is for a single IP address using a 32-bit subnet mask. I'll try to explain the differences where applicable.

While here, we can use a subnet for the Ethernet interfaces of our web servers from our class C; namely, 192.168.1.1 for our router and 192.168.1.2, 192.168.1.3 and 192.168.1.4 for our web servers. Under Red Hat, this is done by editing the /etc/sysconfig/network-scripts/eth0 file to look something like this:

DEVICE=eth0
IPADDR=192.168.1.2
NETMASK=255.255.255.224
NETWORK=192.168.1.0
BROADCAST=192.168.1.31
ONBOOT=yes

You'll also want to edit the /etc/sysconfig/network file to configure the appropriate default route. Mine looks like this:

NETWORKING=yes
HOSTNAME=foohost.foo.com
DOMAINNAME=foo.com
GATEWAY=192.168.1.1
GATEWAYDEV=eth0
Interface configuration varies from distribution to distribution, so your mileage may vary.
Setting Up Our Router

The router should be set up so that it has an interface on the same subnet as the web servers. In our example, we'll assign one interface on the route to IP address 192.168.1.1. This will be the default route for our web servers.

Our First Web Site

Now, suppose you've secured your first contract with Widgetco, Inc. and they'd like you to set up their web site at http://www.widgetco.com/. Registration for this domain, which is outside the scope of this article, should already be completed. The first thing to do is configure the addresses on the loopbacks of the web servers. On our Red Hat machines, we configure them using /etc/sysconfig/network-scripts/ifcfg-lo:[1-655536]. We want each of our hosts to be capable of serving traffic for any of the other hosts at any given time, so each web server will have the other web servers' loopback IP addresses bound to its loopback. Remember that we used our first subnet for the Ethernet interfaces of our web servers, so starting with the second subnet, we'll pick one address out of each of four subnets. We'll take 192.168.1.33, 192.168.1.65, 192.168.1.97, and 192.168.1.129 and bind them to loopbacks on all of the web servers. This is where redundancy comes in. As an example, /etc/sysconfig/network-scripts/ifcfg-lo:1 should look something like this:

DEVICE=lo:1
IPADDR=192.168.1.33
NETMASK=255.255.255.224
NETWORK=192.168.1.32
BROADCAST=192.168.1.63
ONBOOT=yes

Our /etc/sysconfig/network-scripts/ifcfg-lo:2 should look like this:

DEVICE=lo:2
IPADDR=192.168.1.65
NETMASK=255.255.255.224
NETWORK=192.168.1.64
BROADCAST=192.168.1.95
ONBOOT=yes
and so on for all four loopbacks. Again, do this on each web server using the same configuration files. Having multiple hosts which use the same IP addresses won't be an issue, since they are on loopbacks.

You should be able to run the following command to bring up your newly created interfaces on each host:

/etc/rc.d/init.d/network start

To make sure your loopback interfaces have been configured, run:

/sbin/ifconfig
If everything has been done correctly, your output will look something like Listing 1.

At this point, you should be able to ping the four addresses bound to the loopbacks from the host they were configured on. The next step is to set up the routing table on the router so that it knows how to get to these loopback interfaces. We'll set up a route for each of the four subnets, pointing to each of the four hosts.

An example for a Cisco router might look like this:

ip route 192.168.1.32 255.255.255.224 192.168.1.2
ip route 192.168.1.64 255.255.255.224 192.168.1.3
ip route 192.168.1.96 255.255.255.224 192.168.1.4
ip route 192.168.1.128 255.255.255.224 192.168.1.5

If you're using Linux as your router, it will look like this:

/sbin/route add -net 192.168.1.32 netmask\
255.255.255.224 192.168.1.2
/sbin/route add -net 192.168.1.64 netmask\
255.255.255.224 192.168.1.3
/sbin/route add -net 192.168.1.96 netmask\
255.255.255.224 192.168.1.4
/sbin/route add -net 192.168.1.128 netmask\
255.255.255.224 192.168.1.5
Basically, this information tells the router that the next hop for a packet bound for any host on the 192.168.1.32 subnet is the Ethernet interface of our first host, 192.168.1.2. The next hop for a packet bound for any host on the 192.168.1.64 subnet is the Ethernet interface of our second host, 192.168.1.3. Routing table entries are also set up for our third and fourth subnets, which point to the third and fourth hosts, respectively. Setting up these entries will differ depending on which hardware you've chosen to act as your router. It's a good idea to become familiar with the process of adding and removing routes on your hardware. At this point, you should be able to ping the loopback interfaces on your web servers from the router. Other machines utilizing this router should be able to access the loopback interfaces as well. Using TELNET to get to 192.168.1.33 should get you a login prompt on the first host, while 192.168.1.65 should get you to the second and so on.

Now, we'll set up DNS so that www.widgetco.com is served by our web server rotation. For Red Hat Linux, we place the following in /var/named/widgetco.com:

@ IN SOA ns1.widgetco.com. hostmaster.widgetco.com.
(
 1998020100 ; serial (yyyymmddnn)
 86400 ; refresh (every day)
 3600 ; retry (every hour)
 1209600 ; expire (2 weeks)
 86400 ) ; minimum TTL (half day)
 IN NS ns.foo.com.
www IN A 192.168.1.33
         IN     A       192.168.1.65
         IN     A       192.168.1.97
         IN     A       192.168.1.129
Configuring Apache

Apache should be configured in /etc/conf/httpd.conf for Red Hat Linux to listen on each of the loopbacks that we've configured. Let's look at a simple example:

<VirtualHost 192.168.1.33 192.168.1.65 192.168.1.97
192.168.1.129>
ServerName www.widgetco.com
DocumentRoot /www/www.widgetco.com
</VirtualHost>

This tells Apache to listen on all four of our loopback interfaces, set the ServerName and set up the DocumentRoot from which Apache will provide content for www.widgetco.com. Though Apache will not see traffic on all four interfaces during normal operation, configuring Apache to listen beforehand will allow us to redirect traffic from one web server to another on the fly.

Redirecting Traffic

Redirecting traffic from one machine to another is fairly simple. It's just a matter of changing what your router thinks is the next hop for a given subnet or host. For example, if we need to reboot our first web server, we could redirect traffic to the second with the following Cisco router commands:

no ip route 192.168.1.32 255.255.255.224
ip route 192.168.1.32 255.255.255.224 192.168.1.3

All traffic that was going to 192.168.1.2 is now rerouted to 192.168.1.3, the second web server, and we've dropped only the packets that were sent between the first and second router configuration commands. If your router is running Linux, you can write a simple shell script that changes these routes for you automatically. A few lines of Expect can change routes in a dedicated hardware router.

Other Applications?

This method of traffic redirection is not limited to web servers. Other applications using IP, and which could benefit from high availability, can utilize methods similar to the ones we've covered. A few examples include DNS, FTP and mail servers.

High Availability Linux Web Servers
Aaron Gowatch is a Senior Systems Engineer living in San Francisco, California. These days he spends almost as much time wrenching on his Vespa and Lambretta motor scooters as he does sitting at the console. He can be reached at aarong@divinia.com.
Load Disqus comments