Automate System Administration Tasks with Puppet
If you have more than one UNIX box in your care, you know how duplication happens. Every machine needs a common set of settings. Package upgrades need to be deployed. Certain packages need to be on every server.
You also want to make sure that any changes to your systems happen in a controlled manner. It's one thing to start off with two servers that are similarly configured; it's another thing to know they're the same a year later, especially if other people are involved.
Puppet is a system for automating system administration tasks (in the author's own words). In the Puppet world, you define a policy (called a manifest) that describes the end state of your systems, and the Puppet software takes care of making sure the system meets that end state. If a file changes, it is replaced with a pristine copy. If a required package is removed, it is re-installed.
It is important to draw a distinction between shell scripts that copy files between systems and a tool like Puppet. The latter abstracts the policy from the steps required to make a system conform. Puppet is smart enough to use apt-get to install a package on a Debian system and yum on a Fedora system. Puppet is smart enough to do nothing if the system already is conformant to the policy.
The Puppet system is split into two parts: a central server and the clients. The server runs a dæmon called puppetmaster. The clients run puppetd, which both connects to, and receives connections from, the puppetmaster. The manifest is written on the puppetmaster. If Puppet is used to manage the central server, it also runs the puppetd client.
The best way to begin with a configuration management system like Puppet is to start with a single client and a simple policy, and then roll it out to more clients and a more complex policy. To that end, start off by installing the Puppet software. Puppet is written in the Ruby scripting language, so you need to install that before you begin (Ruby is available as a package for most distributions).
Packages Are Good
Some people scoff at the idea of using a prebuilt binary package and prefer to build everything from source. That'll work, but it just doesn't scale. When you get further along with Puppet, you'll see how your manifest can manage packages with a single line. It's certainly possible to specify all the files you built, but then you're putting in a lot of needless effort.
You can (and should) build your own packages where needed. Packaging your own applications means you will build the software consistently, version after version, so that files will be in the same place and you won't accidentally drop features. Building your own packages also handles dependencies against other packages and keeps track of software versions.
In all likelihood, you will end up with your own package repository that holds your locally developed packages and any vendor packages that you've modified. You also will use Puppet to ensure that your clients are pointed at your repository.
Installing Puppet from a package also lets you manage the client's Puppet software through Puppet itself. Need to upgrade in order to get more features? Simply update your manifest.
If you choose to install from source, you need the facter and puppet tarballs from the author's site:
http://reductivelabs.com/downloads/facter/facter-latest.tgz
http://reductivelabs.com/downloads/puppet/puppet-latest.tgz
The facter tarball contains the Facter utility, which generates facts about the host system. Facts can be anything from the Linux distribution to whether the host is a virtual machine. The puppet tarball contains both puppetd and puppetmaster.
Untar the files (tar -xzf facter-latest.tgz and tar -xzf puppet-latest.tgz). Change to the newly created facter directory, and run ruby install.rb as root. You will do the same for the puppet directory, which installs both the client and server packages.
Then, run:
puppetmasterd --mkusers; chown puppet /var/puppet
on the puppetmaster to create the puppet user (which also creates the initial directory structure and then fixes a permissions problem). You can skip this step if you are installing from packages.
On the client, run:
puppetd --mkusers; puppetd --server puppet.example.com --test
substituting the name of your puppetmaster for puppet.example.com, which creates the user and directory structure on the client, and then begin the SSL key exchange between the client and the server. You will get an error about certificate validation, because the certificates are not trusted yet.
Back on the puppetmaster, run puppetca --list to show the outstanding certificate requests. You then can use puppetca --sign to accept the certificate, as shown below:
[root@test1 etc]# puppetca --list test2.ertw.com [root@test1 etc]# puppetca --sign test2.ertw.com Signed test2.ertw.com
At this point, the client and server have a mutually trusted connection. The next step is to define the manifest. For this article, I'm using the network time protocol (NTP) dæmon as an example. The goal is to define a manifest that ensures the dæmon is installed, configured and in the boot sequence.
In Puppet terms, a resource is something being managed and the attributes that define it. A resource might be a file that has permission attributes or a package with a name and a version. Puppet comes bundled with many resource types; you also can create your own or download those that others have made.
The central manifest is defined in /etc/puppet/manifests/site.pp. Start with a simple resource defining the NTP package:
package { ntp: ensure => installed }
The above defines a package resource called ntp with one attribute called ensure. The ensure attribute defines the state of the package, with values such as installed, absent, latest or even a version number.
Puppetmaster will notice the change in site.pp and reload the manifest. The client will check in only every half-hour, so you can restart puppetd or send the process the SIGUSR1 signal to force the client to check back with the server immediately. If all goes well, your client will read the manifest and install the ntp package. Try removing the package, and it will be replaced within 30 minutes. If not, check your logs (usually /var/log/messages) for any errors, and make sure your site.pp is correct.
NTP also requires a configuration file called /etc/ntp.conf. Puppet has a resource type called file that handles files. The puppetmaster will hold the master ntp.conf and copy it to the clients should they change their copies.
Create a directory in /var/puppet called files. Then, create /etc/puppet/fileserver.conf as shown below, and restart puppetmasterd:
[files] path /var/puppet/files allow *
fileserver.conf defines file shares for the internal Puppet file server. The above example implements a share called files, which corresponds to a directory on the puppetmaster called /var/puppet/files. Use a URL like puppet://puppet.example.com/files/etc/ntp.conf to access a file located at /var/puppet/files/etc/ntp.conf on the puppetmaster. The allow * grants access to all puppet clients.
Put a working ntp.conf in /var/puppet/files/etc/, and then add the following to your existing site.pp:
file { "ntp.conf": mode => 644, owner => root, group => root, path => "/etc/ntp.conf", source => "puppet://puppet.example.com/files/etc/ntp.conf" }
The format of this file resource is much like the package you previously set up. The resource has a tag of ntp.conf (which is quoted because of the period). The mode, owner and group attributes specify the file's permissions. The path attribute is the local path, which, if omitted, defaults to the value of the tag (the tag does not have a full path in this case, however). Finally, the file's source is a puppet URI that will be pulled from the puppetmaster.
Restart the puppet dæmon on the client (or wait 30 minutes), and you will see ntp.conf has been updated. If you try to change it, you will see that it is replaced in the next cycle.
The final resource needed is the service resource, whose job is to make sure a dæmon is running and that the dæmon is in the startup scripts (or not, if that's your desire). Add the following fragment to your site.pp:
service { ntpd: ensure => true, enable => true, subscribe => [ File["ntp.conf"], Package[ntp] ] }
The service resource handles the ntpd service. The ensure attribute makes sure the dæmon is running, and the enable attribute makes sure it is part of the startup script. The mechanics of this are handled by a provider, and each OS and distribution can have a different provider for each type of service. On Red Hat and Fedora systems, the service provider uses the chkconfig and service utilities.
The subscribe attribute brings the three resources together. The service resource is subscribed to the ntp.conf file resource and the ntp package resource. If any one of them change, the service resource is notified, which is an indication that the service should be restarted. This means you can push out changes by editing the master file on the puppetmaster, and on the next cycle, the client will download the new configuration and restart the dæmon without your intervention.
The subscribe attribute can take either a single element, such as Package[ntp], or multiple elements written in array format, such as [ element1, element2]. Also be careful to capitalize the reference, as the lowercase version has been deprecated and will not work at some point in the future.
Although powerful, these resource definitions can become unwieldy. Puppet has ways around this too. Create a directory under manifests called services, and create a file in this directory called ntpclient.pp with the following contents:
class ntpclient { package { ntp: ensure => installed } file { "ntp.conf": mode => 644, owner => root, group => root, path => "/etc/ntp.conf", source => "puppet://puppet.example.com/files/etc/ntp.conf", } service { ntpd: ensure => true, enable => true, subscribe => [ File["ntp.conf"], Package [ntp] ], } }
This new file contains the three resources you created earlier, surrounded by a class definition. A class groups several resources, which simplifies your configuration and promotes manifest sharing.
Now, replace your site.pp with this simplified manifest:
import "services/*" include ntpclient
The import line reads in all the files inside the services directory. The include line evaluates the class, which means that the class will be applied to the node. This configuration has the same effect as the one before, except the NTP client functionality now has been bundled into the class.
So far, the manifest has assumed that all clients get the same configuration. The easiest way to give different configurations to different clients is with a node definition. A node definition applies a series of configuration directives to a given set of nodes. Replace your site.pp as follows:
import "services/*" node test2, test3 { include ntpclient } node default { }
With this policy in place, only test2 and test3 will have the ntp client class applied. Any other client will be caught by the default statement, which has no resources defined.
Facter is another way to differentiate hosts. Facter generates facts about a machine, such as the operating system, hostname and processor. Simply type facter to see a list of the currently known facts. Here is a subset of the facts generated on one of my test machines:
architecture => i386 domain => ertw.com facterversion => 1.3.8 fqdn => test2.ertw.com hardwareisa => i686 hardwaremodel => i686 hostname => test2 id => root ipaddress => 192.168.1.143 ipaddress_eth0 => 192.168.1.143 kernel => Linux kernelrelease => 2.6.18-8.el5xen lsbdistcodename => Final lsbdistdescription => CentOS release 5 (Final) lsbdistid => CentOS lsbdistrelease => 5 macaddress => 00:16:3E:5D:22:17 macaddress_eth0 => 00:16:3E:5D:22:17 memoryfree => 159.17 MB memorysize => 256.17 MB operatingsystem => CentOS operatingsystemrelease => 2.6.18-8.el5xen processor0 => Intel(R) Pentium(R) 4 CPU 1.80GHz processorcount => 1 ps => ps -ef puppetversion => 0.24.2
Facts are exposed in the manifest as variables. The operatingsystem fact is seen as $operatingsystem. A common use of this is to make the same resource behave differently, depending on the operating system:
file { "foo" name => $operatingsystem ? { solaris => "/usr/local/etc/foo.conf", default => "/etc/foo.conf" } }
The above example uses a Puppet selector to set the name attribute instead of a static string. A selector is much like a case statement in that it can return different values depending on the input. This file resource refers to /usr/local/etc/foo.conf on Solaris systems and /etc/foo.conf on other systems. The system type is determined from the input to the selector, which is the $operatingsystem Facter variable.
You can add your own facts by writing a Ruby script. See Resources for links to documentation for adding custom facts.
My first experience with configuration management was with a product called cfengine. With cfengine, I was able to manage a Web cluster of 14 servers easily and reduce the time to install a new node from several hours to a matter of minutes. Puppet's author has a great deal of cfengine experience and built Puppet to address many shortcomings of cfengine.
Given that cfengine has a much wider install base than Puppet, why would one choose Puppet? After comparing the two, I've discovered several reasons. First, Puppet has a much cleaner configuration than cfengine. In the cfengine world, you are concerned with the ordering of certain operations, whereas Puppet handles ordering with the subscribe attribute (and some others).
Cfengine has many commands for adding and removing lines from files, which don't exist natively in Puppet. Puppet addresses this by providing native resource types for many of the systems that I found myself editing by hand, such as mountpoints. Using a dedicated resource type means the manifest is clear and simple.
Cfengine is open source, but it has a more closed community than Puppet. You can extend cfengine through modules, much akin to Puppet's recipes and facts, but it is nowhere near as integrated. Puppet seems designed from the start to be extensible, where cfengine feels like an afterthought. Puppet also promotes recipe sharing by making them modular, where sharing cfengine code is more difficult because the resources are in different parts of the cfengine policy.
Puppet is written in Ruby, and cfengine is written in C. Initially, I thought this was an advantage for cfengine, but after getting into Puppet, I realized it's not a big deal. Puppet's author takes great pains to abstract Puppet's configuration from the Ruby language, so no knowledge of Ruby is needed.
I found the learning curve for cfengine to be the steepest. Granted, I had no understanding of configuration management when starting with cfengine, and I had some cfengine experience by the time I started with Puppet, but many of my stumbling blocks have been fixed in Puppet.
Both projects offer support over their IRC channels. Cfengine has an extensive on-line manual and a fair bit of third-party documentation on other Web sites. Puppet has an excellent wiki and a comparable amount of third-party documentation.
Although Puppet is younger compared to cfengine, its openness and extensibility are what make it a better choice than cfengine.
Special thanks to James Turnbull, author of Pulling Strings with Puppet, for reviewing this before publication.
Resources
Puppet's Home Page: reductivelabs.com/trac/puppet/wiki
Annotated Links on Using Puppet: del.icio.us/SeanW/puppetlj
Sean Walberg is a network engineer living in Winnipeg, Canada. He is the former system administer of b5media, a global blog network, where he used system management tools to automate routine work.