Managing Linux Using Puppet
At some point, you probably have installed or configured a piece of software on a server or desktop PC. Since you read Linux Journal, you've probably done a lot of this, as well as developed a range of glue shell scripts, Perl snippets and cron jobs.
Unless you are more disciplined than I was, every server has a unique, hand-crafted version of those config files and scripts. It might be as simple as a backup monitor script, but each still needs to be managed and installed.
Installing a new server usually involves copying over config files and glue scripts from another server until things "work". Subtle problems may persist if a particular condition appears infrequently. Any improvement is usually made on an ad hoc basis to a specific machine, and there is no way to apply improvements to all servers or desktops easily.
Finally, in typical scenarios, all the learning and knowledge invested in these scripts and configuration files are scattered throughout the filesystem on each Linux system. This means there is no easy way to know how any piece of software has been customized.
If you have installed a server and come back to it three years later wondering what you did, or manage a group of desktops or a private cloud of virtual machines, configuration management and Puppet can help simplify your life.
Enter Configuration ManagementConfiguration management is a solution to this problem. A complete solution provides a centralized repository that defines and documents how things are done that can be applied to any system easily and reproducibly. Improvements simply can be rolled out to systems as required. The result is that a large number of servers can be managed by one administrator with ease.
PuppetMany different configuration management tools for Linux (and other platforms) exist. Puppet is one of the most popular and the one I cover in this article. Similar tools include Chef, Ansible and Salt as well as many others. Although they differ in the specifics, the general objectives are the same.
Puppet's underlying philosophy is that you tell it what you want as an end result (required state), not how you want it done (the procedure), using Puppet's programming language. For example, you might say "I want ssh key XYZ to be able to log in to user account foo." You wouldn't say "cat this string to /home/foo/.ssh/authorized_keys." In fact, the simple procedure I defined isn't even close to being reliable or correct, as the .ssh directory may not exist, the permissions could be wrong and many other things.
You declare your requirements using Puppet's language in files called
manifests with the suffix .pp. Your manifest states the requirements
for a machine (virtual or real) using Puppet's built-in modules or your
own custom modules, which also are stored in manifest files. Puppet is
driven from this collection of manifests much like a program is built
from code. When the puppet apply
command is run, Puppet will
compile the program, determine the difference in the machine's state
from the desired state, and then make any changes necessary to bring
the machine in line with the requirements.
This approach means that if you run puppet apply
on a machine that
is up to date with the current manifests, nothing should happen, as there
are no changes to make.
Puppet is a tool (actually a whole suite of tools) that includes the Puppet execution program, the Puppet master, the Puppet database and the Puppet system information utility. There are many different ways to use it that suit different environments.
In this article, I explain the basics of Puppet and the way we use it to manage our servers and desktops, in a simplified form. I use the term "machine" to refer to desktops, virtual machines and hypervisor hosts.
The approach I outline here works well for 1–100 machines that are fairly similar but differ in various ways. If you are managing a cloud of 1,000 virtual servers that are identical or differ in very predictable ways, this approach is not optimized for that case (and you should write an article for the next issue of Linux Journal).
This approach is based around the ideas outlined in the excellent book Puppet 3 Beginners Guide by John Arundel. The basic idea is this:
-
Store your Puppet manifests in git. This provides a great way to manage, track and distribute changes. We also use it as the way servers get their manifests (we don't use a Puppet master). You easily could use Subversion, Mercurial or any other SCM.
-
Use a separate git branch for each machine so that machines are stable.
-
Each machine then periodically polls the git repository and runs
puppet apply
if there are any changes. -
There is a manifest file for each machine that defines the desired state.
For the purposes of this article, I'm using the example of configuring developers' desktops. The example desktop machine is a clean Ubuntu 12.04 with the hostname puppet-test; however, any version of Linux should work with almost no differences. I will be working using an empty git repository on a private git server. If you are going to use GitHub for this, do not put any sensitive information in there, in particular keys or passwords.
Puppet is installed on the target machine using the commands shown in Listing 1.
The install simply sets up the Puppet Labs repository and installs git
and Puppet. Notice that I have used specific versions of
puppet-common
and the puppetlabs/apt
module. Unfortunately, I have found Puppet
tends to break previously valid code and its own modules even with
minor upgrades. For this reason, all my machines are locked to specific
versions, and upgrades are done in a controlled way.
wget https://apt.puppetlabs.com/puppetlabs-release-precise.deb
dpkg -i puppetlabs-release-precise.deb
apt-get update
apt-get install -y man git puppet-common=3.7.3-1puppetlabs1
puppet module install puppetlabs/apt --version 1.8.0
Now Puppet is installed, so let's do something with it.
Getting StartedI usually edit the manifests on my desktop and then commit them to git and push to the origin repository. I have uploaded my repository to GitHub as an easy reference, which you may wish to copy, fork and so on.
In your git repository, create the file manifests/puppet-test.pp, as shown in Listing 2. This file illustrates a few points:
-
The name of the file matches the hostname. This is not a requirement; it just helps to organize your manifests.
-
It imports the apt package, which is a module that allows you to manipulate installed software.
-
The top-level item is "node", which means it defines the state of a server(s).
-
The node name is "puppet-test", which matches the server name. This is how Puppet determines to apply this specific node.
-
The manifest declares that it wants the vim package installed and the emacs package absent. Let the flame wars commence!
include apt
node 'puppet-test' {
package { 'vim':
ensure => 'present'
}
package { 'emacs':
ensure => 'absent'
}
}
Now you can use this Puppet configuration on the machine itself. If you
ssh
in to the machine (you may need ssh
-A agent
forwarding so you can
authenticate to git), you can run the commands from Listing 3, replacing
gitserver
with your own.
git clone git@gitserver:Puppet-LinuxJournal.git
↪/etc/puppet/linuxjournal
puppet apply /etc/puppet/linuxjournal/manifests
↪--modulepath=/etc/puppet/linuxjournal/
↪modules/:/etc/puppet/modules/
This code clones the git repository into /etc/puppet/linuxjournal and
then runs puppet apply
using the custom manifests directory.
The puppet apply
command looks for a node with a matching name and then
attempts to make the machine's state match what has been specified in
that node. In this case, that means installing vim, if it isn't already,
and removing emacs.
It would be nice to create the developer user, so you can set up that configuration. Listing 4 shows an updated puppet-test.pp that creates a user as per the developer variable (this is not a good way to do it, but it's done like this for the sake of this example). Note how the variable is preceded by $. Also the variable is substituted into strings quoted using "but not with" in the same way as bash.
Listing 4. /manifests/puppet-test.pp
include apt
node 'puppet-test' {
$developer = 'david'
package { 'vim':
ensure => 'present'
}
package { 'emacs':
ensure => 'absent'
}
user { "$developer":
ensure => present,
comment => "Developer $developer",
shell => '/bin/bash',
managehome => true,
}
}
Let's apply the new change on the desktop by pulling the changes and
re-running puppet apply
as per Listing 5. You now should have a new
user created.
cd /etc/puppet/linuxjournal
git pull
puppet apply /etc/puppet/linuxjournal/manifests
↪--modulepath=/etc/puppet/linuxjournal/
↪modules/:/etc/puppet/modules/
Creating Modules
Putting all this code inside the node isn't very reusable. Let's move
the user into a developer_pc
module and call that
from your node. To do
this, create the file modules/developer_pc/manifests/init.pp in the
git repository as per Listing 6. This creates a new module called
developer_pc
that accepts a parameter called
developer name
and uses it
to define the user.
class developer_pc ($developer) {
user { "$developer":
ensure => present,
comment => "Developer $developer",
shell => '/bin/bash',
managehome => true,
}
}
You then can use the module in your node as demonstrated in Listing 7.
Note how you pass the developer
parameter, which is then accessible inside
the module.
node 'puppet-test' {
package { 'vim':
ensure => 'present'
}
package { 'emacs':
ensure => 'absent'
}
class { 'developer_pc': developer => 'david' }
}
Apply the changes again, and there shouldn't be any change. All you have done is refactored the code.
Creating Static FilesSay you would like to standardize your vim config for all the developers and stop word wrapping by setting up their .vimrc file. To do this in Puppet, you create the file you want to use in /modules/developer_pc/files/vimrc as per Listing 8, and then add a file resource in /modules/developer_pc/manifests/init.pp as per Listing 9. The file resource can be placed immediately below the user resource.
Listing 8. /modules/developer_pc/files/vimrc
# Managed by puppet in developer_pc
set nowrap
Listing 9. /modules/developer_pc/manifests/init.pp
file { "/home/$developer/.vimrc":
source => "puppet:///modules/developer_pc/vimrc",
owner => "$developer",
group => "$developer",
require => [ User["$developer"] ]
}
The file
resource defines a file
/home/$developer/.vimrc, which will be
set from the vimrc file you created just before. You also set the owner
and group on the file, since Puppet typically is run as root.
The require
clause on the file takes an array of resources and states
that those resources must be processed before this file is processed (note
the uppercase first letter; this is how Puppet refers to resources
rather than declaring them). This dependency allows you to stop Puppet
from trying to create the .vimrc file before the user has been created.
When resources are adjacent, like the user and the file, they also
can be "chained" using the ->
operator.
Apply the changes again, and you now can expect to see your custom .vimrc
set up. If you run puppet apply
later, if the source vimrc file
hasn't changed, the .vimrc file won't change either, including the
modification date. If one of the developers changes .vimrc, the next
time puppet apply
is run, it will be reverted to the version in Puppet.
A little later, say one of the developers asks if they can ignore case
as well in vim when searching. You easily can roll this out to all the
desktops. Simply change the vimrc file to include set
ignorecase
,
commit and run puppet apply
on each machine.
Often you will want to create files where the content is dynamic. Puppet has support for .erb templates, which are templates containing snippets of Ruby code similar to jsp or php files. The code has access to all of the variables in Puppet, with a slightly different syntax.
As an example, our build process uses a file called
$HOME/Projects/override.properties that contains the name of the
build root. This is typically just the user's home directory. You can
set this up in Puppet using an .erb template as shown in Listing 10.
The erb template is very similar to the static file, except it needs to
be in the template folder, and it uses <%= %>
for
expressions, <% %>
for code, and variables are referred to with the @
prefix.
# Managed by Puppet
dir.home=/home/<%= @developer %>/
You use the .erb template by adding the rules shown in Listing 11. First,
you
have to ensure that there is a Projects directory, and then you require the
override.properties file itself. The ->
operator is used to
ensure that you create the directory first and then the file.
file { "/home/$developer/Projects":
ensure => 'directory',
owner => "$developer",
group => "$developer",
require => [ User["$developer"] ]
}
->
file { "/home/$developer/Projects/override.properties":
content => template('developer_pc/override.properties.erb'),
owner => "$developer",
group => "$developer",
}
Running Puppet Automatically
Running Puppet each time you want to make a change doesn't work well
beyond a handful of machines. To solve this, you can have each machine
automatically check git for changes and then run puppet
apply
(you can do this only if git has changed, but that is an optional).
Next, you will define a file called puppetApply.sh that does what you want and then set up a cron job to call it every ten minutes. This is done in a new module called puppet_apply in three steps:
-
Create your puppetApply.sh template in modules/puppet_apply/files/puppetApply.sh as per Listing 12.
-
Create the puppetApply.sh file and set up the crontab entry as shown in Listing 13.
-
Use your
puppet_apply
module from your node in puppet-test.pp as per Listing 14.
# Managed by Puppet
cd /etc/puppet/linuxjournal
git pull
puppet apply /etc/puppet/linuxjournal/manifests
↪--modulepath=/etc/puppet/linuxjournal/modules/
↪:/etc/puppet/modules/
Listing 13. /modules/puppet_apply/manifests/init.pp
class puppet_apply () {
file { "/usr/local/bin/puppetApply.sh":
source => "puppet:///modules/puppet_apply/puppetApply.sh",
mode => 'u=wrx,g=r,o=r'
}
->
cron { "run-puppetApply":
ensure => 'present',
command => "/usr/local/bin/puppetApply.sh >
↪/tmp/puppetApply.log 2>&1",
minute => '*/10',
}
}
Listing 14. /manifests/puppet-test.pp
class { 'puppet_apply': ; }
You will need to ensure that the server has read access to the git
repository. You can do this using an SSH key distributed via Puppet and an
IdentityFile
entry in /root/.ssh/config.
If you apply changes now, you should see that there is an entry in root's crontab, and every ten minutes puppetApply.sh should run. Now you simply can commit your changes to git, and within ten minutes, they will be rolled out.
Modifying Config Files
Many times you don't want to replace a config file, but rather ensure
that certain options are set to certain values. For example, I may want
to change the SSH port from the default of 22 to 2022 and disallow
password logins. Rather than manage the entire config file with Puppet,
I can use the augeas
resource to set multiple configuration options.
Refer to Listing 15 for some code that can be added to the
developer_pc
class you created earlier. The code does three things:
-
Installs openssh-server (not really required, but there for completeness).
-
Ensures that SSH is running as a service.
-
Sets
Port 2022
andPasswordAuthentication no
in /etc/ssh/sshd_config. -
If the file changes, the
notify
clause causes SSH to reload the configuration.
package { 'openssh-server':
ensure => 'present'
}
service { 'ssh':
ensure => running,
require => [ Package["openssh-server"] ]
}
augeas { 'change-sshd':
context => '/files/etc/ssh/sshd_config',
changes => ['set Port 2022', 'set PasswordAuthentication no'],
notify => Service['ssh'],
require => [ Package["openssh-server"] ]
}
Once puppetApply.sh automatically runs, any subsequent SSH sessions will need to connect on port 2022, and you no longer will be able to use a password.
Removing Rules
When defining rules in Puppet, it is important to keep in mind that
removing a rule for a resource is not the same as a rule that removes
that resource. For example, suppose you have a rule that creates an
authorized SSH key for "developerA". Later,
"developerA" leaves,
so you remove the rule defining the key. Unfortunately, this does not
remove the entry from authorized_keys
. In most
cases, the state defined in
Puppet resources is not considered definitive; changes outside Puppet
are allowed. So once the rule for developerA's key has been removed,
there is no way to know if it simply was added manually or if Puppet
should remove it.
In this case, you can use the ensure => 'absent'
rule to ensure packages,
files, directories, users and so on are deleted. The original Listing 1 showed
an example of this to remove the emacs package. There is a definite
difference between ensuring that emacs is absent versus no rule declaration.
At our office, when a developer or administrator leaves, we replace their SSH key with an invalid key, which then immediately updates every entry for that developer.
Existing ModulesMany modules are listed on Puppet Forge covering almost every imaginable problem. Some are really good, and others are less so. It's always worth searching to see if there is something good and then making a decision as to whether it's better to define your own module or reuse an existing one.
Managing GitWe don't keep all of our machines sitting on the master branch. We use a modified gitflow approach to manage our repository. Each server has its own branch, and most of them point at master. A few are on the bleeding edge of the develop branch. Periodically, we roll a new release from develop into master and then move each machine's branch forward from the old release to the new one. Keeping separate branches for each server gives flexibility to hold specific servers back and ensures that changes aren't rolled out to servers in an ad hoc fashion.
We use scripts to manage all our branches and fast-forward them to new releases. With roughly 100 machines, it works for us. On a larger scale, separate branches for each server probably is impractical.
Using a single repository shared with all servers isn't ideal. Storing sensitive information encrypted in Hiera is a good idea. There was an excellent Linux Journal article covering this: "Using Hiera with Puppet" by Scott Lackey in the March 2015 issue.
As your number of machines grows, using a single git repository could become a problem. The main problem for us is there is a lot of "commit noise" between reusable modules versus machine-specific configurations. Second, you may not want all your admins to be able to edit all the modules or machine manifests, or you may not want all manifests rolled out to each machine. Our solution is to use multiple repositories, one for generic modules, one for machine-/customer-specific configuration and one for global information. This keeps our core modules separated and under proper release management while also allowing us to release critical global changes easily.
Scaling Up/Trade-offsThe approach outlined in this article works well for us. I hope it works for you as well; however, you may want to consider some additional points.
As our servers differ in ways that are not consistent, using Facter or metadata to drive configuration isn't suitable for us. However, if you have 100 Web servers, using the hostname of nginx-prod-099 to determine the install requirements would save a lot of time.
A lot of people use the Puppet master to roll out and push changes, and this is the general approach referred to in a lot of tutorials on-line. You can combine this with PuppetDB to share information from one machine to another machine—for example, the public key of one server can be shared to another server.
ConclusionThis article has barely scratched the surface of what can be done using Puppet. Virtually everything about your machines can be managed using the various Puppet built-in resources or modules. After using it for a short while, you'll experience the ease of building a second server with a few commands or of rolling out a change to many servers in minutes.
Once you can make changes across servers so easily, it becomes much more rewarding to build things as well as possible. For example, monitoring your cron jobs and backups can take a lot more work than the actual task itself, but with configuration management, you can build a reusable module and then use it for everything.
For me, Puppet has transformed system administration from a chore into a rewarding activity because of the huge leverage you get. Give it a go; once you do, you'll never go back!