Using Hiera with Puppet
With Hiera, you can externalize your systems' configuration data and easily understand how those values are assigned to your servers. With that data separated from your Puppet code, you then can encrypt sensitive values, such as passwords and keys.
Separating code and data can be tricky. In the case of configuration management, there is significant value in being able to design a hierarchy of data—especially one with the ability to cascade through classifications of servers and assign one or several options. This is the primary value that Hiera provides—the ability to separate the code for "how to configure the /etc/ntp.conf" from the values that define "what ntp servers should each node use". In the most concise sense, Hiera lets you separate the "how" from the "what".
The idea behind separating code and data is more than just having a cleaner Puppet environment; it allows engineers to create more re-usable Puppet modules. It also puts your variables in one place so that they too can be re-used, without importing manifests across modules. Hiera's use cases include managing packages and versions or using it as a Node Classifier. One of the most compelling use cases for Hiera is for encrypting credentials and other sensitive data, which I talk about later in this article.
Puppet node data originally was managed through node inheritance, which is no longer supported, and subsequently through using a params.pp module subclass. Before Hiera, it was necessary to modify the params.pp module class locally within the module, which frequently damaged the re-usability of the module. params.pp still is used in modules today, but as of Puppet version 3, Hiera is not only the default, but also the first place checked for variable values. When a variable is defined in both Hiera and a module, Hiera takes precedence by default. As you'll see, it's easy to use a module with params.pp and store some or all of the variable data in Hiera, making it easy to migrate incrementally.
To get started using Hiera with your existing Puppet 3 implementation, you won't have to make any significant changes or code migrations. You need only a hierarchy file for Hiera and a yaml file with a key/value pair. Here is an example of a Hiera hierarchy:
hiera.yaml:
:backends:
- yaml
:yaml:
:datadir: /etc/puppet/hieradata
:hierarchy:
- "node/%{::fqdn}"
- "environment/%{::env}/main"
- "environment/%{::env}/%{calling_module}"
- defaults
And a yaml file:
/etc/puppet/hieradata/environment/prod/main.yaml:
---
$nginx::credentials::basic_auth: 'password'
Hiera can have multiple back ends, but for now, let's start with yaml,
which is the default and requires no additional software. The
:datadir:
is just the path to where the hierarchy search path should begin, and
is usually a place within your Puppet configuration. The
:hierarchy:
section is where the core algorithm of how Hiera does its key/value
lookups is defined. The :hierarchy:
is something that will grow and change
over time, and it may become much more complex than this example.
Within
each of the paths defined in the :hierarchy:
, you can reference any
Puppet variable, even $operatingsystem
and
$ipaddress
, if set. Using
the %{variable}
syntax will pull the value.
This example is actually a
special hierarchical design that I use and recommend, which employs a
fact assigned to all nodes called @env
from within
facter. This @env
value can be set on the hosts either based on FQDN or tags in EC2 or
elsewhere, but the important thing is that this is the separation of
one large main.yaml file into directories named prod, dev and so on,
and, therefore, the initial separation of Hiera values into categories.
The second component of this specific example is a special Hiera variable
called %{calling_module}
. This variable is unique and reserved for Hiera
to indicate that the yaml filename to search will be the same as the
Puppet module that is performing the Hiera lookup. Therefore, the way
this hierarchy will behave when looking for a variable in Puppet is like:
$nginx::credentials::basic_auth
First, Hiera knows that it's looking in /etc/puppet/hieradata/node
for a file named <hostname.domain.tld>.yaml and for a value
for nginx::credentials::basic_auth
. If either the file
or the variable isn't there, the next step is to look in
/etc/puppet/hieradata/environment/<prod|stage|dev>/main.yaml,
which is a great way to have one yaml file with most of your Hiera
values. If you have a lot of values for the nginx example and you
want to separate them for manageability, you simply can move them to
the /etc/puppet/hieradata/environment/<prod|stage|dev>/nginx.yaml
file. Finally, as a default, Hiera will check for the value in
defaults.yaml at the top of the hieradata directory.
Your Puppet manifest for this lookup should look something like this:
modules/nginx/manifests/credentials.pp
class nginx::credentials (
basic_auth = 'some_default',
){}
This class, when included, will pull the value from Hiera and can
be used whenever included in your manifests. The value set here of
some_default
is just a placeholder; Hiera will override anything
set in a parameterized class. In fact, if you have a class you are
thinking about converting to pull data from Hiera, just start by moving
one variable from the class definition in {} to a parameterized section
in (), and Puppet will perform a Hiera lookup on that variable. You
even can leave the existing definition intact, because Hiera will override
it. This kind of Hiera lookup is called Automatic Parameter Lookup
and is one of several ways to pull data from Hiera, but it's by far
the most common in practice. You also can specify a Hiera lookup with:
modules/nginx/manifests/credentials.pp
class nginx::credentials (
basic_auth = hiera('nginx::credentials::basic_auth'),
){}
These will both default to a priority lookup method in the Hiera
data files. This means that Hiera will return the value of the first
match and stop looking further. This is usually the only behavior you
want, and it's a reasonable default. There are two lookup methods worth
mentioning: hiera_array
and
hiera_hash
. hiera_array
will find all of
the matching values in the files of the hierarchy and combine them in
an array. In the example hierarchy, this would enable you to look up all
values for a single key for both the node and the environment—for
example, adding an additional DNS search path for one host's
/etc/resolv.conf. To use a hiera_array
lookup, you must define
the lookup type explicitly (instead of relying on Automatic Parameter Lookup):
modules/nginx/manifests/credentials.pp
class nginx::credentials (
basic_auth = hiera_array('nginx::credentials::basic_auth'),
){}
A hiera_hash
lookup works in the same way, only it gathers all matching
values into a single hash and returns that hash. This is often useful
for an advanced create_resources
variable import as well as many other
uses in an advanced Puppet environment.
Perhaps Hiera's most powerful feature is the ability to pull data from a variety of back-end storage technologies. Hiera back ends are too numerous to list, but they include JSON, Redis, MongoDB and even HTTP to create a URL-driven Puppet value API. Let's take a look at two useful back ends: Postgres and hiera-eyaml.
To start with the psql back end, you need to install the hiera-psql gem on your Puppet master (or each node if you're using masterless Puppet runs with Puppet apply), with a simple hiera.yaml file of:
:hierarchy:
* 'environment/%{env}'
* default
:backends:
* psql
:psql:
:connection:
:dbname: hiera
:host: localhost
:user: root
:password: password
You can do lookups on a local Postgres installation with a single database called hiera with a single table called config with three columns: Path, Key and Value.
path key value
'environment/prod' 'nginx::credentials::basic_auth' 'password'
This is extremely useful if you want to expose your Hiera data to custom in-house applications outside Puppet, or if you want to create a DevOps Web console or reports.
Storing credentials in Puppet modules is a bad idea. If you store credentials in Puppet and your manifests on an external code repository, you're not only unable to share those manifests with developers with less-secure access, but you're obviously exposing vital security data outside the organization, and possibly in violation of various types of compliance. So how do you encrypt sensitive data in Puppet while keeping your manifests relevant and sharable? The answer is with hiera-eyaml.
Tom Poulton created hiera-eyaml to allow engineers to do just that: encrypt only the sensitive string of data inside the actual file rather than encrypting the entire file, which also can be done with hiera-gpg (a very useful encryption gem but not covered in this article).
To get started, install the hiera-eyaml gem, and generate a keypair on the Puppet master:
$ eyaml createkeys
Then move the keys to a secure location, like /etc/puppet/secure/keys. Your hiera.yaml configuration should look something like this:
hiera.yaml:
---
:backends:
- eyaml
- yaml
:yaml:
:datadir: /etc/puppet/hieradata
:eyaml:
:datadir: /etc/puppet/hieradata
:extension: 'yaml' # <- so all files can be named .yaml
:pkcs7_private_key: /path/to/private_key.pkcs7.pem
:pkcs7_public_key: /path/to/public_key.pkcs7.pem
:hierarchy:
- "node/%{::fqdn}"
- "environment/%{::env}/main"
- "environment/%{::env}/%{calling_module}"
* defaults
To encrypt values, you need only the public key, so distribute it to anyone who needs to create encrypted values:
$ eyaml encrypt -s 'password'
This will generate an encrypted block that you can add as the value in any yaml file:
main.yaml:
nginx::credentials::user: slackey #cleartext example value
nginx::credentials::basic_auth : > #encrypted example value
ENC[PKCS7,Y22exl+OvjDe+drmik2XEeD3VQtl1uZJXFFF2Nn
/HjZFXwcXRtTlzewJLc+/gox2IfByQRhsI/AgogRfYQKocZg
IZGeunzwhqfmEtGiqpvJJQ5wVRdzJVpTnANBA5qxeA==]
Editing encrypted values in place is one of the coolest features of the
hiera-eyaml back end. eyaml edit
opens a copy of the eyaml file in your
editor of choice and automatically decrypts all of the values in the
file. Here you can modify the values just as though they were plain
text. When you exit the editor by saving the file, it automatically
encrypts all of the modified values and saves the new file in place. You
can see that the unencrypted plain text is marked to allow the eyaml tool
to identify each encrypted block, along with the encryption method that
originally was used. This is used to make sure that the block is encrypted
again only if the clear text value has changed and is encrypted using
the original encryption mechanism:
nginx::credentials::user: user1
nginx::credentials::basic_auth : DEC(1)::PKCS7[very secret password]!
Blocks and strings of encrypted text can get rather onerous once you have more than a hundred entries or so. Because these yaml files are meant to be modified by humans directly, you want them to be easy to navigate. In my experience, it makes sense to keep your encrypted values in a separate file, such as a secure.yaml, with a hierarchy path of:
:hierarchy:
- "node/%{::fqdn}"
- "environment/%{::env}/secure"
- "environment/%{::env}/main"
- "environment/%{::env}/%{calling_module}"
This isn't necessary, as each value is encrypted individually and can be distributed safely to other teams. It may work well for your environment, however, because you can store the encrypted files in a separate repository, perhaps in a different Git repository. Only the private keys need to be protected on the Puppet master. I also recommend having separate keys for each environment, as this can give more granular control over who can decrypt different datafiles in Hiera, as well as even greater security separation. One way to do this is to name the keys with the possible values for the @env fact, and include that in the path of the hierarchy. You'll need to encrypt values with the correct key, and this naming convention makes it easy to tell which one is correct:
:pkcs7_private_key: /path/to/private_key.pkcs7.pem-%{::env}
:pkcs7_public_key: /path/to/public_key.pkcs7.pem-%{::env}
When using Hiera values within Puppet templates, either encrypted or not, you must be careful to pull them into the class that contains the templates instead of calling the values from within the template across classes—for example, in the template mytest.erb in a module called mymodule:
mytest.erb:
...
username: user1
passwd: <%= scope.lookupvar('nginx::credentials::basic_auth') %>
↪#don't do this
...
Puppet may not have loaded a value into
nginx::credentials::basic_auth
yet because of the order of operations. Also, if you are using the
%calling_module
Hiera variable, the calling module in this case
would be mymodule, and not nginx, so it would not find the value in the
nginx.yaml file, as one might expect.
To avoid these and other issues, it's best to import the values into the mymodule class and assign local values:
mymodule.pp:
class mymodule {
include nginx::credentials
$basic_auth = "${nginx::credentials::basic_auth}"
file { '/etc/credentials/boto_cloudwatch.cfg':
content => template ("mymodule/mytest.erb"),
}
And then reference the local value from the template:
mytest.erb:
...
username: user1
passwd: <%= @basic_auth %>
You're now ready to start introducing encrypted Hiera values gradually into your Puppet environment. Maybe after you separate data from your Puppet code, you can contribute some of your modules to the PuppetForge for others to use!
ResourcesDocs—Hiera 1 Overview: https://docs.puppetlabs.com/hiera/1
"First Look: Installing and Using Hiera": http://puppetlabs.com/blog/first-look-installing-and-using-hiera
TomPoulton/hiera-eyaml: https://github.com/TomPoulton/hiera-eyaml
dalen/hiera-psql: https://github.com/dalen/hiera-psql
"Encrypting sensitive data in Puppet": http://www.theguardian.com/info/developer-blog/2014/feb/14/encrypting-sensitive-data-in-puppet