At the Forge - Redis
The past few months, I've been covering non-relational databases, sometimes known as NoSQL databases. To hard-core NoSQL proponents, relational databases are no longer the be-all and end-all of data storage. Rather, NoSQL systems, which offer flexibility, easy replication and storage using modern data structures, are the way of the future—and perhaps even of the present.
Most NoSQL adherents aren't quite this extreme, but instead point to NoSQL as a useful solution to relatively new problems, such as those faced by Web sites with massive loads. To them (and me), NoSQL databases offer the storage equivalent of a new data structure. You could build programs with nothing more than integers, strings and arrays, but with the addition of hash tables to your arsenal, your code becomes easier to write and maintain. In the same way, having an additional storage mechanism can improve the quality, efficiency and maintainability of your software.
NoSQL is a catchphrase that has caught on like wildfire in the past year or two, but it's a problematic phrase in that it describes what these databases are not, rather than what they are. And indeed, many different types of NoSQL databases exist. Two that I have explored in this column during the past few months are MongoDB and CouchDB. Both of these are “document” databases—they store collections of name-value pairs, much like a Ruby hash or a Python dictionary.
A different type of NoSQL database is the key-value store. Whereas you can think of a document database as containing multiple hash tables, a key-value store is the equivalent of a single hash table. As you can tell by its name, a key-value store allows for the storage of a single value (which might be an aggregate data structure, such as an array or hash table), identified by a single key.
Whether a document database or a key-value store is more appropriate for your application depends greatly on your needs. I recently rewrote part of my PhD dissertation software, which previously had used PostgreSQL for all back-end storage, to use a combination of PostgreSQL and MongoDB. I chose MongoDB because I will need to retrieve documents using a variety of fields and combinations of fields. A single key for each document would have been insufficient.
In another case, a financial application on which I have been working, I needed fast access to the latest exchange rates for a number of currency pairs. Because I was going to be retrieving data based only on a single, unique key (that is, the six-letter representation of a currency pair), using a document database would result in unnecessary overhead. All I was interested in doing was storing the current exchange rate for a currency pair or retrieving the current rate for that pair, a perfect match for a key-value store.
So, I spent some time investigating key-value stores and decided to use Redis, an open-source key-value store originally developed by Salvatore Sanfilippo, an Italian programmer who was hired by VMware to work on Redis full-time. Redis was released in February 2009, but it quickly has attracted a large following, in no small part because of its amazing speed.
In many ways, Redis resembles memcached, another key-value store that is popular for scaling Web applications. Like memcached, Redis stores keys and values in RAM. Like memcached, Redis is extremely fast. Like memcached, Redis has bindings and clients written in a large number of languages.
However, there are significant differences. Redis can store and manipulate a large number of data structures (such as lists, sets and hashes). Redis stores values in RAM but writes them out to disk, asynchronously, on a regular basis. This means if someone pulls the plug on your computer, you will lose only the items you added since the last time Redis saved. Everything else will be read into RAM and made available in the usual way when you next bring up Redis.
And, have I mentioned that Redis is fast? It's not uncommon to hear people talk about getting tens of thousands of reads and writes per second with Redis.
Now that I have described Redis, let's try to install it. On most modern Linux distributions, you should be able to install Redis (often as the package redis-server) via apt-get or yum. However, pay attention to the version number. My Linux server running Ubuntu 9.10 happily installed a very old version of Redis for me. I uninstalled it and downloaded it from the Redis home page (see Resources).
If you download the source code, you might be surprised to discover that there is no configure script. Rather, you just run make to compile Redis. Once it's done, you can install the programs (especially redis-server) manually into an appropriate directory, such as /usr/local/bin. Don't forget to install redis.conf, the Redis configuration file, in an appropriate place, such as /etc. To get things started, say:
/usr/local/bin/redis-server /etc/redis.conf
This tells Redis to start up and read its configuration from /etc/redis.conf. The configuration file is easy to read and modify, and you should take a look at it when you have a chance. If you're interested in just starting to work with Redis and don't care about fiddling with the controls, you can do that. The default configuration works just fine for most basic purposes.
The configuration setting that probably is of greatest interest is “daemonize”, which indicates whether Redis should put itself into the background. I kept Redis in the foreground (and with debug-level logging active) when I first started to use it, but when I finally put it into production, I turned on daemonize, so I wouldn't receive a large number of log and update messages while the system was in use.
The other configuration setting indicates how often Redis should save its state to disk. The default configuration parameters that came with my installation look like this:
save 900 1 save 300 10 save 60 10000
This means Redis should save its state every 900 seconds if there has been one change, every 300 seconds if there have been ten changes, and every 60 seconds if there have been 10,000 changes. Redis saves to disk asynchronously, so there's no danger of it slowing down substantially when it performs the save operation.
You can change these settings according to your particular application's needs, striking an appropriate balance between how much data you're willing to lose if the server goes down and the need for high performance. A separate program, redis-benchmark, comes with Redis, and it allows you to get a sense of how many reads and writes you can expect to execute per second on your specific hardware, with the configuration options you have put in place.
By default, Redis listens on port 6379. You can connect to it locally via telnet or by using the redis-cli program that comes along with it, which lets you interact with the Redis server.
Now that you have a Redis server, how do you work with it? One simple way is to use the command-line interface, which comes as a program called redis-cli. If you prefer, you can use a programming language instead, which hides the protocol behind a set of objects and methods, but most of the libraries I have seen use the same method names as the underlying Redis protocol.
The two most basic commands in Redis are GET and SET, which retrieve and set values. SET takes two parameters, a key and a value, while GET takes a single parameter:
redis> GET name (nil) redis> SET name reuven OK redis> GET name reuven redis> GET Name (nil)
From this example, you can see several things. First, Redis will return a nil value if you retrieve a key that has not been set. Second, keys are case-sensitive, so “name” is different from “Name”. This might be important if you use names or e-mail addresses as the keys in your Redis database, so be careful! Finally, you can see that Redis stores everything as a string, at least when you're storing things in this way, so you don't need to put quotes around your values, unless they contain quotes.
The nature of the protocol means that your keys may not contain space characters. I read somewhere that this restriction may be lifted at some point. Nevertheless, for compatibility with older versions of Redis, you might want to remain conservative in your key-naming conventions. Other than that, you are free to use any character you want in your keys and values.
If this were all Redis could do, you might think of it as a super-memcached that saves its state to disk on a regular basis. After all, memcached also is a key-value store that keeps data in RAM and is extremely fast.
However, Redis offers a number of features on the server that go beyond what memcached offers. For example, there is the setnx command, which sets a new value for a particular key, but only if the value does not yet exist. In other words, this is a test-and-set feature, allowing you to be confident you are not overwriting existing, and important, data. For example:
redis> setnx name Kermit (integer) 0 redis> get name reuven
You also can ask Redis to increment and decrement counters for you. For example:
redis> set counter 10 OK redis> incr counter (integer) 11 redis> incr counter (integer) 12 redis> decr counter (integer) 11 redis> decr counter (integer) 10
Redis also provides a rich set to begin with; it allows you to store and manipulate lists. Lists are stored and retrieved using a separate set of commands from GET and SET, which confused me when I first began to use it, but it has become somewhat more natural over time. You can create, add members to and remove members from a list with a few simple commands:
redis> lpush atflist first OK redis> lpush atflist next OK redis> rpush atflist last OK redis> lrange atflist 0 -1 1. next 2. first 3. last redis> lindex atflist 2 last
The lpush and rpush commands add an element to a list (or create the list, if it doesn't yet exist) on the left or right side, respectively. They are similar to the lpop and rpop commands, which remove an element from the stated side of the list. The lrange command allows you to list all the elements of the list from a particular index and until another index. If you give -1 as the ending index, you get the entire list returned to you. Finally, you can retrieve the element at a particular index with lindex.
Although it might not be obvious at this point, Redis is strictly typed. This means trying to retrieve a list as a scalar value, or vice versa, will result in an error:
redis> get atflist (error) ERR Operation against a key holding the wrong kind of value redis> lpush name lerner (error) ERR Operation against a key holding the wrong kind of value
Thus, it's important, when working with Redis, to remember what the type is of each key-value pair.
Related to lists, but with a distinct purpose, are sets. You add items to a set with sadd, get a list of members with smembers and find the length (“cardinality”) of the set with scard:
redis> sadd children atara (integer) 1 redis> smembers children 1. atara redis> sadd children shikma (integer) 1 redis> sadd children amotz (integer) 1 redis> smembers children 1. amotz 2. shikma 3. atara redis> sadd children amotz (integer) 0 redis> scard children (integer) 3
As you can see from the above example, adding an element to a set normally results in a response of 1, indicating that the element was added. However, each element of a set must be unique within the set; no duplication is allowed. If you try to re-add an element that already exists in the set, Redis responses with 0, indicating that the element did not need to be added. As with all other parts of Redis, sets are case-sensitive, so if you try to add the same name, but with a different capitalization, the operation will succeed.
Redis provides facilities for working with sets, such as union and intersection. One possible use for this would be in social tags on a Web site. Each URL could be the name of a set, and the set could contain all the social tags applied to that URL. You then could find which tags have been applied to two different URLs without having to retrieve and compute this on your own, at the application level.
Redis also provides sorted sets, which are identical to the sets you have seen until now, but they keep the items in a specific order (or “rank”) that can be modified.
The most recent versions of Redis now support hash tables. (By the time you read this, Redis 2.0 likely will have been released, with complete support for such functionality.) This might seem a bit strange, given that you can think of Redis as a large hash table, but it means you can store multiple hash tables within Redis. The hash-table functions all begin with an h and provide the same sorts of setting, getting and testing functionality that you have seen for the main Redis storage mechanism.
Finally, the latest version also provides “multi-exec” functionality, allowing you to execute multiple commands within a single atomic operation. This is not quite the same as transactions as you know them from relational databases, but it goes a long way toward such functionality, making Redis attractive not only for basic key-value operations, but also for more complex ones.
I looked at Redis after having read numerous rave reviews, and I was expecting to find serious problems with it. To date, I haven't found any. Indeed, I find myself among its excited proponents. That said, I have grown to enjoy working with Redis because I'm using it in places where it is appropriate. I can handle the loss of data stored since the most recent checkpoint. The data I am storing fits into Redis' data structures quite easily, and the data I am storing fits within my server's available RAM. In addition, there is an excellent Ruby library for working with Redis, which allows me to integrate it into my work seamlessly and easily.
That said, Redis isn't a good match for everyone. If you are storing multilevel hash tables, or if you cannot afford to lose even a moment's data when the server goes down, or if you want to have the data replicated across master servers (as opposed to master-slave, which Redis handles easily), you might want to look at a different solution, such as Cassandra. But I have been impressed and delighted with Redis in my work so far, and from what I can tell, I'm not the only one who feels this way.
If you need a high-speed storage or caching system that provides everything memcached does and then some, you probably should take a look at Redis. It is easy to install, high performance, and it has client libraries in every major programming language. Redis has been in production use with numerous applications, including many Web sites, for more than a year, and its users continue to rave about its functionality and performance. Even if you don't need a key-value store right now, it might be worth installing and playing with Redis. I wouldn't be surprised if after a few minutes of experimentation, you will think of some uses for it you hadn't considered previously.
Resources
The home page for Redis, as well as the Web site from which you can download the latest source code, is code.google.com/p/redis. This page contains a large number of links to tutorials and libraries for Redis users, most of which are worth at least a quick look.
A good introduction to Redis by Kirk Hanes of Engine Yard (a Ruby hosting company) and how you can use it from within your Ruby programs is at www.engineyard.com/blog/2009/key-value-stores-for-ruby-part-4-to-redis-or-not-to-redis.
Finally, a helpful cheat sheet for the Redis protocol, including the latest additions, such as hash tables and multi-exec, is available from Mason Jones at his GitHub page: github.com/masonoise/redis-cheatsheet. (Appropriate, given that GitHub is a heavy user of Redis.)
Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD candidate in learning sciences at Northwestern University, researching the design and analysis of collaborative on-line communities. Reuven lives with his wife and three children in Modi'in, Israel.