CVS homedir

by Joey Hess

I keep my life in a CVS repository. For the past two years, every file I've created and worked on, every e-mail I've sent or received and every config file I've tweaked have all been checked into my CVS archive. When I tell people about this, they invariably respond, “You're crazy!”

After all, CVS is meant for managing discrete bodies of code, such as free software programs that are worked on and available to a lot of people or in-house projects that are collaboratively developed by several employees. CVS has a reputation of being a pain to deal with, and it has a lot of crufty bits that regularly drive users up the wall, like its mistreatment of directories. Why inflict the pain of CVS on yourself if you don't have to? Why do it on such a scale that it affects nearly everything you do with your computer?

I get three major benefits from keeping my whole home directory in CVS: home directory replication, history and distributed backups. The first of these is what originally drove me to CVS for my whole home directory. At the time, I had a home desktop machine, two laptops and a desktop machine at work. Rounding this out were perhaps 20 remote accounts on various systems around the world and many systems around the workplace that I might randomly find myself logging in to. I used all of these accounts for working on the same projects and already was using CVS for those projects.

I'm a conservative guy when it comes to my computing environment (I've used the same wallpaper image for the past five years), and at the same time I'm always making a lot of little tweaks to improve things. Whenever I go to work and something wasn't just like I had tweaked it the night before, I'd feel a jarring disconnect, and annoyingly copy over whatever the change was. When I sat down at some other system at work, to burn a CD perhaps, and found a bare Bash shell instead of the heavily customized environment I've built up over the past ten years, it was even worse. The plethora of environments, each imperfectly customized to my needs by varying degrees, was really getting on my nerves. So one day I cracked and sat down and began to feed my whole home directory into CVS.

It worked astonishingly well. After a few weeks of tweaking and importing I had everything working and began developing some new habits. Every morning (er, afternoon) when I came into work, I'd cvs up while I read the morning mail. In the evening, I'd cvs commit and then update my laptop for the trip home. When I got home, I'd sync up again, dive right back into whatever I'd been doing at work and keep on rolling until late at night—when I committed, went to bed and began the cycle all over again. As for the systems I used less frequently, like the CD burner machine, I'd just update when I got annoyed at them for being a trifle out of date.

It only took a few more weeks before the advantage of having a history of everything I'd done began to show up. It wasn't a real surprise because having a history of past versions of a project is one of the reasons to use CVS in the first place, but it's very cool to have it suddenly apply to every file you own. When I broke my .zshrc or .procmailrc, I could roll back to the previous day's or look back and see when I made the change and why. It's very handy to be able to run cvs diff on your kernel config file and see how make xconfig changed it. It's great to be able to recover files you deleted or delete files because they're not relevant and still know you've not really lost them. For those amateur historians among us, it's very cool to be able to check out one's system as it looked one full year ago and poke around and discover how everything has evolved over time.

The final major benefit took some time to become clear. Linus Torvalds once said, “Only wimps use tape backup: real men just upload their important stuff on FTP and let the rest of the world mirror it.” I'm not a real enough man to upload my confidential documents to ftp.kernel.org though, so I've been wimping along with backups to tape and CD and so on. But then it hit me: take, for example, one crucial file, like my .zshrc or sent-mail archive: I had a copy of that file on my work machine, and on my home machine, and on my laptop and several other copies on other accounts. There was another copy encoded in my CVS repository too.

I'm told that the best backups are done without effort—so you actually do them—and are widely scattered among many machines and a lot of area so that a local disaster doesn't knock them out. They are tested on a regular basis to make sure the backup works. I was doing all of these things as a mere side effect of keeping it all in CVS. Then I sobered up and remembered that a dead CVS repository would be a really, really bad thing and kept those wimpy backups to CD going. But the automatic distributed backups are what keep me sleeping quietly at night. Later, when I left that job, the last thing I did on my work desktop machine was: cvs commit ; sudo rm -rf /. And I didn't worry a bit; my life was still there, secure in CVS.

A full checkout of my home directory with all the trimmings often runs about 4GB in size. A lot of that will be temporary trees in tmp/ and rsynced Ogg Vorbis files (so far, I have not found the disk space to check all of them into CVS). My CVS repository currently uses less than 1GB of space, though it is steadily growing in size. I keep some 13,000 files in CVS, and so a full CVS update of my home directory is a sight to see and takes a while.

These days I'm often stuck behind a dial-up connection, and I mostly just use one laptop, so I might go days between CVS updates. Other better-connected systems have automatic CVS updates done via cron each day. I cvs commit whenever I want to make a backup of where I am in a file or when I am at the point of releasing something. I still also do a full commit of my home directory every day or so. I confess that some of my CVS commit messages are less than informative—“foo” has been used far too many times on some classes of files. I even do some automatic CVS commits; for example, my mailbox archives are committed by a daily cron job.

There are other benefits of course. I attend many tradeshows and other events that require me to sit down at some computer out of the box, use it for an hour or a day and never see it again. I can check out the core of my CVS home directory in about five minutes, and after that it is just as comfortable as if I'd SSH'd home and was doing everything there. I even get my whole desktop set up in that five minutes. In a chaotic tradeshow environment, there is nothing more reassuring than having your familiar computer setup at your fingertips as you demo things to the hordes of visitors.

Keeping your home directory in CVS is not all fun though. Anyone who's used CVS in a large project probably has had to resolve conflicts engendered by two people modifying the same file. At least you can curse the other guy who committed the changes first while you deal with this annoying task. Most of you have probably not had to resolve conflicts between the file you modified at home and at work, then cursing at yourself.

Then there are CVS's famous problems: poor handling of directories and binary files. The nearly nonexistent handling of permissions, which is not a big deal in most projects but becomes important when you have a home directory with some public and some private files and directories in it. A slow, bloated protocol, hindered even more by the necessity of piping it all over SSH; the pain of trying to move a file that is already in CVS, or much worse, a whole directory tree, again hits you especially hard when you're using CVS for the whole home directory. And those damn CVS directories are always cluttering up everything. I've developed means of coping with all of these to varying degrees, but like many of us, I'm hoping for a better replacement one day (and dreading the transition).

Perhaps it's time that I get down to the details of how I organize my home directory in CVS. I've always managed my home directory with an iron hand, and CVS has just exacerbated this tendency. Let's look at the top level:

joey@silk:~>ls
CVS/  GNUstep/  bin/  debian/  doc/  html/  lib/
mail/  src/  tmp/

Yes, that's it. Well, except for the 100-plus dot files. Most people use their home directory as a scratch space for files they're working on, but instead I have a dedicated scratch directory, the tmp directory, which I clean out irregularly. In general, when I start a new file or project, I will be checking it into CVS soon, so I begin working on it in the appropriate directory. This document, for example, is starting its life in the html directory and will be checked into CVS soon to live there forever. Of course, sometimes I goof up and then I have to resort to the usual tricks to move files in CVS. And so the first rule of CVS home directories is it pays to think before starting and get the right filename and location the first time. Don't be too impatient to check in the file.

CVS is a great way to ensure that you have a nice, clean, well-managed home directory. Every time I cvs update it will helpfully complain to me about any files it doesn't know about. Of course, I make heavy use of .cvsignore files in some directories (like tmp/).

If I go to another machine, the home directory looks pretty much the same, though various things might be missing:

joeyh@auric:~>ls
CVS/  GNUstep/  bin/  tmp/

I use this machine for occasional specific shell purposes. I don't administer the system, so I don't want to put private files there. The result is a much truncated version of my home directory. It's perfectly usable for everything I normally do on that machine, and if I want to, say, work on this document there at some point, I can just type cvs co html and a password and be on my way.

The way I make this partial-checkouts system work is by using CVS modules and aliases. I have modules defined for each of the top-level directories and for the home directory (dot files) itself. For example, the entry in my CVSROOT/modules file for the stripped-down version of my home directory looks like this:

joeyh -u cvsfix -o cvsfix joey-cvs/home &bin

For more complete home directories, I use this instead:

joey -u cvsfix -o cvsfix joey-cvs/home &src &doc
&debian &html &lib &.hide &bin &mail
Notice the .hide module. It results in a ~/.hide directory when I check it out. This directory is where I put the occasional private file that I don't want to appear in home directories—like the one on auric—that are on systems not administered by me. The files in .hide get hard-linked to their proper locations if .hide is checked out, so I can put confidential dot files in there and only check those dot files out on trusted systems. I also have, for example, my Mozilla cookies file in .hide.

It's important to distinguish between such files that I need to put in .hide and the entire set of private directories, like my mail directory. Yes, I keep my mail in CVS (except for just-arrived spooled mail, which I keep synced up with a neat little program called isync that is smarter about mail than CVS is). But it's all in its own mail/ directory, so I can omit checking that directory out to systems that I don't trust with my mail or that I don't want to burden with hundreds of megabytes of mail archives.

While I'm discussing privacy issues, I should mention that I make some bits of my home directory completely open to the public. This includes a lot of free software in debian/ and src/, and some handy little programs in bin/. This is accomplished by permissions. I have to make sure that most directories in the repository (or at least the top-level directories like mail/) are mode 700, so only I can access them. Other top-level directories, like bin/, are opened up to mode 755. This allows anonymous CVS access and browsing at cvs.kitenet.net/joey-cvs/bin/.

This leads to the second rule of CVS home directories: don't import $HOME in one big chunk; break it up into multiple modules. The structure of your repository need not mirror the structure of your actual home directory. Modules can be checked out in different locations to move things around and control access on a per-module level. There's a layer of indirection there, and such layers always make things more flexible and more complex.

Some of the projects I work on have their own CVS repositories that are unconnected to my big home directory repository. That's fine too; I simply check them out into logical places in my home directory tree as needed. CVS can even be tweaked to recurse into those directories when updating or committing.

Another thing to notice in those lines from my modules file is the use of -u cvsfix to make the cvsfix program run after CVS updates. That program does a lot of little things, including ensuring that permissions are correct, setting up the hard links to files in .hide and so on.

One last thing to mention is the issue of heterogeneous environments and CVS. Most of my accounts are on systems running varying versions of Debian Linux on a host of different architectures, but there are accounts on other distributions, on Solaris and so forth. Trying to make the same dot files work on everything can be interesting. My .zshrc file, for example, goes to great pains to detect things like GNU ls, deals with varying zsh versions, sets up aliases to the best available editor and other commands and so on. Other programs, like .xinitrc, check the host they're running on and behave slightly (or completely) differently. I've even at one point had a .procmailrc that filtered mail differently depending on hostname, though the trick to doing that is lost somewhere in one of the innumerable versions stored in my repository. I've even resorted in a few places to files with names of the form filename.hostname—cvsfix finds one matching the current host and links it to the filename. Branches are also a possibility, of course, but despite my heavy use of CVS, I still find some corners of it a black art.

Well I guess that's it. I'd be happy to hear from anyone else who keeps their home directory in CVS, especially if you have some tricks to share. In the future I'd like to try checking /etc into CVS too, and if you've successfully done this, I'd love to talk with you. Now I'm off to commit this file.

Joey Hess (joey@kitenet.net) is a longtime Debian developer who lives on a farm in Virginia. He enjoys finding new and unlikely places from which to commit code wirelessly to CVS.

Load Disqus comments