File Synchronization with Unison
Unison is a file-synchronization tool that runs on Linux, UNIX and Microsoft Windows. Those of you who've used IBM Lotus Notes or Intellisync Mobile Suite probably have an idea of what synchronization is good for, as compared to one-way mirroring options such as rsync. You might have mirrored a company document directory to your laptop, for example, and then modified a document or two. Other people might have modified other documents in the same directory by the time you get back. With rsync, you'd need to reconcile the differences between the two directories manually or risk overwriting someone's changes. Unison can sort out what has changed where, propagate the changed files and even merge different changes to the same file if you tell it how.
Think of Unison as two-way rsync with a bit of revision control mixed in. The most common use is keeping your local and remote home directory, or some data directory you often use in different contexts, in sync. It uses the rsync algorithm to keep network traffic down and should be tunneled through SSH over untrusted networks. No extra work is needed—simply specify ssh:// when adding a directory location. Quite a bit of extra disk space often is needed for Unison, though, because the synchronizer needs to keep track of what the files looked like on the last run.
Unison's home page is maintained at the University of Pennsylvania; the project leader, Benjamin C. Pierce, is a professor in the Department of Computer and Information Science. See the on-line Resources for the URL.
Unison isn't as widely deployed as rsync, so you might not be able to find a precompiled package for your distribution. But the binaries downloadable from the Unison home page should work for most people.
If you'd like to compile from source, you can. A few extra hoops must be jumped through, however, because Unison is programmed in OCaml, not the most common language. See Resources if there is no handy package for your distribution.
Compiling and installing Unison is simple; type make UISTYLE=xxx. The GTK user interface needs additional OCaml bindings for GTK, so I use the text interface in this article. Typing make UISTYLE=text or make UISTYLE=gtk should give you a Unison executable. Simply copy the executable to somewhere in the path on both machines you want to synchronize.
In this article, I'm using the current stable version of Unison, 2.9.1, unless otherwise noted. You need to use the latest betas if you're going to synchronize files larger than 2GB.
The developer versions tend to work well. They are what the developers run themselves on their own precious data. Sign up for the unison-hackers mailing list if you feel a bit adventurous. Jerome Vouillon, Benjamin C. Pierce and Trevor Jim tend to hang out there discussing improvements. Commit logs also float by, so you can track what is going on.
Unison keeps its config and working files in a .unison directory in your home directory or wherever you want to put it. Set the UNISON environment variable to specify an alternate location.
The default configuration is stored in .unison/default.prf. Listing 1 shows a plain config file suitable for testing. Synchronizing two directories is now as simple as:
$ unison /nfsmount/dir1 /home/me/dir1
Listing 1. .unison/default.prf
# Unison preferences file merge = diff3 -m CURRENT1 OLD CURRENT2 > NEW backup = Name * maxbackups = 10 log = true logfile = /home/knan/.unison/unison.log rshargs = -C
Unison then asks the user about any differences between the directories and offers reasonable defaults. It does take a bit of time to get used to Unison's way of thinking, however. And, Unison is no substitute for backups. Unison happily propagates back the deletion of all the files in one replica, for example, which can be a rude awakening for programmers used to CVS. For example:
rm dir1/* ; unison ssh://server/dir1 dir1
Deleting a file is an action that is replicated on the other side upon synchronization. So, this example command removes all files in dir1 on both sides.
Once you feel comfortable, consider adding auto = true to the Unison profile. This skips questions about any non-conflicting changes but gives you a chance to back out at the end.
The Unison manual is recommended reading. It is clear and well written and explains what happens at most corner cases.
Once users become familiar with Unison, a common thought is to use it for keeping one's home directory in sync between machines, say, your laptop and desktop. This can be realized pretty easily. Listing 2 has a simple profile that does the job, but you probably want to extend it. Listing 2, for example, ignores MP3 files and Unison's own files and demonstrates the use of include for having common settings applied to all profiles.
Listing 2. .unison/home.prf
# Unison preferences file root = /home/erik root = ssh://remotehost/home/erik # exactly two or none "root" lines ignore = Name *.mp3 # ignore all .mp3 files anywhere ignore = Path .unison # ignore all files with .unison somewhere in their full path include default # imports settings from default.prf
Test our new profile like this:
$ unison home -testserver
And invoke it like this:
$ unison home -batch $ unison home
The -batch run takes care of the easy cases without asking, backing up and logging as needed, and the second run asks you about any tricky business—like merging.
The root = lines can be omitted if you want to specify the files to be synchronized on the command line instead. The lines are equivalent to this invocation:
$ unison home /home/erik ssh://remotehost/home/erik
In order to do a three-way merge, backups must be enabled. By default, with backups disabled, Unison keeps only a checksum and metadata, such as permissions, so it has no unmodified file to reference.
In version 2.9.1 of Unison, if you choose merge for a conflict and the merge is successful without manual intervention, the changes are propagated immediately, which doesn't give you a chance to back out. So, if you have the space, I suggest leaving maxbackups at 5 or so, instead of the default 2, to leave yourself the chance of recovering from automatic mismerges. Contents of the backup directory after a merge look like this:
$ ls -1 .unison/backup/ shared.txt merged version ("NEW") shared.txt.1.unibck changed remotely ("CURRENT2") shared.txt.2.unibck changed locally ("CURRENT1") shared.txt.3.unibck old version ("OLD")
As of the newest beta, 2.10.3 at the time of this writing, Unison can invoke different merge programs for different files. You might want to use 3DM to merge XML files, for example, or a database merge tool for your Berkeley databases. This functionality still is new and subject to change. It has been noted by the project leader that the merge functionality was in need of a rewrite and didn't really work too well in 2.9.1 and 2.9.20. Thus, if you intend to do much merging, you will be better off tracking the bleeding edge.
Resources for this article: /article/8059.
Erik Inge Bolsø is a UNIX consultant and épée fencer who lives in Molde, Norway, and has been running Linux since 1996. Another of his hobbies can be found by doing a Google search for “balrog genealogy”, and he can be reached at ljcomment@tvilsom.org.