A Git Origin Story
A look at Linux kernel developers' various revision control solutions through the years, Linus Torvalds' decision to use BitKeeper and the controversy that followed, and how Git came to be created.
Originally, Linus Torvalds used no revision control at all. Kernel
contributors would post their patches to the Usenet group, and later to the
mailing list, and Linus would apply them to his own source tree. Eventually,
Linus would put out a new release of the whole tree, with no division between any of
the patches. The only way to examine the history of his process was as a
giant diff
between two full releases.
This was not because there were no open-source revision control systems available. CVS had been around since the 1980s, and it was still the most popular system around. At its core, it would allow contributors to submit patches to a central repository and examine the history of patches going into that repository.
There were many complaints about CVS though. One was that it tracked changes on a per-file basis and didn't recognize a larger patch as a single revision, which made it hard to interpret the past contributions of other developers. There also were some hard-to-fix bugs, like race conditions when two conflicting patches were submitted at the same time.
Linus didn't like CVS, partly for the same reasons voiced by others and partly for his own reasons that would become clear only later. He also didn't like Subversion, an open-source project that emerged around the start of the 2000s and had the specific goal of addressing the bugs and misfeatures in CVS.
Many Linux kernel developers were unhappy with the lack of proper revision control, so there always was a certain amount of community pressure for Linus to choose something from one of the available options. Then, in 2002, he did. To everyone's shock and dismay, Linus chose BitKeeper, a closed-source commercial system developed by the BitMover company, run by Larry McVoy.
The Linux kernel was the most important open-source project in history, and Linus himself was the person who first discovered the techniques of open-source development that had eluded the GNU project, and that would be imitated by open-source projects for decades to come, right up to the present day. What was Linus thinking? How could he betray his community and the Open Source world like this? Those were the thoughts of many when Linus first started using BitKeeper for kernel development.
Additionally, BitMover put significant restrictions on the Linux community in exchange for the non-pay license. First, Linux developers would not be allowed to work on competing revision control projects while using BitKeeper. And second, BitMover would control certain metadata related to the kernel project, in order to notice any abuse of the license. Without access to that metadata, kernel developers would be unable to compare past kernel versions—an important standard feature of other revision control systems.
The controversy did not die down, although Linus continued to rely on BitKeeper for years. His basic argument was that he was not a free software zealot. He would use open-source tools if they were better than their commercial counterparts. But if a commercial tool was better, he wouldn't turn his nose up.
Many kernel developers, however, were indeed free software zealots. The outrage and tension between Linus and the rest of the developers was intense, though it was not sufficient to fracture the community and cause an actual fork of the Linux kernel project. Certainly people like Alan Cox, Al Viro, David Miller, Andrea Arcangeli, Andrew Morton and a respectable number of others had the technical skills to lead a competing project, and perhaps some even had enough street cred to pull a significant number of kernel developers with them. But none did. The tension and hostility persisted.
What Was So Great about BitKeeper?BitKeeper's main claim to fame was that it offered a distributed system, whereby whole repositories could be forked and merged easily. This was the key. With it, sub-groups of kernel developers could collaborate independently with the benefit of revision control and then feed their changes up to Linus when they were ready. This way, a large portion of the work that previously had been piled entirely onto Linus' shoulders could be distributed among his trusted lieutenants, or really among any group that chose to work together in that way. Architectures, drivers and subsystems all could be developed somewhat independently, and then each could be merged with the main kernel tree in one big gulp.
This all may be starting to sound very familiar, but in 2002, it was a new idea. Existing projects like CVS and Subversion could do forks and merges only as major, time-consuming operations that made you yearn for death. With BitKeeper, it became a trivial operation.
Linus' willingness to use a proprietary piece of software at the very heart of the kernel development toolchain inspired a lot of people to try even harder to create an alternative. The CVS and Subversion projects were too far behind and had made too many fundamental design errors. The same was true of other existing projects. But now that everyone knew—or thought they knew—what Linus really wanted, they could start coding in earnest. The result was a number of revision control systems offering distributed development.
Among these were arch, darcs and monotone. Using BitKeeper as their competing model, they each represented an effort to present Linus with an alternative to BitKeeper.
Many tried, but none succeeded. This was partly because Linus would not fully articulate what he needed from any of those projects, just as he had not fully articulated what had been missing from CVS and Subversion. And there was also the sense that Linus wasn't bothered by using a closed-source tool—that for any alternative to be acceptable to him, it would have to be a significant technical improvement over BitKeeper. Thus, even though no open-source tool had been good enough before BitKeeper, the arrival of BitKeeper raised the bar yet farther on any open-source tool that might come along.
After three years of intense effort, none of the open-source alternatives were any closer than CVS or Subversion to meeting Linus' needs. And the situation might have continued far longer, if not for Andrew Tridgell, the creator of Samba, co-creator of rsync and all-around good-hearted guy. In 2005, Andrew tried to reverse-engineer the BitKeeper networking protocols in order to create a free software alternative. If it hadn't been him, it would've been someone else—it was only a matter of time. Larry McVoy had warned the Linux developers that he would pull the plug if anyone tried this, and that's exactly what he did. Suddenly, BitKeeper no longer could be used for kernel development. The entire development toolchain, and all the developer culture that had sprung up around distributed version control, was thrown into uncertainty.
What did this mean? Would Linus return to his old style of development, vetting all patches himself? If not, would he choose one of the open-source alternatives to BitKeeper? And if he did, which one would it be?
At this point, something remarkable occurred. For the first time since its inception in 1991, Linus stopped work entirely on the Linux kernel. Since none of the existing tools could do what he needed, he decided to write his own.
One of Linus' primary concerns, in fact, was speed. This was something he had never fully articulated before, or at least not in a way that existing projects could grasp. With thousands of kernel developers across the world submitting patches full-tilt, he needed something that could operate at speeds never before imagined. He couldn't afford to wait longer than a few seconds for even the largest and most complex operation to finish. Neither arch, nor darcs, nor monotone, nor any other project, had ever come close to meeting that requirement.
Linus coded in seclusion for a brief time, then shared his new conception with the world. Within days of beginning the project in June of 2005, Linus' git revision control system had become fully self-hosting. Within weeks, it was ready to host Linux kernel development. Within a couple months, it reached full functionality. At this point, Linus turned the project's maintainership over to its most enthusiastic contributor, Junio C. Hamano, and returned full-time to Linux development once again.
A stunned community of free software developers struggled to understand this bizarre creation. It did not resemble any other attempts at revision control software. In fact, it seemed more like a bunch of low-level filesystem operations, than a revision control system. And instead of storing patches as other systems did, it stored whole versions of each changed file. How could this possibly be good? On the other hand, it could handle forks and merges with lightning speed and could generate patches rapidly on demand.
Gradually, Junio drew together a set of higher-level commands that more closely resembled those of tools like CVS and Subversion. If the original set of git commands were the "plumbing", this new set of commands were the "porcelain". And, so they came to be called.
As much as there had been controversy and resentment over BitKeeper, there was enthusiasm and participation in the further development of git. Ports, extensions and websites popped up all over the place. Within a few years, pretty much everyone used git. Like Linux, it had taken over the world.