Documenting Proper Git Usage
Jonathan Corbet wrote a document for inclusion in the kernel tree, describing best practices for merging and rebasing git-based kernel repositories. As he put it, it represented workflows that were actually in current use, and it was a living document that hopefully would be added to and corrected over time.
The inspiration for the document came from noticing how frequently Linus Torvalds was unhappy with how other people—typically subsystem maintainers—handled their git trees.
It's interesting to note that before Linus wrote the git tool, branching and merging was virtually unheard of in the Open Source world. In CVS, it was a nightmare horror of leechcraft and broken magic. Other tools were not much better. One of the primary motivations behind git—aside from blazing speed—was, in fact, to make branching and merging trivial operations—and so they have become.
One of the offshoots of branching and merging, Jonathan wrote, was rebasing—altering the patch history of a local repository. The benefits of rebasing are fantastic. They can make a repository history cleaner and clearer, which in turn can make it easier to track down the patches that introduced a given bug. So rebasing has a direct value to the development process.
On the other hand, used poorly, rebasing can make a big mess. For example, suppose you rebase a repository that has already been merged with another, and then merge them again—insane soul death.
So Jonathan explained some good rules of thumb. Never rebase a repository that's already been shared. Never rebase patches that come from someone else's repository. And in general, simply never rebase—unless there's a genuine reason.
Since rebasing changes the history of patches, it relies on a new "base" version, from which the later patches diverge. Jonathan recommended choosing a base version that was generally thought to be more stable rather than less—a new version or a release candidate, for example, rather than just an arbitrary patch during regular development.
Jonathan also recommended, for any rebase, treating all the rebased patches as new code, and testing them thoroughly, even if they had been tested already prior to the rebase.
"If", he said, "rebasing is limited to private trees, commits are based on a well-known starting point, and they are well tested, the potential for trouble is low."
Moving on to merging, Jonathan pointed out that nearly 9% of all kernel commits were merges. There were more than 1,000 merge requests in the 5.1 development cycle alone.
He pointed out in the doc that, although "many projects require that branches in pull requests be based on the current trunk so that no merge commits appear in the history", the kernel had no such requirement. Merges were considered perfectly orderly ways of doing business, and developers should not try to rebase their branches to avoid merges.
An interesting thing about kernel development is that the hierarchy of maintainership tends to favor a hierarchy of git repository maintainers. It's not uncommon for one or a few people to manage a branched kernel repository, and to have developers managing branches of that tree, with other developers in turn managing branches of those.
For mid-level maintainers, Jonathan pointed out, there are two relevant situations: merging a tree from lower in the hierarchy into your own and merging your own tree higher up toward Linus' top-level tree.
Jonathan recommended that for mid-level maintainers accepting merges from lower trees, maintainers not seek to hide the merge request and, in fact, should add a commit message or changelog entry, explaining the patches that went into the merge.
Jonathan also pointed out that the "Signed-Off-By" tags were crucial elements of commit messages that helped track responsibility as well as important debugging information. He suggested that all maintainers should continue to use them and to verify them when merging from other trees. Jonathan said, "Failure to do so threatens the security of the development process as a whole."
That advice referred to downstream trees, but Jonathan had some very interesting points to make about merging from upstream trees. This is when you're working on your tree, and you want to make sure you're up to date with the latest-and-greatest tree from Linus or someone close to him. Of course, doing so would make your own life slightly easier, because you'd be up to date, you could test your code against the tip of the tree, and so on. Still, Jonathan counseled against it.
For one thing, you could be bringing other people's bugs into your own tree, destabilizing your test code, and then you'd have the uncertainty of knowing that your code was actually solid and ready to submit further upstream.
Another temptation is to do a merge from the upstream source right before submitting your own merge request to ensure that your request won't encounter any conflicts. However, as Jonathan said, "Linus is adamant that he would much rather see merge conflicts than unnecessary back merges. Seeing the conflicts lets him know where potential problem areas are. He does a lot of merges (382 in the 5.1 development cycle) and has gotten quite good at conflict resolution—often better than the developers involved."
Instead, if you do notice a conflict that will show up when Linus does the merge, you should say something about it in the pull request, so Linus sees that you see the situation.
As a last resort, for particularly nutty cases, Jonathan said, you could create another branch, with your own conflict resolutions, and point Linus to that so he can see how you'd resolve the problems yourself. The pull request, however, should be for the unresolved branch.
Doing a test merge in that way is fine, he said. It helps you know if there will be any conflicts, so you can communicate better to the upstream maintainers.
He offered some more good advice and closed by saying:
The guidelines laid out above are just that: guidelines. There will always be situations that call out for a different solution, and these guidelines should not prevent developers from doing the right thing when the need arises. But one should always think about whether the need has truly arisen and be prepared to explain why something abnormal needs to be done.
And...Linus replied that he liked the whole doc.
David Rientjes from Google reported that he actually had been in the process of writing an internal doc for use by Google engineers, discussing this very topic. He was thrilled that Jonathan had done a better job explaining it than his own effort.
Geert Uytterhoeven also liked the new doc, and he offered some spelling and grammar corrections.
Only Theodore Ts'o had any significant criticism to offer. He felt a clear distinction should be made between reordering patches (which he felt was what most people thought of when they talked about rebasing), versus actually changing or removing commits that have already gone into the tree. Both were technically rebasing, yet both were really quite different operations.
Jonathan replied to this, suggesting that maybe the doc could refer separately to "rebasing" and "history modification". And, Ted agreed this would be better.
End of thread. I love seeing this sort of documentation go into the kernel. It's not the same as general-purpose git advice, because it's specific to kernel development processes and policies that are already in place. At the same time, it's potentially very useful to other large-scale projects that might want to mimic the Linux kernel development process. All open-source projects essentially mimic the kernel development process anyway—Linus is the one who first discovered and popularized the methods of how to run an open-source project—and there tends to be a lot of wisdom in his development policy decisions even now.
Note: if you're mentioned above and want to post a response above the comment section, send a message with your response text to ljeditor@linuxjournal.com.