April 4th, 2010


Rebasing is a controversial feature, with some people declaring it the greatest thing since sliced bread and others declaring it the spawn of the devil (okay I exaggerate, but you get the idea). I will now set the record straight by explaining The Truth.

Rebase is a good feature.

But it's hard to implement properly on a version control system using the standard Hg/Git architecture.

There are a number of problems with the standard architecture, but the one important for rebasing is that it treats data like a roach motel - changes can get in, but they can't get out. Rebasing is fundamentally about removing some changes in a controlled way, and the closest you can get is to rewrite some history which never made it off the local machine and pretend it never happened.

I'll now describe an architecture which supports rebasing very well. Whether similar techniques can be used to add rebasing to the standard architecture will have to be left for another day.

First, we need a concept of a branch. A branch is a snapshot of the codebase which changes over time. For rebasing to work well, there needs to be a very strong notion of branches, and the simplest way to get that is to have a single centralized repository with a list of branches whose values change over time and whose old values are remembered. To have relationships between branches, we dispense completely with the complex notion of history which hash-based architectures have, and introduce a concept of 'parent'. Each version of each branch has a parent specified, although it could be null. A version of a branch represents two things simultaneously:

(1) A snapshot of the codebase

(2) a patch off of its parent

Individual branches are modified with CVS/svn style update/commit.

Rebasing is now straightforward. You take a diff from the parent to the current snapshot, and apply that to the new parent. This can be done for any potential parent, including the latest version of the branch which the current parent is from, the latest version of another branch, and even older versions of other branches or the current parent. Any reparenting will propagate to all child branches, and changing the parent back will re-propagate nicely as well. This approach allows you to nicely specify what goes where, without having that roach motel feel of code committing.

There would be a number of practical benefits to this architecture beyond allowing nice rebasing, although writing a new version control system from scratch today seems like an extreme approach. In the future I'll comment about the possibility of using parents in Hg/Git, after I've given some thought to it.