You are viewing bramcohen

Sun, Apr. 4th, 2010, 09:06 pm
Rebasing

Rebasing is a controversial feature, with some people declaring it the greatest thing since sliced bread and others declaring it the spawn of the devil (okay I exaggerate, but you get the idea). I will now set the record straight by explaining The Truth.

Rebase is a good feature.

But it's hard to implement properly on a version control system using the standard Hg/Git architecture.

There are a number of problems with the standard architecture, but the one important for rebasing is that it treats data like a roach motel - changes can get in, but they can't get out. Rebasing is fundamentally about removing some changes in a controlled way, and the closest you can get is to rewrite some history which never made it off the local machine and pretend it never happened.

I'll now describe an architecture which supports rebasing very well. Whether similar techniques can be used to add rebasing to the standard architecture will have to be left for another day.

First, we need a concept of a branch. A branch is a snapshot of the codebase which changes over time. For rebasing to work well, there needs to be a very strong notion of branches, and the simplest way to get that is to have a single centralized repository with a list of branches whose values change over time and whose old values are remembered. To have relationships between branches, we dispense completely with the complex notion of history which hash-based architectures have, and introduce a concept of 'parent'. Each version of each branch has a parent specified, although it could be null. A version of a branch represents two things simultaneously:

(1) A snapshot of the codebase

(2) a patch off of its parent

Individual branches are modified with CVS/svn style update/commit.

Rebasing is now straightforward. You take a diff from the parent to the current snapshot, and apply that to the new parent. This can be done for any potential parent, including the latest version of the branch which the current parent is from, the latest version of another branch, and even older versions of other branches or the current parent. Any reparenting will propagate to all child branches, and changing the parent back will re-propagate nicely as well. This approach allows you to nicely specify what goes where, without having that roach motel feel of code committing.

There would be a number of practical benefits to this architecture beyond allowing nice rebasing, although writing a new version control system from scratch today seems like an extreme approach. In the future I'll comment about the possibility of using parents in Hg/Git, after I've given some thought to it.

Mon, Apr. 5th, 2010 03:32 pm (UTC)
agthorr

I used darcs for a long time (before switching to git around a year ago), and it works vaguely like you describe. In darcs, the repository is simply a collection of patches, and darcs is smart enough to calculate which patches depend on which other patches. It was very easy to get rid of certain patches, or create a new branch that included patches X and Z but not Y. Cherry-picking on steroids, basically. :-)

I switched to git because darcs became unbearably slow as my project (Poker Sleuth) became large.

Mon, Apr. 5th, 2010 06:18 pm (UTC)
elsmi

FYI, I can't quite work out what you're proposing exactly (is this the classic CVS model? or are branches basically a mutable list parentage so you can "erase" old parents? or is the idea that you can swap out one branch for another with the same name?). And I probably have more than the median understanding of this stuff so this might mean you should clarify some :-). (Maybe a worked example?)

Mon, Apr. 5th, 2010 09:05 pm (UTC)
bramcohen

Each branch acts almost like a classic CVS model. You can even implement it by having a '.parent' file with a single line in it. The .parent file can be changed just like everything else, but there are very specific guidelines for modifying and merging when it's different, having to do with actually looking at the snapshot which it points to.

Did that clarify things any? It really is a very different way of thinking about the world.