Log in

No account? Create an account

Thu, May. 22nd, 2008, 02:55 pm
Version Control Recommended Practices

It's been a while since I last posted my thoughts on version control. My thoughts have changed a lot over time, so I'm going to cover everything from the very highest level.

Here are my recommendations:

1. Don't use branches.

I'm serious. Creating lots of branches takes a lot of time and energy, and is usually a complete waste. If you have a good reason to use branches, use the minimum possible. In particular, creating a new branch for every new feature is ludicrous.

2. Don't bother with a pretty history.

The history of a branch is hardly ever looked at. Making it look pretty for the historians is just a waste of time. The beauty of 3-way merge is that you can always clean stuff up later and never worry about the past mess ever again. In particular, don't go to great lengths to make sure that there's a coherent local image of the entire repository exactly as it appeared on your local machine after every new feature. There are very rare projects which maintain a level of reliability and testing which warrant such behavior, and yours isn't one of them. Stop wanking.

3. Use 3-way merge

This is mostly a dig at myself - I've spent a long time thinking about possible semantics of merging, and the upshot is that there's a way of supporting cherry-picking well in the underlying engine, but nobody's ever proposed a good UI for it, and I suspect that's because the feature is inherently confusing and not such a hot idea for process reasons as well.

4. Use Bram's diff algorithm.

This is for tool developers, not for end users. It's fairly self explanatory.

5. Treat stable branches as forks, don't auto-merge from them.

Once a stable branch has been separate for a while, the development branch has inevitably deviated too much for auto-merging to be accurate and useful. Most projects have a convention for marking bug fixes in commit comments on the development branch, then grepping for those comments in the history and manually trying to apply those patches. The tools support for this is sucky, and the world will be a more productive place if someone improves that, but the practice is clearly a good one.

6. Don't spend time achieving a greater than usual level of stability on the main branch before branching off stable

Forcing everyone to wait until stable is branched off to get any work done just wastes their time. It's far better to do the stable branch first, then work on extra stability on the stable branch while people continue to get new work done on the main branch.

7. If you do use branches, have them 'shadow' the main branch.

Branches which don't get merged regularly inevitably become forked forever. For a branch to continue to be useful it's necessary for it to pull from the main branch on a regular basis and have any resulting conflicts be resolved.

Good reasons to have a branch are if there's a feature which might turn out to be a bad idea, there's some code work which would cause too much instability if it were checked into the main branch before being complete, or there's a direct need to have a version of the codebase missing certain features. In all of these cases the branch must be maintained by pulling from the branch it's shadowing frequently in order to not die, and the branch should have its contents committed into main and stop being maintained as soon as is feasible.

Experimental features or ones written by a person of dubious coding skills in particular should only be a single branch. Some projects treat them as a zillion piecemeal little patches, and that's just painful and awful. Far better to give feedback on a branch, and commit the whole branch when it's at an acceptable level of stability. Note that experimental branches can have their own stable and unstable versions as well, with work going on on unstable while code improvement happens in stable. That's a good practice, although it's hardly ever done today.

8. Show all relevant history in annotate/blame view

If a line of code was written by one person, then pulled into the main branch by another person, then the history should say when and by whom both of those events happened. If it was pulled into another branch by another person, that should be included as well. The semantics of a single line's history is that it was written once and then had a series of pull (or push, depending on how you think about it) operations performed on it as it was moved into other branches. None of the tools currently get this right.

9. If you have a good reason to branch off branches, do it right.

There are a few sophisticated techniques for branch management which are fairly safe and coherent from a process standpoint. These should only be used sparingly, but tools support for them is currently lousy and they're cool so I'll explain them now.

First, it's possible to have a branch shadow another branch, like this:
 branch A
branch B

In this case, A is shadowing main, and B is shadowing A.

It's also possible under some circumstances to safely change the way the branch relationships work. For example, the previous case can be changed into this one, and vice versa:
        /   \
 branch A   branch B

For those of you knowledgeable about the technical issues, the times when this can be done are when the LCA of the branch being moved and the one it's shadowing is already an ancestor of the current version of the new branch to shadow. If the reparenting isn't currently allowed, then some committing or updating can be done to make it allowed (although that might not be immediately advisable for process reasons).

Version control system developers, please make your systems have the shadowing concept be built in from the ground up. And allow the safe forms of reparenting. And make sure that the branch relationships and their changes are kept in the history along with everything else.

Valid words used in this post which the spell checker didn't like: wanking, grepping, sucky, codebase, reparenting.

Fri, May. 23rd, 2008 04:33 am (UTC)
wisedonkey: #10

I propose a tenth item: Intermediate checkpoints.

It's rather annoying to not have intermediate checkpoints. The whole point of version control is to make sure your goofs on a particular edit can be backed out. Even then, there are certain situations where you don't want to check in a Major Revision when all you really want to do is see how a rewrite^W refactoring of some code affects things. On the fourth iteration you realize the second attempt was the best. There's no reason each of those should be a major checked-in version but you should be able to jump between any of those, at least before you do a major checkin.

/uses vesta

Fri, May. 23rd, 2008 04:50 am (UTC)
bramcohen: Re: #10

If you mean a feature where you locally checkpoint and can revert to the checkpoint but when you do a commit or update all the memory of the checkpoint is wiped out then I can believe that some people would find that helpful, although it isn't a core important feature. My own approach is much more straightforward. I just plan out what I'm going to do and then start typing, with very little experimentation and patch management. Most of the time such activities are so time consuming and wasteful as to be not worth it.