Thu, May. 22nd, 2008, 02:55 pm
Version Control Recommended Practices
It's been a while since I last posted my thoughts on version control. My thoughts have changed a lot over time, so I'm going to cover everything from the very highest level.
Here are my recommendations:
1. Don't use branches.
I'm serious. Creating lots of branches takes a lot of time and energy, and is usually a complete waste. If you have a good reason to use branches, use the minimum possible. In particular, creating a new branch for every new feature is ludicrous.
2. Don't bother with a pretty history.
The history of a branch is hardly ever looked at. Making it look pretty for the historians is just a waste of time. The beauty of 3-way merge is that you can always clean stuff up later and never worry about the past mess ever again. In particular, don't go to great lengths to make sure that there's a coherent local image of the entire repository exactly as it appeared on your local machine after every new feature. There are very rare projects which maintain a level of reliability and testing which warrant such behavior, and yours isn't one of them. Stop wanking.
3. Use 3-way merge
This is mostly a dig at myself - I've spent a long time thinking about possible semantics of merging, and the upshot is that there's a way of supporting cherry-picking well in the underlying engine, but nobody's ever proposed a good UI for it, and I suspect that's because the feature is inherently confusing and not such a hot idea for process reasons as well.
4. Use Bram's diff algorithm.
This is for tool developers, not for end users. It's fairly self explanatory.
5. Treat stable branches as forks, don't auto-merge from them.
Once a stable branch has been separate for a while, the development branch has inevitably deviated too much for auto-merging to be accurate and useful. Most projects have a convention for marking bug fixes in commit comments on the development branch, then grepping for those comments in the history and manually trying to apply those patches. The tools support for this is sucky, and the world will be a more productive place if someone improves that, but the practice is clearly a good one.
6. Don't spend time achieving a greater than usual level of stability on the main branch before branching off stable
Forcing everyone to wait until stable is branched off to get any work done just wastes their time. It's far better to do the stable branch first, then work on extra stability on the stable branch while people continue to get new work done on the main branch.
7. If you do use branches, have them 'shadow' the main branch.
Branches which don't get merged regularly inevitably become forked forever. For a branch to continue to be useful it's necessary for it to pull from the main branch on a regular basis and have any resulting conflicts be resolved.
Good reasons to have a branch are if there's a feature which might turn out to be a bad idea, there's some code work which would cause too much instability if it were checked into the main branch before being complete, or there's a direct need to have a version of the codebase missing certain features. In all of these cases the branch must be maintained by pulling from the branch it's shadowing frequently in order to not die, and the branch should have its contents committed into main and stop being maintained as soon as is feasible.
Experimental features or ones written by a person of dubious coding skills in particular should only be a single branch. Some projects treat them as a zillion piecemeal little patches, and that's just painful and awful. Far better to give feedback on a branch, and commit the whole branch when it's at an acceptable level of stability. Note that experimental branches can have their own stable and unstable versions as well, with work going on on unstable while code improvement happens in stable. That's a good practice, although it's hardly ever done today.
8. Show all relevant history in annotate/blame view
If a line of code was written by one person, then pulled into the main branch by another person, then the history should say when and by whom both of those events happened. If it was pulled into another branch by another person, that should be included as well. The semantics of a single line's history is that it was written once and then had a series of pull (or push, depending on how you think about it) operations performed on it as it was moved into other branches. None of the tools currently get this right.
9. If you have a good reason to branch off branches, do it right.
There are a few sophisticated techniques for branch management which are fairly safe and coherent from a process standpoint. These should only be used sparingly, but tools support for them is currently lousy and they're cool so I'll explain them now.
First, it's possible to have a branch shadow another branch, like this:
In this case, A is shadowing main, and B is shadowing A.
It's also possible under some circumstances to safely change the way the branch relationships work. For example, the previous case can be changed into this one, and vice versa:
branch A branch B
For those of you knowledgeable about the technical issues, the times when this can be done are when the LCA of the branch being moved and the one it's shadowing is already an ancestor of the current version of the new branch to shadow. If the reparenting isn't currently allowed, then some committing or updating can be done to make it allowed (although that might not be immediately advisable for process reasons).
Version control system developers, please make your systems have the shadowing concept be built in from the ground up. And allow the safe forms of reparenting. And make sure that the branch relationships and their changes are kept in the history along with everything else.
Valid words used in this post which the spell checker didn't like: wanking, grepping, sucky, codebase, reparenting.
Fri, May. 23rd, 2008 02:33 am (UTC)
10. Use darcs? :)
Fri, May. 23rd, 2008 03:58 am (UTC)
No. What I'm advocating here is that rather than the tools having very powerful features, they should have features which support the simple things well. There's still plenty of stuff I'm asking for which doesn't have proper tools support.
Fri, May. 23rd, 2008 04:56 am (UTC)
Ah, I see. Fair enough!
Fri, May. 23rd, 2008 04:33 am (UTC)
wisedonkey : #10
I propose a tenth item: Intermediate checkpoints.
It's rather annoying to not have intermediate checkpoints. The whole point of version control is to make sure your goofs on a particular edit can be backed out. Even then, there are certain situations where you don't want to check in a Major Revision when all you really want to do is see how a rewrite^W refactoring of some code affects things. On the fourth iteration you realize the second attempt was the best. There's no reason each of those should be a major checked-in version but you should be able to jump between any of those, at least before you do a major checkin.
Fri, May. 23rd, 2008 04:50 am (UTC)
bramcohen : Re: #10
If you mean a feature where you locally checkpoint and can revert to the checkpoint but when you do a commit or update all the memory of the checkpoint is wiped out then I can believe that some people would find that helpful, although it isn't a core important feature. My own approach is much more straightforward. I just plan out what I'm going to do and then start typing, with very little experimentation and patch management. Most of the time such activities are so time consuming and wasteful as to be not worth it.
Fri, May. 23rd, 2008 06:07 am (UTC)
ibsulon : For projects of how many people?
I don't much have an argument about the rest, but no branches? This makes peer review and code commits by multiple people very difficult.
Let me suggest an alternative view. Branch for every feature. Commit to that branch every night - it's the developer's equivalent to saving in word. Then, when the feature is complete, the development process mandates that another developer look at your code for potential issues. After that, the change is committed.
Happiness is being able to check out the build at any time and being able to compile it and it just work.
I agree that it shouldn't be branched off for more than two weeks, though. Any more and you're trying to bite off more than you can chew. split up the problem.
Fri, May. 23rd, 2008 11:46 pm (UTC)
bramcohen : Re: For projects of how many people?
Any project where the main build is routinely broken has very severe process problems, and using a version control system to get around that is a bit of a crutch. Before committing, developers should always update, build, and run whatever tests their are, then commit. That's the very most basic level of version control process, which I assumed everybody knows already.
Code reviews can be implemented using branches, although they're very short-lived branches. I've never used code reviews extensively, although my own inclination would be to have a process where one person commits their changes to the trunk, then another person reviews those changes after they're already in. If code review frequently results in the code getting rejected instead of just tweaked or bug fixed, there are much bigger issues.
Fri, May. 23rd, 2008 08:27 am (UTC)
We use branch-per-bug development at work and it's been wonderful, both for source review and for delaying the decision on which features to include in a release based on what is actually ready to go.
Fri, May. 23rd, 2008 11:48 pm (UTC)
Why do you have so many half-chewed features lying around? My general recommendation is to get some features done completely and have those all be in the main branch. That way possible interdependencies or conflicts between features don't cause any issues, and there's no cognitive load associated with keeping track of what's in what branch.
Mon, Jun. 9th, 2008 12:04 pm (UTC)
You probably have more flexibility in your live dates than we do. We have to be able to roll a working version out by the end of timebox deadline. This way we don't have to correctly estimate in advance what features we can do by then - we just try and get as much done as we can in the time available, and include only those features which are ready to go by that time.
But we've also found branch-per-bug invaluable for source review - you can really look at what a change will do before you merge it in. Our previous source review procedures were more haphazard and I had less confidence in them. What have you used?
Thu, Jun. 12th, 2008 10:10 pm (UTC)
I've used CVS, Subversion, and Codeville. All of them work okay if you maintain a single trunk and nothing else. Subversion has completely reasonable support for forking off release branches. Codeville's advanced functionality I found painful to use because of all the manual keeping track of branches, which is why I have all these thoughts about how the version control system should remember the relationships between branches for you.
I haven't done much code review, but would be more inclined to do it by assigning particular commits to particular people for review/signoff. Doing that with tool support would be vastly simpler than doing it without, and I don't know the current tools very well, so perhaps none of them do what I'd like, but it seems like all the mucking around with branches is a lot more cognitive overhead than necessary for maintaining what is essentially one bit of information for every commit.
Fri, Jul. 18th, 2008 11:16 pm (UTC)
tarrant3000 : Git works fine
We've been using the branch-per-bugfix approach for a couple of weeks now, and it works quite well.
The developer creates a branch with the name fix/[bugid], and those fixes can be cherry-picked into an integration branch (one for qa, perhaps one for uat, another for launch prep, etc.).
Eventually the fix branch makes its way out to master, which essentially represents what has gone live.
Periodically we prune the branches by deleting all of them. Git will refuse to delete any branch that has any changes that are not in master, so we safely avoid accidental code deletions while also keeping the branch list very clean.
As a result of this, we can have individual changes launched in any order, with an arbitrary number of test/integration branches, run by an arbitrary number of integrators.
Fri, May. 23rd, 2008 05:37 pm (UTC)
This is a prime example of they are (or should be) payin' you the big bucks.
Fri, May. 23rd, 2008 06:52 pm (UTC)
if branches aren't cheap, that's a problem with your vcs. cheap branching is why i'm stuck on git, problems and all.
Fri, May. 23rd, 2008 10:22 pm (UTC)
I can't agree more. I used to think of branches as a bug, for lack of a better word. After switching to git, I see branches as a feature and use it regularly, since it is as trivial to do as committing.
Fri, May. 23rd, 2008 11:50 pm (UTC)
I'm arguing against their usage for process reasons. Yes the version control system should support branches cheaply, but even if the tool supports making lots of branches, actually doing so usually doesn't speed up development. In fact it can actively slow it down.
Sat, May. 24th, 2008 10:30 am (UTC)
I think some of these things depend on what situation you are in, and how big the code-base is.
I mainly disagree on "Don't stabilize before branching for release", from practical experience.
The 'stabilizing stuff' you put in should also be merged into the main branch anyway. Except then you end up with the problem you discussed before, where the stable branch and mainline get too far separate you can't merge it back it.
All programmers don't like being forced to do 'boring' book-work cleanup and stabilisations. Everyone wants to write exciting new features. Therefore I've found saying "Nothing new till we've cleaned up the bug database" every 6 months or so is the only way of forcing them to clean up the code base a little :)
Sat, May. 24th, 2008 04:55 pm (UTC)
The stabilize before branching thing is trivial if you have a single well-disciplined programmer. As you increase the number of programmers and decrease their level of discipline it becomes increasingly necessary to branch first. Even with good discipline, I wouldn't ever try to stabilize before branching with more than ten programmers on a project, just because some of them will inevitably sit around doing nothing during the stabilization process.
It does make sense for the main branch to shadow the stable branch for a while until the divergence gets too big. That requires tool support though, and it's a detail I skipped over, perhaps inadvisably, in the interests of keeping my suggestions reasonably short and coherent.
Mon, May. 26th, 2008 05:25 am (UTC)
colonel_nikolai : Amen, Brutha
Wow, like we share the same mind. The only nit is as I reach 6 and go on, I start to get nervous. But then you seem to say those are more edge cases and should be used carefully if at all, and I totally agree.
Source code control interaction should be like breathing: I never want to see a developer on my team "think" when they have to deal with it. When I hear things like "features" on branches and using branches for "code review", I kinda giggle, like, what are we doing here? Playing with source code control systems or writing software?
Every commit should constitute a fully running system, minus x number of features prior to a release. This means all tests pass (You are doing test driven development, right?) and I can deploy the code for the customer right there, every time. Once you adopt this, you test a little, code a little, commit, repeat: Breathe in, breathe out, repeat. THAT'S ALL.
I want the developer's brainpower reserved for writing tests to drive out great code, not fixing bugs that are a result of "Junk on the Trunk Syndrome" or "My branch of this branch of trunk is not merge-able anymore" syndrome.
Thu, Jan. 29th, 2009 05:50 pm (UTC)
manersoni : Breathing easy with Accurev
I see you are not familiar with Accurev version control, and probably aren't familiar with how streams vs. branches make code review so much more a natural part of the process. I would try to get my hands on a free copy to try it out so you don't have to play around with branches any more. It's friggin awesome!
Sat, Feb. 14th, 2009 11:18 pm (UTC)
Looking forward to more reports about it. I have a gut feeling there's something missing in accurev.