You are viewing bramcohen

Sun, Apr. 17th, 2011, 06:56 pm
Git Can't Be Made Consistent

This post complains about Git lacking eventual consistency. I have a little secret for you: Git can't be made to have eventual consistency. Everybody seems to think the problem is a technical one, of complexity vs. simplicity of implementation. They're wrong. The problem is semantics. Git follows the semantics which you want 99% of the time, at the cost of having some edge cases which it's inherently just plain broken on.

When you make a change in Git (and Mercurial) you're essentially making the following statement:

This is the way things are now. Forget whatever happened in the past, this is what matters.


Which is subtly and importantly different from what a lot of people assume it should be:

Add this patch to the corpus of all changes which have ever been made, and are what defines the most recent version.


The example linked above has a lot of extraneous confusing stuff in it. Here's an example which cuts through all the crap:

  A
 / \
B   B
|
A


In this example, one person changed a files contents from A to B, then back to A, while someone else changed A to B and left it that way. The question is: What to do when the two heads are merged together? The answer deeply depends on the assumed semantics of what the person meant when they reverted back to A. Either they meant 'oops I shouldn't have committed this code to this branch' or they meant 'this was a bad change, delete it forever'. In practice people mean the former the vast majority of the time, and its later effects are much more intuitive and predictable. In fact it's generally a good idea to make the separate branch with the change to B at the same time as the reversion to A is done, so further development can be done on that branch before being merged back in later. So the preferred answer is that it should clean merge to B, the way 3 way merge does it.

Unfortunately, this decision comes at significant cost. The biggest problem is that it inherently gives up on implicit cherry-picking. I came up with some magic merge code which allowed you to cut and paste small sections of code between branches, and the underlying version control system would simply figure out what you were up to and make it all work, but nobody seemed much interested in that functionality, and it unambiguously forced the merge result in this case to be A.

A smaller problem, but one which seems to perturb people more, is that there are some massively busted edge cases. The worst one is this:

  A
 / \
B   B
|   |
A   A


Obviously in this case both sides should clean merge to A, but what if people merge like this?

  A
 / \
B   B
|\ /|
A X A
|/ \|


Because of the cases we just went over, they should clean merge to B. What if they are then merged with each other? Since both sides are the same, there's only one thing they can merge to: B

  A
 / \
B   B
|\ /|
A X A
|/ \|
B   B
 \ /
  B


Hey, where'd the A go? Everybody reverted their changes from B back to A, and then via the dark magic of merging the B came back out of the ether, and no amount of further merging will get rid of it again!

The solution to this problem in practice is Don't Do That. Having multiple branches which are constantly pulling in each others's changes at a slight lag is bad development practice anyway, so people treat their version control system nicely and cross their fingers that the semantic tradeoff they made doesn't ever cause problems.

Mon, Apr. 18th, 2011 08:42 am (UTC)
MrJoy: Re: You are missing the point.

Whenever I conduct a merge, I DO use three-way merge tools, but I have learned to ALWAYS ALWAYS ALWAYS sanity-check the conclusions it reaches, hunk-by-hunk.

Except of course in the rare case I want to close out a branch outright and denote that in the history, and then I do a "git merge -s ours" to denote it as such.

You may also find "git rerere" relevant when dealing with repetitious merges that might otherwise get history confused.

However, ultimately, when I find such situations arise I usually conclude it is time to start reminding people that Git is about "patches as communication", and encourage them to act accordingly. I invariably see a decrease in ill-conceived and ill-executed merges that may or may not be doing what people *expect*, a reduction in spurious commits that only communicate minutiae of an individual's particular workflow (and not meaningful information about what the individual set out to *achieve*), and so forth.

Problems like the on you describe are a symptom of people abdicating their responsibility as engineers to The Machine. Now, I am perfectly happy to let machines do for me what they can do better than I, (garbage collection + references > pointers, for example) but when an engineer accepts changes into the codebase he or she is working on without even being AWARE of what those changes are -- a situation that happens by definition when one leaves merging to an automated tool -- then the engineer is inviting a world of hurt upon him or herself.

Know what's in your codebase, or expect it to be broken often and unexpectedly. Communicate upstream in the form of cogent, terse patches that express the whole of a change, and nothing but that change.

To do otherwise necessarily requires one of two options: A) Coordinate early and often to minimize the risk of deviation (I.E. don't branch), or B) Fall into irreconcilable chaos. SVN users are familiar with both. Git users are familiar with A only by virtue of bad habits learned in SVN-land, and familiar with B only by virtue of the worst form of laziness. (I.E. not the kind Larry Wall spoke of...)

Mon, Apr. 18th, 2011 08:56 am (UTC)
bramcohen: Re: You are missing the point.

You seem to enjoy spending time messing around with your version control system.

Mon, Apr. 18th, 2011 09:05 am (UTC)
MrJoy: Re: You are missing the point.

Nope, I enjoy having code that goes from working state to working state, and not getting blindsided accidentally.

Wed, Apr. 20th, 2011 11:59 am (UTC)
ciw88: Re: You are missing the point.

i'm confused.

i think the original point was that git makes random decisions when merging, which forces the git user to look at the result and make sure git's making those random choices to everyone's liking.

the preferred situation would be that a version control system is deterministic and consistent, i.e. that given the same situation it will always do the same thing. if that thing is not what you want, then you will know beforehand, and can fix it. if it is, you don't have to check, you just know it's going to be right.

the drawback of consistency is that it's hard to achieve, and it's harder to understand what happens if you don't follow it with your own eyes hunk by hunk (which you can still do btw). the benefit is that if you're smart enough to use your tools correctly, being consistent they give you a lot more power and efficiency. because you don't have to keep making sure they do the right thing every step of the way.

now what has that to do with social contracts and multiple points of truth? if git were consistent, how would that change the balance of control and authority in a team using it? doesn't consistency, rather than limiting your choices of what hunk goes where, only make it easier for you to implement those choices?

i like git a lot, btw. (-: