Log in

Mon, Apr. 18th, 2011, 06:38 pm

> In this example, one person changed a files contents from A to B, then back to A, while someone else changed A to B and left it that way. The question is: What to do when the two heads are merged together?

Report a merge conflict.

Going back to the basics: what are the semantics of a 3-way merge? Well, we have three snapshots, the base and two branches. We make diffs between the base and the branches. What it actually means is that we reconstruct the programmers' actions from the snapshots: they added these lines, modified these lines, deleted these lines.

Then we either merge the diffs and produce the merged snapshot, or discover a conflict: both programmers modified the same line, or one deleted the line another modified, or both added some lines in the same place.

And if we look at this this way, then the root of the problem is that git (as well as Mercurial, SVN, etc) take a shortcut when computing the diffs to be fed into the 3-way merge, and that sometimes produces incorrect/inconsistent results.

An example of git doing it wrong: http://pastebin.com/SxmwpFkY

If I run the the script, switch to master and do "git diff master~2", git produces an incorrect diff:
@@ -3,3 +3,11 @@ B
I mean, it's a correct diff between the two snapshots, but this is not what I did.

But when I run "git blame", it produces the correct "diff":
6584854c 1) A
6584854c 2) B
6584854c 3) C
6584854c 4) D
6584854c 5) E
c36e1ff8 6) G
c36e1ff8 7) G
c36e1ff8 8) G
^b43cba4 9) A
^b43cba4 10) B
^b43cba4 11) C
^b43cba4 12) D
^b43cba4 13) E

Here it examines all commits leading to the current one, and deduces the position of the lines from the initial commit (^b43cba4) correctly.

And that's it. Normal diffs are not associative, given successive snapshots a, b, c, diff(diff(a, b), c) != diff(a, diff(b, c)) != diff(a, c). While the output of `blame` is associative (except maybe for deleted lines).

So it seems that if git (and hg, and svn) used the output of a blame-like algorithm for merging, then the order of automatic merges wouldn't matter. And the problem becomes purely technical one: how to make this blame-like algorithm fast enough.

(by the way, of course the basic step -- that reconstruction of programmer's actions from the difference between the snapshot, is not infallible. But when you use it on two adjacent snapshots it's good enough, also pretty transparent. As the distance between the snapshots being diffed (by the number of intermediate commits) increases, the probability of getting it wrong grows).

No HTML allowed in subject


Notice! This user has turned on the option that logs IP addresses of anonymous posters. 

(will be screened)