Log in

No account? Create an account

Fri, May. 6th, 2005, 07:24 pm
More version control stuff

Linus Torvalds thinks that Git is already better at merging than everything else. In the sense that Linus personally has a magic merge algorithm knows as 'Andrew Morton' that's correct, but for everybody else this claim is completely false. (I asked around about what he could mean, and some people suggested that 'better' might have meant 'faster', which is the only explanation which doesn't involve Linus being delusional or lying.)

Unfortunately Linus's stated plans for Git are that it should only ever have whole tree three way merging (because that's the simple, practical, expedient way) and that it will have implicit renames and moves of lines between files (because that's the 'right' way). The resulting approach isn't workable even on paper. Whether the plans will be changed in advance of a disaster remains to be seen.

Meanwhile, we recently made a big advance in merge algorithms, you can read the latest about the upcoming new Codeville merge algorithm in this post.

Sat, May. 7th, 2005 05:07 pm (UTC)
bramcohen: Re: diff

The problem with trying a single ancestor first is that if there was a conflict different hunks might have won in the result, and the wrong side of a conflict might have a match of a single blank line or some similar nonsense and thus invalidate the entire other side.

It might make sense to match against the 'mash-up' of all the ancestors - that is, the result of simply smushing their weaves together, without checking for conflicts. This seems like an alarmingly odd approach, but we recently figured out a new weave ordering algorithm which makes the mash-up look considerably less funky, and I'm slowly warming up to the idea. Simple two-way diffs certainly have aesthetic appeal.

All that stuff about unique lines is just to improve the behavior and run-time of two-way diffs, by the way. We don't want to get erroneous matches on '}' lines.

Mon, May. 9th, 2005 01:38 pm (UTC)
ciphergoth: Re: diff

Doing it against the "mash-up" makes a lot of sense. Especially if you can tweak the "diff" algorithm to prefer matches using more-closely-related lines over less-closely-related if the length of the LCS is the same. That seems perfectly practical for the naive O(nm)-time dynamic programming "diff" algorithm I got taught at Uni; I don't know if it's just as easy for the more sophisticated algorithms used in practice.