Log in

No account? Create an account

Fri, May. 6th, 2005, 07:24 pm
More version control stuff

Linus Torvalds thinks that Git is already better at merging than everything else. In the sense that Linus personally has a magic merge algorithm knows as 'Andrew Morton' that's correct, but for everybody else this claim is completely false. (I asked around about what he could mean, and some people suggested that 'better' might have meant 'faster', which is the only explanation which doesn't involve Linus being delusional or lying.)

Unfortunately Linus's stated plans for Git are that it should only ever have whole tree three way merging (because that's the simple, practical, expedient way) and that it will have implicit renames and moves of lines between files (because that's the 'right' way). The resulting approach isn't workable even on paper. Whether the plans will be changed in advance of a disaster remains to be seen.

Meanwhile, we recently made a big advance in merge algorithms, you can read the latest about the upcoming new Codeville merge algorithm in this post.

Mon, May. 9th, 2005 01:12 pm (UTC)

This algorithm seems to merge versions rather than merging changes. Supposing I have A -> B -> C and I want to apply the bugfix in B -> C to version A without including the change A -> B - how can I do that using this merge algorithm? Is that what Linus means by "cherrypicking"?

If all he means is choosing the parts of B -> C he wants to merge, then that's trivial - create a synthetic version B' and consider the change to be B -> B' -> C, then merge B -> B'.

Mon, May. 9th, 2005 02:44 pm (UTC)

Yes, that's what's meant by cherry-picking.

There are two approaches to cherry-picking here - either pull in everything and add supressions to lines which shouldn't appear, or only pull in certain lines. I'm not sure which is the best approach on a technical level. I'm also not sure what the best approach to cherry picking is. See this post for my latest thoughts on the matter. Certainly supporting cherry-picking based on this algorithm will be entirely doable, but I'm still going over basic functionality and haven't worked it all out yet.

Mon, May. 9th, 2005 04:51 pm (UTC)

I think you can do cherrypicking as follows - (u)nborn, (l)ive, (d)ead. Including backward transactions for rollback...

uuu u
uul l
uud d
ulu u
ull u
uld d
udu u
udl u (?)
udd u
luu l
lul l
lud d
llu u
lll l
lld d
ldu u (?)
ldl l
ldd l
duu d
dul d (?)
dud d
dlu d (?)
dll d
dld d
ddu u
ddl l
ddd d

Mon, May. 9th, 2005 07:34 pm (UTC)

Your chart is a little inconsistent. There are two things we can go for here -

Pull in everything in C which isn't already in B

Remove everything in B which isn't in C

If we're doing the pull in approach, then anything which is d in A should remain d. If we're doing the removal approach, then ddu should result in u.

I think the first on is 'cherry-pick' and the second one is 'rollback'. The semantics of rollback are such that we'd really like rollbacks to stay rolled back on later merges, so this simplistic approach fails for what we'd like that functionality to do.

If we go with straight cherry-picking, we get the following table, for which I also included what side won for the purposes of detecting conflicts:

uuu u
uul l >
uud d >
ulu u
ull u <
uld d >
udu u
udl u <
udd u <
luu l <
lul l
lud d >
llu l <
lll l
lld d >
ldu l <
ldl l
ldd l <
duu d <
dul d <
dud d
dlu d <
dll d <
dld d
ddu d <
ddl d <
ddd d

The line which gives me pause is uld. The question is - if a section is overwitted in B, then re-overwritten in C, should the result of cherry-picking C-B then merging with B be to drop B's change or conflict? My current feeling is that it should just drop B's change, which means that d is the right answer, but I'm not 100% certain that u isn't the correct behavior.

Thanks for pointing out this approach, it hadn't occured to me. My brain still hasn't made the full paradigm shift to the weave way of thinking.

Mon, May. 9th, 2005 08:02 pm (UTC)

Note that this entire chart can be summarized assuming u=0, l=1 and d=2, as max(A, C > B ? C : u). The arrow goes to whichever side, between A and C, is the same as the descendant, and if they're all the same there's no arrow.

Mon, May. 9th, 2005 11:33 pm (UTC)

We disagree on four cases:

llu	u	l
ldu	u	l
ddu	u	d
ddl	l	d

These are all "roll-back" cases - they can only arise if the revision history is something like "C -> B -> A" and we want a version of A without the C -> B change. I think you're right about "ldu l" (it's one of the ones I marked with a query) but I'm surprised on all the other three - surely where A = B, a "roll-back" has to do what's in C?

Thinking about it further, for rollback to work, the rule has to be more like ((C == B) ? A : C), but this isn't right for "ldu l" or "dlu d" - I'm not sure if "udl u" is correct or not. The cases with discrepancies are

ddl l l d
ddu u u d
dlu u d d
dul l d d
ldu u l l
llu u u l
udl l u u

Columns are my rule, my hand-picked, your rule.

Tue, May. 10th, 2005 12:38 am (UTC)

The problem with doing rollback in this sort of way is that A inevitably gets merged with a near-relative of A very quickly, so if something was d in A then it will rapidly become d again. Rollback needs to use a different mechanism, probably involving line suppressions which can be undone later.

Mon, May. 9th, 2005 07:38 pm (UTC)

Note that this approach is for the 'pull from C what is not in B' interface to cherry-picking, not the 'I'm going to cherry-pick some stuff from C and you figure out what I did' interface to cherry-picking.

Mon, May. 9th, 2005 11:34 pm (UTC)

The two are complementary. Especially if you can insert revisions between other revisions after the fact...