Log in

No account? Create an account

Fri, May. 6th, 2005, 07:24 pm
More version control stuff

Linus Torvalds thinks that Git is already better at merging than everything else. In the sense that Linus personally has a magic merge algorithm knows as 'Andrew Morton' that's correct, but for everybody else this claim is completely false. (I asked around about what he could mean, and some people suggested that 'better' might have meant 'faster', which is the only explanation which doesn't involve Linus being delusional or lying.)

Unfortunately Linus's stated plans for Git are that it should only ever have whole tree three way merging (because that's the simple, practical, expedient way) and that it will have implicit renames and moves of lines between files (because that's the 'right' way). The resulting approach isn't workable even on paper. Whether the plans will be changed in advance of a disaster remains to be seen.

Meanwhile, we recently made a big advance in merge algorithms, you can read the latest about the upcoming new Codeville merge algorithm in this post.

Sun, May. 8th, 2005 10:27 am (UTC)
ciphergoth: Re: VCS...

One attractive thing about three-way-merge is that it means that the network protocol need know nothing of the file semantics used to implement merging - which in turn should mean that you can have arbitrarily sophisticated merges, using all sorts of clever semantic information, without making the whole VCS too complicated to live.

This is why I'm unhappy to learn that 3WM has such serious problems - a Codeville-like approach means that servers are exchanging semantic information about the history of every line in a file, which means that the network protocol is assuming "all the world's a line-oriented text format" in a very deep way. With something like Monotone, a clever client can merge C files one way, Python another, XML another and PDF another still without forcing the network protocol and storage formats to know about all these different file formats.

There seems to be talk of using Codeville's merge in Monotone. I'm curious to know how this would work - for each file, you'd have to crawl over many revisions, infer the anottated version anew, and from there do the merge. Could that be done efficiently enough?

Sun, May. 8th, 2005 08:28 pm (UTC)
bramcohen: Re: VCS...

We've been discussing the next rev of the Codeville protocol, which is necessitated by the upcoming weave stuff (well, not necessitated, but it's a lot more work to not change it than to change it) and it looks like we're going to be viewing file history as a sequence of versions identified by secure hash, rather than including delta information in the history. The immediate motivation for this is that we'd like to be able to in the future dynamically decide on the weave ordering in the local client based on what resolves better, and we couldn't do that if the weave ordering were global. So basically, yeah, I agree about allowing more sophisticated merge algorithms in the future, and we're preserving that property, in that same way.

Yes, you crawl over past revisions for each file to merge it. Yes, that can be done plenty efficiently enough. That information also caches quite well if it's kept in a weave, and doesn't even need to be generated until the first time you do a merge.

Mon, May. 9th, 2005 01:08 pm (UTC)
ciphergoth: Re: VCS...

But the Codeville network protocol and storage formats are still line-oriented?

Mon, May. 9th, 2005 02:40 pm (UTC)
bramcohen: Re: VCS...

The network protocol will probably be baseed on xdelta. The storage format will be strictly local, and can be changed locally, but we're not really sure what it's going to look like. Certainly inferencing about lines will be done somewhere in there, to do merging if nothing else.