Log in

No account? Create an account

Fri, May. 6th, 2005, 07:24 pm
More version control stuff

Linus Torvalds thinks that Git is already better at merging than everything else. In the sense that Linus personally has a magic merge algorithm knows as 'Andrew Morton' that's correct, but for everybody else this claim is completely false. (I asked around about what he could mean, and some people suggested that 'better' might have meant 'faster', which is the only explanation which doesn't involve Linus being delusional or lying.)

Unfortunately Linus's stated plans for Git are that it should only ever have whole tree three way merging (because that's the simple, practical, expedient way) and that it will have implicit renames and moves of lines between files (because that's the 'right' way). The resulting approach isn't workable even on paper. Whether the plans will be changed in advance of a disaster remains to be seen.

Meanwhile, we recently made a big advance in merge algorithms, you can read the latest about the upcoming new Codeville merge algorithm in this post.

Sat, May. 7th, 2005 11:13 pm (UTC)
bramcohen: Re: VCS...

You're talking about using semantic information about the files being versioned, which while a reasonable idea in principle, introduces a huge amount of complexity in practice. Instead we just view everything as lines, and err on the side of extra conflicts when it comes time to decide what's broken. Yes, semantic mismatches can still happen on clean merge, but those are surprisingly rare, and fairly easy to detect and fix (usually just by doing a build).

There's a fair amount of academic work on semantic merging, none of us working on real systems really trust it at this point. It could potentially be useful though.

Sun, May. 8th, 2005 04:02 am (UTC)
invrnrv: Re: VCS...

what about the real-time aspect that i mentioned? it wouldn't be hard to figure out whether X given Y or Y given X creates more compile errors than the sum of X and Y alone, in which case there is a conflict. This should be relatively straightforward to implement. By alerting the coders while they're programming it resolves possible conflicts so there will be no trouble merging.

Also, it would be ideal to hook up the IDE to your VCS because that gives you a lot of information about what line went where.

i'm just rambling my 2 cents. take it for what its worth =]

Sun, May. 8th, 2005 04:59 am (UTC)
bramcohen: Re: VCS...

I'm not sure what you mean about that compilation stuff. Certainly one can have scripts which attempt a build before every commit and after every update.

Trying to infer line movement based on what goes on in the editor is probably a bad idea. Users have a tendency to do things like copy over a function, edit the new copy, then delete the old one, with the intention of this being a modification of the old function, not a creation of a new one. Microsoft Word's version control tells people when they do this, which helps them do the right thing, but in the programming world there is no equivalent concept.

Sun, May. 8th, 2005 10:27 am (UTC)
ciphergoth: Re: VCS...

One attractive thing about three-way-merge is that it means that the network protocol need know nothing of the file semantics used to implement merging - which in turn should mean that you can have arbitrarily sophisticated merges, using all sorts of clever semantic information, without making the whole VCS too complicated to live.

This is why I'm unhappy to learn that 3WM has such serious problems - a Codeville-like approach means that servers are exchanging semantic information about the history of every line in a file, which means that the network protocol is assuming "all the world's a line-oriented text format" in a very deep way. With something like Monotone, a clever client can merge C files one way, Python another, XML another and PDF another still without forcing the network protocol and storage formats to know about all these different file formats.

There seems to be talk of using Codeville's merge in Monotone. I'm curious to know how this would work - for each file, you'd have to crawl over many revisions, infer the anottated version anew, and from there do the merge. Could that be done efficiently enough?

Sun, May. 8th, 2005 08:28 pm (UTC)
bramcohen: Re: VCS...

We've been discussing the next rev of the Codeville protocol, which is necessitated by the upcoming weave stuff (well, not necessitated, but it's a lot more work to not change it than to change it) and it looks like we're going to be viewing file history as a sequence of versions identified by secure hash, rather than including delta information in the history. The immediate motivation for this is that we'd like to be able to in the future dynamically decide on the weave ordering in the local client based on what resolves better, and we couldn't do that if the weave ordering were global. So basically, yeah, I agree about allowing more sophisticated merge algorithms in the future, and we're preserving that property, in that same way.

Yes, you crawl over past revisions for each file to merge it. Yes, that can be done plenty efficiently enough. That information also caches quite well if it's kept in a weave, and doesn't even need to be generated until the first time you do a merge.

Mon, May. 9th, 2005 01:08 pm (UTC)
ciphergoth: Re: VCS...

But the Codeville network protocol and storage formats are still line-oriented?

Mon, May. 9th, 2005 02:40 pm (UTC)
bramcohen: Re: VCS...

The network protocol will probably be baseed on xdelta. The storage format will be strictly local, and can be changed locally, but we're not really sure what it's going to look like. Certainly inferencing about lines will be done somewhere in there, to do merging if nothing else.