You are viewing bramcohen

Sun, Dec. 19th, 2004, 06:33 pm
Great Programmers

I've seen a lot of discussion of great programmers, usually centering on how to find them, but usually what people really want to know is how to become one. Since I'm widely considered to be a great programmer, I'll give some advice.

First of all there's raw coding ability. For this, practice makes perfect. Implementing lots of algorithms from, say Introduction to Algorithms can help sharpen your technical abilities, but really the important thing is to have some experience. Anyone with enough natural talent will get good at basic raw coding.

There are only two coding skills which mostly people who are completely self-taught as a programmer miss out on: proper encapsulation, and unit tests. For proper encapsulation, you should organize your code so that changes which require modifying code in more than one module are as rare as possible, and for unit tests you should write them to be pass/fail so that all unit tests can be run as a comprehensive suite. And now you know everything you need to about those two things. Anyone who is taught the above guidelines, and decides they really want to learn those skills, will with sufficient practice become good at them.

Coding skill is all well and good, and you can't become a great programmer without it, but it's far from everything. I'm decent at raw coding, but I know many people who are better, and some of them are abysmal programmers. I in particular can't deal with being tasked with fixing up spaghetti code. My brain simply locks down and refuses to make any modifications which it isn't convinced will work, which is of course impossible when the source material is an incurably bug-ridden mess.

What truly separates the great programmers from the journeyman programmers is architecture. What's puzzling is that architecture appears to be one of the simplest parts of the whole process, requiring in most cases little more than some pencil and paper calculations and a willingness to change.

The simplest architectural problems to solve are the ones which for lack of a better theory most people ascribe to emotional or psychological problems. These are decisions for which there's no rational justification whatsoever. For example, writing a non-speed-critical program (which is most of them) in C or C++. A few years ago you could justify that because the other languages didn't have such extensive libraries, but today it's ludicrous. Another one is building one's protocol as a layer on top of webdav. And another one is building a transactional system for retrieving any subsection of any point in the history of an arbitrarily large file in constant time when that isn't part of project requirements. Yes, I'm making fun of subversion here. It's a great example of a project permanently crippled by dumb architectural decisions.

Half of these 'emotional' architectural decisions are dogmatically using a past practice in situations where it's inapplicable. The other half are working on interesting problems which have little or no utility in the finished product. Once decisions like these have been made, questioning them can become a political impossibility. If someone new comes in to a project with many man-years on it, and in their first week learns that there's a networking call which includes a parameter as to whether it should be blocking or non-blocking, and immediately declares that the entire codebase is a mess and difficult if not impossible to maintain, they'll almost certainly be correct and justified, but their opinion will likely be disregarded as as brash and ill-informed. After all, they haven't spent the kind of time on the codebase than everybody else. I've actually had this happen to me, and while others have claimed that there are more political ways of approaching such problems, my experience has been that once the truth becomes unthinkable a couple people need to get fired before any improvement can be made.

My advice about technically unjustifiable architectural decisions is to not do them. If you find yourself doing them, you probably need to get laid or see a shrink or have a beer.

But what if you're emotionally well-adjusted, and want to get better at software architecture? Logging more hours at work will get you nowhere. When I wrote BitTorrent multiple other people were working on the exact same problem, most of them with a big head start and a lot more resources, and yet I still won easily. The problem was that most of them simply could not have come up with BitTorrent's architecture. Not with 20 code monkeys working under them. Not with a decade to work on it. Not after reading every available book on networking protocols. Not ever.

Clearly this isn't because BitTorrent's architecture is terribly difficult to understand. The entire approach can be understood without any really hard thinking in about an hour, with the possible exception of the state machine for the wire protocol, and even that is extremely simple as state machines go. The realy difficulty in coming up with something like BitTorrent is that it involves fundamentally rethinking all of your basic approaches. This is very difficult for humans to do. We attack any new problem we encounter with techniques we already know, and try small modifications if difficulties turn up.

My suggestion for learning software architecture is to practice. Obviously you can't practice it by doing hundreds of projects, because each one of them takes too long, but you can easily design a hundred architectures for problems which only exist on paper, and where you strive to just get the solution to work on paper. Start by modifying the requirements of a problem you're working on. What if the amount of bandwidth or CPU was a hundredth what it currently is? What if it were a thousand times? A million? What if you had a thousand times as much data? A million? A billion? What if the users were untrusted and you had to either prevent them from damaging the system or have a means of fixing things when they did? It doesn't matter if these scenarios are totally unrealistic, what matters is that they're different and that when you try to find architectures for handling them you take the inputs just as seriously as if you were about to start writing a system with those requirements for work. Try to find as many different approaches as you can, and come up with scenarios in which the stranger ones would be better.

Learning these skills takes time, but is definitely worth it. I couldn't have come up with Codeville's architecture without first having spent a lot of time working on voting algorithms. Not that voting algorithms have anything to do with version control, but the process of coming up with example scenarios and defining the behavior which should happen in each of them carries over very well.

Sat, Dec. 25th, 2004 06:35 am (UTC)
bramcohen: Re: Subversion

Well, they've now gone halfway to admitting that using webdav has been a failure. The other half would be to stop supporting the webdav version. Unfortunately for them, the webdav view of the world was the basis for how files are handled, as a result of which subversion doesn't support file renames, and never will. Don't believe the feature list. Subversion does 'renames' as a copy and a delete of the old version, as a result of which if one person moves a file and another one modifies it the change will be dropped, which is even worse behavior than cvs has.

The whole transactional file store thing is covered in Tom Lord's post diagnosing subversion. One thing not covered in that post is that the way that data structure is built on top of berkeleydb is also comically stupid, even if you assume that it's a worthwile thing to build, which it isn't.

Sat, Dec. 25th, 2004 06:45 pm (UTC)
ghudson: Re: Subversion

If you're basing your opinion of Subversion on Tom Lord's "Diagnosing" post, I hope you've also read my response, http://web.mit.edu/ghudson/thoughts/undiagnosing. (That response predates FSFS, which you should also be familiar with if you're going to be harshing on svn.)

What's comically stupid about the way we built the data structure on top of Berkeley DB? And why are you convinced that Subversion will never support true renames?

Sun, Dec. 26th, 2004 05:36 am (UTC)
bramcohen: Re: Subversion

I've read your response, and it simply concedes. Your claim that distributed operation isn't important to 'next-generation version control' is simply wrong, and to claim that it was never a goal of subversion is revisionist history.

The performance problems which have caused the switch to FSFS are caused by using berkeleydb improperly. The switch is a strange sidestep. Such performance problems and bungling attempts to fix them aren't fundamental to the architecture, or particularly important in terms of the long term viability of the project, but they are another thing to add to the list of failures.

I don't think subversion will ever support true renames first and foremost because there has been no admission of the problem. The claims that subversion supports renames have been very disingenuous, and to fix it would first require a widespread admission of the problem's scale and importance. On a technical level, supporting renames isn't extraordinarily hard, but it wasn't that hard the first time, and subversion got it completely wrong then, so presumably something would have to be different the second time to not lead to another failure.

Sun, Dec. 26th, 2004 05:55 am (UTC)
ghudson: Re: Subversion

If distributed operation is such a key feature, why is Subversion enjoying the amount of marketplace success it has? What evidence do you have that it is of such paramount importance, or that it was an initial goal of Subversion?

How do you think Subversion is using BDB improperly? Performance problems were not the reason I designed FSFS. BDB's intrinsic brittleness and limitations are the reason. BDB 4.4.x will finally address some of the brittleness issues, but not all of them, and it will still be impossible to use BDB on a remote filesystem.

Google for 'site:svn.haxx.se subversion "true rename"' and say again that you think there has been no admission of the problem. Also, justify how it is "disingenuous" to say that Subversion supports file renames when you can "svn mv foo bar" and subsequently get log information on bar crossing the rename, diff between versions of bar crossing the rename, and so forth. There's a difference between "doesn't get renames quite right" and "doesn't support renames at all."

Sun, Dec. 26th, 2004 07:16 am (UTC)
bramcohen: Re: Subversion

For what was supposedly the heir apparent to CVS, Subversion's market penetration has been pathetic. What market penetration it does have can be reasonably attributed to the enormous amount of political backing and press it's had.

Not having distributed operation is completely missing the boat. If you want to argue that subversion was missing the boat from the beginning, I'm not going to bother arguing against that.

Running off of a mounted file system is not an important feature. And my experience with berkeleydb has been that it's quite stable so long as only a single process accesses the repository at once, which is the way things work when you use a real server, the only reasonable approach.

I did that search and found a post from you specifically talking about the depth of the problem and importance of fixing it. The other developers don't seem to be as sure, and the status page indicates that renaming is to be done, but not in the immediate future, and that work hasn't been started on it. For a project which has had multiple man-decades of work on it, that's an awful lot of ignoring the problem. Still, you're right that there's more acknowledgement than there was in the past, which is an improvement, but I'm still not holding my breath.

Any reasonable definition of 'supports renames' is not covered by what subversion does, and to claim that the marginally-more-than-CVS functionality it has constitutes renames is simply dishonest.

Tue, Dec. 28th, 2004 01:58 pm (UTC)
(Anonymous): Re: Subversion

I use Subversion for my projects (one user, one computer). I've never used version control before (not even CVS) so I don't know other systems of what problems Subversion may have when you have multiple users working on the same project. I have a few issues with Subversion (disk space waste, 1.1 commit is MUCH slower than 1.0 on my WinXP), but I think it's a wonderful project. I never lost data of had corruption problems, althrough it is sometimes hard to rename some folders or to change the case of filesnames. But that could be because of the client I use (TortoiseSVN).

By the way, it was through Subversion that I found the amazing Python language. I never heard of it before this March and now I work only in it (with the occasional C++ module for critical stuff).

So you may be right that Subversion has problems, but it's excellent for little people who just start using version control.

Mon, Mar. 28th, 2005 02:01 am (UTC)
qu1j0t3: Re: Subversion

Re: changing case,
This is tricky to do with working copies on case-insensitive filesystems (Windows, Apple HFS+). But it's trivial to do this directly on the repository:

svn rename <repoURL>/FILE.c <repoURL>/file.c

This can be scripted of course, for instance you can extract the list of files from a working copy. I've used this to automate lowercasing of directories with many wrong-cased files.

Mon, Mar. 28th, 2005 02:03 am (UTC)
qu1j0t3: Re: Subversion

Now after that rant, who needs "a beer" or a nice walk in the fresh air!?

Mon, Dec. 27th, 2004 05:29 am (UTC)
omnifarious: Re: Subversion

I actually think that authentication and permssions are the biggest failings of Subversion that are a direct result of it being initially implemented on top of WebDAV. I don't mind the centralized model. I think it's the right model for most projects. And the rename problem is pretty annoying as well. But, my big beef is authentication and permissions.

Sun, Mar. 6th, 2005 05:23 am (UTC)
(Anonymous): Subversion rename problems not due to WebDAV.

The reason for Subversion's current (and regrettable) lack of true rename support has nothing to do with WebDAV. It has to do with the decision to use the same style working copies as CVS, followed by a decision to allow that working copy architecture more influence than it should have had.

If a system that does directory versioning supports renames, the obvious question is, what happens when someone commits one side of a rename but not the other? E.g.:

$ svn mv foo bar
$ svn commit bar # note that foo is not mentioned

Our answer was to break the rename down into a copy and a delete, so that if you commit just one side of it, you get either the copy or the delete -- which are already supported operations.

Now, I think this decision was a mistake. First of all, the repository should support true renames anyway, so that doing a rename without a working copy (which is totally supported) Does The Right Thing. And secondly, even in the working copy, I think it would have been better to simply disallow committing one side of a rename without the other.

My intention here is not to defend the lack of true renames, but only to point out that WebDAV had absolutely nothing to do with it.