Log in

No account? Create an account

Sun, Dec. 19th, 2004, 06:33 pm
Great Programmers

I've seen a lot of discussion of great programmers, usually centering on how to find them, but usually what people really want to know is how to become one. Since I'm widely considered to be a great programmer, I'll give some advice.

First of all there's raw coding ability. For this, practice makes perfect. Implementing lots of algorithms from, say Introduction to Algorithms can help sharpen your technical abilities, but really the important thing is to have some experience. Anyone with enough natural talent will get good at basic raw coding.

There are only two coding skills which mostly people who are completely self-taught as a programmer miss out on: proper encapsulation, and unit tests. For proper encapsulation, you should organize your code so that changes which require modifying code in more than one module are as rare as possible, and for unit tests you should write them to be pass/fail so that all unit tests can be run as a comprehensive suite. And now you know everything you need to about those two things. Anyone who is taught the above guidelines, and decides they really want to learn those skills, will with sufficient practice become good at them.

Coding skill is all well and good, and you can't become a great programmer without it, but it's far from everything. I'm decent at raw coding, but I know many people who are better, and some of them are abysmal programmers. I in particular can't deal with being tasked with fixing up spaghetti code. My brain simply locks down and refuses to make any modifications which it isn't convinced will work, which is of course impossible when the source material is an incurably bug-ridden mess.

What truly separates the great programmers from the journeyman programmers is architecture. What's puzzling is that architecture appears to be one of the simplest parts of the whole process, requiring in most cases little more than some pencil and paper calculations and a willingness to change.

The simplest architectural problems to solve are the ones which for lack of a better theory most people ascribe to emotional or psychological problems. These are decisions for which there's no rational justification whatsoever. For example, writing a non-speed-critical program (which is most of them) in C or C++. A few years ago you could justify that because the other languages didn't have such extensive libraries, but today it's ludicrous. Another one is building one's protocol as a layer on top of webdav. And another one is building a transactional system for retrieving any subsection of any point in the history of an arbitrarily large file in constant time when that isn't part of project requirements. Yes, I'm making fun of subversion here. It's a great example of a project permanently crippled by dumb architectural decisions.

Half of these 'emotional' architectural decisions are dogmatically using a past practice in situations where it's inapplicable. The other half are working on interesting problems which have little or no utility in the finished product. Once decisions like these have been made, questioning them can become a political impossibility. If someone new comes in to a project with many man-years on it, and in their first week learns that there's a networking call which includes a parameter as to whether it should be blocking or non-blocking, and immediately declares that the entire codebase is a mess and difficult if not impossible to maintain, they'll almost certainly be correct and justified, but their opinion will likely be disregarded as as brash and ill-informed. After all, they haven't spent the kind of time on the codebase than everybody else. I've actually had this happen to me, and while others have claimed that there are more political ways of approaching such problems, my experience has been that once the truth becomes unthinkable a couple people need to get fired before any improvement can be made.

My advice about technically unjustifiable architectural decisions is to not do them. If you find yourself doing them, you probably need to get laid or see a shrink or have a beer.

But what if you're emotionally well-adjusted, and want to get better at software architecture? Logging more hours at work will get you nowhere. When I wrote BitTorrent multiple other people were working on the exact same problem, most of them with a big head start and a lot more resources, and yet I still won easily. The problem was that most of them simply could not have come up with BitTorrent's architecture. Not with 20 code monkeys working under them. Not with a decade to work on it. Not after reading every available book on networking protocols. Not ever.

Clearly this isn't because BitTorrent's architecture is terribly difficult to understand. The entire approach can be understood without any really hard thinking in about an hour, with the possible exception of the state machine for the wire protocol, and even that is extremely simple as state machines go. The realy difficulty in coming up with something like BitTorrent is that it involves fundamentally rethinking all of your basic approaches. This is very difficult for humans to do. We attack any new problem we encounter with techniques we already know, and try small modifications if difficulties turn up.

My suggestion for learning software architecture is to practice. Obviously you can't practice it by doing hundreds of projects, because each one of them takes too long, but you can easily design a hundred architectures for problems which only exist on paper, and where you strive to just get the solution to work on paper. Start by modifying the requirements of a problem you're working on. What if the amount of bandwidth or CPU was a hundredth what it currently is? What if it were a thousand times? A million? What if you had a thousand times as much data? A million? A billion? What if the users were untrusted and you had to either prevent them from damaging the system or have a means of fixing things when they did? It doesn't matter if these scenarios are totally unrealistic, what matters is that they're different and that when you try to find architectures for handling them you take the inputs just as seriously as if you were about to start writing a system with those requirements for work. Try to find as many different approaches as you can, and come up with scenarios in which the stranger ones would be better.

Learning these skills takes time, but is definitely worth it. I couldn't have come up with Codeville's architecture without first having spent a lot of time working on voting algorithms. Not that voting algorithms have anything to do with version control, but the process of coming up with example scenarios and defining the behavior which should happen in each of them carries over very well.
(Screened comment)

Tue, Dec. 21st, 2004 08:03 am (UTC)
bramcohen: Re: Hey Bram

That depends what sort of programmer you wish to become. If you want to dabble in computers because computers seem important to you, I suggest learning Python. If you want to learn computers because you're fascinated by their inner workings, I suggest learning C (not C++, that has a lot of extraneous cruft). Most people fall into the first category.

BitTorrent took two years of full time work to get to the not sucking stage, and another year to become reasonably mature.

Tue, Dec. 21st, 2004 05:59 am (UTC)
(Anonymous): Codeville architecture

So how about a post about Codeville architecture, then, eh? ;-)

Your competito^Wcolleagues are curious!

(I guess the answer might be "then go to CodeCon", but while Graydon and I talked about it some, he's busy then and I was busy now (and don't really have $80 to throw at it anyway), so no Monotone submission. Maybe some other time.)

-- Nathaniel Smith <njs@pobox.com>

Tue, Dec. 21st, 2004 08:11 am (UTC)
bramcohen: Re: Codeville architecture

Codeville's documentation, especially its architectural documentation, has lagged far behind its implementation, mostly because implementation is a higher priority that documentation, especially when we haven't even hit 1.0 yet, and there's still the occasional significant change.

What I'd really like to see is a paper comparing the architectures of Darcs, Monotone, and Codeville, although I'm not sure that there's a single person who's grokked two out of those three systems.

Are you local to the San Francisco area?

Tue, Dec. 21st, 2004 10:19 am (UTC)
darkcode: Age and development

I can't help wondering- I'm 16, and in high school in San Diego, CA. I've been programming for perhaps 2 years, and have gotten fairly good at it for a kid, but nothing compared to most of the programmers I meet and talk to. After studying code in college, for example, does writing applications (especially with GUIs) get easier? Or should I give up for "not having the gift"? I know you're not a counselor, but answering this question would mean quite a lot to me.

- David

Tue, Dec. 21st, 2004 10:49 am (UTC)
(Anonymous): Re: Age and development


Tue, Dec. 21st, 2004 03:52 pm (UTC)
(Anonymous): Re: Age and development

I learned how to program GUIs while I was still in high school. In fact, I had only known C++ for 5 months at the time. I think the key is the API you choose. I learned the BeOS API which was(is?) quite easy to wrap one's head around. I would recommend Qt nowadays.
As for the college question, nothing I learned in college really applies to GUI programming. College exposed me to many other languages and programming paradigms. While those are good experience, they aren't quite the same as learning an API.
So really, what I'm trying to say is that you are as ready to learn GUI programming now as you ever will be. Studying programming in in college is just 4 years of practice with fundamental theory thrown in for good measure.
Bram... - (Anonymous) - Expand
Re: Bram... - (Anonymous) - Expand

Sat, Dec. 25th, 2004 05:38 am (UTC)
(Anonymous): Subversion

Another one is building one's protocol as a layer on top of webdav. And another one is building a transactional system for retrieving any subsection of any point in the history of an arbitrarily large file in constant time when that isn't part of project requirements. Yes, I'm making fun of subversion here. It's a great example of a project permanently crippled by dumb architectural decisions.

Subversion also has a non-webdav server "svnserve" which uses the TCP based svnserve protocol. It's much faster. Many open source project repositories use plain svnserve. Not sure what you meant with your comments on transactions.

Sat, Dec. 25th, 2004 06:35 am (UTC)
bramcohen: Re: Subversion

Well, they've now gone halfway to admitting that using webdav has been a failure. The other half would be to stop supporting the webdav version. Unfortunately for them, the webdav view of the world was the basis for how files are handled, as a result of which subversion doesn't support file renames, and never will. Don't believe the feature list. Subversion does 'renames' as a copy and a delete of the old version, as a result of which if one person moves a file and another one modifies it the change will be dropped, which is even worse behavior than cvs has.

The whole transactional file store thing is covered in Tom Lord's post diagnosing subversion. One thing not covered in that post is that the way that data structure is built on top of berkeleydb is also comically stupid, even if you assume that it's a worthwile thing to build, which it isn't.
Re: Subversion - (Anonymous) - Expand

Sun, Dec. 26th, 2004 07:29 am (UTC)

People regularly ask for the ability to run a repository on a remote-mounted filesystem (or worse, they try, using BDB, and get bizarre failures). Baldly asserting that it's not an important feature doesn't make it so.

Anyway, go ahead and finish writing Codeville. I think you'll find that a version control system that scales beyond basement projects is a lot tougher than you think, and that the vast majority of the user base doesn't have the same priorities as you do. If you're right and I'm wrong, I'm sure you'll be able to revisit this thread in a few years and be proud.

(In your review of the google results, you seem to have missed, among many other posts, http://svn.haxx.se/dev/archive-2003-01/1199.shtml where Karl says "I think everyone agrees that true rename would be preferable than copy+delete." It's not a high priority, but it's definitely on the slate for the future.)

Sat, Jan. 1st, 2005 06:03 pm (UTC)

Finally I'm glad to see that someone of your caliber is sharing my opinions about software development. It's basically comes down to choosing the right approach (from practice and experience) and the suitable environment (e.g. the proper programming language for the job). I also think that 'evolutionary' development with rapid prototyping is sometimes most likely to brings you closer to the solution instead of a full-fledged top-bottom design.

Sat, Mar. 26th, 2005 10:15 am (UTC)

really good words

Fri, Nov. 4th, 2005 01:50 pm (UTC)
mr_mediocracy: How are architectural decisions made?

Hello there,

I try to grasp the meaning of your statement about "emotional or psychological problems" and being "emotionally well-adjusted" when doing architectural decisions.

Could you please elaborate on what you mean with that. How do emotions lead to technically unjustifiable architectural decisions?


PS: Maybe I just did not get your point because Englisch is not my native tongue. But I tried hard :-)

Fri, Nov. 4th, 2005 07:05 pm (UTC)
bramcohen: Re: How are architectural decisions made?

The most common problem is that people can't admit when they're wrong

Sat, Jan. 19th, 2008 05:57 pm (UTC)
sotomax: Two skills?

There are only two coding skills which mostly people who are completely self-taught as a programmer miss out on: proper encapsulation, and unit tests.

While this is true, I believe that the top two skill that self-taught programmers are missing are documentation and naming, and decomposition. Good naming and good documentation can only be learned from working with others. Just yesterday I was reviewing someone's class for our new team (call it team Foo).

He called it FooEngine. I pointed out that we're already in a "Foo" directory -- and this is general code not specific to our team -- so he changed it to Engine. If you were a stranger and you saw a class called Engine, would you have the faintest idea what it was doing? It actually exports individual lines of a CSV file, with retries... if we weren't going to break it up into three classes (see below) then I'd call it ExportCsvLinesWithRetry.

Decomposition is another thing you rarely learn on your own. If only one person is working on the code at a time, you have to pay a moderate cost for decomposing your design into disconnected parts, and you don't see the benefit. But a key to cooperation with others and to maintainable software in general is to decompose the design into the smallest reasonable pieces.

In the case above, when you split up the Engine into three parts, it turned out that there were good solutions to each of the three parts already in the codebase. So in fact nothing might need to be written, but more likely, we'll add some useful new functionality into a couple of places where other people can use it.