December 19th, 2004


A few people have asked me about adding streaming ability to BitTorrent.

Streaming could be supported in the current protocol (which makes me a little puzzled by the recent swarmcast announcement stating that a new protocol is being pushed out specifically to support streaming). You 'just' request the next few pieces coming up first, and if you have the next few pieces request randomly.

I haven't worked on streaming for several reasons. First, there have been plenty of performance optimizations to do even without working on streaming, and enhancements to core functionality take priority. Second, the sweet spot for streaming is fairly small. Lots of formats can't be displayed in real time at current speeds of net connections, and among the ones that can it's frequently not much longer to wait for the entire thing to download than to wait for a reasonable size of buffer to accumulate. The one example which clearly works is low quality sound files of an hour or so in length, but that's a fairly niche market. Third, there's the whole issue of how to hook into players to have them use the streamed content. My experiences with third party software have been universally nightmarish, so I'm not very keen on dealing with those problems.

I'd also like to point out the difference between radio-style streaming and play on demand. A lot of people think because of the history that being stuck on a TV schedule is what people want, but in fact people vastly prefer a tivo/netflix interface to a real-time streaming interface, and frequently completely dump the real time when the other one becomes available. Play on demand is when there's a specific file which someone might want to watch and it starts playing immediately when they click on it. That's the vast bulk of what users want. Tapping into whatever's going on in a live stream is a niche market.

Great Programmers

I've seen a lot of discussion of great programmers, usually centering on how to find them, but usually what people really want to know is how to become one. Since I'm widely considered to be a great programmer, I'll give some advice.

First of all there's raw coding ability. For this, practice makes perfect. Implementing lots of algorithms from, say Introduction to Algorithms can help sharpen your technical abilities, but really the important thing is to have some experience. Anyone with enough natural talent will get good at basic raw coding.

There are only two coding skills which mostly people who are completely self-taught as a programmer miss out on: proper encapsulation, and unit tests. For proper encapsulation, you should organize your code so that changes which require modifying code in more than one module are as rare as possible, and for unit tests you should write them to be pass/fail so that all unit tests can be run as a comprehensive suite. And now you know everything you need to about those two things. Anyone who is taught the above guidelines, and decides they really want to learn those skills, will with sufficient practice become good at them.

Coding skill is all well and good, and you can't become a great programmer without it, but it's far from everything. I'm decent at raw coding, but I know many people who are better, and some of them are abysmal programmers. I in particular can't deal with being tasked with fixing up spaghetti code. My brain simply locks down and refuses to make any modifications which it isn't convinced will work, which is of course impossible when the source material is an incurably bug-ridden mess.

What truly separates the great programmers from the journeyman programmers is architecture. What's puzzling is that architecture appears to be one of the simplest parts of the whole process, requiring in most cases little more than some pencil and paper calculations and a willingness to change.

The simplest architectural problems to solve are the ones which for lack of a better theory most people ascribe to emotional or psychological problems. These are decisions for which there's no rational justification whatsoever. For example, writing a non-speed-critical program (which is most of them) in C or C++. A few years ago you could justify that because the other languages didn't have such extensive libraries, but today it's ludicrous. Another one is building one's protocol as a layer on top of webdav. And another one is building a transactional system for retrieving any subsection of any point in the history of an arbitrarily large file in constant time when that isn't part of project requirements. Yes, I'm making fun of subversion here. It's a great example of a project permanently crippled by dumb architectural decisions.

Half of these 'emotional' architectural decisions are dogmatically using a past practice in situations where it's inapplicable. The other half are working on interesting problems which have little or no utility in the finished product. Once decisions like these have been made, questioning them can become a political impossibility. If someone new comes in to a project with many man-years on it, and in their first week learns that there's a networking call which includes a parameter as to whether it should be blocking or non-blocking, and immediately declares that the entire codebase is a mess and difficult if not impossible to maintain, they'll almost certainly be correct and justified, but their opinion will likely be disregarded as as brash and ill-informed. After all, they haven't spent the kind of time on the codebase than everybody else. I've actually had this happen to me, and while others have claimed that there are more political ways of approaching such problems, my experience has been that once the truth becomes unthinkable a couple people need to get fired before any improvement can be made.

My advice about technically unjustifiable architectural decisions is to not do them. If you find yourself doing them, you probably need to get laid or see a shrink or have a beer.

But what if you're emotionally well-adjusted, and want to get better at software architecture? Logging more hours at work will get you nowhere. When I wrote BitTorrent multiple other people were working on the exact same problem, most of them with a big head start and a lot more resources, and yet I still won easily. The problem was that most of them simply could not have come up with BitTorrent's architecture. Not with 20 code monkeys working under them. Not with a decade to work on it. Not after reading every available book on networking protocols. Not ever.

Clearly this isn't because BitTorrent's architecture is terribly difficult to understand. The entire approach can be understood without any really hard thinking in about an hour, with the possible exception of the state machine for the wire protocol, and even that is extremely simple as state machines go. The realy difficulty in coming up with something like BitTorrent is that it involves fundamentally rethinking all of your basic approaches. This is very difficult for humans to do. We attack any new problem we encounter with techniques we already know, and try small modifications if difficulties turn up.

My suggestion for learning software architecture is to practice. Obviously you can't practice it by doing hundreds of projects, because each one of them takes too long, but you can easily design a hundred architectures for problems which only exist on paper, and where you strive to just get the solution to work on paper. Start by modifying the requirements of a problem you're working on. What if the amount of bandwidth or CPU was a hundredth what it currently is? What if it were a thousand times? A million? What if you had a thousand times as much data? A million? A billion? What if the users were untrusted and you had to either prevent them from damaging the system or have a means of fixing things when they did? It doesn't matter if these scenarios are totally unrealistic, what matters is that they're different and that when you try to find architectures for handling them you take the inputs just as seriously as if you were about to start writing a system with those requirements for work. Try to find as many different approaches as you can, and come up with scenarios in which the stranger ones would be better.

Learning these skills takes time, but is definitely worth it. I couldn't have come up with Codeville's architecture without first having spent a lot of time working on voting algorithms. Not that voting algorithms have anything to do with version control, but the process of coming up with example scenarios and defining the behavior which should happen in each of them carries over very well.