Awful Programming Advice

I just came across a blog post going over that old saw of Object Oriented Design, a square being a subclass of a rectangle.

This advice is worse than useless. It's either wrongheaded or meaningless, depending on how literally you take it.

Taken literally, it would never make sense to make a full-blown class for such a trivial piece of functionality. There simply would be more lines of code taken up making declarations than could possibly be saved by convenience. Taken less literally, it's just gibberish, a completely nonsensical way of thinking about the problem, like teaching a drawing class where you cover pentagrams.

So what would be a sane way of building this functionality? Well, first you have to decide what the functionality actually is, since there wasn't any actual functionality in the first example. A reasonable set of functionality would be polygons. Polygons can be rotated, translated, and scaled, and you can take their union and intersection with other polygons, and calculate their area. This is a nontrivial set of functionality which it makes sense to encapsulate. How then to make rectangles and squares? The simplest way is to have two convenience functions - one which builds rectangles, and one which builds squares. Or just make the convenience function accept a variable number of parameters, and if it only gets one to return a square.

But this example doesn't use any subclassing! I can hear the OOP gurus exclaiming now. How are people supposed to learn subclassing if you don't give them any examples of it? This is a deranged attitude. Subclassing is not an end in and of itself, it's a technique which is occasionally handy. And I'll let you in on a little secret - I personally almost never use subclassing. It's not that I one day decided that subclassing is bad and that one should avoid it, it's that as I got better at coming up with simple designs I wound up using it less and less, until eventually I almost stopped using it entirely. Subclassing is, quite simply, awkward. Any design which uses subclassing should be treated with skepticism. Any design which requires subclassing across encapsulation boundaries should be assumed to be a disaster.

Unfortunately this is hardly atypical of introductory object oriented design examples. Even the ones which are more real world tend to be awful. Take, for example, networking APIs. A typical example of an API is one which allows the creation of sockets, with blocking read calls and maybe-blocking write calls. The first few versions of Java had this sort of API exclusively. This approach is simple, seems logical to people unfamiliar with it, and is an unmitigated disaster in practice. It leads to a ton of threads floating around, with ridiculous numbers of race conditions, and awful performance because of all the threads swapping in and out. Such awfulness unfortunately hasn't stopped it from being the default way people are shown how to do things.

So what would be a better API? This is something I have a lot of experience with, so I'll give a brief summary. I'm glossing over some details here, but some of that functionality, like half-open sockets, is perhaps best not implemented.

The networking object constructor takes a callback function. Each socket is referred to using an identifier (yes, that's right, an identifier, they don't warrant individually having objects).

The methods of the networking object are as follows:

Start listening on a port

Stop listening on a port

make a new outgoing connection (returns the connection id)

write data to a socket (returns the number of bytes written)

check if a socket has buffered data to write

read data from a socket

close a socket

Here are the methods of the callback:

incoming socket created

socket closed

socket has data to be read

socket has flushed all write data from buffer

You've probably noticed that this API, while simple, isn't completely trivial and there is considerable subtlety to what exactly all the methods mean. That is an important point. For functionality of less that this level of complexity object orientation is generally speaking a bad idea, and trying to force simpler examples in the name of instruction results in poor and misleading instruction.

Unfortunately not all asynchronous networking APIs are in agreement with my example, which really says something about the state of the art in software design. I daresay that if Twisted followed this example straightforwardly (it does something similar, but awkwardly and obfuscatedly) then Tornado would never have been written.

Logging decorators in Python

I've been doing a lot of debugging lately, and have been using this bit of code to great effect:

from traceback import extract_stack, format_list
ti = 0.0

class rec:
    def __init__(self):
        self._his = []

    def __setattr__(self, key, value):
        if key != '_his':
            self._his.append((key, value, ti, extract_stack()[:-1]))
        self.__dict__[key] = value

    def dump(self):
        for key, value, t, stack in self._his:
            print '%s = %s' % (key, value)
            print 'time %f' % t
            print ''.join(format_list(stack))
 
To use this, I make a struct-like class subclass rec, call rec's init from its init, and then call dump() when I want to know what's happened to the particular object, and it displays all assignments which have happened to it, and the virtual time when they happened (I'm running a lot of simulations) and where the assignment took place.

Extremely useful, but missing in some key areas:

It should really be a decorator, rather than a base class

If someone retrieves a primitive (list or dict), then it should return a wrapped version of that, so if someone says, for example, myobj.stuff[3] = 4 then the logging should record exactly that. Right now such assignment are simply not logged, which sucks.

The virtual time should be less of a hack than a global. Maybe getting a virtual time by associating off the current thread would work. This is less general-purpose than the other functionality here, but I find it quite useful.

Can anybody please flesh out this functionality, or point me to something which does it already? I'd like for this to be part of my standard debugging toolbox, the first thing I go to when a simulation I'm running hits a complex and hard to track down bug.

Tracker Ball Puzzle


People seemed to not understand that these puzzles are collaborations between me and Oskar, even though he clearly explains it in the videos, so I did the presentation in this one.

Someone at Mozilla Foundation needs to be fired

Somebody at Mozilla decided they need lots of 'true' random numbers.

My patience for this subject completely ran out about five years ago, so this post is going to show a complete lack of diplomacy. I would like to emphasize, in advance, that this is my honest, reasoned opinion, not said in anger, and that if you ask my opinion again in the future I'll say the exact same thing.

Once a computer has collected a small number of 'true' random bits (maybe it's 128, maybe it's 256, but regardless it's small) there's no need whatsoever for it to block on collecting more 'random' numbers. A pseudorandom number generator based on AES will be able to generate random numbers based on that seed until the end of the universe and noone constrained by the laws of physics and math will ever be able to tell the difference between that and 'true' random numbers. This is extremely well established cryptography. To require 'true' random numbers is, to use an apt analogy, wankery. It does not, and cannot, do anything to improve security, and it mostly just causes huge amounts of pain. It is (and I repeat myself, because I have a hunch people will think I'm glossing over some nuance here) of no benefit whatsoever.

My advice to the Mozilla foundation (and again, this is my reasoned opinion, not said in anger, and I won't be changing my mind later): find out who was responsible for this policy of requiring lots of 'true' random numbers, and fire them. Fire them today. They have demonstrated gross incompetence, a total lack of understanding of the very most basic concepts in security.

Some people might think that if I knew more about who was behind this and what their specific motivations are, then that might change my mind. That is incorrect. The security field is filled with people who to non-experts seem very impressive and knowledgeable, especially when they're advocating, and even moreso demanding, very painful and difficult things in the name of security. Most of these people are frauds. I have had it with paying homage to the concept of impartiality when discussing these peoples's opinions. If someone spouts a bunch of technical mumbo-jumbo to bring the conversation to a place which a lay person has trouble understanding, then they may be able to make the argument be based on pure rhetoric, but gross incompetence is still gross incompetence, and despite having found an effective way to bullshit their way through, they're still wrong.

Ludology in City of Heroes

City of Heroes has had some interesting issues with its gameplay, involving a character named Twixt using some tactics which made everybody hate him.

Several years ago I happened to be seated next to the designer of City of Heroes at an event where he won something. He was a pudgy guy, wearing big round glasses, with a white city of heroes t-shirt and a blue cape on. We got into a conversation about his game, and I asked what it was that made it compelling, and he said that it's every kid's fantasy of being a superhero, and it was very obvious that he'd based the game's design on his own. I asked him if City of Heroes is compelling as a pure abstract game, and the interesting response was that he didn't understand the question. After a few minutes of conversation he got what I was asking, and his answer, which really perplexed me at the time, was that it was a good question, but he didn't know.

Consider a game with the following semantics: You sit, unmoving, for two hours, with no user feedback, no buttons to push, nothing, completely passive, while the game plays out in front of you, exactly the same way as it would for anybody else. This sounds like a terrible game, but it's exactly what movies are, and movies are very popular and get little criticism that they're terrible games.

The Twixt problem was caused not so much by any one person behaving unreasonably as the game engine having a problem. There's a battle tactic which is quite effective but has the effect of wiping out an enemy without even giving them a chance to play, making it not much fun for them and it doesn't even get much credit for you. Because City of Heroes is more fantasy than game, players have a convention of not using this tactic, because that maximizes the fun of play. This is done at the expense of an individual's success taking the game as a sport, but since the game isn't a sport, people don't worry about that too much. Real sports don't involve dressing up as superheroes (except for figure skating, but that isn't a real sport). What really should be done is that the rules should be modified so that the particular tactic isn't so nasty. It's a general rule of game design that all players should get a chance to play and have fun, even if they aren't very good, and tactics which allow a better player to win without the weaker player even having a chance to try to retaliate are no fun.

Bandwidth fundamentals

A random person asks about something they read on Wikipedia:

Example from wiki below:

http://en.wikipedia.org/wiki/Bram_Cohen

quote from site:

MojoNation allows people to break up confidential files into encrypted chunks and distribute those pieces on computers also running the software. If someone wanted to download a copy of this encrypted file, he would have to download it simultaneously from many computers. This concept, Cohen thought, was perfect for a file sharing program, since programs like KaZaA take a long time to download a large file because the file is (usually) coming from one source (or "peer"). Cohen designed BitTorrent to be able to download files from many different sources, thus speeding up the download time, especially for users with faster download than upload speeds. Thus, the more popular a file is, the faster a user will be able to download it, since many people will be downloading it at the same time, and these people will also be uploading the data to other users.

This explanation was lifted from an actual new article, which doesn't necessarily mean it's true. In fact, it's somewhere between grossly misleading and wrong.

There's a classic fallacy because if one person stands up during a concert they get a better view, then if everybody stood up during a concert they'd all get a better view. This is of course is not true - they wind up slightly worse off by all standing, because they all compete with each other for a view. The same thing happens with downloading from a server. In general, web servers will give about the same rate to every client downloading from them, so if you open many more connections than everybody else you get a greater proportion of the bandwidth and hence a better rate. But you do so simply by taking bandwidth from other downloaders. The overall supply of upload is unchange, it's simply being shuffled around. If everybody does the same thing it results in overall slightly worse performance and you're basically back where you started, but with a bunch of headaches tacked on.
 
So why does BitTorrent perform so well? Quite simply, because it does a better job of finding more places to do uploading. Any peer which is downloading is in general willing to upload as well, and their uplink is usually unutilized, so if you can get a peer to start uploading as soon as it starts downloading, and keep uploading as long as possible, and saturate its link while it's uploading, then overall performance will be better. It doesn't necessarily help to transfer over more connections, or make more different things available at the same time, or use error correcting codes. In fact, all of those are a complex tradeoff between benefits and costs, with the net result being that small amounts of them can help reliability and robustness, but in general it's good to keep things simple and be polite to the network.

On the internet, the formula is bytes downloaded = bytes uploaded. It's that simple.

The Uncanny Cube



From deep in the uncanny valley of the Rubik's Cube, it's the Uncanny Cube. At first blush this appears to be a slight variant, but it is quite profoundly and perversely different.

Bram's Cube



This is Bram's Cube, an idea I'm very fond of. It's very interesting to solve, since the middle layer and everything else can be thought of independently and solved on their own, but that scrambles the part you weren't thinking of.