Log in

No account? Create an account

Mon, Sep. 14th, 2009, 04:21 pm
Awful Programming Advice

I just came across a blog post going over that old saw of Object Oriented Design, a square being a subclass of a rectangle.

This advice is worse than useless. It's either wrongheaded or meaningless, depending on how literally you take it.

Taken literally, it would never make sense to make a full-blown class for such a trivial piece of functionality. There simply would be more lines of code taken up making declarations than could possibly be saved by convenience. Taken less literally, it's just gibberish, a completely nonsensical way of thinking about the problem, like teaching a drawing class where you cover pentagrams.

So what would be a sane way of building this functionality? Well, first you have to decide what the functionality actually is, since there wasn't any actual functionality in the first example. A reasonable set of functionality would be polygons. Polygons can be rotated, translated, and scaled, and you can take their union and intersection with other polygons, and calculate their area. This is a nontrivial set of functionality which it makes sense to encapsulate. How then to make rectangles and squares? The simplest way is to have two convenience functions - one which builds rectangles, and one which builds squares. Or just make the convenience function accept a variable number of parameters, and if it only gets one to return a square.

But this example doesn't use any subclassing! I can hear the OOP gurus exclaiming now. How are people supposed to learn subclassing if you don't give them any examples of it? This is a deranged attitude. Subclassing is not an end in and of itself, it's a technique which is occasionally handy. And I'll let you in on a little secret - I personally almost never use subclassing. It's not that I one day decided that subclassing is bad and that one should avoid it, it's that as I got better at coming up with simple designs I wound up using it less and less, until eventually I almost stopped using it entirely. Subclassing is, quite simply, awkward. Any design which uses subclassing should be treated with skepticism. Any design which requires subclassing across encapsulation boundaries should be assumed to be a disaster.

Unfortunately this is hardly atypical of introductory object oriented design examples. Even the ones which are more real world tend to be awful. Take, for example, networking APIs. A typical example of an API is one which allows the creation of sockets, with blocking read calls and maybe-blocking write calls. The first few versions of Java had this sort of API exclusively. This approach is simple, seems logical to people unfamiliar with it, and is an unmitigated disaster in practice. It leads to a ton of threads floating around, with ridiculous numbers of race conditions, and awful performance because of all the threads swapping in and out. Such awfulness unfortunately hasn't stopped it from being the default way people are shown how to do things.

So what would be a better API? This is something I have a lot of experience with, so I'll give a brief summary. I'm glossing over some details here, but some of that functionality, like half-open sockets, is perhaps best not implemented.

The networking object constructor takes a callback function. Each socket is referred to using an identifier (yes, that's right, an identifier, they don't warrant individually having objects).

The methods of the networking object are as follows:

Start listening on a port

Stop listening on a port

make a new outgoing connection (returns the connection id)

write data to a socket (returns the number of bytes written)

check if a socket has buffered data to write

read data from a socket

close a socket

Here are the methods of the callback:

incoming socket created

socket closed

socket has data to be read

socket has flushed all write data from buffer

You've probably noticed that this API, while simple, isn't completely trivial and there is considerable subtlety to what exactly all the methods mean. That is an important point. For functionality of less that this level of complexity object orientation is generally speaking a bad idea, and trying to force simpler examples in the name of instruction results in poor and misleading instruction.

Unfortunately not all asynchronous networking APIs are in agreement with my example, which really says something about the state of the art in software design. I daresay that if Twisted followed this example straightforwardly (it does something similar, but awkwardly and obfuscatedly) then Tornado would never have been written.

Thu, Sep. 17th, 2009 11:06 pm (UTC)

Hi Bram,

That is a terrible example of object orientated design and inheritance, but there was one even more obvious flaw that wasn't pointed out.

Inheritance should express an "is-a" relationship: *Everything that is true of the base should also be true of the derived*. This is not the case here, because a rectangle can have an arbitrary width and height, but a square cannot - its width and height must be equal.

The last time I read this exact same example (in Scott Meyer's "Effective C++" I believe), it was as an example of how something that was mathematically or intuitively true (a square "is-a" rectangle) doesn't hold in practice for object orientated design - you can manipulate a square through a rectangle interface (pointer/reference), and make one side longer than the other, violating the very invariant that makes a square different to a rectangle. I guess you could throw an exception and enforce it that way, but I wouldn't. I'd consider that wrong headed.

Could you please elaborate on the remark "Any design which requires subclassing across encapsulation boundaries should be assumed to be a disaster"?

I strongly agree that object orientated programming isn't the be all and end all.

Peter Geoghegan

Thu, Sep. 17th, 2009 11:33 pm (UTC)

The canonical example of an API which requires subclassing across boundaries is 'to make a new button, subclass the Button class and override the handle_click() method', which was long considered the standard and reasonable way of doing that.

Fri, Sep. 18th, 2009 09:51 pm (UTC)

Trivia that may be useful in job interviews: The "everything that is true of the base should also be true of the derived" principle that I stressed in my original post is called the Liskov substitution principle. I can't think of any reason why anyone would ever want to violate it, so I consider it an immutable rule.

Virtual functions, as I understand them, represent a hook to customise some functionality of a class (a role that is sometimes in conflict with their other role as interfaces to the classes functionality if they're non-private, hence the NVI idiom, where all virtual functions are private). Inheritance is most often misunderstood as a way of re-using a base class's functionality, when in fact it is a way of facilitating re-use by other code which may have been written before the derived class, using a base class reference. If you want to re-use, just use simple composition. To quote someone else, "when dynamic polymorphism is correct and appropriate, composition is selfish; inheritance is generous".

I wouldn't consider the design you described that "requires subclassing across encapsulation boundaries" to be necessarily a disaster, except insofar as it requires you to subclass a class when there is invariably a far simpler, more succinct way of achieving the same end through the use of signals and slots, messages, events, etc. I can't really put my finger on what you mean beyond that, if in fact you do mean anything beyond that. Do you?

I'm a C++ weenie, which admittedly informs my opinions about object orientated design. To clarify my earlier remarks: Yes, object orientation and inheritance are overrated, but that's only because they're widely considered to be the alpha and the omega. I still think they're very important.