Mon, Sep. 14th, 2009, 04:21 pm
Awful Programming Advice
I just came across a blog post going over that old saw of Object Oriented Design, a square being a subclass of a rectangle.
This advice is worse than useless. It's either wrongheaded or meaningless, depending on how literally you take it.
Taken literally, it would never make sense to make a full-blown class for such a trivial piece of functionality. There simply would be more lines of code taken up making declarations than could possibly be saved by convenience. Taken less literally, it's just gibberish, a completely nonsensical way of thinking about the problem, like teaching a drawing class where you cover pentagrams.
So what would be a sane way of building this functionality? Well, first you have to decide what the functionality actually is, since there wasn't any actual functionality in the first example. A reasonable set of functionality would be polygons. Polygons can be rotated, translated, and scaled, and you can take their union and intersection with other polygons, and calculate their area. This is a nontrivial set of functionality which it makes sense to encapsulate. How then to make rectangles and squares? The simplest way is to have two convenience functions - one which builds rectangles, and one which builds squares. Or just make the convenience function accept a variable number of parameters, and if it only gets one to return a square.
But this example doesn't use any subclassing! I can hear the OOP gurus exclaiming now. How are people supposed to learn subclassing if you don't give them any examples of it? This is a deranged attitude. Subclassing is not an end in and of itself, it's a technique which is occasionally handy. And I'll let you in on a little secret - I personally almost never use subclassing. It's not that I one day decided that subclassing is bad and that one should avoid it, it's that as I got better at coming up with simple designs I wound up using it less and less, until eventually I almost stopped using it entirely. Subclassing is, quite simply, awkward. Any design which uses subclassing should be treated with skepticism. Any design which requires subclassing across encapsulation boundaries should be assumed to be a disaster.
Unfortunately this is hardly atypical of introductory object oriented design examples. Even the ones which are more real world tend to be awful. Take, for example, networking APIs. A typical example of an API is one which allows the creation of sockets, with blocking read calls and maybe-blocking write calls. The first few versions of Java had this sort of API exclusively. This approach is simple, seems logical to people unfamiliar with it, and is an unmitigated disaster in practice. It leads to a ton of threads floating around, with ridiculous numbers of race conditions, and awful performance because of all the threads swapping in and out. Such awfulness unfortunately hasn't stopped it from being the default way people are shown how to do things.
So what would be a better API? This is something I have a lot of experience with, so I'll give a brief summary. I'm glossing over some details here, but some of that functionality, like half-open sockets, is perhaps best not implemented.
The networking object constructor takes a callback function. Each socket is referred to using an identifier (yes, that's right, an identifier, they don't warrant individually having objects).
The methods of the networking object are as follows:
Start listening on a port
Stop listening on a port
make a new outgoing connection (returns the connection id)
write data to a socket (returns the number of bytes written)
check if a socket has buffered data to write
read data from a socket
close a socket
Here are the methods of the callback:
incoming socket created
socket has data to be read
socket has flushed all write data from buffer
You've probably noticed that this API, while simple, isn't completely trivial and there is considerable subtlety to what exactly all the methods mean. That is an important point. For functionality of less that this level of complexity object orientation is generally speaking a bad idea, and trying to force simpler examples in the name of instruction results in poor and misleading instruction.
Unfortunately not all asynchronous networking APIs are in agreement with my example, which really says something about the state of the art in software design. I daresay that if Twisted followed this example straightforwardly (it does something similar, but awkwardly and obfuscatedly) then Tornado would never have been written.
Tue, Sep. 15th, 2009 12:30 am (UTC)
I think there are three approaches to software design. Data oriented, algorithm oriented and object oriented. I can tell that you are an algorithm oriented designer like I am. A data designer thinks the design is done when all the SQL tables are designed, and an object designer thinks the design is done when the object model is complete. You and I, we like to know how things actually work :-) The data structures and classes we end up with are a by product of the design, not the design in itself.
Tue, Sep. 15th, 2009 01:30 am (UTC)
Yeah, that's basically it. I'm not above starting with SQL or object diagrams from time to time, but I view those as rather narrow perspectives on the system as a whole, and try to keep them as simple as possible.
Tue, Sep. 15th, 2009 04:47 am (UTC)
Subclassing is a way to save typing when you have two situations where you need similar but not quite identical behavior. (But then you will discover that they are more different than you thought. Which is okay, just don't be surprised...)
Also in some languages (C++) it's the name for how you get "interfaces", but that's not really subclassing anyway.
This is totally typical of formal education in programming, though; we have practical skills that we don't know how to articulate very well (which is in no way unique to programming) so when we're *forced* to articulate something in teaching then we just make stuff up that uses the same words and sounds like it makes sense. (Something similar seems to happen in academic CS, where the value of a system is based on how convincingly you can describe that value in a paper.) ...Then sometimes the stuff we make up sounds *so* good that it gets built into all new languages. Thus, OO.
I wish POSIX exposed more information about socket write buffers.
Tue, Sep. 15th, 2009 02:50 pm (UTC)
Yeah, using pure abstract base classes in C++ is completely reasonable, and were I using C++ I'd probably still do that, but you got what I meant.
Tue, Sep. 15th, 2009 06:35 am (UTC)
The reason why such API doesn't exist is that it's not flexible enough while adding conciderable overhead to the raw syscall-based API. For example, this API states that any 'can read' condition will lead to unconditional method call.
Some API, though, come further. For example, Perl module AnyEvent::Handle ( http://search.cpan.org/~mlehmann/AnyEvent-5.2/lib/AnyEvent/Handle.pm
) adds ability to maintain list of 'read' and 'write' callbacks (for convenience it allows also higher level callbacks which will do simple parsing or serializing/deserializing of values of certain types). This eases programming because it allows implementing an 'expect'-like functionality.
Of course such approach is only viable in dynamic languages. In static languages the event-based programming is very complicated because messages/events are not first-class features of most "static" environments and compilers don't know how to optimize/inline their processing. So the API which you propose is great but is limited by common "Simula-based OO implementations" restrictions. If we have more general support for events/messages-based OO in everydays languages it'll be easier to standardize such approaches to OO design.
Tue, Sep. 15th, 2009 01:58 pm (UTC)
I've written code substantially similar to what Bram describes in C, Python, and C#. C doesn't even have objects, much less messages or events. I'm not sure why you believe any of those features would be required.For example, this API states that any 'can read' condition will lead to unconditional method call.
Under what circumstances would you not
want to read data from the socket when there is data available?
Tue, Sep. 15th, 2009 12:01 pm (UTC)
I've only had a cursory glance of Tornado.
What problems of design in Twisted does Tornado actually solve?
My initial guess was that twisted was a too-large kitchen-sink solution and the friendfeed guys wanted something lighter.
Tue, Sep. 15th, 2009 02:56 pm (UTC)
Twisted could in principle work, in fact someone's already implemented swapping out the Tornado networking layer with Twisted (at some cost to performance). The main problem is that Twisted looks like a convoluted monster, and it's extremely unclear how to get started from just looking at it.
Tue, Sep. 15th, 2009 10:42 pm (UTC)
You're comparing bad design in OO versus good design in FP. In practice they aren't really all that different - a function is just an object with only one method, although Java culture in particular tends to encourage excessive objectification.
Tue, Sep. 15th, 2009 07:16 pm (UTC)
I agree that the standard OO example of shapes is a bad example, and I'll go further to say that it's actively damaging. The easiest way to talk about subclassing (and you do, if only for the purpose of introducing features of the language to the learner) is from the interface perspective... different things you can use in the same places. You can get most of what you want with just interface inheritance, and full inheritance is really for minimizing code duplication (a very good thing).
I personally blame bad OO understanding on C++, with having to declare functions virtual, allowing multiple-inheritance, outrageous degrees of operator overloading, etc. C++ is one of my least favorite languages.
Tue, Sep. 15th, 2009 10:43 pm (UTC)
Yeah, interfaces are completely reasonable. I tend to forget about them because I'm used to programming in Python, where they're implicit.
Thu, Sep. 17th, 2009 04:41 am (UTC)
I've always wondered if my projects were just too small for subclassing to really shine. It's always left me with a sour taste-- seems like most of the time it does more to increase code complexity than it does to provide benefit of saving typing or reducing duplicated code. I'd agree with others that under certain conditions it has value. Anyway, it's kind of reassuring to hear your opinion about this.
Thu, Sep. 17th, 2009 08:35 pm (UTC)
If you end up thinking about non-trivial tree of classes, like shape-circle-oval, square-rect-polygon, triangle - you're doing it wrong! I don't know why, but it just does never seem to work. You can generalize it to a polygon, tho, and it would work.
What you really want from subclassing is interfaces (Set and List come to mind immediately) and civilized monkey-patching (you have a class, you want some of its behavior changed, so you override a method or two)
Thu, Sep. 17th, 2009 11:06 pm (UTC)
That is a terrible example of object orientated design and inheritance, but there was one even more obvious flaw that wasn't pointed out.
Inheritance should express an "is-a" relationship: *Everything that is true of the base should also be true of the derived*. This is not the case here, because a rectangle can have an arbitrary width and height, but a square cannot - its width and height must be equal.
The last time I read this exact same example (in Scott Meyer's "Effective C++" I believe), it was as an example of how something that was mathematically or intuitively true (a square "is-a" rectangle) doesn't hold in practice for object orientated design - you can manipulate a square through a rectangle interface (pointer/reference), and make one side longer than the other, violating the very invariant that makes a square different to a rectangle. I guess you could throw an exception and enforce it that way, but I wouldn't. I'd consider that wrong headed.
Could you please elaborate on the remark "Any design which requires subclassing across encapsulation boundaries should be assumed to be a disaster"?
I strongly agree that object orientated programming isn't the be all and end all.
Thu, Sep. 17th, 2009 11:33 pm (UTC)
The canonical example of an API which requires subclassing across boundaries is 'to make a new button, subclass the Button class and override the handle_click() method', which was long considered the standard and reasonable way of doing that.
Fri, Sep. 18th, 2009 03:18 am (UTC)
I agree entirely that typical OO examples like Squares and Rectangles are baloney. I also agree that complicated inheritance hierarchies work about as well in keeping software orderly as they do in royal succession of power. I'm worried whenever I see inheritance trees that are deeper than two links.
That being said, there are a few cases where it's really useful. One of the things that I find handy is when you you need to handle big chunks of a process in similar, and complicated ways, and smaller parts in special case ways.
For example, I recently built myself a REST controller that handles getting things into and out of the db fairly generically, but often needs to do some special stuff to handle many to many relationships, or special authorization bits. I have most all of the code in a base class that calls hooks at the appropriate points. The base class implementation of the hooks do simple things which are generic across all my resources, and then when I need to do something special for a particular resource, I can get by just overwriting a hook or two.
Using inheritance like this has gotten me a nice little web app for managing specifications, issues and projects in under 1200 lines of python, and I'm pretty grateful for that.