Log in

No account? Create an account

Mon, Sep. 14th, 2009, 04:21 pm
Awful Programming Advice

I just came across a blog post going over that old saw of Object Oriented Design, a square being a subclass of a rectangle.

This advice is worse than useless. It's either wrongheaded or meaningless, depending on how literally you take it.

Taken literally, it would never make sense to make a full-blown class for such a trivial piece of functionality. There simply would be more lines of code taken up making declarations than could possibly be saved by convenience. Taken less literally, it's just gibberish, a completely nonsensical way of thinking about the problem, like teaching a drawing class where you cover pentagrams.

So what would be a sane way of building this functionality? Well, first you have to decide what the functionality actually is, since there wasn't any actual functionality in the first example. A reasonable set of functionality would be polygons. Polygons can be rotated, translated, and scaled, and you can take their union and intersection with other polygons, and calculate their area. This is a nontrivial set of functionality which it makes sense to encapsulate. How then to make rectangles and squares? The simplest way is to have two convenience functions - one which builds rectangles, and one which builds squares. Or just make the convenience function accept a variable number of parameters, and if it only gets one to return a square.

But this example doesn't use any subclassing! I can hear the OOP gurus exclaiming now. How are people supposed to learn subclassing if you don't give them any examples of it? This is a deranged attitude. Subclassing is not an end in and of itself, it's a technique which is occasionally handy. And I'll let you in on a little secret - I personally almost never use subclassing. It's not that I one day decided that subclassing is bad and that one should avoid it, it's that as I got better at coming up with simple designs I wound up using it less and less, until eventually I almost stopped using it entirely. Subclassing is, quite simply, awkward. Any design which uses subclassing should be treated with skepticism. Any design which requires subclassing across encapsulation boundaries should be assumed to be a disaster.

Unfortunately this is hardly atypical of introductory object oriented design examples. Even the ones which are more real world tend to be awful. Take, for example, networking APIs. A typical example of an API is one which allows the creation of sockets, with blocking read calls and maybe-blocking write calls. The first few versions of Java had this sort of API exclusively. This approach is simple, seems logical to people unfamiliar with it, and is an unmitigated disaster in practice. It leads to a ton of threads floating around, with ridiculous numbers of race conditions, and awful performance because of all the threads swapping in and out. Such awfulness unfortunately hasn't stopped it from being the default way people are shown how to do things.

So what would be a better API? This is something I have a lot of experience with, so I'll give a brief summary. I'm glossing over some details here, but some of that functionality, like half-open sockets, is perhaps best not implemented.

The networking object constructor takes a callback function. Each socket is referred to using an identifier (yes, that's right, an identifier, they don't warrant individually having objects).

The methods of the networking object are as follows:

Start listening on a port

Stop listening on a port

make a new outgoing connection (returns the connection id)

write data to a socket (returns the number of bytes written)

check if a socket has buffered data to write

read data from a socket

close a socket

Here are the methods of the callback:

incoming socket created

socket closed

socket has data to be read

socket has flushed all write data from buffer

You've probably noticed that this API, while simple, isn't completely trivial and there is considerable subtlety to what exactly all the methods mean. That is an important point. For functionality of less that this level of complexity object orientation is generally speaking a bad idea, and trying to force simpler examples in the name of instruction results in poor and misleading instruction.

Unfortunately not all asynchronous networking APIs are in agreement with my example, which really says something about the state of the art in software design. I daresay that if Twisted followed this example straightforwardly (it does something similar, but awkwardly and obfuscatedly) then Tornado would never have been written.

Tue, Sep. 15th, 2009 12:30 am (UTC)

I think there are three approaches to software design. Data oriented, algorithm oriented and object oriented. I can tell that you are an algorithm oriented designer like I am. A data designer thinks the design is done when all the SQL tables are designed, and an object designer thinks the design is done when the object model is complete. You and I, we like to know how things actually work :-) The data structures and classes we end up with are a by product of the design, not the design in itself.

Tue, Sep. 15th, 2009 01:30 am (UTC)

Yeah, that's basically it. I'm not above starting with SQL or object diagrams from time to time, but I view those as rather narrow perspectives on the system as a whole, and try to keep them as simple as possible.

Tue, Sep. 15th, 2009 04:47 am (UTC)

Subclassing is a way to save typing when you have two situations where you need similar but not quite identical behavior. (But then you will discover that they are more different than you thought. Which is okay, just don't be surprised...)

Also in some languages (C++) it's the name for how you get "interfaces", but that's not really subclassing anyway.

This is totally typical of formal education in programming, though; we have practical skills that we don't know how to articulate very well (which is in no way unique to programming) so when we're *forced* to articulate something in teaching then we just make stuff up that uses the same words and sounds like it makes sense. (Something similar seems to happen in academic CS, where the value of a system is based on how convincingly you can describe that value in a paper.) ...Then sometimes the stuff we make up sounds *so* good that it gets built into all new languages. Thus, OO.

I wish POSIX exposed more information about socket write buffers.

Tue, Sep. 15th, 2009 02:50 pm (UTC)

Yeah, using pure abstract base classes in C++ is completely reasonable, and were I using C++ I'd probably still do that, but you got what I meant.

Tue, Sep. 15th, 2009 06:35 am (UTC)

The reason why such API doesn't exist is that it's not flexible enough while adding conciderable overhead to the raw syscall-based API. For example, this API states that any 'can read' condition will lead to unconditional method call.

Some API, though, come further. For example, Perl module AnyEvent::Handle ( http://search.cpan.org/~mlehmann/AnyEvent-5.2/lib/AnyEvent/Handle.pm ) adds ability to maintain list of 'read' and 'write' callbacks (for convenience it allows also higher level callbacks which will do simple parsing or serializing/deserializing of values of certain types). This eases programming because it allows implementing an 'expect'-like functionality.

Of course such approach is only viable in dynamic languages. In static languages the event-based programming is very complicated because messages/events are not first-class features of most "static" environments and compilers don't know how to optimize/inline their processing. So the API which you propose is great but is limited by common "Simula-based OO implementations" restrictions. If we have more general support for events/messages-based OO in everydays languages it'll be easier to standardize such approaches to OO design.

Tue, Sep. 15th, 2009 01:58 pm (UTC)

I've written code substantially similar to what Bram describes in C, Python, and C#. C doesn't even have objects, much less messages or events. I'm not sure why you believe any of those features would be required.

For example, this API states that any 'can read' condition will lead to unconditional method call.

Under what circumstances would you not want to read data from the socket when there is data available?

Tue, Sep. 15th, 2009 02:53 pm (UTC)

If you're trying to rate limit a download connection you can do it (awkwardly) by only reading from the socket at the rate you want. This will cause the sending side to slow down, although it's kind of clumsy under the hood due to technical issues with how TCP works. There unfortunately isn't a better way to rate limit TCP on the receiving end.

Tue, Sep. 15th, 2009 03:22 pm (UTC)

There unfortunately isn't a better way to rate limit TCP on the receiving end.

If it's a protocol I'm designing myself, I'd explicitly incorporate the rate limiting into the protocol. Either by telling the sender what data rate to use, or by designing the protocol so I can request the data in smaller chunks (and request them slower if I'm receiving data too quickly). I'd also consider jettisoning TCP in favor of UDP, if I'm going to be doing congestion/bandwidth allocation by hand anyway.

If you need to rate limit someone else's protocol (e.g., HTTP) within your application, then, yeah, you're kind of screwed. Although even HTTP lets you request portions of a file.

In any case... 99% of network applications don't need to worry about this? :-)

Tue, Sep. 15th, 2009 07:12 pm (UTC)

If it's a protocol I'm designing myself, I'd explicitly incorporate the rate limiting into the protocol.

You're awfully trusting of unknown clients....

Tue, Sep. 15th, 2009 07:31 pm (UTC)

If it's very important that my application follow a strict bandwidth budget, then I go with my other suggestion: design the protocol so I can request the data in smaller chunks (and request them slower if I'm receiving data too quickly).

Tue, Sep. 15th, 2009 10:39 pm (UTC)

Requesting smaller chunks doesn't really do what you want - it tends to make packets come in spurts rather than steadily, and if you make the chunks small enough it tends to limit the thoughput to less than what you intended. Congestion control is hard.

Tue, Sep. 15th, 2009 11:58 pm (UTC)

Congestion control is hard.

I know. Some years ago I wrote a TCP implementation. I don't even want to think about how many hours I spent staring at packet traces while debugging. :-)

Tue, Sep. 15th, 2009 10:38 pm (UTC)

Trusting people not to flood you is something you have to accept if you want to be on the internet.

Tue, Sep. 15th, 2009 10:33 pm (UTC)

Rate limiting is a difficult problem. TCP is window-based rather than rate-based, for fairly good reasons, and controlling a rate by adjusting the receive window is clumsy at best. When using a new protocol you can simply give the sending side a max rate and the other side can implement it straightforwardly with token buckets or something like that, but in a legacy system that isn't an option.

In any case, yes, 99% of the time simply having the callback hand over the new data rather than reporting that a read is possible and requiring another call to get it is simpler and higher performing behavior. I did say there's a lot of subtlety even to that fairly simple API.

Tue, Sep. 15th, 2009 10:37 pm (UTC)

TCP is window-based rather than rate-based, for good reasons, and controlling rate by limiting the receive window is clumsy at best. For a new protocol one can simply rate limit on the sending side, which works far better, but that isn't always an option in legacy systems.

In any case, yeah, most of the time having a callback simply hand over the data which was received is more straightforward and higher performing. I did say there's a lot of subtlety to this API.

Tue, Sep. 15th, 2009 09:22 pm (UTC)

> I've written code substantially similar to what Bram describes in C, Python, and C#.

I didn't say that it isn't possible. I just showed the rationale why it's not common in standard OO libraries.

> I'm not sure why you believe any of those features would be required.

They're not required but having generic programming tools in language would raise the value of standardized event/callback based I/O OO API. Today there are so many AIO OO APIs that it's frustrating to stick to one of them knowing that this API is only for the one language/environment and to switch to another language one should get much more experience.

> Under what circumstances would you not want to read data from the socket when there is data available?

For example, when designing high load, resource constrained application we may want to release resources that are already locked before having a chance to require even more resources. And to do this we might want in particular to 'write' first. Also on high-bandwith tcp downloads there's a chance that more data could be 'read' later from buffers (when we're done with writing/checking EOFs/checking timeouts) - therefore less overhead. I see that Bram mentioned another trick with pace making uncontrollable downloads - myself I didn't mean that.

Tue, Sep. 15th, 2009 12:01 pm (UTC)

I've only had a cursory glance of Tornado.

What problems of design in Twisted does Tornado actually solve?

My initial guess was that twisted was a too-large kitchen-sink solution and the friendfeed guys wanted something lighter.

Tue, Sep. 15th, 2009 02:56 pm (UTC)

Twisted could in principle work, in fact someone's already implemented swapping out the Tornado networking layer with Twisted (at some cost to performance). The main problem is that Twisted looks like a convoluted monster, and it's extremely unclear how to get started from just looking at it.

Tue, Sep. 15th, 2009 10:02 pm (UTC)

Do you think there's room to do much better than Twitter does? How?

Thu, Sep. 17th, 2009 11:35 pm (UTC)

How does twitter do networking?

Fri, Sep. 18th, 2009 07:10 am (UTC)

Twisted. Much better than Twisted does.

There's almost nothing Twitter does that there isn't room to do much better.

(Deleted comment)

Tue, Sep. 15th, 2009 10:42 pm (UTC)

You're comparing bad design in OO versus good design in FP. In practice they aren't really all that different - a function is just an object with only one method, although Java culture in particular tends to encourage excessive objectification.

Tue, Sep. 15th, 2009 07:16 pm (UTC)

I agree that the standard OO example of shapes is a bad example, and I'll go further to say that it's actively damaging. The easiest way to talk about subclassing (and you do, if only for the purpose of introducing features of the language to the learner) is from the interface perspective... different things you can use in the same places. You can get most of what you want with just interface inheritance, and full inheritance is really for minimizing code duplication (a very good thing).

I personally blame bad OO understanding on C++, with having to declare functions virtual, allowing multiple-inheritance, outrageous degrees of operator overloading, etc. C++ is one of my least favorite languages.

Tue, Sep. 15th, 2009 10:43 pm (UTC)

Yeah, interfaces are completely reasonable. I tend to forget about them because I'm used to programming in Python, where they're implicit.

Thu, Sep. 17th, 2009 04:41 am (UTC)

I've always wondered if my projects were just too small for subclassing to really shine. It's always left me with a sour taste-- seems like most of the time it does more to increase code complexity than it does to provide benefit of saving typing or reducing duplicated code. I'd agree with others that under certain conditions it has value. Anyway, it's kind of reassuring to hear your opinion about this.

Thu, Sep. 17th, 2009 08:35 pm (UTC)

If you end up thinking about non-trivial tree of classes, like shape-circle-oval, square-rect-polygon, triangle - you're doing it wrong! I don't know why, but it just does never seem to work. You can generalize it to a polygon, tho, and it would work.
What you really want from subclassing is interfaces (Set and List come to mind immediately) and civilized monkey-patching (you have a class, you want some of its behavior changed, so you override a method or two)

Thu, Sep. 17th, 2009 11:06 pm (UTC)

Hi Bram,

That is a terrible example of object orientated design and inheritance, but there was one even more obvious flaw that wasn't pointed out.

Inheritance should express an "is-a" relationship: *Everything that is true of the base should also be true of the derived*. This is not the case here, because a rectangle can have an arbitrary width and height, but a square cannot - its width and height must be equal.

The last time I read this exact same example (in Scott Meyer's "Effective C++" I believe), it was as an example of how something that was mathematically or intuitively true (a square "is-a" rectangle) doesn't hold in practice for object orientated design - you can manipulate a square through a rectangle interface (pointer/reference), and make one side longer than the other, violating the very invariant that makes a square different to a rectangle. I guess you could throw an exception and enforce it that way, but I wouldn't. I'd consider that wrong headed.

Could you please elaborate on the remark "Any design which requires subclassing across encapsulation boundaries should be assumed to be a disaster"?

I strongly agree that object orientated programming isn't the be all and end all.

Peter Geoghegan

Thu, Sep. 17th, 2009 11:33 pm (UTC)

The canonical example of an API which requires subclassing across boundaries is 'to make a new button, subclass the Button class and override the handle_click() method', which was long considered the standard and reasonable way of doing that.

Fri, Sep. 18th, 2009 09:51 pm (UTC)

Trivia that may be useful in job interviews: The "everything that is true of the base should also be true of the derived" principle that I stressed in my original post is called the Liskov substitution principle. I can't think of any reason why anyone would ever want to violate it, so I consider it an immutable rule.

Virtual functions, as I understand them, represent a hook to customise some functionality of a class (a role that is sometimes in conflict with their other role as interfaces to the classes functionality if they're non-private, hence the NVI idiom, where all virtual functions are private). Inheritance is most often misunderstood as a way of re-using a base class's functionality, when in fact it is a way of facilitating re-use by other code which may have been written before the derived class, using a base class reference. If you want to re-use, just use simple composition. To quote someone else, "when dynamic polymorphism is correct and appropriate, composition is selfish; inheritance is generous".

I wouldn't consider the design you described that "requires subclassing across encapsulation boundaries" to be necessarily a disaster, except insofar as it requires you to subclass a class when there is invariably a far simpler, more succinct way of achieving the same end through the use of signals and slots, messages, events, etc. I can't really put my finger on what you mean beyond that, if in fact you do mean anything beyond that. Do you?

I'm a C++ weenie, which admittedly informs my opinions about object orientated design. To clarify my earlier remarks: Yes, object orientation and inheritance are overrated, but that's only because they're widely considered to be the alpha and the omega. I still think they're very important.

Fri, Sep. 18th, 2009 03:18 am (UTC)

I agree entirely that typical OO examples like Squares and Rectangles are baloney. I also agree that complicated inheritance hierarchies work about as well in keeping software orderly as they do in royal succession of power. I'm worried whenever I see inheritance trees that are deeper than two links.

That being said, there are a few cases where it's really useful. One of the things that I find handy is when you you need to handle big chunks of a process in similar, and complicated ways, and smaller parts in special case ways.

For example, I recently built myself a REST controller that handles getting things into and out of the db fairly generically, but often needs to do some special stuff to handle many to many relationships, or special authorization bits. I have most all of the code in a base class that calls hooks at the appropriate points. The base class implementation of the hooks do simple things which are generic across all my resources, and then when I need to do something special for a particular resource, I can get by just overwriting a hook or two.

Using inheritance like this has gotten me a nice little web app for managing specifications, issues and projects in under 1200 lines of python, and I'm pretty grateful for that.