Log in

No account? Create an account

Mon, Sep. 14th, 2009, 04:21 pm
Awful Programming Advice

I just came across a blog post going over that old saw of Object Oriented Design, a square being a subclass of a rectangle.

This advice is worse than useless. It's either wrongheaded or meaningless, depending on how literally you take it.

Taken literally, it would never make sense to make a full-blown class for such a trivial piece of functionality. There simply would be more lines of code taken up making declarations than could possibly be saved by convenience. Taken less literally, it's just gibberish, a completely nonsensical way of thinking about the problem, like teaching a drawing class where you cover pentagrams.

So what would be a sane way of building this functionality? Well, first you have to decide what the functionality actually is, since there wasn't any actual functionality in the first example. A reasonable set of functionality would be polygons. Polygons can be rotated, translated, and scaled, and you can take their union and intersection with other polygons, and calculate their area. This is a nontrivial set of functionality which it makes sense to encapsulate. How then to make rectangles and squares? The simplest way is to have two convenience functions - one which builds rectangles, and one which builds squares. Or just make the convenience function accept a variable number of parameters, and if it only gets one to return a square.

But this example doesn't use any subclassing! I can hear the OOP gurus exclaiming now. How are people supposed to learn subclassing if you don't give them any examples of it? This is a deranged attitude. Subclassing is not an end in and of itself, it's a technique which is occasionally handy. And I'll let you in on a little secret - I personally almost never use subclassing. It's not that I one day decided that subclassing is bad and that one should avoid it, it's that as I got better at coming up with simple designs I wound up using it less and less, until eventually I almost stopped using it entirely. Subclassing is, quite simply, awkward. Any design which uses subclassing should be treated with skepticism. Any design which requires subclassing across encapsulation boundaries should be assumed to be a disaster.

Unfortunately this is hardly atypical of introductory object oriented design examples. Even the ones which are more real world tend to be awful. Take, for example, networking APIs. A typical example of an API is one which allows the creation of sockets, with blocking read calls and maybe-blocking write calls. The first few versions of Java had this sort of API exclusively. This approach is simple, seems logical to people unfamiliar with it, and is an unmitigated disaster in practice. It leads to a ton of threads floating around, with ridiculous numbers of race conditions, and awful performance because of all the threads swapping in and out. Such awfulness unfortunately hasn't stopped it from being the default way people are shown how to do things.

So what would be a better API? This is something I have a lot of experience with, so I'll give a brief summary. I'm glossing over some details here, but some of that functionality, like half-open sockets, is perhaps best not implemented.

The networking object constructor takes a callback function. Each socket is referred to using an identifier (yes, that's right, an identifier, they don't warrant individually having objects).

The methods of the networking object are as follows:

Start listening on a port

Stop listening on a port

make a new outgoing connection (returns the connection id)

write data to a socket (returns the number of bytes written)

check if a socket has buffered data to write

read data from a socket

close a socket

Here are the methods of the callback:

incoming socket created

socket closed

socket has data to be read

socket has flushed all write data from buffer

You've probably noticed that this API, while simple, isn't completely trivial and there is considerable subtlety to what exactly all the methods mean. That is an important point. For functionality of less that this level of complexity object orientation is generally speaking a bad idea, and trying to force simpler examples in the name of instruction results in poor and misleading instruction.

Unfortunately not all asynchronous networking APIs are in agreement with my example, which really says something about the state of the art in software design. I daresay that if Twisted followed this example straightforwardly (it does something similar, but awkwardly and obfuscatedly) then Tornado would never have been written.

Tue, Sep. 15th, 2009 06:35 am (UTC)

The reason why such API doesn't exist is that it's not flexible enough while adding conciderable overhead to the raw syscall-based API. For example, this API states that any 'can read' condition will lead to unconditional method call.

Some API, though, come further. For example, Perl module AnyEvent::Handle ( http://search.cpan.org/~mlehmann/AnyEvent-5.2/lib/AnyEvent/Handle.pm ) adds ability to maintain list of 'read' and 'write' callbacks (for convenience it allows also higher level callbacks which will do simple parsing or serializing/deserializing of values of certain types). This eases programming because it allows implementing an 'expect'-like functionality.

Of course such approach is only viable in dynamic languages. In static languages the event-based programming is very complicated because messages/events are not first-class features of most "static" environments and compilers don't know how to optimize/inline their processing. So the API which you propose is great but is limited by common "Simula-based OO implementations" restrictions. If we have more general support for events/messages-based OO in everydays languages it'll be easier to standardize such approaches to OO design.
(Deleted comment)

Tue, Sep. 15th, 2009 02:53 pm (UTC)

If you're trying to rate limit a download connection you can do it (awkwardly) by only reading from the socket at the rate you want. This will cause the sending side to slow down, although it's kind of clumsy under the hood due to technical issues with how TCP works. There unfortunately isn't a better way to rate limit TCP on the receiving end.
(Deleted comment)

Tue, Sep. 15th, 2009 07:12 pm (UTC)

If it's a protocol I'm designing myself, I'd explicitly incorporate the rate limiting into the protocol.

You're awfully trusting of unknown clients....
(Deleted comment)

Tue, Sep. 15th, 2009 10:39 pm (UTC)

Requesting smaller chunks doesn't really do what you want - it tends to make packets come in spurts rather than steadily, and if you make the chunks small enough it tends to limit the thoughput to less than what you intended. Congestion control is hard.
(Deleted comment)

Tue, Sep. 15th, 2009 10:38 pm (UTC)

Trusting people not to flood you is something you have to accept if you want to be on the internet.

Tue, Sep. 15th, 2009 10:33 pm (UTC)

Rate limiting is a difficult problem. TCP is window-based rather than rate-based, for fairly good reasons, and controlling a rate by adjusting the receive window is clumsy at best. When using a new protocol you can simply give the sending side a max rate and the other side can implement it straightforwardly with token buckets or something like that, but in a legacy system that isn't an option.

In any case, yes, 99% of the time simply having the callback hand over the new data rather than reporting that a read is possible and requiring another call to get it is simpler and higher performing behavior. I did say there's a lot of subtlety even to that fairly simple API.

Tue, Sep. 15th, 2009 10:37 pm (UTC)

TCP is window-based rather than rate-based, for good reasons, and controlling rate by limiting the receive window is clumsy at best. For a new protocol one can simply rate limit on the sending side, which works far better, but that isn't always an option in legacy systems.

In any case, yeah, most of the time having a callback simply hand over the data which was received is more straightforward and higher performing. I did say there's a lot of subtlety to this API.

Tue, Sep. 15th, 2009 09:22 pm (UTC)

> I've written code substantially similar to what Bram describes in C, Python, and C#.

I didn't say that it isn't possible. I just showed the rationale why it's not common in standard OO libraries.

> I'm not sure why you believe any of those features would be required.

They're not required but having generic programming tools in language would raise the value of standardized event/callback based I/O OO API. Today there are so many AIO OO APIs that it's frustrating to stick to one of them knowing that this API is only for the one language/environment and to switch to another language one should get much more experience.

> Under what circumstances would you not want to read data from the socket when there is data available?

For example, when designing high load, resource constrained application we may want to release resources that are already locked before having a chance to require even more resources. And to do this we might want in particular to 'write' first. Also on high-bandwith tcp downloads there's a chance that more data could be 'read' later from buffers (when we're done with writing/checking EOFs/checking timeouts) - therefore less overhead. I see that Bram mentioned another trick with pace making uncontrollable downloads - myself I didn't mean that.