Tatsat Banerjee


We don’t need no steenking abstractions



I have been reading a great book over the past couple of days. It is called The Best Software Writing and I picked it up at the local Borders when I dropped in to browse as I am wont to do on the odd occasion. Reading it fired up the creative juices, so I thought I would put finger to keyboard about something that has been gnawing at me for a while now: the fact that we seem to be making things more complex than they need to be because of some misguided attempt to model the “real world”.

Now, I am the first to admit to being an old-world developer. I learnt how to program in the late 70s, at a time when OOP was simply unheard of out in the wild. I clearly remember the famous hot air balloon cover on the issue of Byte that introduced Smalltalk. Yes, not only was I alive then, I was old enough to buy Byte and read it.

I have worked through the evolution of our craft and the changes in the way we approach the development of software. By virtue of the fact that I learnt to program before there was OOP, I intially had a tough time really understanding how OOP worked, simply because there was a lot of unlearning that I had to do. It took me a while, and there were many times when I told myself that now I ‘got’ OOP, only to admit a little while later that I really didn’t get it at all.

If that sounds at all negative about OOP, it isn’t meant to. I am actually a big proponent of the benefits of OOP and think that the binding of data with the procedures that operate on it is A Good Thing™.

What I am finding, however, is that a new generation of programmers who don’t know anything except OOP are losing track of the objective. When you learn OOP, you are inevitably given examples of real-world objects that you try to model. I think that if I ever see another example of an Animal class, with Dog and Cat descendents, I will scream (even though I actually wrote a student manual for a VB4 course that used these self-same classes). We learn about inheritance and its use to create polymorphic behaviour by having Cat.Speak emit a mieaow while Dog.Speak emits a woof. Or we have traffic lights control their own sequencing by sending messages to each other. All very cool.

Then off we go to do commercial programming. Except, there is a real disconnect at this point. ‘EDP’ (to use an outdated but nonetheless expressive term - Electronic Data Processing) is not the same as traffic control. In most business data manipulation, the entities we are dealing with are nothing but data points. The only reason the system exists is to maintain a repository of data, furnish a controlled view into that repository and provide a mechanism to allow a predefined set of transformations of that data to take place in a controlled manner. There are no dogs or cats in the wild that we are trying to model. What exactly does an account do when it is not being “modelled” by a piece of code? The answer, of course, is nothing at all – it just sits quietly in a database somewhere.

I have seen too many systems that model an Account object with all sorts of complex behaviours. From one perspective, this can be useful – we can encapsulate all the things that we can do to an account as methods of the Account object, thereby binding the data with the code and (at least in theory) hiding the implementation details. I have no problem with this; in fact, this is exactly what makes sense.

The problem arises when we refuse to acknowledge that the account lives in a database with a whole lot of other accounts. Worse still, we refuse to acknowledge that the database exists at all. Instead, we abstract it away with a “factory” the magically produces Account objects. It can give us a particular Account object if we can give it some distinguishing attribute, like an ID. But heaven forbid that we ask for any complex filtering — why, that’s almost like SQL, and we wouldn’t want to pollute our neat, abstract, HashMap in the Sky with any of that old “relational stuff”.

So instead of crafting a simple SQL statement that reflects the set of data we actually need, we write code that does all but the simplest filtering on the client side. You don’t want all the columns of data? Who cares? Just don’t reference those attributes of the objects. You don’t want all the rows? Well, use one of the simple factory methods to do a first cut at the filtering, then just iterate through the list of objects that are returned and remove the ones you don’t want.

Does anybody else see a problem here?

Decades ago, we came up with a way to manage operations on tabular data. A whole relational alegbra was developed that provided a clean set of operations that could be requested by a client program, so that only the data it actually wanted would be returned by the database. The SQL language provided a reasonably straightforward way to express those operations, and databases got really, really good at processing SQL, so that, at least in most cases, they could quickly and efficiently get just this required data together. That’s what they are designed to do, and they do it very well. And when they have done this, just the data that is actually required goes over the wire to the program that requested it, which in turn can be simpler because it doesn’t need to do the whole post-processing dance.

Today, OOP developers think that their code has somehow become impure if it contains an SQL statement, or acknowledges the existence of a database, or a table, or a row or column. Yet these are the actual big-O Objects that are being manipulated. The row in the table (OK, the rows in related tables in the database) are the account. That is the object we are dealing with. Why do we feel the need to abstract it away?

It’s not like a cat. It is not practical for a program to deal with an actual Cat object. The Cat interface is not well understood, and the Cat Query Language is still in its infancy. When dealing with cats, it makes sense to create a simpler abstraction of a Cat, that models all the attributes of a real cat that we are interested in, and use that as a surrogate for a real cat in our program. Besides, cat fur really clogs up the fan vent in a computer case.

An account, on the other hand, like a large number of the “objects” we deal with in business software development, is quite easily accessible directly.

We don’t need no steenking abstractions.