Biofractal: An Irrational Love of the Relational

Is a relational database the best data-store for serializing object-graphs?

Caveats

This applies to coders who employ Agile, Domain Driven Design
The application you are building is not a data mining app

Imagine building an object-oriented application. You find that you need to save and subsequently retrieve an object's state (serialize and de-serialize your object graph). Is a relational database (RDB) the best data store for you?

The answer seems pretty clear to me - Relational databases are just about the worst kind of data store for storing an hierarchical object-graph.

Of course an RDB can store object state. Mapping-tables, foreign keys and normalisation can all be used to project your natural object hierarchy into tables, rows and columns in the way that a globe can be projected onto the flat page of an atlas.

But the map is not the terrain, the RDB projection is nothing less than a distortion of the object graph. Translating between the object-graph and its distorted, relational representation requires work - and not just computer cycles, although it needs plenty of those, but also the hand-rolled code / bugs required to manage the transformation of data to and from the RDB object-store.

Traditionally this code was encapsulated into a bespoke Data Access Layer (DAL). It makes me shudder to remember just how much of my life I have wasted writing and debugging DAL code. But all that wasted time is not the critical problem, the real kicker is that a DAL severely limits how Agile you can be.

Take a typical agile process, the quick refactoring of a class definition, say the addition of a new public property.

Add new property to class
Add equivalent mapping to DAL class
Add equivalent parameter to SPROC
Add equivalent column to RDB table

Notice how that list feels back to front? Surely life would easier if I did things the other way around?

Add new column to RDB table
Add equivalent parameter to SPROC
Add equivalent mapping to DAL class
Add equivalent property to class

This shows up another problem with using RDBs - It promotes data-driven design.

Data Driven Design

tables

[Domain Driven Design] = Design your objects by designing objects

To me the natural way to design a domain is to play with the objects but an RDB + DAL approach flips this natural flow on its head and makes you design backwards. It makes you design the data-store before the domain, from tables -> objects.

Data driven design means that you do all the work up-front (table, sproc, DAL then finally object) which greatly increases the cost of experimentation. This ossifies the design process. Data-driven design strongly inhibits a successful design evolving out a series of cheap experiments. This is why agile coders tend to use domain driven design.

So why suffer all this RDB pain? Why not use a data store whose intrinsic architecture fits the structure of an object graph and does not require piles of buggy DAL code just to satisfy the basics of object serialization? DBAs often cite two main reasons:

You can run reports that cut across the object graph
You can keep you data application agnostic allowing future applications to use the data

Reason 1 - Because I am not writing a data-mining app I know my report designs in advance. Therefore I don't need ad-hoc, dynamic reports. Since my reports are pre-defined they can be represented in my object-graph as a collection of serializable report objects. Reports are just filtered collections of report objects.

Reason 2 - I adhere to strict YAGNI (you aren't going to need it) principles therefore I only write code to do the job. No matter how tempting it is, I do not write code just in case it might be needed in the future.

So what can the modern object-oriented coder do to make life a bit easier?

Use an Object Database

You can use a database that has been specifically designed to store object state data. They are called object databases. I have played around with the open source object database called DB4Objects which I found to be very fast and easy to use (as easy as NHibernate) but when I was playing its development was in a state of rapid flux. If you have used an object database then please leave a comment.

Scrap the DAL

Object Relational Mappers (ORM) e.g NHibernate allow you to scrap the DAL. ORMs automate, as far as possible, the cruddy DAL code and get you closer to the agile ideal of fast and cheap refactoring. You can create new classes, add and remove and interface elements and the ORM will take care of adding new tables and columns. You need never write another object persistence save / update / delete SPROC again (almost).

Whilst much better than writing DAL code, ORMs are not perfect and are not pain free. Every once in a while you have to go to the DB and mess around. This might seem trivial but you will be amazed how quickly all that DB experience evaporates.

Also, to ORM-enable your objects you need to somehow provide the mapping information that links the objects to their equivalent tables. These days this can be handled by marking up your classes with fairly simple [attribute] metadata but it is still a surprising amount of work to keep the attributes up to date.

So ORMs are not the final answer because they are really just papering over the the underlying problem: A relational database is not the best data store for serializing object-graphs.

1 comment:

Anonymous12:10 am
There are many tools available that make translation between OO and RDBMS easy (hibernate for example). The power of RDBMS is being hughly underestimated here and it's maturity+stability totally forgotten about. SQL's simplicity is also never mentioned.

You never mention how you query an Object Database?
Also, what if you need to add a new field to an object. You may have millions of objects that need modified, how does this happen?

Part of the success of SQL and RDBMS is that it allows you to manipulate data in ways you didn't think of when you designed the db schema.

Your comment that you know your report designs up front seems somewhat naive and totally none agile. There's a very good chance the report design will change whilst your coding your object model (probably several times), also over the course of an application lifetime, business is in constant flux, and therefore the chances are your reports will be too. Agile is about be able to quickly adapt. Taking your YAGNI approach will mean a rewrite at possible the same cost and time as the original.

For me, I'm not convinced, but I'll watch this approach mature before commiting.

Friday, February 13, 2009

An Irrational Love of the Relational

1 comment: