Monday, August 31, 2009

two databases, one column

I feel fortunate that I don't experience a "WTF?!" moment on a daily basis. Although sometimes, it does feel like I'm in Groundhog Day, perpetually experiencing the same WTF?!s over and over again. Today was such a day.

The application that I'm working on talks to two databases. In one database, there's a table called user with columns like id, firstname, lastname, ssn, and so on. The information in some of the rows in this table has also been stored in the second database. This is done for reasons of efficiency, so that when querying the second database, we don't always have to dip into the first database to get user information.

This seems perfectly reasonable to me, except for the way that it was done. Way back in the stone ages of our application, someone decided to store all the user information in a single column in the second database. It is stored as a set of key-value pairs in a CSV string, in a column called user, like this: "user=3378,first=John,last=Smith,ssn=123456789,..."

Since I first encountered this, I've been cursing scratching my head about why someone would decide to store all of this information in a single column. It makes queries more difficult to read and write, because it requires regular expressions to extract single items like the social security number from the user column. I can imagine this is also a performance hit.

In addition, whoever decided on this format did not replicate the user table column names for the keys in the CSV string. So the user.firstname column in database 1 maps to first in the CSV, user.lastname maps to last, and so on. This makes life more difficult when searching the code base for all queries related to a particular piece of user information. I recently fixed a bug caused by this mapping difference.

I'd love to have a conversation with the person who decided to do things this way. It would either be enlightening, or just plain crazy.

No comments:

Post a Comment