My first TSQL Tuesday post on the new blog is an invite by my friend and sql sister Mickey Stuewe (b|t). It is regarding data modeling practices that are well avoided and how problems rising as a result of bad data models can be fixed.
I have several experiences to recount in this regard as my first ever job was with doing data modeling and design. I will pick one of them for this post. Several years ago, I was working for a startup company as a junior DBA. DBAs also had their say in how data models were designed, although we didn’t always get the final word on it. Since it was a new company working off its own software we had some freedom to design things as we wanted to. The issue we had on had was to do with some code tables. Or rather, a code table. We had tables for products, customers and invoices. Now, we had an issue with some other aspects of the business – we needed a few lookup tables. We had a product type, a customer type and an invoice type. In addition to that, we needed payment type, address type and a whole of other ‘types’. The structure of these were more or less similar – a code and a description, in most cases. In a truly 3rd normal design – each of these would have been their own table with a foreign key relationship into the table it was used in. Small tables, no doubt, but specifically intended to store the lookup data in question.The senior DBA to whom I was reporting to had an opinion that a lot of small tables were adding up to the schema getting large and therefore difficult to manage, and that the queries would get too ‘deep’ over time. So he made a call that we would have a generic ‘code table’. A simple denormalized table with three fields – a code, a code type, and a code description. All of our codes – product type, customer type, invoice type, address type – everything would go in that one table, with the code type telling us what it was, and a code description giving some additional information.Programmers were thrilled with the idea, since it meant they had only one table to work with for most of their queries. Initially, this code table only contained about a 100 or so records and seemed like a workable idea. But the business started to grow, and along with it the table too. We started running into issues as below:
1 Almost all the queries in the application were referring to this one table – leading to it getting hit too many times.
2 The queries got increasingly cryptic when they tried to refer to one specific range of records – for example, an address type was indicated by a code A, and their query would say ‘code_type = ‘A”..another query would have code type = ‘I’ and so on..when the programmer left nobody really knew what these letters meant. And yes, there was no third table to decode the meaning.
3 The business wanted to manage the data, and there was no easy way to give them an interface. The data was a jumble of many different things and could not be managed with one screen. The programming effort needed to give them multiple screens was huge.
4 Some of the ‘codes’ started needing additional attributes – phone number type, for example, was initially just cell phone or residential.Then, some customers started asking for a second residential line, so you had to say residential 1 or 2, or cell phone 1 or 2.
5 There were no foreign key relationships defined – so the dependency was on code to enforce the relationship. Code had bugs, and that resulted in bad data getting in very often.
In short, it got so very complicated that the CIO had to approve a project to normalize the whole thing. The rewrite involved creating several smaller tables, putting in the right foreign keys and relationships, and creating interfaces for the business to manage the data. The big lesson that came out of it was that the size of the schema or fewer tables do not really mean a simpler/easier to manager design, in fact it can be the exact opposite.
Good database design is the backbone of a well designed and well performing application – always try to learn what goes into it. A great article by SQL Server MVP Louis Davidson describes several issues like this and is worth reading.
One thought on “TSQL Tuesday #72 – Data Modeling gone wrong”