Normalizing your data model
A positive thing about Qlikview is that it is oriented to technical and business users. However, having inexperienced people creating data models will, with no doubt, create headaches in the near future.
I thought it would be useful to share with you how a good data model should look like, by normalizing the tables or data we have.
Normalization is the processes of organizing data in a database. When we say “organizing” we mean, creating proper table structures in a way that we can then stablish relationships among them. One of the main reasons for this is to avoid data redundancy.
In this article we are going to cover only the first three normal forms.
Let’s start with a real example that you can easily find in any of your company spreadsheets.
In this example, a sales person can have 2 managers, and they report sales from up to 3 countries for one of the managers and one country for the other manager. The Cons of this approach:
– Not easily escalable
– Repeating the sames type of data in different columns.
FIRST NORMAL FORM
The first step to normalize a table or dataset is to eliminate redundant columns. In our example we have 2 columns for the managers and 4 columns to display sales by country by managers.
We want to eliminate Manager1, Manager2, Country1-Manager1, Country2-Manager1, Country3-Manager1, Country1-Manager2, and replace them with 2 columns. Manager and Country.
At this stage we can also start thinking about primary keys. A primary key is an identifier that can uniquely identify a record in a table. Each table will have a primary key.
In our example, SalesPerson is a primary key. This is because we can identify one person by it’s name (altohugh we will use IDs in real life) and with its name we can have only one record with one phone number and one email address.
SECOND NORMAL FORM
In this step we want to eliminate redundant data. We can eliminate redundant data by identifying fields with repeated values.
Our model will now look like this:
THIRD NORMAL FORM
In this step, we eliminate everything that is not dependant on the primary key.
Rember that a primary key is a unique identifier that can uniquely identify a row.
Our primary keys are:
SalesPerson: by name or ID we can identify a sales person and their contact details
Managers: by name or ID we can identify a manager. One person can have more than one manager and one manager can manage more than one sales person, so we will have 2 keys (Sales person and Manager)
Country: same as managers. Two keys to identify which country a sales person is responsible for.
Our final third normal form tables would look like this:
Note that Manager and Country have a many to many relationship. In order to identify which sales person reports to a specific manager, we need both fields, SalesPerson and Manager.
Finally, our Qlikview model would look like this: