Working with data at different levels of granularity

I’m currently working in a project for a client that reports on data with different levels of granularity. Of course this becomes a problem as the architecture becomes a bit messy. However, it becomes really interesting the way you can play with Qlikview even when you have a de-normalize schema.
I’ll try to explain what I want to achieve with an example.
Let’s suppose we have two fact tables. I will not include any dimension tables to make the example simpler, but I will still have the dimension values within the fact table.
My first fact table contains data at the day/city level. It looks like this:

Date
Continent
Country
State
City
Downloads

From this table I can know the number of downloads for a song I uploaded to my site. This is by day and City. Having the city I can then create a hierarchy up to the continent level.

My second table, on the other hand, represents the number of times my song was played online (streaming) on my website. This is by day, and country.

Date
Country
NumOfTimesPlayed

Knowing that we cannot de-aggregate this last table to the city/day level, I needed to find a way to show the number of times my song was played when the users choose continent.

Joining the tables by day is not an option as we might have data in the first table for a certain day that we don’t have on the second table and vice versa. For this example, we assume that we have the same countries in the first and second table.

Within our physical model, there’s no connection between my first and second fact table. The good thing is Qlikview allows us to create logical data models.

A Logical data is based on business requirements and doesn’t have to follow strict DB definition. To clarify, joining two tables with different levels of granularity is not correct when working on your physical model, but it can be done in a logical model.

After joining my two tables I have this:

And the data in my tables would look like this:

If I select America, both tables will filter USA and Argentina but I will see different dates. What I need is my second table to show the exact same dates and countries as my first one.

Naturally, I can’t achieve what I want with this data model, so I need to find a way to make the second table show whatever I select in my first table and for this we can use the P() function. The P() function represents all possible values and allows a field name as a parameter.

I will now change my expression in the second table from:

sum(NumOfTimesPlayed)

To:

Sum({<DatePlayed P(DateDownload)>} NumOfTimesPlayed)

My model will now respond as expected and the second table will show only those countries and dates filtered in my first table.

Conclusion

It is ideal that our model contains data with the same level of granularity, but there are many cases that this won’t happen. Qlikview allow us to easily find a work around to manage this complex, de-normalize models.

JV

Advertisements

Tags: , , , ,

4 responses to “Working with data at different levels of granularity”

  1. Erdöl Biramen says :

    Why don’t you just join the two tables by country AND date?

  2. Erdöl Biramen says :

    Joining the tables by day is not an option as we might have data in the first table for a certain day that we don’t have on the second table and vice versa.

    If I select America, both tables will filter USA and Argentina but I will see different dates. What I need is my second table to show the exact same dates and countries as my first one.

    These are conflicting statements. If you want the 2nd table to show exactly the same dates as 1st table, then you should use a left join. If you want to see all dates from both tables then use a full outer join.

    • Business Intelligence Experience says :

      Hi Erdol,

      A left join will make me lose data from my second table.
      Ex:
      TableA:
      Jan
      Feb

      TableB:
      Feb
      Mar

      If I left join A with B, I would lose March from table B.

      The full outer join could be a solution but I wouldn’t be able to explain the use of the P() function which is the purpose of this post even if I don’t explicitly mention it 🙂

      Thank you for your input!

      Juan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: