A New Choice of Armor: The Flag Matrix

May 22, 2011 at 2:51 pm 9 comments

As you know already, I’m a big fan of using Set Analysis in my QlikView applications, especially for Point In Time Reporting. However, there might be some specific situations in which Set Analysis just won’t cut it and you need an alternative.

In Context

I found myself in that situation this month in a project where the QlikView application was required to have Sales Data to the highest detail (resulting in a Fact Table of about 300 million rows for two years worth of data) and several analyses of the Pint-In-Time kind were to be made. So, I started doing the usual stuff: built the Data Model, prepared the calendar with some fields to optimize the use of Set expressions, and started creating the objects. To my surprise, when I created the PITR analysis objects, I noticed a poor performance, a very poor performance indeed. I say “to my surprise” because I was counting on having a high performance app, despite having that amount of data in memory, based on two things:

  • The Server I was working on had 512 GB of RAM, and 4 octo core processors. (What else would you need with a server like that, right?)
  • The Data Model was not that complex, I had already worked with similar models and required analyses, and had not had any problems with my set expressions.

But there you go, not everything goes as first planned in terms of performance, and I started making some adjustments: I tried summarizing the data, simplifying the Data Model a little bit, and even considered having the pre-built fields for MTD, LYMTD, etc. This last option was not the best actually and I was just not satisfied with settling for that because it would mean the user would not be able to “play” and analyze or make comparisons based on different months.

After several days trying to figure it out, I was presented with an option I discarded instinctively: using Flags in the calendar in order to use expressions like Sum(Sales * FlagMTD). The reason I first doubted using this approach was very obvious to me: How would an expression that had to be evaluated on a row-by-row basis would be better than Set Analysis? Plus, that is an old-fashion method, used when there was no Set Analysis. I was wrong, not in the second part of my reasoning, but in the first.

Turns out that on the first test, Flags performed a lot better than Set Analysis: From around 16 seconds in a single and basic object using Set expressions, to less than 1 second using Flags on the same object. The Dashboard Sheet, which was taking nearly 50 seconds to calculate using Set expressions, took less than 5 using Flags. Sometimes it took less than 1, depending on the selections I made.

I felt relief for being able to deliver a high performing app, and at the same time I was taught a lesson: Try it before you trash it.

Technical Details.

So, how does it work? If this approach is new to you, I will advice you to see the following explanation from the “associative” perspective, since I think it would be easier to understand it that way. The following diagram illustrates the concept.

Calendar and Fact Tables Association.

We have the Fact Table to the left, and the Caleendar Table to the right, linked by a field called “KeyDate”, which is not the field used for the UI selections. We have split the Date Field in order to associate ONE date in the Calendar Table to SEVERAL dates in the Fact Table. You can see it as a “Translation” of Dates, if you like, in which you have on one side the Dates that will actually be used for selections in the UI, and the actual Dates with which QlikView will do its work, on the other side. In the above example, Feb 2011 in the “User Date” gets translated/associated to the following “Transaction Dates”:

  • Feb 2011 itself, with the Flag for “This Year Month To Date”
  • Feb 2010, with the Flag for “Last Year Month To Date”
  • Jan and Feb 2010, with the Flag for “Last Year YTD”
  • Jan and Feb 2011, with the Flag for “This Year YTD”

The most important thing that you need to understand when using this approach is that, even when the User Date is associated with different Transaction Dates, you will be able to choose which values to actually take into account by using the Flag in your expression. In this way, all the dates that you want to exclude in the calculation, will be zeroed out by the Flag itself.

Your final expression should look something similar to:
  • For TYMTD Calculation: Sum(SalesAmount * _TYMTD)
  • For LYMTD Calculation: Sum(SalesAmount * _LYMTD)
  • and so on…

Things to take care of

There are a few things you should consider when creating your Data Model using this approach:
  1. Nulls. I’ve noticed that, since what is being done here is a direct multiplication (row by row), having nulls in the fields being multiplied may slow down the calculation. So, for best performance, make sure you don’t have any nulls netiher in the SalesAmount field nor in any of the flags.
  2. You might want to use variables for your different expressions. It will give you more control and is better for maintenance.
  3. The user needs to always have one selected value in the Month and year field for this to work. If this is not the case, the calculations will yield unexpected results.

Conclusion

The Flag Approach is an alternative to Set Analysis, and that doesn’t mean than one is better than the other. I guess it would depend on several factors such as the Data Model, the hardware, and the Data Volume. I encourage you to try and see how it performs on distinct scenarios before picking your default armor for dealing with Point In Time Analyses.
Want to see the Flag Matrix in action? Download the script here.


About these ads

Entry filed under: Data Model. Tags: , , .

QlikView mobile: AJAX Touch Client Power Up your Big Apps with Binary Index Expansion

9 Comments Add your own

  • 1. Rob Wunhderlich  |  June 5, 2011 at 6:15 pm

    Great post. I think some people believed that with the advent of Set Analysis, the flag technique would become obsolete. You’ve published a great case study illustrating “it depends”.

    Set Analysis has to redrive the data association code. Multiplying by 1 and addition are simple common operations that probably run highly optimized.

    Thanks for posting such a detailed case.

    Reply
  • 2. Jay Jakosky  |  July 2, 2011 at 4:19 pm

    Excellent post! Set Analysis is not a panacea.

    Flags and Set Analysis are not exclusive! It is common for most expressions on a chart or sheet to benefit from flags and a few expressions to require set analysis.

    Reply
  • 3. Jason Michaelides  |  August 30, 2011 at 9:52 am

    Hi there. This is a great post – thanks. I have a question though – why does the user always need to have a month and year selected? Not sure why the results will be unexpected…

    Thanks,

    Jason

    Reply
    • 4. Mike  |  August 30, 2011 at 10:24 pm

      Hello Jason! Thanks for leaving your comment.

      The reason for requiring the user to have only one month and year selected is the nature of the associations built with the Flags table, i.e. the calendar table does not have a one-to-one relationship with the actual dates in the fact table. Basically, you need to specify a reference point for the Flags Table to work their way through your model.

      For example, The period 2011-08 in our calendar table, would have different associations with the fact table, such as 2011-07 (Previous Month Flag), 2010-08 (Last Year MTD Flag). Therefore, the fact that you specify the use of one flag (say, Previous Month), doesn’t specify which month do you want to take as reference point.

      I hope I clarified the point. It’s actually a little difficult to come around the idea, and I would suggest, if you haven’t already done so, that you play around with the sample script and figure out the associations.

      Best regards,
      Mike.

      Reply
      • 5. Jason Michaelides  |  August 31, 2011 at 2:00 am

        Thanks Mike. Yesterday I did indeed download the qvw and start investigating! As you say, it is a little challenging to get your head around but I’ll perservere as it seems a very elegant solution to a comon challenge.

        Jason

  • 6. Matthew Stephen  |  January 14, 2012 at 3:00 pm

    Excellent solution!

    I’m trying to think how this could be modified to calculate a YTD flag for a fiscal year starting 5th April – but no luck so far

    Reply
    • 7. Mike  |  January 14, 2012 at 3:14 pm

      HI Matthew,

      Thanks for your comment. So far, I have not needed to adapt the script for Fiscal Calendars, and I agree it might be a little tricky to get the algorithm right. If you get it done, I would be interested in taking a look at it, so please share.

      Have a good day.
      Mike.

      Reply
  • 8. Ed DeMeo  |  February 10, 2012 at 12:53 pm

    Matthew,

    Hopefully the information I am providing below will help you create the FISCAL YEAR flags…Below is an excerpt from a script where I am creating Fiscal Year flags.

    LOAD “DATA_DATE”,
    Day(DATA_DATE) as DAY,
    “CALENDAR_MONTH”,
    “CALENDAR_YEAR”,
    “CALENDAR_YEAR” & ‘C’ & “CALENDAR_MONTH” as CalendarMonth,
    “CALENDAR_NAME”,
    “FISCAL_MONTH_NUM”,
    “FISCAL_YEAR_NUM”,
    “FISCAL_YEAR_NUM” & ‘M’ & “FISCAL_MONTH_NUM” as FiscalMonth,

    //Fiscal Year Flags
    If(((FISCAL_YEAR_NUM >=2008 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2009 AND FISCAL_MONTH_NUM =2009 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2010 AND FISCAL_MONTH_NUM =2010 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2011 AND FISCAL_MONTH_NUM =2011 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2012 AND FISCAL_MONTH_NUM =2012 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2013 AND FISCAL_MONTH_NUM =2013 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2014 AND FISCAL_MONTH_NUM =2014 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2015 AND FISCAL_MONTH_NUM =2015 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2016 AND FISCAL_MONTH_NUM <=12)), 1, 0) AS FY15

    Reply
  • 9. Ed DeMeo  |  February 13, 2012 at 12:14 pm

    //Fiscal Year Flags
    If(((FISCAL_YEAR_NUM >=2008 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2009 AND ISCAL_MONTH_NUM =2009 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2010 AND FISCAL_MONTH_NUM =2010 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2011 AND FISCAL_MONTH_NUM =2011 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2012 AND FISCAL_MONTH_NUM =2012 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2013 AND FISCAL_MONTH_NUM =2013 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2014 AND FISCAL_MONTH_NUM =2014 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2015 AND FISCAL_MONTH_NUM =2015 AND FISCAL_MONTH_NUM >=1) AND (FISCAL_YEAR_NUM < 2016 AND FISCAL_MONTH_NUM <=12)), 1, 0) AS FY15

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 1,303 other followers

Recent Posts

Twitter


Follow

Get every new post delivered to your Inbox.

Join 1,303 other followers

%d bloggers like this: