Posted by Chris Reeves on 29 January 2021
Welcome to the third part in this series on historical reporting, subtitled Part Three – “When things are a-changin’ (slowly changing dimensions), under the hood”… I’ll assume it’s only a technical audience from here. You may recognize that in the previous post we just implemented an SCD Type II solution. This is a simple form of historical reporting because we have no measures to concern ourselves with. So let’s lay down a technical foundation, which the other historical types will build upon.
(Read the previous post in this blog series here: "Business data reporting against historical sales data and historical ERP data – Part 2")
First let’s quickly review the reporting requirement, which is to generate this:
From source data such as:
And we would like to accommodate the fact that one or more of our salespeople may migrate. In this example, we’ll have Jenny migrate from East to West.
The following code is the SQL-equivalent query to generate the above chart. However, the catch is that we to need to join each Sales Amount row to the correct Sales Person row as it was at the time of the transaction.
We’ll fill in the join criteria later, but first, we need to take our Sales Person data and convert it into a form that includes changes over time. So instead of this:
We require this:
We only want to track changes in the Region column. Needlessly tracking columns results in excess processing and storage. It can also result in incorrect information, such as when an address is corrected in the source system, but the history of the error is maintained in the warehouse. For these reasons, and for backup efficiency purposes, a dedicated history table is used, as in the following:
Here the first two columns are the key of the table and the next two provide the range of applicability of the records. We now require the ability to populate this table as the source system changes over time. And now for some more challenging SQL:
I’ve included some comments, but I won’t go through the syntax line by line. Microsoft’s online documentation does an excellent job explaining the Merge statement.
There are three use cases the SQL statement is accommodating:
The net result is that, running the above query on a scheduled basis will result in the history table being maintained with the data necessary to meet our requirements.
Note: for simplicity, this SQL populates RevisionStart with “GetDate()”, while in practice this value would be derived from a source system column.
We now need SQL to combine our historical columns (from [History].[Sales Persons]) and our non-historical column(s) (First Name from [Staging].[Sales Person]).
The SQL is a relatively simple outer join, as below. Which we can use to populate a [Warehouse].[Sales Person] table or create a view.
Just to recap, before we tie it all together, we have:
The only remaining challenge is to complete our chart-query by filling in the join criteria between our fact table (Sales Amounts) and our historical dimension (Sales Persons). We could specify a range-join here, but this would require us duplicating such logic for each reporting query. There is also a slight complication that some reporting tools may not handle composite keys (such as Power BI as at Nov 2018). So here we make use of the concatenated key (alluded to above) on our [Warehouse].[Sales Persons] table. At this point we’ll create one view to sit on top of the Sales Amounts table, as follows (as before you can also choose to use this SQL to populate a table):
Note: CreatedTime should be provided by a source system column.
And finally, we can fill in our join criteria by making use of the new concatenated key column from the above view:
At this point you are now armed with the knowledge to implement your own SCD II solution or at the very least, to appreciate the value of the advanced historical capabilities provided by ZAP Data Hub. Please join me next time around where we take a non-technical look at how to track measures that change over time.
(Want to continue reading? Check out the next and final post in this blog series here: "Business data reporting against historical sales data and historical ERP data – Part 4")
Chris joined ZAP in 2006 as a developer and has gone on to become our Chief Technology Officer. He's based out of our Brisbane, Australia office and blogs on technical aspects of data warehousing.
View my social profiles: LinkedIn | Twitter