Data Management, BI Analytics, ERP Reporting & Data Warehousing Blog | Talk Data to Me by ZAP

Business data reporting against historical sales data and historical ERP data – Part 3

Chris Reeves

Posted by Chris Reeves on 29 January 2021

Business data reporting against historical sales data and historical ERP data – Part 3

Welcome to the third part in this series on historical reporting, subtitled Part Three – “When things are a-changin’ (slowly changing dimensions), under the hood”… I’ll assume it’s only a technical audience from here. You may recognize that in the previous post we just implemented an SCD Type II solution. This is a simple form of historical reporting because we have no measures to concern ourselves with. So let’s lay down a technical foundation, which the other historical types will build upon.

(Read the previous post in this blog series here: "Business data reporting against historical sales data and historical ERP data – Part 2")


Running SQL Queries for Historical Sales reporting

First let’s quickly review the reporting requirement, which is to generate this:

SQL Queries Historical Sales Reporting
From source data such as:

Source Data Historical Sales Reporting

Source Data Historical Sales Reporting

And we would like to accommodate the fact that one or more of our salespeople may migrate. In this example, we’ll have Jenny migrate from East to West.

The following code is the SQL-equivalent query to generate the above chart. However, the catch is that we to need to join each Sales Amount row to the correct Sales Person row as it was at the time of the transaction.

SQL Query Historical Reporting

We’ll fill in the join criteria later, but first, we need to take our Sales Person data and convert it into a form that includes changes over time. So instead of this:

Sales Person Data Historical Reporting


We require this:

Sales Person Historical Reporting


We only want to track changes in the Region column. Needlessly tracking columns results in excess processing and storage. It can also result in incorrect information, such as when an address is corrected in the source system, but the history of the error is maintained in the warehouse. For these reasons, and for backup efficiency purposes, a dedicated history table is used, as in the following:

Historical Reporting Sales Data Chart


Here the first two columns are the key of the table and the next two provide the range of applicability of the records. We now require the ability to populate this table as the source system changes over time. And now for some more challenging SQL:

SQL Statement Historical Data


I’ve included some comments, but I won’t go through the syntax line by line. Microsoft’s online documentation does an excellent job explaining the Merge statement.

There are three use cases the SQL statement is accommodating:

  • New key detected: Insert revision 1 with a null RevisionEnd.
  • Existing key differs on a tracked column: Update the RevisionEnd for the top row for that key and insert a new revision.
  • Existing key no longer present in source: Update the RevisionEnd for the top row for that key.

The net result is that, running the above query on a scheduled basis will result in the history table being maintained with the data necessary to meet our requirements.

Note: for simplicity, this SQL populates RevisionStart with “GetDate()”, while in practice this value would be derived from a source system column.

Combining Historical and Non-Historical Data

We now need SQL to combine our historical columns (from [History].[Sales Persons]) and our non-historical column(s) (First Name from [Staging].[Sales Person]).

Historical and Non-Historical Data Combined

The SQL is a relatively simple outer join, as below. Which we can use to populate a [Warehouse].[Sales Person] table or create a view.

SQL Statement Combine Non-Historical and Historical Data

Notes:

  • The Duration column records the time between changes in tracked columns. In this example, the Duration column provides the time that each salesperson resided within a region. This provides yet further justification for appropriately setting which columns to track, as erroneously tracked columns will result in erroneous duration values.
  • And we’ll get to the concatenated key soon.

Just to recap, before we tie it all together, we have:

  • A historical table that holds only the tracked column(s).
  • A Merge statement that we can schedule to populate and maintain this table.
  • And a view (or table) combining the historical columns with the non-historical columns. Notably, the key of this view (or table) now includes a revision Id.

The only remaining challenge is to complete our chart-query by filling in the join criteria between our fact table (Sales Amounts) and our historical dimension (Sales Persons). We could specify a range-join here, but this would require us duplicating such logic for each reporting query. There is also a slight complication that some reporting tools may not handle composite keys (such as Power BI as at Nov 2018). So here we make use of the concatenated key (alluded to above) on our [Warehouse].[Sales Persons] table. At this point we’ll create one view to sit on top of the Sales Amounts table, as follows (as before you can also choose to use this SQL to populate a table):

SQL Query Table Warehouse Sales Amounts

Note: CreatedTime should be provided by a source system column.

And finally, we can fill in our join criteria by making use of the new concatenated key column from the above view:

SQL Query Sales Amounts


At this point you are now armed with the knowledge to implement your own SCD II solution or at the very least, to appreciate the value of the advanced historical capabilities provided by ZAP Data Hub. Please join me next time around where we take a non-technical look at how to track measures that change over time.

(Want to continue reading? Check out the next and final post in this blog series here: "Business data reporting against historical sales data and historical ERP data – Part 4")

Get actionable historical reporting from your ERP, CRM and all other data sources


About the Author:

Chris Reeves
Chris Reeves
Chris joined ZAP in 2006 as a developer and has gone on to become our Chief Technology Officer. He's based out of our Brisbane, Australia office and blogs on technical aspects of data warehousing.
View my social profiles: LinkedIn |

Any feedback or questions? Leave a comment below...

Subscribe to Talk Data To Me

Latest Posts: