Posted by Trey Johnson on 18 December 2020
I’ve often found that many people do not consider Data Warehousing to be an easy undertaking. It could be the challenge of a data warehouse being IT-driven, or it could simply be that people want to explore and follow hunches with data before asking for deep investments around their data platform. Maybe it’s human nature: to want to move beyond something perceived as difficult towards something perceived as easier or less effort.
With various recent announcements, the tools people use for visualizing data – notably Power BI and Tableau – have each received some essential improvements in the areas of Data Acquisition, Preparation and Enrichment.
For Tableau, it is the Tableau Prep feature set, which is part of their Creator License along with Tableau Desktop. Meanwhile, Microsoft has been hard at work delivering capabilities through Power BI practically monthly since the product’s inception (or at least since Power BI 2.0). And, for the last six months, since, Microsoft has been talking about the evolution of Dataflows from what was formerly Common Data Service for Analytics along with the envisioned features of Datapools. (Worth checking out James Serra’s blog for more on this). Also see what ZAP provides for your Tableau Data Warehouse set up.
In November, Microsoft put out much introductory information about the Dataflows technology. They detailed some of the key capabilities of dataflows and announced the availability for ‘public preview’. There are commercial implications of dataflows (it’s not free, after all) but we’ll wait for Microsoft to formalize what the implications are. (Matthew Roche’s blog on Dataflows is also well worth checking out.)
If you don’t have a Data Warehouse, I’d encourage you to spend some time reading the ZAP blog (and website) to learn more about what a Data Warehouse – and, more importantly, automated Data Management – provides to businesses that want to get fast reporting from single or multiple data sources such as ERPs, CRMs, HR and financial systems.
(You may also be interested to read "5 Reasons Why a Data Warehouse Improves Business Reporting")
There also various capabilities which naturally will not ever evolve within Data Preparation tools but are needed for building Data Warehouses or Structured Data Platforms. Realistically, self-service Data Preparation is a step towards – but not the full journey – towards maturity in the full realm of Self-Service Data Exploration and Presentation. Tableau, Microsoft and many others have unlocked the ‘first stage’ of Self-Service in a way where organizations achieve real value. Next, through the use of Self-Service Data Preparation and Enhancement, the work of the individuals can be made stronger – but that is not the greatest strength of these tools.
Data Prep and Dataflows give the traditional non-IT user a means of blending data for unique investigation – possibly “hunch following” – with the standard data (data warehouse) of the organization.
In this way, data preparation is perfect at ensuring the standard metrics of the organization are preserved but the ability to temporarily, at first, blend data from new processes is key to either understanding more about what is happening or support assertions of what might happen.
One caveat: some data delivered via data prep activities may become useful if there is the required value and the focus. In fact, discoveries via data from additional prep means that insight might fully evolve to focus on additional current or future business processes. And the great news is the effort applied to blend data via a data prep tool is generally less significant or involved than the work put into the data warehouse.
Data Preparation opens the door on expanding the information being portrayed by a fairly static data source (like a Data Warehouse) when combined with data from a fairly dynamic – and new – process. The work being done by Analysts all the way to Data Scientists thus creates new and enriched data, which inexpensively and possibly temporarily blended with the main data from a Data Warehouse, highlights the exact information that requires action and affects business strategy.
(You might also want to check out the infographic "Business data preparation")
As I mentioned in this IT Toolbox article, building BI dashboards, visibility to the next information (even unanticipated) is key to ongoing success. In short, BI dashboards fail when they don’t provide the ability to get the next answer. Most dashboards provide an initial answer, but being unable to ask the next question and get the next answer means that people move on to other analytical resources. With Data Preparation, the opportunity to fix temporary gaps in insight is strengthened and, as we’ve mentioned, creates a longer-term opportunity.
A fundamental belief of mine is that using Data Preparation tools is most valuable when enriching structured data sets, like the Data Warehouse. In doing so, you are not replacing the data the business has accumulated OR replacing the need for this data, just making sure more of the underlying story is told. In summary, let’s look at three scenarios:
If you do, and if there are members of staff or partners wanting to use tools like Tableau Prep, Power BI data prep with dataflows or other data preparation tools (and you know your users, I hope), go ahead – let them become better data storytellers, if it is within their capabilities.
If you are, and if you want your self-study of the data to be part of a larger story, remember that, often, structured data sets from multiple systems are borne out of semantics or “rules” behind the data which might not be easily understood. In other words, adding confidence to existing data is far greater than trying to replace it.
If it does, and if a functional data store/data warehouse seems out of reach, I would highly encourage you to use data preparation tools to confirm the business’ data needs and consider the evolution to a resilient platform, much like a data warehouse, over time.
In all cases, data preparation can play an important part in getting (new) information to the business quickly. Having an automated data management platform like ZAP Data Hub matches this velocity in establishing a data warehouse environment where you can store data on your premises on SQL Server or in the cloud (Azure and Azure SQL) and make those incremental data discoveries a permanent part of your organization’s ongoing information.
Trey is Chief Evangelist and leader of ZAP’s Americas business with a background of 25 years working with SQL Server and Microsoft Data Platform technologies. His other roles include being an industry speaker, published author, Board Member of the PASS organization, member of Microsoft’s Advisory Councils and community enthusiast for the last two decades.
View my social profiles: LinkedIn | Twitter