Businesses have always made decisions based on data. But as the volume of data available has grown exponentially, a new discipline has evolved. Data scientists, according to data from LinkedIn and other sources, are in high demand and are commanding massive salaries. Yet, faced with burgeoning data volume, variety and velocity, businesses are struggling to gain better insights because their business intelligence and decision support systems are trapped in thinking that was hatched almost four decades ago.
How do you make sense of all that data to gain insights that allow you to make better and faster decisions? We keep hearing about businesses needing to be more agile, but we still duplicate lots of data and complete complex transformation to make it fit a specific data model in order to allow business users to conduct analysis.
But that analysis is limited by what data is extracted – a decision often made by software engineers based on what’s most easily accessible and their understanding of the business problems and data that has been transformed using programmatic logic that is opaque to the people who need to know what the data means.
For decades, this model was the science of data warehousing. Business people would define the questions they want to ask and data engineers would determine the best sources of data to answer those questions. Data from those sources would be extracted, transformed and loaded into a new database. While today, we talk about ‘data lakes’, the principle is the same. Someone decides what data is needed, extracts it to a central location and lets users take advantage of tools to ask questions from the data.
There are many problems with this process. Deciding what data sources to use and designing the extractions, transformations and loading processes (ETL) is time consuming and labour intensive. In many cases, by the time ETL is in place, the questions the business wants answered have changed or the opportunity that needed the answers has passed.
The business logic that underpins these decision support systems is trapped within the program logic of the ETL system and is invisible from the people who most need the system and costly to change. It also makes companies dependent on a limited pool of people who understand the ETL logic and, potentially, locks them into a single vendor as transferring that knowledge to another system is too hard.
And even if you had access to the best data scientists in the world, the algorithms they use to conduct analyses would be limited by what the ETL process gives them.
It’s little wonder that Gartner reported that about half of data warehousing projects fail to pass user acceptance. And that success rate may even have been optimistic considering the impact of cloud and hybrid systems. The data businesses need to consider when making decisions is now spread far and wide.
There is a better way. What if we could make everyone in the business a data scientist by letting them access whatever data they wanted, wherever it is, and empower them. Instead of being limited by the cost and availability of data scientists, you can make every business operative into a data scientist.
What’s needed is a way to use the great new query tools we see, that put the power to ask almost any question, against data wherever it is. For example, say your business maintains an on-prem general ledger system, Salesforce as its CRM, a point of sale system and Zendesk for customer support. Throw in social media channels such as Twitter and the view of a customer becomes quite complex.
In a traditional data warehousing project someone would decide which data would be used for customer analytics. That view might neglect the general ledger because the CRM is seen as more important. Or it might ignore social media because of a bias against its importance.
Business moves faster than ever before with more data from more sources, the old ways, which are still dominant today, simply don’t cut it. But a virtual data fabric can be used with familiar business analysis tools and allow business users to use data from any source they choose – and not only the sources that were easiest or cheapest for the traditional ETL approach.
This allows users to leverage their business knowledge to become data scientists as they become unencumbered by the limitations of the old ETL approach. The shortage of data scientists becomes a non-issue.
This is where the world of data analytics is moving. It takes away the limitations of the ETL approach, replacing it with one that allows business users to decide what data they need to answer a specific question. It takes away the time and cost of ETL, empowering people to use data to make decisions in real time without waiting for a data engineer to decide what data they can use.