The Unified Analytics Warehouse: An Idea Whose Time Has Arrived
The Unified Analytics Warehouse (UAW) is a marriage between two topics in enterprise data management that have been around for some time--the data warehouse and unified analytics.
Unified Analytics is an attempt to manage the gap between data engineering and data analytics/data science, bridging the disciplines to make such concepts as accelerated AI and data-driven decision-making feasible for organizations.
The data warehouse has been around since the 90s, and can be summarized as an attempt to take an organization’s data and organize it in a central repository so that it is operationalizable, analyzable, and primed for data discovery. It sounds great in theory, but there are a few notable challenges:
It doesn’t take into account the dispersed, distributed nature of today’s enterprise data architecture. To be consolidated in a warehouse, data has to be moved from one storage platform to another, an endeavor that just isn’t practical for most organizations.
Also, it is RDBMS-centric. The data warehouse is built for the relational database model and structured data. It’s not designed for semi-structured and unstructured data running on various NoSQL models or data lakes. This alienates data teams from a huge portion of the data they need.
The data lake proponents encountered a similar problem but on the flip side. They were optimized for semi-structured, non-relational data but didn’t offer the consistency and reliability needed for many enterprise applications, such as ACID (atomicity, consistency, isolation and durability) compliance. As a result, the data warehouse concept has fallen short of meeting the needs of modern data teams.
The data team’s workspace is a mess of repositories and tools
This creates all kinds of challenges for organizations, but the pain is particularly acute for data engineers, data analysts and data scientists who are forced to leverage a complicated network of platforms and applications to do their work. Imagine, for instance, that a data team was tasked with determining the effectiveness of certain branding tones and language on various customer demographics for a diversified product line. They might need to work with web stream data from customer-facing apps, email text, images, customer data from CRM and marketing automation systems and even geo-spatial data.
Consequently, they might have to gather data from Hadoop, OracleDB, MongoDB, TeraData and other systems using SQL and maybe even Pig, and then analyze it in Jupyter Notebooks, Colab, Tableau and other platforms. Chances are they’ll have to wrestle with different departments who ‘own’ the data--a problem which even the most well-maintained enterprise data catalog won’t solve on its own. The drag this puts on data-driven decisionmaking is inestimable--an answer to one simple question can take months.
Vendors have tried to address this, but efforts have been inadequate, either failing to meet the stringent RDBMS-related needs of enterprises, or lacking the flexibility and scale of the data lake. Conventional approaches require compromise. Data scientists and engineers, however, don’t have time for such compromises if they want to take their organizations to the next level (or even simply remain afloat in today’s hyper competitive landscape). Data teams need to be able to work across all platforms and data types simultaneously without having to hunt for data or continually switch gears. This is where the Unified Analytics Warehouse comes in.
The fundamentals of the Unified Analytics Warehouse (UAW)
As John Santaferraro of EMA asserted, UAW is unified because it handles multi-structured data in a single platform, and a warehouse because it stores multi-structured data in an organized and accessible manner. He further notes that a UAW needs to:
Be storage agnostic, able to tie together multi-structured data stored in any hardware or cloud platform, and providing analytical capabilities across all storage tiers
Meet security, regulatory and compliance requirements
Offer embedded machine learning algorithms and for advanced analytics
Automate the more time-consuming and rote aspects of data prep
Recommend structure, schema and data relationships
Handle the full performance spectrum in terms of data volume, concurrent users and compute-intensive requirements
Support the full range of analytics approaches, including R, Python, Tableau, Looker, Power BI, as well as Jupyter and other notebooks
Offer ready access to multi-structured data using SQL
In a nutshell, a UAW needs to unify all interactions with both data and analytics tools through a ‘single pane of glass’.
How can it be done?
Many vendors are trying to be the end-all-be-all platforms, but this is problematic for one reason: data is difficult to move and most companies now have a vast patchwork of databases, data warehouses and data lakes operated by different vendors and accessed through different BI tools, and run by different departments and bureaucracies. Modern data infrastructure might span Hadoop, Snowflake, S3 buckets, Oracle, Tableau, Power BI, Looker, and other tools and platforms. Trying to upend this is a mammoth undertaking that is doomed to failure.
Virtualization, however, provides an alternate approach that involves using software to create a “virtual layer” of simplification that shields users from underlying complexities of IT architecture. It enables you to skip time-and-resource-intensive ETL (extract, transform, load), and allows data to be generated and analyzed on the fly without actually moving anything, using the tools of your choice.
If you’d like to learn more about how Promethium has partnered with Starburst to use virtualization to build a Unified Analytics Warehouse, read on.