Data Fabric vs. Data Warehouse
When it comes to managing access to data on an organizational level, there are a number of models that are used, the latest and most useful of which is the data fabric. To understand its value, it’s helpful to understand how it fundamentally differs from an approach that has been the historical mainstay of data management: the data warehouse.
In many ways a data warehouse is similar to a public library. It’s an attempt to collect all the data that’s relevant to the organization in a single location, and then organize it so that it’s easy to find. In theory, this allows analysts to quickly locate the data they might need to answer a business-related question. However, it also has some drawbacks, not the least of which is that it requires a lot of time and resources to maintain. For this reason, the data industry has been looking for ways to take it to the next level, which leads us to the data fabric.
If a data warehouse is like a library, a data fabric is more like the internet, which is a virtual access layer for info that might be located anywhere in the world. Let’s discuss why this is the case by exploring some of the similarities and differences between the two approaches.
How data warehouses and data fabrics are similar
Data warehouses and data fabrics have a few things in common, both conceptually and functionally. For example, both of these systems:
Make data assets searchable, both by the specific dataset, and by topic
Allow users to connect to the data sources, once they’re located
Support governance, identifying sensitive information, etc.
How data warehouses and data fabrics are different
While both of these systems create a way for users to access specific data assets, they also differ in fundamental ways:
The amount of accessible data. A data warehouse is a curated selection of specific data assets, while a data fabric gives you access to everything--all the data, wherever it lives.
The need to move the data. Data warehouse solutions require a significant amount of ETL (extract, transform, load), but a data fabric uses the data wherever it lies. A data fabric is a unified virtual data layer that sits on all your data repositories--Hadoop, Oracle, Snowflake, Teradata, etc.--and allows you to retrieve the data as quickly as you locate it. So the data never actually moves.
Scalability. Data warehouses have natural limits in terms of their scalability due to the fact that they’re a physical collection of data. Data fabrics, on the other hand, are simply abstract connective layers that connect data sources. When you launch a new website, you don’t worry about whether the internet can handle it, because the ‘internet’ isn’t really a thing--it’s just a virtual access layer that’s been abstracted over a lot of different real things.
Central vs. department-level data management. A data warehouse requires that a central data team directly manage data assets. A data fabric, on the other hand, allows the department that owns the data asset to control the data.
One final point is that a data fabric may contain a data warehouse, but a data warehouse will never contain a data fabric. A data fabric exists at a higher level--it is an abstraction of all kinds of data sources, one of which may be a data warehouse.
In summary, think of a data warehouse as a collection of data assets drawn from your organization, and curated in a very tightly controlled environment. There are some good things about this, but while this model may work for certain critical data assets, it’s generally not sufficient to handle today’s skyrocketing data demands. The data fabric, on the other hand, offers an approach that gives the flexibility, speed and scalability that today’s organizations require.