The Data Fabric: A Definition
The ‘data fabric’ is a concept that is rapidly gaining traction in enterprise data management, and can help you take giant leaps forward in terms of the value your organization derives from its data.
A data fabric provides a virtual layer that connects the entirety of an organization’s data, along with all of the processes and platforms that might be connected to it. It continually uses machine learning (ML) and artificial intelligence (AI) to make sense of and apply structure to data from various sources.
Just as the ‘Web’ refers not to a single software platform or piece of hardware but rather a layer of connectivity, so to the data ‘fabric’ refers to the connecting of many pieces of data-related software and hardware into a unified system. A data fabric integrates data that is connected to it through all standard data delivery methods, including streaming, ETL, replication, messaging, virtualization or microservices, and connects repositories that might range from relational and NoSQL databases to data warehouses, data marts and even data lakes like Hadoop.
A Contextual Layer for All an Organization’s Data
Once integrated into the system through one of these methods, a data fabric overlays the data sources with context. It does this by leveraging machine learning (ML) and artificial intelligence (AI) algorithms to canvas and understand the ‘metadata’ -- or the data that describes a dataset. For example, it might look at the various columns, data types and constraints placed on a data table in order to predict how it may relate to data that is located in other repositories.
The data fabric continually learns from existing data and new data that is added to your system. As time goes on it is able to make better predictions regarding potential relationships and points of integration between data owned by different departments, such as data held in a CRM system, and data held in a supply chain software system.
These relationships are made accessible to both technical and non-technical users through ‘knowledge graphs’ which visually map out the data sources and their relationships. These graphs are presented in the form of user-friendly charts which are overlaid with helpful descriptive information--in similar fashion to flowcharts--that allows users to easily identify data relationships that will provide answers to their analytics questions. In this way, the system would not only identify relationships in data that resides in different systems -- such as CRM and supply chain software -- but also provide a user-friendly visual means to identify and interpret those relationships. A user could not only quickly see what information was in their department, but in other departments too, and how it all connects together.
Automating Time-consuming Processes to Speed Analytics
As AI and ML algorithms continue to learn from your data sources, they can move on from merely identifying relationships to begin automating time-consuming processes that users typically must perform manually. For example, it may catalog previous queries, or data analytics questions asked by users, so that those processes don’t have to be re-engineered every time the same or even a similar question is asked. This not only makes data-derived insight available to non-analysts, but it also frees up the time of analysts and data scientists to deal with more complex problems.
Benefits of a Data Fabric
As you can imagine, a data fabric provides several unique benefits:
A unified data environment. This makes it possible to grant any authorized user access to all of your organization’s data assets, regardless of where those assets might reside. If your New Jersey-based procurement team needs to access production data housed in a datacenter located in Nepal, they can do that just as easily as they could if the data resided in Hoboken.
A much faster data analytics lifecycle. As analysts and data scientists no longer have to spend inordinate amounts of time hunting down datasets, the process of answering questions is sped up significantly.
Compliance and risk mitigation. A data fabric provides complete visibility into all the data, so if a regulator requires a particular dataset, you can easily find it.
Less need for ETL(extract, transform, load). The abstracted ‘fabric’ means that you don’t have to move data in order to access it or integrate it with the system. This means significant reductions in the cost and risk associated with moving or copying data into warehouses and other repositories.
More easily handle the Volume, Velocity and Variety of Big Data. Because you don’t have to move data in order to integrate it into your architecture, and because a data fabric works with all forms of data ingestion, including heavy workload processes like streaming, you can scale your data very fast. You simply connect the new data source to the fabric, and it becomes part of your overall system.
Want to learn more? Comparisons can be helpful - Data Fabric vs Data Virtualization