In the history of digital evolution, one very apparent benefit gained by organizations was (and still is) the access and control over their data, and lots of it. No longer did companies need to rely strictly on paid service providers that specialize in collecting, analyzing, and interpreting the companies’ actions and efforts to better understand the market’s corresponding responses, reactions, as well as, projections for the future. Now, they too were able to easily, and in near real-time, generate, collect, and access data and analytics, to do with them as they wish.
First generation of data management
Cloud adoption and evolution of data warehouses to data lakes
While there are many benefits of data warehousing, it did not prove to be very conducive to rapid scaling. Therefore, corporations needed more scalable approach to collect and process the ever expanding volume of data, requiring data management principles to evolve. The four Vs of big data (volume, velocity, variety and veracity) were considered while solutioning the data pipelines.
Cloud adoption, MapReduce, micro batching, stream processing, and other technical advancements helped to push the boundaries of operational data management techniques and principles to form data lakes. They enabled processing and storing of huge amounts of unstructured data, shifting the responsibilities of orchestration and transformation of the datasets closer to the consumption side of the data management organization structure.
There were also significant developments in the analytical data management front:
Web analytics and their supporting algorithms matured over time to form plotting and extrapolation of graphs, enabling training of data for machine learning, and further developing into the makings of deep neural networks.
Monetization and valuation of data became a common practice with the enablement and access to causality, and their corresponding weighted data and measures.
Even sentiment, affinity, and loyalty analysis using metrics from social media engagements and qualitative studies offered insights for organizations and individuals (especially for public figures or influencers) as a way of measuring their standing and effectiveness with their target audience.
A discoverable and composable data management platform
The organization’s ability to successfully: read the market accurately and promptly, anticipate the rising trends or a change in direction, and minimize the reaction/response time and impact to disruptions and disruptors, can differentiates the leaders from the rest of the industry. And performing well in the above areas requires an organization to have meaningful, actionable, and timely data at their fingertips at all times, demanding faster and shorter data management lifecycle – the next generation in data management evolution, data fabric.
A data fabric is more of a design concept enabled by “weaving” a variety of tools and technologies available in the market to collect, store, catalog/organize, integrate, process/transform, and infuse data for consumption in order to continuously optimize
1) the decisions being made, and
2)benefits of actions taken by the business.
A data fabric operates on an abstraction layer, and can be characterized by the following notions:
- An integrated layer of data and connecting processes
- Semantic inferences and active metadata discovery
- Knowledge graphs
- Composable by design
- Spans across all applications and platforms of the organization
- Follows a metadata driven approach
Data fabric architecture
Data sources
- Relational databases or non-relational databases
- Data warehouses or data lakes
- Various cloud sources
- Streams from social media handles
- Web analytics streams and various data pipelines
- Logs from applications
- Spreadsheets, word documents, emails
- Unstructured data sources
- 3rd party data service providers
- Data from enterprise systems, etc.
Data catalog and passive metadata associations
Semantic inference
Knowledge graph
Active metadata
Package and publish
All the knowledge collected or inferred is of no good use unless it is appropriately packaged according to the needs of its consumers. The functional data preparation and delivery layer of data fabric is just that. Composition, aggregation, grouping, or filtering is done at this layer through self-service interface exposed to the consumers. Low-code or no-code based platforms come in handy in such areas, empowering non-technical users.
Orchestration and integration layer, on the other hand, is responsible for onboarding the consumers and providing endpoints or interfaces through which they can access the information. To the decision makers, the layer may provide live report dashboards, notifications over email, push alerts, etc. For automation platforms or IoT consumers, the layer exposes data through web services or streams; letting those systems or devices trigger appropriate actions.
How can an organization go about building a data fabric?
- Organization wide awareness and alignment
- Subject matter experts (SMEs) with the domain knowledge
- SMEs on AI & ML to set up
- Augmented data catalogs
- Semantic inferences
- Insights and recommendation engine
- SMEs on cloud, multi-cloud architectures – building each data fabric layers
- DevSecOps and MLOps rolling out the workloads
References
- https://www.ibm.com/analytics/data-fabric
- https://www.gartner.com/smarterwithgartner/data-fabric-architecture-is-key-to-modernizing-data-management-and-integration
- https://www.ibm.com/downloads/cas/WNBGAWZ1
- https://www.talend.com/resources/what-is-data-fabric/
- https://emtemp.gcom.cloud/ngw/globalassets/en/publications/documents/5-key-actions-for-it-leaders-for-better-decisions.pdf
- https://www.k2view.com/what-is-data-fabric
- https://www.collibra.com/us/en/blog/what-is-an-augmented-data-catalog
- https://www.researchgate.net/publication/3297947_Protection_of_Database_Security_Via_Collaborative_Inference_Detection#pf3
- https://bigid.com/blog/what-is-active-metadata/