Today’s enterprise data landscape is very complex. Data is coming to the enterprise from everywhere and anywhere at high speed from myriad of data sources. Some data is structured, some data is unstructured, some data is binary, and some data is semi-structured. Some data is coming at constant velocity while other data is coming real time. Some data is also coming as stream such as from social channels and smart IoT devices. Against this backdrop, it’s no wonder why every enterprise is striving to be data-driven to tackle this data deluge as a way glean insight to transform business and to innovate better products and services.
To achieve this goal, data must be treated as an enterprise asset for it to have real value. Data must be trustworthy and of high quality.
The problem is every enterprise is attempting to be data-driven as part of its digital and technology transformation initiative but a very are succeeding.
Let’s discuss why data transformation is difficult. Let’s start with discussing some common data integration challenges.
Modern Data Integration Challenges
While the data challenges that we are going to be talking about below apply to any industry striving to be data-driven as part of digital transformation, we will specifically focus on clinical research industry.
- Clinical trials process is lengthy and expensive, currently touching an average 8 years with $2B to launch a new drug. And high-quality data is the lifeblood of clinical research. If the power of data can be leveraged at scale and near real time, particularly in areas of streamlining patient recruitment, retention, monitoring, and adherence using real world data (RWD). Clinical trials can be executed faster and cheaper, the benefits of which is that life-savings drugs and therapies can come to the market faster.
- RWD data is normally gathered from outside the confine of randomized control trials (RCT), and consists of data gathered from patient registries, electronic health records, pharmacy and health insurance databases, claims and billing databases, social media pertaining to quality of life and non-adherence, patient-powered research networks (PPRNs) and platforms, and Patient-reported outcomes (PROs) systems. RWD produces real world evidence (RWE), which is actionable when powered by analytics, machine learning & artificial intelligence (AI). RWE provides insight beyond traditional clinical trial data, adding potential to link data from different sources; improve trial efficiency; identify new indications, and a real-world perspective of risks/benefits to make informed decisions beyond traditional clinical trials
- Data sources in clinical research also include data of various types – real time streaming data from IoT sensors such as smart watches, sleep tracker devices, continuous blood pressure monitoring devices; unstructured data from EHR and EMR systems including doctors’ notes, pathology reports; binary data from images and scans, and structured data from myriad of internal and external systems. Each data type requires a methodical data architecture approach to ingest, process, and convert before it becomes RWE.
- Then, we have semi structured data from various logs as part of overall system architecture. This includes data from system logs, application logs, network logs, device logs, among many that are in clinical trials ecosystem, constantly interacting to execute clinical workflows and processes.
- Not only is the above collective data voluminous, but it is also of high velocity with high veracity. Real time data from sensors and medical monitoring devices, for example, is inherently messy with gaps. Data from EHR systems are not clean either. On top, one EHR system does not play ball nicely with other EHR system. While standards such as Fast Healthcare Interoperability Resource (FHIR) is promising interoperability, the progress has been slow. Data is also often incomplete and has interoperability challenges and hard to integrate with data from other systems because of different storage formats, data models and semantic meanings.
- If this is not enough, organizations, in addition, want to build “smart data products” on top of integrated data for data monetization and operational efficiency using advanced analytics and predictive algorithms using AI/ML/NLP/NLU, which means data must be cleaned, enriched, integrated, governed and of high quality with built in data auditability, governance, ownership, lineage, and provenance in mind so as to build sustained trust in results from these data products that can be explained.
- Finally, from data infrastructure deployment perspective, the clinical data landscape is increasingly becoming hybrid – some data is on premise and some data is distributed across multiple public and private clouds, creating not only data integration nightmare but also data management and data governance challenges. Not to mention, data privacy laws such as GDPR and CCPA are adding salt to the injury of data integration complexity. Data lineage, data auditability and data provenance are being demanded constantly.
Against this backdrop, enterprise challenges and the opportunities that data brings, we discuss the emergence of two data architecture patterns below – Data Mesh and Data Fabric.
Remember, these emerging patterns are not meant to work in vacuum in isolation, instead if architected right, must work harmoniously side by side with existing data architecture patterns like operational data store (ODS), enterprise data warehousing (EDW) and Enterprise data lake (EDL). For example, data fabric and data mesh can feed “real time data” into EDW, among many other use cases.
Let’s get into the basics of Data Mesh and Data fabric.
Data Mesh and Data Fabric design patterns offer a new way of approaching complex enterprise data landscape as discussed above, in pursuit of supporting dynamic delivery of semantically enriched data and creating a resilient digital data platform foundation with self-service enablement for all enterprise data consumers.
Data democratization is the goal.
By weaving together data from internal silos and external sources as well as from existing ODS, EDW and Data Lake, data fabrics and data mesh design patterns create a network of information to power countless applications and empower power users and data scientists, unleashing innovation and enabling digital transformation.
Data Mesh Architecture Pattern
Much in the same way that software engineering teams transitioned from monolithic applications to microservice architectures, the data mesh is the data platform version of microservices. The data mesh architecture pattern embraces the ubiquity of data by leveraging a domain-oriented and self-serve design. The tissue that connects these domains and their associated data assets is a universal interoperability layer that applies the same syntax and data standards, driven my master metadata management and master data management with support from enterprise data catalog and data governance. The data mesh design pattern is primarily composed of 4 components: data sources, data infrastructure, domain-oriented data pipelines and interoperability. The critical layer is the universal interoperability layer, reflecting domain-agnostic standards, as well as observability, provenance, auditability, and governance.
Data Fabric Architecture Pattern
Data fabric architecture pattern encourages a single unified data architecture with an integrated set of technologies and services, designed specifically to deliver integrated, enriched and high-quality data – at the right time, in the right method and to the right data consumer in support of both operational and analytical workloads. This data architecture pattern combines key data management and governance technologies including data catalog, data governance, data integration, data monitoring, data pipelining and data orchestration.
Both data mesh and data fabric complement existing enterprise investments in operational data store, data warehousing and data lake. The trick is to make the best uses for all these patterns to unlock data value within enterprise. These patterns complement each other.
Data transformation is a difficult undertaking that requires a disciplined approach from enterprise architecture perspective. It also requires patience and modern skillsets across teams including knowledge in cloud computing, distributed computing and building scalable data pipeline, not to mention deep understanding of AI and ML techniques. It needs a refresh in “data culture”.
Data transformation is also a journey. As business evolves, new data sources are going to be added. Enterprises may buy other enterprises or get merged with another enterprise. The data landscape underneath is here to be dynamic.
This requires a robust data architecture that can scale as business scales.
Finally, instituting “data literacy program” throughout the enterprise is a critical component towards achieving a successful data transformation. This includes implementing data architecture review board and data governance working council, as a cross-functional business initiative.
Data transformation is not just a technology and architecture problem – people and process transformation along with culture transformation are a huge part of it.