Government systems produce and store a large amount of data daily. Government leaders want to utilize this data to make decisions faster and more efficiently. It is nearly impossible to make well-informed decisions if that data is not visible, accessible, organized, and cannot be seamlessly discovered across the enterprise.
Alternately, superior strategic and tactical capabilities become available when data is easily discoverable, readily accessible, secure, and can be merged and analyzed. A critical first step to achieving this vision is to ensure data stores federate with each other and share information about the data they host.
———-
Federated Data Stores
Data federation is the ability to combine data from different data sources into a single view. When done correctly, data federation eliminates the need to store the same data in multiple repositories and improves data quality and accessibility. Data stores that support the ability to federate with other systems are ‘Federated Data Stores’.
Data federation does not copy data across data stores. Instead, it makes data visible without moving it from its current location. Essentially, data stores that are federated make it easy to access and find information residing in various data sources.
This federation is made possible by exchanging data-about-data (i.e., metadata). Federated data stores (FDS) make available metadata about the data they house.
Concept Diagrams
Figure 1 below depicts a conceptual federated data store use case. The architecture depicted allows interoperability and metadata sharing between independent data stores.
Participating data stores expose a common agreed-to API. The API makes available metadata about data in the store. This metadata can be consumed by authorized users and systems for various purposes which include enterprise-wide search, data analysis and more.
Figure 2 below depicts another use case of the federated data environment. In this instance, the users of the federated data store can discover data across both the data stores without the need the integrate with a new data source.
Federated Data Stores adhere to a common contract to share metadata information. The aforementioned API is how the contract can be implemented.
For example, a Minimal Viable Product (MVP) version of Federated Data Store could implement the following API calls:
- Get a list of all datasets housed in the repository
- Get particulars about a dataset
- Get a list of all data providers
Federating enterprise data stores offers multiple advantages, including reduced storage requirements, reduced time for data discovery, ease of use, and cost saving.
A working Federated Data Stores environment will enable rapid discovery, sharing, and analysis of enterprise data. The MVP listed above is an easy way to start developing and validating the FDS concept.
Author Information
Sanjeev Chauhan is a Lead Data Architect at MITRE Corporation. He has many years of software and data architecture experience. He has successfully led product development across a range of industries.
Approved for Public Release; Distribution Unlimited. Public Release Case Number 23-0143
©2023 The MITRE Corporation. ALL RIGHTS RESERVED.
The author’s affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE’s concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.