For Enterprise Data Architects, the most creative work—the epitome of professional satisfaction—is developing elegant designs for the target data layer and its integrated components, perfectly aligned with business needs. For an organization, engaging in major transformation of the data layer, while very expensive and sustained over multiple years, is important enough and flashy enough (new technologies, better data, sharper analytics, lower costs, etc.) to grab and hold executive attention.
BUT—the elephant in the room, casting its huge shadow over the bright, visionary efforts to craft and implement an optimized data layer—is the lack of metadata about existing legacy data stores and repositories, and, most prominently, the lack of metadata about data movement among them—point-to-point interfaces, services, snapshots, database links, bulk data transfers, etc. For the sake of brevity, we’ll refer to all of them as “interfaces.”
If illustrated in one floor to ceiling complex diagram with every line drawn, the ‘Gods-eye view’ of incoming and outgoing interfaces throughout the legacy environment would chill the soul of the most impassioned data warrior. It is not uncommon for organizations to relegate the overall legacy environment to intractable status, which could also be expressed as ‘we’ll limp along until we can build our way out of it.’ This posture is in effect denial; it blunts the urge to understand this major challenge by avoidance of parsing out its current and future impacts. It does, however, prevent the CIO from having to apologize to the lines of business for past decades of ad-hoc approaches, which led to an under-documented and under-controlled legacy environment.
Let’s expose the consequences to an organization of ‘limping along’ and inject a dose of practical reality. ‘You’ in the statements below refers to the organization:
-
Data is persistent. As long as you remain in the business you’re in, you will need the data you create, manage and use. Operational business needs are perpetual. You can’t stop the train.
-
Data is scattered. Interfaces—data in motion—are largely unmapped. You can’t recreate the past, but you can’t ignore it either. This state of affairs causes persistent challenges:
-
You spend an enormous amount of money (!) on integration testing annually, involving hundreds, perhaps thousands of interfaces. These costs are buried in the operations and maintenance budget for applications—if you were able to aggregate these costs, the total would be staggering. (A major contributor to the undesirable annual increases in steady state spend for IT).
-
You can’t easily trace data lineage, even for business-critical data—impacting business decisions, accurate analytics, and regulatory compliance
-
Because you can’t say with certainty what the data is and where it comes from, you can’t easily designate authoritative data sources. When attempting to select authoritative data sources, many organizations realize there are no good options (clean, accurate, timely, and approved by suppliers and consumers) and have to settle for choosing the least bad option.
-
-
Transformation of the data layer is an imperative and must continue. However, your projects and programs are dragging a heavy anchor:
-
You can’t easily develop standard data representations. You’ve called entities and attributes by multiple terms over the years, with the resulting complexity codified in numerous legacy databases. The effort and collaborative agreements required to streamline are time-consuming and expensive. You only undertake them by necessity – for example, when the ERP system must be implemented by the end of the year.
-
In the process of insulating the legacy data layer to lessen the impact on operational applications as target architecture components are brought online, you’ve created data provisioning services. However, unless you’ve developed a sound To-Be baseline against which to conform, for example, a solid Business Glossary and approved Business or Enterprise Data Models, you end up creating a second complex layer of interfaces on top of the first, adding to your annual costs.
-
You can’t easily scope or precisely define your architectural transition plan. Because you can’t map existing interfaces, you have to allocate time and dollars to every individual project to evaluate and plan for interface consolidation and replacement (and hope that your schedule and budget estimates were correct).
-
Your transformation projects are inhibited by the effort to clean up the pieces of your legacy world. Because interfaces aren’t specified and mapped, or initial metadata is not kept up to date, the project team’s delivery schedule is often impacted by necessary but unplanned work. This can cause complaints from the business lines and senior executives as the unexpected costs and delays mount up.
-
I could add to this litany of woes, which are constant underlying issues in virtually every organization, but I’ll close with a real-life example. A large Federal organization was engaged in a massive Services Oriented Architecture transformation. Once hardware, infrastructure, and middleware services were successfully implemented, the next phase involved design and implementation of 75 business and data provisioning services. For one key data provisioning service, master data about contracts, the systems integrator analyzed many legacy applications to determine the authoritative data source, and found:
-
There was no single source that contained the desired data set – no repository
-
Of the many sources then evaluated to ‘piece together’ the required data elements, the tables and columns were all represented by different names
-
There was no reliable metadata available for any of the data sources – no interface documentation or integration data models.
The project team had to halt the project, and engaged in an unplanned, effort-intensive manual analysis of hundreds of paper contracts, spending significant time with business experts to validate the intended data set. Then they designed and developed a complex script to pull data from multiple sources, cleanse and integrate it in a staging area, and finally render it into an XML document for orchestration.
The problem of interface management falls into the category of vitally important – but not urgent. So how does an organization chart a path to bring order to this chaos, and how would that empower the Enterprise Data Architect? Let’s start with the diagram below, illustrating a reasonable path based on where most organizations find themselves.
Head in the Sand – not wanting to accept that the organizational burden of legacy interfaces is as serious as it is—addressed above.
We Get It—Where to Start? – Organizations typically adopt a just-in-time approach. For example, if the organization wants to implement a Customer Master Data hub, it needs to identify the existing data stores and interfaces that are currently capturing and managing customer data, within the context of the approved scope. The operations and maintenance project teams responsible for the data sources need to provide information, e.g., an interface control specification or similar document. The data architect team, in close collaboration with the owners and stewards of the relevant data stores, defines the current sources and selects the most accurate and timely data from each. The team can then finalize the design of the MDM data store and corresponding supplier-to-hub services.
The approach described works for this example. The customer data becomes organized, metadata is generated for future use, and the effort also provides benefits to other data management activities, such as expanding the Business Glossary, enhanced metadata, quality rules, consuming components of the target architecture, enhancing the enterprise data model, etc. However, if the interfaces were documented through this project, the design frequently takes longer and requires more time and effort from the business customers. Rinse and repeat—the same additional effort, the same higher costs for every similar project.
Natural Events and a Strategy – Leveraging natural events, as described above, is the usual approach. This can yield increased benefits by intentionally introducing proactive planning, which extends the reuse potential of these ad-hoc efforts.
To gain efficiency, save costs, and bring the data layer into order more quickly, it is advised to develop a strategy for managing interfaces, and set phased implementation milestones over time. In creating this strategy, a couple of key points are useful:
-
Outline the extent of the complexity, e.g., do you know how many interfaces you are currently maintaining? Does each major data store owner know the incoming and outgoing interfaces, where the data comes from and where it goes? How much of this information is captured and updated in project documentation and the metadata repository?
-
Orient the strategy aligned with the transition plan to the target architecture to identify high-yield natural events, for example: the organization has committed to full identification and specification of privacy data by Year 2; redesign of the EDW is planned for Year 3; what is the list of the interfaces that need to be understood and documented by then?
-
Determine how the organization can improve its current interface management processes. This is the pre-work that both slows down the complex-ification and enables step by step achievement of a rational, well-orchestrated transition to the target state.
The strategy should address, at a minimum:
-
Definition of organizational priorities. These will still largely be dictated by natural events, e.g., redesign of an EDW is a significant opportunity for progress.
-
A policy to require, over time, interface specifications for all business application data stores. The timing can be brokered with each major data store owner, e.g., required for the next major release. If an application is expected to have a useful life for several years, this should be required. Implementing this mandate requires an augmentation of the work for the operations and maintenance teams, which should be accounted for in project plans and budgets.
-
Determination of where this metadata data will be stored. For instance, if the first destination is the project library, can links be provided to a central location? Should the organization create a data asset repository? How will it be accessed by business experts and project teams?
-
A sequence plan, with a mapping to upcoming development projects and maintenance projects for key data stores. The bigger the planned effort, the earlier the work should start in the project schedule.
-
Key tasks to accelerate and standardize the organization’s management of its interfaces:
-
Development of a standard template for interface specification—including table and column names, corresponding business terms, definitions, allowed values, data types, lengths, and if used, XML representations. The content is more important than the method—what the source data is, and how it is transformed or rendered (e.g., business rules for aggregations within a view).
-
Development and implementation of a standard interface management process with a process model, roles, and responsibilities—e.g., once documented and baselined, what are the criteria for updating interface metadata?
-
Development of a data services registry, populated by the owners/custodians of supplying data sources, with a description and drill-down—clearly identifying the data provided, its timeliness, etc.
-
Step by Step Gains in Control – Imagine that the organization has accomplished all of the work described above in two years. It has developed an interface management strategy and a policy, defined processes, established roles, captured and stored the resulting metadata in a standard manner, and mapped the long-term effort to the target project being launched in a sequence plan with milestones. Project teams are delivering their specifications by adding tasks to planned major releases, when it requires less effort to revisit the interfaces.
Now Year 3 is here, and our example project, the EDW Redesign, is being launched. In addition to new ETL scripts for bulk movement, it requires consumption of existing data services and to data from many point-to-point operational applications. The project manager knows in advance which interfaces have not yet been specified, and where the project requires tasking to update the metadata. The business owner has increased confidence in the design level of effort estimates.
The design team has worked with the business owner(s) to define the scope, and is evaluating well-structured, up to date specifications, provided by data store project teams, for precise information about the data—where it originates and exactly what is contained in the interface, including business rules applied for calculations or aggregations. They can easily determine where redundancies reside, and provide the data governance representatives with source checklists to speed up their assessments of data quality, precedence, timeliness, etc. The data architect can quickly learn which data elements need to be analyzed for possible consolidation or decomposition; this frees up time and creative space for the creation of optimal structures.
With some attention, proactive preparation, and allotting manageable chunks of effort, an organization can succeed in streamlining design efforts, providing higher quality data, accelerating the realization of the target architecture, and provide a sound foundation for governance approvals, minimizing the burden on the business as it develops better and faster solutions—a big step in attaining Data Agility.