Data Speaks for Itself: Fly-By-Wire Data Management

For several years, I have been an advocate for the concept of positive data control or PDC. It is the idea of placing a layer, or a firewall if you will, around your data assets, and requiring all requests to manage (access, move, transform) the data inside to be mediated by this control layer. To help understand this concept, I have tried to use a few analogies in recent years. In one of the early articles in the column called “Waldo Where are You?” I tried using a comparison to the Waldo machine developed to handle radioactive materials in the 1940’s. However, this example was obviously directed at the wrong generation. Another was a comparison to the enterprise network firewall, an intelligent layer that controls what network connections can be made from both inside and outside. Yet another example was to think of PDC as an extension of self-service analytics where the control layer (user interface) only allows users to access specially prepared data products using a limited, but configurable, set of operations. But in the case of PDC, the interface would control access to all enterprise data assets and tools. 

At the inaugural South African Chief Data Officer and Information Quality Symposium held at the University of Pretoria last year, I gave a presentation that included the PDC concept and some of the recent PDC research. I have been excited that the new generative AI models have the potential to make PDC a reality in the near future. When I returned to speak at the conference again this year, I was approached by Howard Diesel, the President of DAMA in South Africa. Howard, who had heard my talk the previous year, had been thinking about PDC, and suggested to me a much better PDC analogy.  

It goes like this. Airline pilots use the expression “fly-by-wire” to describe a system that replaces mechanical control linkage with electronic signals processed by computers. In smaller aircraft, the pilot’s yoke (control column) and foot pedals are directly connected by cables to move these control surfaces, but in modern jet aircraft, this is no longer the case. Instead, the pilot’s (and co-pilot’s) yoke and foot pedals only generate digital signals that go to a flight control computer which in turn, sends digital signals to mechanical actuators that cause the surfaces to move.  

Fly-by-wire has an obvious advantage of relieving the pilot of having to provide the physical force necessary to move these large surfaces while flying at high speeds. But perhaps more importantly, the flight control computer can be programmed to prevent the pilot from making potentially dangerous maneuvers that could endanger the safety of the aircraft such as stalling or overloading the airframe. In other words, the flight control computer provides a level of automated governance over the flight operations. 

I think you can see where this is going. Just like the evolution of aviation, modern data management systems have become larger and more complex. While data governance is recognized as a necessary control over data management decisions, it is largely dependent on human goodwill and oversight for compliance. Just like the fly-by-wire flight control computer, a PDC control layer inserts automation between the system operators and data users to ensure that the operations they request are performed correctly, efficiently, and in compliance with governance policies. In same way that a pilot using fly-by-wire only makes a request for a control surface movement to the flight control computer, a PDC system user only makes a request for a data operation to the PDC control layer. In both cases, the request is automatically reviewed, and if deemed compliant with governance policies, the control layer performs the action on behalf of the user. 

While fly-by-wire is a good analogy to explain the concept of PDC, it is not perfect. The analog movements of the pilot’s yoke and foot pedals can easily be converted into digital signals to the actuators. In the case of PDC, a user’s natural language request would have to be understood semantically and logically, then converted into one or more computer commands, or perhaps code, necessary to carry out the requested action. In terms of governance, the unsafe flight configurations for an aircraft are well understood and can be pre-programmed as rules into the flight control computer. In the case of PDC, not all user requests can be anticipated. Instead, they must be interpreted in real time, then logically compared to the governance policy for approval. 

Large language models (LLMs) have opened the door to the practical implementation of PDC for data management through their ability to understand natural language policies and their ability to carry out requests expressed in natural language. Ensuring that data processing requests comply with data governance policies and then carrying out valid requests represent two of the three major components of PDC. The third component is recording and reporting. Just as all pilot actions are captured by the cockpit flight recorders, all PDC user requests must go through the PDC control layer where they are recorded. This makes all user requests and resulting actions visible so there can no longer be unrecorded actions either unintentional or intentional. It would be a chief data officer’s dream to have a real-time dashboard of all data processing activities underway at any moment. Not to mention a complete archive of data processing requests that can be analyzed to find idle systems, unused data, processing bottlenecks, and other issues that could be remediated and lead to more efficient and effective data management. 

Some have raised the issue of LLM mistakes (hallucinations), but these can be mitigated. I would respond that humans also make mistakes, often do not understand and follow policy, and many of their actions are not recorded or buried deep in obscure log files. Also, PDC doesn’t have to be consumed all at once. It is possible to build PDC “islands” around specific datasets. These PDC islands can be enlarged or connected to other islands incrementally. Research and development into PDC is ongoing, and there are still many problems to solve, but there is no doubt in my mind that it is coming. 

Share this post

Dr. John Talburt

Dr. John Talburt

Dr. John Talburt is Professor of Information Science and Acxiom Chair of Information Quality at the University of Arkansas at Little Rock (UALR) where he serves as the Coordinator for the Information Quality Graduate Program.  He also holds appointments as Executive Director of the UALR Laboratory for Advanced Research in Entity Resolution and Information Quality, Associate Director of the Acxiom Laboratory for Applied Research, and Co-Director of the MIT Information Quality Program’s Working Group on Customer-Centric Information Quality Management.  His current research is at the intersection of information quality and information integration, particularly the areas of entity resolution and entity identification.  Prior to his appointment at UALR he was a leader for research and development and product innovation at Acxiom Corporation.  Professor Talburt is an inventor for several patents related to customer data integration, the author for numerous articles on information quality and entity resolution, and the winner of the 2008 DAMA International Academic Award.  He can be reached at (501) 371-7616 or by email at jrtalburt@ualr.edu.

scroll to top