The Ghost in the Machine

Three decades ago, during the reign of hierarchical databases, full hierarchical data processing was commonplace. Hierarchical query languages were supporting the most complex and powerful multipath
queries that could be thrown at hierarchical databases. This ended with the advent of relational databases. Now that hierarchical data structures are popular again because of XML, their full
hierarchical processing is still being limited to flat two dimensional linear path processing by relational processing. This will change when database professionals realize that ANSI SQL relational
processing can now support full multipath nonlinear hierarchical processing. The physical or scientific proof that this technology exists and works can be difficult to find since relational
processing was not designed to support full hierarchical processing and could be hiding in plain site or buried deep in the relational engine. This article will find it and expose it.


Relational Hierarchical Processing is Possible

Most database professionals still do not know or believe that ANSI SQL can now inherently perform hierarchical data processing. I still see this belief and supposed fact being mentioned often as the
reason for the impedance mismatch problem preventing full relational/XML seamless integration. The fact is ANSI SQL can perform full hierarchical processing directly out of the box today. This makes
relational data and XML data directly compatible at a full hierarchical level. More amazingly, this is a full multipath navigationless hierarchical processing (schema-free query) that far exceeds
today’s commonly limited single path linear processing. This processing can bring back powerful full multipath nonprocedural query processing again, this time naturally in ANSI SQL.

This advanced ANSI SQL hierarchical processing also allows nontechnical users to once again specify any multipath query without having to know or specify the data structure. This increases the number
of queries possible many times over, and the additional semantics natural existing between the hierarchical pathways dynamically increases the value of the database data as needed.

There is good reason for skepticism about ANSI SQL’s inherent full multipath hierarchical processing. After all, how is it possible that this powerful capability, very advanced beyond
today’s standards, suddenly appeared in SQL with no design or additional code added for its support? The required hierarchical processing logic required to support this capability certainly
seems too powerful, seamless, and well thought out for it to appear spontaneously.


Hierarchical Data Modeling and Data Preservation

Full hierarchical data modeling and hierarchical data preservation was introduced unknowingly in the SQL-92 standard with the introduction of the Outer Join operation, specifically the Left Outer
Join (or Left Join), which preserves unmatched rows on the left side of the operation. This preserves the data rows on the left side of the operation if there is no matching row on the right side.
This results in a powerful basic hierarchical operation of modeling the left side data over the right side data. 

Additionally with the Left Join operation, another very important addition was made that completed the hierarchical modeling processing capability. This is the ON clause which replaces the WHERE
clause for joining tables based on matching relationships and is specified locally at each specific join point affecting only the join point where it is used. The WHERE clause is still necessary to
specify global data filtering that affects the entire structure. In hierarchical terminology, the ON clause specifies the hierarchical node linkage. This gives the Left Join and the ON clause
combination extremely precise and flexible control to model and define full multipath hierarchical structures and process them hierarchically. The Left Join/ON clause combination supplies the
hierarchical data modeling syntax and its associated hierarchical data processing semantics for controlling the hierarchical data processing and preservation principles.

Figure 1 is a hierarchical view that uses a Left Join sequence to define the hierarchical structure shown in Figure 2. The entire hierarchical structure defined by the ANSI SQL standard hierarchical
view will operate fully hierarchically automatically when processed by SQL.  A close examination of the ON clause use in Figure1 will show how the pathways are extended and how new pathways are
created to model the hierarchical structure shown in Figure 2.


Lowest Common Ancestor (LCA) Logic
The hierarchical data processing described so far is quite surprising to have occurred with no hierarchical processing design or forethought behind it. But this is only the beginning. Full
hierarchical processing also contains and requires advanced capabilities that occur across multiple pathways supported automatically by semantics that naturally occur across the pathways. These
different capabilities are used in two ways.  One way is for selecting data from one pathway based on data in another pathway. The other way is needed to process compound WHERE data filtering
logic that references different pathways. Each of these reasons uses the Lowest Common Ancestor (LCA) node between its pathways to control the range of operation under it to assure the result is
meaningful for the hierarchical query.

The locating of the LCA node and the controlled range of the processing of the pathways under the LCA node was originally performed in physical hierarchical structures by hierarchical tree walking
requiring many navigational steps. The finding of the LCA node required an algorithm and much research went into locating it efficiently. Compounding this processing is that LCA nodes can be nested
depending on the number of different pathways requiring accessing in a single query.

The selection of qualified data under an LCA from another pathway is limited to only the data occurrences under the LCA node data occurrence. When the WHERE clause is performing compound decision
logic, all of the combinations of the data in multiple paths being tested for a single match is limited to the data under the LCA node occurrence. In both of these LCA uses, the LCA node data
occurrence limits the processing to keep the data meaningfully related. This LCA technology has not made it into commercial database software today except for its natural occurrence in ANSI
SQL.

When relating two pieces of data in a hierarchical structure, the lower in the hierarchy they are related, the stronger the relationships are. For this reason, by default unless overridden, the
convention is to always use the Lowest Common Ancestor node as the highest level to restrict the testing range to. This type of hierarchical association measurement is also known as hierarchical
proximity.


Examples of Determining LCA
This section has additional examples of the two types of LCA usage using Figure 2 above. The examples use the ViewX view defined in Figure 1 and shown in Figure 2.  Starting with a data
selection example: SELECT E FROM ViewX WHERE C=2. The LCA in this case is node A found by following nodes C and E up to their Lowest Common Ancestor.
Let’s look at a WHERE qualification logic LCA as in: SELECT A FROM ViewX WHERE F=2 AND G=4. In this case the LCA is node E.

What happens if we combine the last two types of LCA use as in: SELECT C FROM ViewX WHERE F=2 AND G=4. In this example, the LCA’s are nested. Starting
inside out, the WHERE qualification logic has determined node E is the LCA. Continuing outward, the selecting data logic has chosen node A as the LCA using node C and the nested LCA node E in its LCA
determination.

Both of these LCA uses can generate nested or multiple LCA by themselves.  For example, in the case of data selection: SELECT D, G FROM ViewX WHERE
C=2
.  Node combinations D and C produce the LCA node B, and node combinations C and G produce the LCA node A. Similarly, WHERE C=2 AND D=4 OR G=8,
would also produce nested LCA nodes of B and A. Nodes C and D produce LCA node B. Then LCA node B and node G produce LCA node A.

It is worth mentioning that the OR conditional processing in the WHERE operation directly above can dynamically affect which LCAs are used during processing depending on which side of the OR is true.
This also means that both sides of the OR operation always needs testing and operating on depending on the result even if the first OR condition is true. This is necessary in hierarchical processing
logic and ANSI SQL performs this operation correctly in its Cartesian processing. For a more detailed look at this OR conditional hierarchical processing in SQL, see my blog entry. www.adatinc.com/blog1/?p=19

The above examples demonstrate the logic required to process multipath hierarchical processing. You can also see why procedural navigation would be impractical and error prone performing these
complex operations. Academic projects have been attempting to add LCA functionality to XQuery using externally supplied functions. This will have significant complexity limitations and require
knowledge of how to use with each different query requiring a different LCA programming.


How LCA Logic Works in Relational Processing
The above complex LCA logic being performed in relational processing is hard to explain when it was not designed into SQL and because the LCA concept is totally foreign to relational processing.
Another problem is that while the LCA logic was originally processed by tree walking on physical hierarchical structures, relational processing is basically processed a row at a time. The solution to
the relational processing of the LCA logic can be found in the relational Cartesian product internal operation which is the brain of relational processing. This has also made the current problem of
locating LCAs efficiently during processing a non issue.

The relational Cartesian product operates by generating all combinations of data around the join points. With hierarchical relationship modeling using the Left join, these Cartesian product join
points represent the LCA nodes. The combination of the data in Cartesian products causes replicated data to be generated. This properly generated data with its replicated data allows the two
previously discussed types of LCA logic processing to be naturally simulated a row at a time in standard ANSI SQL processing when it is processing hierarchical structures modeled by the Left Join
operation. Complex LCA nesting is automatically handled correctly by the algorithm used by the Cartesian product.


The Ghost in the Machine
Multipath hierarchical query processing (known academically as LCA Query or Schema-free Query) which is inherent in ANSI SQL can be isolated and explained logically. The fact that this
internally complex and powerful hierarchical operation does work automatically and fully in ANSI SQL without any design or forethought being put into its operation is still quite amazing.

Multidimensional hierarchical structure processing is more complex than two-dimensional flat relational data processing. But what accounts for the additional ability to perform this more complex
hierarchical logic in standard SQL? It is inherent in the Cartesian product operation. This totally unexpected hierarchical LCA processing is “the ghost in the machine.” But the LCA
operation has now been located and its processing explained in this article and can now be relied upon to be hierarchically accurate. This also proves that hierarchical processing is a valid subset
of relational processing opening the door to the next generation of advanced hierarchical XML processing.


Conclusion
In the same way it is hard to visualize a multidimensional hierarchical processing occurring in two dimensional relational SQL, it is also difficult to see the hierarchical result in SQL’s
two-dimensional result set. But it is reassuring to know that the result is not just relationally accurate, it is also hierarchically correct. This makes sense because hierarchical processing is a
subset of relational processing that goes through a more restricted specialized hierarchical processing by limiting the join operations to hierarchical Left Outer Joins.

The hierarchical structure has not been lost in the two-dimensional relational rowset. It can be hierarchically mapped using the extracted structure metadata in the hierarchical Left Outer Join data
modeled SQL view to transform the data result into a physical hierarchical structure such as XML. This can be utilized in ANSI SQL to automatically support transparent XML hierarchical processing,
input and output. If you want to understand hierarchical processing principles and LCA processing more fully, access the article and interactive ANSI SQL Transparent XML Hierarchical Processor demo
below.

Navigationless Database XML: Hierarchical Data Processing article

ANSI SQL Transparent XML Hierarchical Processor demo

Share

submit to reddit

About Michael David

Michael is is the founder of Advanced Data Access Technologies, Inc. Previously, he was the lead XML architect for NCR/Teradata, and served as their representative to the ANSI SQLX Group. Before that he was a staff scientist for Teradata and designed high level multi-featured SQL utilities. From his earlier career, he has more than 25 years of experience researching and designing commercial nonprocedural heterogeneous database hierarchical query processing products using flat, relational and hierarchical data. From this experience, he authored the book Advanced ANSI SQL Data Modeling and Structure Processing, as well as numerous papers and articles on this subject. His research on hierarchical and relational systems and data integration has resulted in discoveries that led to the development of an ANSI SQL transparent XML hierarchical processor prototype that integrates and processes relational and XML data at a full multipath hierarchical level. This also proves that inherent hierarchical processing capability is possible in ANSI SQL.
Contact Mike at mike@adatinc.com, and read his blog at www.adatinc.com/blog1.

Top