Introduction
This is the third and final part in this series of articles on ANSI SQL’s inherent hierarchical processing capability and its support of transparent native XML integration. Part 1, Using XML to Link Below the Root, described how SQL can naturally integrate native XML at a full hierarchical processing level utilizing
the Left Outer Join operation to inherently model and process hierarchical data structures. Part 2, Processing XML’s
Complex Hierarchical Structures, demonstrated how standard SQL can naturally support advanced XML hierarchical operations such as logical network structures, variable structures, and structural
transformations.
This third and final article will address issues of efficiency and ease of use utilizing SQL hierarchical views and how they come together to form an extremely powerful combination. The synergy
created from this combination delivers a totally seamless, hierarchically optimized unified heterogeneous virtual view defined entirely in standard SQL. This assures an accurate, consistent, and
efficient ANSI SQL hierarchical operation across the entire heterogeneous view.
Figure 1 (below) demonstrates the processing of a heterogeneous hierarchical ANSI SQL query. The query involves the dynamic hierarchical joining and processing of a physical XML hierarchical
structure and a logical hierarchically modeled relational structure. It will be used to demonstrate the SQL hierarchical capabilities described in this article necessary for ANSI SQL seamless
native XML integration.
Hierarchical Views
As shown at the top of Figure 1, the left outer joins that model and define the XML documents and the relational hierarchical structures can be defined separately in their own SQL views. An
important concept here is that these outer joins are also defining the hierarchical structure metadata and do not necessarily have to represent or perform join operations as in the case of XML
physical structures. In all cases, they still represent their structure and data portion in the query’s virtual view shown on the right middle side of Figure 1. The ability to utilize this
hierarchical metadata raises these SQL views to a higher more abstract level allowing significantly more freedom in how they are used to perform their ANSI SQL function.
Hierarchical SQL views can be embedded in other hierarchical views or joined dynamically on the invoking SQL statement to hierarchical create larger views. The latter allows for ad hoc data
modeling or joining of hierarchical structures at query invocation. This is demonstrated in Figure 1 starting at the top down where the XML view is joined over the RDB view which follows the
semantics of the left join operation with its hierarchical data preservation. The ON clause is used to specify the exact join points between structures.
Figure 1 demonstrates a typical hierarchical joining of hierarchical structures where the lower level structure is joined by its root node in a standard manner, keeping the newly formed structure
and its hierarchical semantics intuitive. This does not have to be case, Part 1 used an example where the lower level structure was joined under its root node. This powerful view combining
capability still produces a valid hierarchical result. It significantly increases the hierarchical view’s flexibility and transparency, and establishes SQL’s left outer join syntax as
an advanced hierarchical data modeling language.
Unified Heterogeneous Virtual View
The natural view expansion of the invoking SQL query shown in the middle of Figure 1 demonstrates how the heterogeneous view types automatically merge together to form a single unified SQL view.
Since this unified view also includes XML data which may be remotely located, it is also heterogeneous and virtual. This enables a consistent SQL operation across heterogeneous structure types.
Also of interest is that the expanded views are automatically nested. This nesting is controlled by the delayed matching ON clause placement caused by expanding embedded views inserting additional
joins and their matching ON clauses. Since outer joins are not performed until their matching ON clause is processed, this causes SQL views to be fully materialized before being joined. This occurs
in the example in Figure 1 where the lower level RDB view is materialized fully before being joined to the previously materialized XML view. This operation, like the hierarchical operations already
covered, is naturally present in the ANSI SQL standard and occurs automatically.
While the nesting of views is not necessary for the query in Figure 1, there are many situations where this automatic operation becomes necessary as in the linking of the lower level structure
below its root described in Part 1. It also protects other participating views from possible side effects caused by a view materialization because the nesting process causes the current view being
materialized to be isolated in its own working set. This view nesting operation demonstrates that the left outer join is a powerful data modeling language for logical and physical structures and
possesses many recombinant SQL capabilities.
For distributed and heterogeneous processing there is an inverse operation to the recombinant SQL capability. When the virtual view is broken up and sent as separate SQL sections to their remote
locations, the left outer join query statements that represent these different substructures also inherently carry their hierarchical structure metadata. This automatically enables the distributed
processing to be naturally carried out hierarchically in autonomous pieces. It will be described later how the hierarchical processing can enable the distributed processing to perform in a linear
fashion.
It is worth mentioning that the left outer join hierarchical data modeling does produce less logic and coding errors. This is because it is more formal in nature and adheres to sound hierarchical
principles. This is very similar to the advantages gained by structured programming. These characteristics also enable powerful semantic optimizations that are described in the next section.
Optimization
Earlier in this article, it was mentioned that the data modeling outer join views can be used as metadata that defines the hierarchical structure rather than a more physical (code) definition. This
allows both the hierarchical definition of physical structures like XML and the logical data modeling of relational structures. This also allows the hierarchical optimization of physical and
logical structures by pruning the unneeded legs of the hierarchical data structure to avoid physical access navigation or logical join operations. This is possible by utilizing the hierarchical
semantics of the metadata. Basically, only referenced nodes and nodes on the path to referenced nodes need to be accessed and processed. Referenced nodes are determined at runtime, so that
every request is optimally optimized based on the data requested which is easily added or deleted from the SQL SELECT list at execution.
Notice in Figure 1, the Relational Working Set located along the lower left side and the corresponding Hierarchical Working Data along the right side, do not contain the M or B nodes specified in
their respective views. These nodes have been identified in the Unified SQL View by their dashed box and dashed connection lines indicating that they are not referenced or on a path to a referenced
node. This is shown halfway down the right side of Figure 1. This optimization not only reduces data access, but significantly decreases Cartesian product data explosions which cause
replicated data and their required additional processing. This has been a significant efficiency problem with XML integration in SQL.
Standard inner join views can not utilize this powerful hierarchical optimization. With inner joins, all tables (or nodes) must be accessed to detect missing data (dangling tuples) in order to
remove their incomplete rows from the result. This is not the case with left outer joins that hierarchically preserve rows with missing data. One reason that most SQL databases do not support this
optimization is because they do not recognize hierarchical structures. Another reason is that the ANSI SQL specification defines the outer join operation in terms of the inner join causing this
optimization not to be considered.
Use of this powerful hierarchical optimization is easily triggered when recognizing the hierarchical virtual structure, and can be applied simply as a pre-step to the normal inner join optimization
which is still applicable. This also makes these hierarchical outer join views even more powerful and user friendly than the older inner join views because a single large hierarchical view can do
the job of many tailored inner join views.
Normally views comprised of code would be very difficult to optimize globally, code is usually optimized in chunks, but this is not so for hierarchical outer join views with their hierarchical
structure metadata. Their Unified SQL Virtual View defined by the expanded outer join views, shown midway down the right side of Figure 1, can be globally optimized utilizing its hierarchical
structure metadata. This total control over the virtual view enables the transparent replacement of the relational Cartesian product engine with a hierarchical engine. This will completely
eliminate the Cartesian product data replication efficiency problems discussed earlier and enable a linear processing.
The use of a hierarchical engine is depicted near the bottom of Figure 1 where the left side represents relational processing and the right side demonstrates its corresponding hierarchical
processing for the same query. Note that in hierarchical processing, the XML data access routine at the bottom center can supply the data directly in hierarchical data form. This complete
hierarchical processing can be extremely efficient and can approach linear processing because there is no replicated data and the hierarchical joins are now performed by simple linking operations.
Hierarchical View and Optimization Synergy
Combining the hierarchical SQL view capability with the hierarchical optimization capabilities significantly enhances and extends ANSI SQL’s hierarchical processing capabilities. By
optimizing views, larger more global views can be utilized without increasing overhead, this cuts down on the number of hierarchical views needed. This increases reuse, view abstraction, and
ease of use. More significantly, this also allows a predefined view to automatically process any single or multi-leg query under its global view control. This allows fully optimized general
purpose views and flexible ad hoc queries unique to SQL for hierarchical processing and native XML integration.
Conclusion
The first two articles in this series demonstrated that ANSI SQL can naturally support basic through advanced native XML hierarchical integration nonprocedurally. This third article has shown how
the natural ANSI SQL support of hierarchical views and additional hierarchical optimization have supplied the ease of use and efficiency that makes this SQL native XML integration world class. In
addition, the left outer join hierarchical metadata was shown to provide significant flexibility in how the hierarchical processing is carried out internally. This allows the hierarchical
processing performed by the Cartesian product relational engine to be transparently replaced by a hierarchical engine which is considerably more suited for hierarchical processing.
There are two extremely powerful leverage points that ANSI SQL can capitalize on for SQL native XML integration. This is SQL’s inherent hierarchical processing capability and through this
capability the automatic utilization of the vast amount of hierarchical semantics naturally in the data structure being accessed. This allows for a full nonprocedural operation where the
hierarchical semantics naturally available are utilized automatically to supply the correct hierarchical semantic processing. Because of this automatic operation, the user does not have to specify
how the data is to be processed hierarchically. The user just specifies what hierarchical results are needed. Because this nonprocedural processing can automatically process hierarchical queries
involving extremely complex hierarchical logic utilizing the data’s own semantics, it increases the value of the data by making this advanced processing more practical and available.
The nonprocedural capabilities described above are not being fully utilized today by XML query products. This is evident by their required procedural programming of the complex hierarchical
semantics needed to process the query. For further information on the SQL hierarchical processing and native XML integration capabilities described in this article please refer to: www.adatinc.com.
Recap of unique SQL Native XML integration capabilities:
* Full Nonprocedural Multi-leg Hierarchical Processing
Efficient and accurate hierarchical processing
Advanced XML processing naturally supported
Uses hierarchical semantics to increase the value of data
* Uses Only ANSI SQL Centric Syntax
XML integration is transparent and complete
All results are ANSI SQL accurate
Fully dynamic operation supports ad hoc queries
* Hierarchical Unified SQL Virtual Views
Ease of use, user does not need to know the data structure
Seamless heterogeneous processing
Single global view works for all SQL queries under its control
Copyright 2004 by Advanced Data Access Technologies, Inc.