Controlling Hierarchical Structure Processing From SQL

This is the third TDAN.com article in a series on using ANSI SQL’s inherent hierarchical processing to integrate and process relational and XML data at a full hierarchical processing level. The first article, The Ghost in the Machine – ANSI SQL Hierarchical Processing, introduced the basics of this full multipath hierarchical processing capability naturally existing in ANSI SQL. The second article, Creating More Value than is Captured with Hierarchical Structures Using SQL, presented the advantages of using and operating hierarchically on these structures.

This third article presents some of the more powerful and advanced hierarchical processing operations and capabilities possible with ANSI SQL hierarchical full multipath processing. These include heterogeneous structure integration, hierarchical structure data mashups, hierarchical processing optimization, global hierarchical views and queries, hierarchical data filtering, dynamic structure control, and dynamic structured XML output.

Example Data StructuresAs explained in the previous articles, ANSI SQL can model and process full hierarchical structures by modeling them using Left Outer Join operations which naturally defines the hierarchical data preservation and operational semantics. This includes multipath hierarchical structures too as demonstrated below in Figure 1. This capability was describe in the first article of the series. Two example hierarchical structures are defined below. The first is a logical relational hierarchical structure and the second is a physical XML hierarchical structure.

 Figure 1: Relational Hierarchical Structure

The relational hierarchical structure above in Figure 1 is modeled in SQL using the Left Outer Join operation in Example 1. In the above Relational hierarchical structure, the hierarchical nodes are relational tables. The series of Left Outer Joins below model hierarchical structures with their left side data preservation. Each join also specifies a combination of join points using their own ON clause. ON clauses have replaced the single WHERE clause for defining the joining of tables. Multiple pathways are possible by defining more than one pathway exiting from the same node as shown in the SQL in Example 1. The Relational view name is RDB.



Example 1:
Relational Logical View Definition

Figure 2 below shows the XML hierarchical structure that will also be used for examples in this article. The nodes in this case are XML elements. XML structures unlike relational hierarchical structures are not linked by relationship keys. They are usually related by a contiguous hierarchically nested structure.

 

 Figure 2: XML Hierarchical Structure

The same type of SQL hierarchical data modeling processing as performed for relational data in Example 1 is performed below in Example 2 to define the XML hierarchical structure in Figure 2. Since data relationships are not needed in XML contiguous structures, the ON clause uses only node references to specify the linkages. Because of this difference in specifying linkages between nodes, logical and physical structures are defined differently. This required adding a new view type named VIEWX in our ANSI SQL XML hierarchical processor to define physical XML views. In the VIEWX view definition below in Example 2, the view is named XML.

 Example 2: XML Physical View Definition

Heterogeneous Structure JoinsThe XML view in Figure 2 above, even though defining a native XML structured value, is still defined using ANSI SQL using Left Outer Joins that can be joined directly with relational SQL logical views. This is shown below in the defined view in Example 3 which joins the RDB view to the XML view. The contiguous physical XML view itself does not require being defined using specific ON clause join data points because it is already a single fixed structure. But, data points are still required by the ON clause enabling SQL to join the XML view to the RDB view shown in Example 3 below. This involves combining both of the Left Outer Join views to define the entire heterogeneous unified structure shown in Figure 3.


 
 Example 3: Heterogeneous Unified View Definition

The view in Figure 3 is a heterogeneous hierarchical structure that operates seamlessly. This is possible because logical and physical hierarchical structures both operate exactly the same by following the same hierarchical operational principles. This naturally integrates relational and SQL data at a hierarchical lossless level. The heterogeneous structure in Figure 3 is comprised of XML elements and relational tables. These both represent hierarchical nodes. The term node is how they both will be referred to in data structures from this point on. XML data is introduced into SQL processing when accessed and mapped into a relational rowsets. The rowsets are hierarchically mapped by their XML views which also use Left Outer Joins to map the structure. The heterogeneous view in Example 3 maps the entire combined structure into a single Left Outer Join string and allows the ANSI SQL transparent XML processor to operate hierarchical structure–aware. This meta structure information is used for hierarchical optimization and automatic structured XML output.

 Figure 3: Heterogeneous Data Structure

Hierarchical Processing OptimizationWith physical XML structures and relational logical structures, there is a powerful hierarchical processing optimization that is possible. This hierarchical processing optimization will also work for heterogeneous queries as shown in Figure 3. This optimization is based only on nodes that are referenced or on the path to referenced nodes as shown below in Figure 4. These are the only nodes that need to be accessed from the view in Figure 3. This optimization is based on hierarchical structure data preservation semantics and is applied to the hierarchical structure of the input data by the ANSI SQL transparent XML hierarchical processor.
 

 

 Figure 4: Internal Structure After Optimization

The query in Figure 4 only references data from nodes B and M. Node R and X are still necessary in order to access nodes B and M. But nodes D and X were not referenced and are not needed for processing the query. This optimization does not affect the query XML result in any way.

Global Hierarchical Views The input heterogeneous view used in Figure 4 is a global view because of its hierarchical optimization. This means that this global view can define an entire structure and the query can access any portion of the structure without introducing any unnecessary overhead. This is true of the SQL query used in Figure 4. This means that fewer views are needed and the user does not need to know the structure of the view. This also allows any query possible for the global view to be used. This makes SQL querying easier to use and more powerful.

Dynamic Structure ControlToday with XML processing using the SQL/XML standard and XQuery, the standard XML operation is static and not dynamic. If a dynamic operation is needed, it has to be built into the application for its specific dynamic use. It is not surprising then that the output capability for these XML processors is also designed for static output formats. This is not surprising because there was no reason in the early days of XML support to need dynamic support. Today this is changing, but is happening slowly because there is little or no automatic XML support for this dynamic processing. Dynamic on-demand XML publishing is becoming popular in many industries to produce tailored up-to-date documents from enterprise-wide structured data content with higher quality information, reduced publishing costs, faster times to market, etc.

ANSI SQL’s hierarchical processing naturally includes dynamic processing. Making a change or addition to a SQL/XML standard or XQuery application requires going into the code to make a change and then it should be tested. XQuery designers and book authors claim its design has preserved SQL’s FROM, WHERE, and SELECT capabilities. This is not true for SQL’s SELECT list operation which can Add, Change, or Remove output data types from the query automatically. The output requirements are the driving force in an application for what processing is necessary. The SELECT list dynamically controls this processing. This is made more powerful by the use of global views, described above, used along with SQL’s dynamic SELECT List.

Dynamic Output ProcessingThe SQL example in Figure 5 below is a good example of dynamic processing and output. Using a global view allows any query to be specified that is contained in the global views domain. The hierarchical optimization which removes all unnecessary portions of the global view from the active query keeps the processing down to the required minimum. This means each specific query’s processing is dynamically determined and tailored to its required processing. Three requirements are necessary for this advanced processing. These are: dynamic SELECT list type capability, dynamic hierarchical optimization which enables global views, and dynamic hierarchical structure-awareness of the current structure being processed. XQuery does not support any of these capabilities.

 Figure 5: Dynamic Output Structure

In the SQL example in Figure 5, only data from the B, M and L nodes have been selected for output. The optimization has limited the processing to only the portions of the view structure in Figure 3 needed to produce this output. This output result is actually a fragment of the original view structure. The root was not selected for output and is not included in the result. Notice that this did not have a deleting effect on the structure, but rather a slicing out effect. This can also be seen with the slicing out of the unselected X node. This effect is known as node promotion in hierarchical processing.

Dynamic Node Promotion and CollectionNode promotion is where lower level selected nodes move up and around the unselected output node or nodes. If more than one lower level node moves up and around to a higher level node from more than one path as shown in Figure 5 for the X and L nodes, this condition is called node collection a form of node promotion. Node promotion produces a very dynamic controlled output by the SELECT list’s dynamic output data selection. In addition, this output structure was composed of portions of two structures that were dynamically combined using the Left Outer Join causing a new structure to be dynamically created and this also adds to the dynamic processing and output capability.

The question might arise, how to explain this node promotion in SQL to XML mapping as natural. It happens like most hierarchical processing performed in SQL, it is natural because hierarchical processing is a valid subset of hierarchical processing. In this node promotion example, the SQL SELECT operation known relationally as projection slices out data that is not selected for output from the relational result set. This exactly models node promotion in SQL hierarchical processing and is an intuitive operation. If no data items from a node are output, the empty node is removed and not output. This is the default operation and can be overridden so that empty nodes are output. This override helps with data navigation for applications expecting a fixed structure, this preserves the original structure.

Global Hierarchical QueriesGlobal hierarchical views as shown in Figure 3 with their optimization in Figure 4 can also support Global Queries. This is where the entire global query is selected for output using a SELECT ALL or SELECT * and hierarchical filtering can be applied to the entire view structure which will be output. This allows entire structures to be quickly filtered. It is important to remember from the previous articles in this series that data filtering in hierarchical structues is hierarchical. This means that filtering data on one pathway can filter data on all other pathways. This is why Global Queries are so powerful and useful when filtering. There is also no limit to the complexity of the filtering criteria. ANSI SQL’s natural hierarchical processing can automatically handle this global hierarchical filtering. This can not be practically done today with XML’s navigational processing because of its complexity. SQL’s navigationless processing avoids user navigation problems and overhead.

Data Structure MashupsOnce our interactive ANSI SQL transparent XML hierarchical processor prototype was up and running, it was possible to experiment with pushing the limits of SQL hierarchical processing. One of the areas experimented with was relaxing the rules for joining hierarchical structures. The current believed understanding for joining or combining hierarchical structures is that the lower level structure has to be joined to its root; otherwise the semantics are not obvious. This has been an open issue for a long time. Figure 3 demonstrated the standard hierarchical structure joining capability limiting the join to linking only to the root of the lower level structure. The semantics and structure of the combined data structure go together and are intuitive.

Since SQL had been naturally operating correctly performing hierarchical structure processing and structure joining following the standard hierarchical rules, it was investigated how SQL would handle joining structures where the lower level structure was linked to at a node below its root. This structure performing the join is show below in Figure 6. Also notice that this join of structures is performed dynamically and not previously stored in a view.

 Figure 6: Heterogeneous Dynamic Mashup Structure

There is no question whether the query in Figure 6 would produce results. The question is whether the results are meaningful, predictable and determining what the semantics are. In other words, what are the results showing us?

After experimenting with this unrestricted structure joining capability, it was determined that the results were always meaningful as performed by ANSI SQL. More surprisingly, the semantics were stable and more straightforward than expected. It turns out that regardless of where in the lower level structure is linked to, its resulting hierarchical structure are always as if it was linked directly to its root. This would be the case for the structure in Figure 6 making its structure the same as the structure in Figure 3.

This mashup operation will work for both XML physical structures and SQL logical structures. The reason this works for hierarchical physical structures is that the structure is already constructed in one hierarchical piece and is unaffected by where it is linked. This also works for SQL logical structures because the lower level structure view is fully expanded before it is linked to the higher level structure, making it act as a physical structure. This occurs because of how the left out join views are nested. In the SQL in Figure 6 each view in the SQL join statement has its own embedded ON clauses and the visible ON clause for the joining of the views is the most outer ON clause causing both views to be expanded before joining the expanded view.

Since the lower level structure is already fixed into a hierarchically related form when joined, this means the only combined hierarchically modeled representation is to use the root node of the lower level structure as the data structure link point. This is because it represents the focal point of the lower level fixed structure. This does not conflict with the lower level’s already defined hierarchical structure. To do otherwise would change the lower level structure and semantics which would not properly reflect the hierarchical processing result.

Additional Mashup AdvantageThere is also an additional operational advantage besides the flexibility to be able to link below the root anywhere in the lower structure. This is the capability to apply the ON clause data filtering to the lower level link point. This is also very intuitive and expected. This filtering location is shown in Figure 6 and does alter the semantics by performing data filtering at this lower level link point. This data filtering is applied at a hierarchical processing level which will cause all other pathways to be filtered too.

This lower level data filtering processing for relational data is handled normally. It is naturally processed in the relational row set. But, is this true for physical XML structures too? Physical structures like XML have already been mapped into relational rowsets when accessed. So they operate the same as hierarchically modeled relational data allowing all the flexibility of relational data. All of these natural operations allows for the combining of structures to be “mashup-ed” in unlimited ways.

ConclusionThis article is the third in a series of articles on ANSI SQL transparent XML hierarchical processing. It concentrated on how ANSI SQL can control hierarchical processing in dynamic hierarchical queries. This is a new capability and this article has shown how SQL’s dynamic processing has all of the controls and capabilities necessary to do this.

Our online interactive prototype of an ANSI SQL navigationless XML multipath hierarchical processor that demonstrates the capabilities described in this article can be accessed at www.adatinc.com/demo.html.

Share

submit to reddit

About Michael David

Michael is is the founder of Advanced Data Access Technologies, Inc. Previously, he was the lead XML architect for NCR/Teradata, and served as their representative to the ANSI SQLX Group. Before that he was a staff scientist for Teradata and designed high level multi-featured SQL utilities. From his earlier career, he has more than 25 years of experience researching and designing commercial nonprocedural heterogeneous database hierarchical query processing products using flat, relational and hierarchical data. From this experience, he authored the book Advanced ANSI SQL Data Modeling and Structure Processing, as well as numerous papers and articles on this subject. His research on hierarchical and relational systems and data integration has resulted in discoveries that led to the development of an ANSI SQL transparent XML hierarchical processor prototype that integrates and processes relational and XML data at a full multipath hierarchical level. This also proves that inherent hierarchical processing capability is possible in ANSI SQL.
Contact Mike at mike@adatinc.com, and read his blog at www.adatinc.com/blog1.

Top