Creating More Value than is Captured with Hierarchical Data Structures Using SQL

What is as valuable as your data or using your data? The answer is the capability to automatically increase your data’s value in an unlimited manner. This capability is available today, and it relies on hierarchical data structures. Hierarchical structures are very unique not only in how well they organize data, but also in how they naturally capture more meaning than is stored with the data. Even more impressive is their ability to dynamically process this natural goldmine of meaning in unlimited ways that further increases the value of the stored data. With the increased use of hierarchical XML data structures today, there is an incredible amount of unused data value potential available. This article will explore ways hierarchical structures can be used to significantly increase data value and its utilization.

The Natural Increasing Data Value of Hierarchical StructuresHierarchical structures are easily built and expanded naturally over time. Suppose we start building a hierarchical structure by creating and populating node A over node B. Then at some point we add node C also directly under node A and also start populating it. This is an example of reuse by reusing the already existing node A and its data. We also know what nodes node A has control over and how nodes B and C are related to each other. With the addition of each new node, the data value will increase in a nonlinear fashion. At this point, this multipath nonlinear hierarchical structure looks like the one below in Figure 1.



Figure 1
: Hierarchical Data Structure

SQL Can Model and Process Hierarchical Data StructuresThe hierarchical data structure in Figure 1 is composed of three node types: A, B and C. When modeling a hierarchical structure using relational data, each node is a relational table. Tables can have many occurrences of data rows. This means each node can have multiple data occurrences. This then means data paths such as A/B can have multiple data occurrences. XML and its element structures can be modeled in the same fashion. This allows relational and XML data to be seamlessly integrated at a hierarchical processing level.

The hierarchical structure in Figure 1 can be hierarchically modeled in SQL using the Left Outer Join that preserves its left argument but not its right argument. This places node A over node B hierarchically. This allows node A to exist even if there are no matching node B occurrences. If node B has occurrences that are not matched to node A, these are discarded. The addition of node C follows the same logic. Node A controls the existence of both Node B and C. The Left Outer Join contains an ON clause that specifies the join condition at each specific join. This controls how the nodes are interconnected. This is demonstrated below in Example 1 with the hierarchical SQL view modeling of the structure shown in Figure 1.
 


Example 1: Multipath Structure View Definition

 
The view structure defined in Example 1 using SQL’s Left Outer Join syntax exactly models the hierarchical structure shown in Figure 1. The associated Left Outer Join semantics controlling the data preservation operation controls the operation of this view when the view definition is processed directly in an SQL query.

Today hierarchical structure processing is limited to single path processing. In Figure 1 this means that path A/B can be processed or A/C can be processed, but a combination of both (nodes A, B and C) cannot be accurately processed together without utilizing additional structure semantic logic. The semantics covered previously describe how nodes A and B are hierarchically related, and how nodes A and C are hierarchically related, but did not include how the different paths are related to each other and what their associated semantics would be. This is because the required technology to utilize the necessary structure semantics is not well known or in use today. This is covered directly below.

Utilizing Multiple Paths TogetherA hidden and natural semantic layer between hierarchical structure pathways offers the capability to significantly increase the data value dynamically by allowing powerful multipath hierarchical data processing to be performed correctly. What is required is a multipath processing logic known as Lowest Common Ancestor (LCA) logic that produces meaningful results for multipath queries. This process is automatically carried out in SQL controlled by the structure semantic information inherent in the hierarchical view in Example 1. This little known automatic operation was described in the previous article, The Ghost in the Machine, ANSI SQL Inherent Hierarchical Data Processing.

LCA logic in SQL query processing is used in two different situations. One is used in determining the range of qualified SELECT output items in one hierarchical pathway based on data values from other pathways. The other way used is in filtering data based on data values across multiple pathways that also uses ranges of data items used in the WHERE clause processing. Each of these situations produces its own LCAs, and these can become nested complicating the LCA processing further. SQL internally performs this processing naturally and accurately regardless of the complexity involved.

Automatically Increasing Data Value by Structure SemanticsWith the structure in Figure 1, we can use query paths A/B and A/C separately as today. In addition, we can also use this separately populated structure to also specify the following multipath hierarchical structure queries in Example 2 which utilize multiple pathways. Each query references both nodes B and C, which are on different hierarchical pathways.

Example 2: Sample Multipath Queries

The above nonlinear multipath queries in Example 2 automatically utilize the nonlinear structure semantics that exist naturally between the separate linear path additions. Multipath queries utilize more semantics than linear path queries because every node added by a linear path addition is also related to every node in every other pathway in the structure and each has its own meaning associated with it. This adds considerable potential data value, enabling an unlimited number of possible queries for a given data structure and significantly increasing the overall data value of the entire hierarchical structure with no additional effort for the SQL user.

The Flexibility to Specify Schema-Free QueriesMultipath hierarchical query processing is too complex for the user to specify procedurally with navigation, looping logic control and the required LCA logic. With schema-free queries, the user does not have to have knowledge of the data structure or have to navigate the structure so that multipath queries become possible. Today this Schema-Free processing capability is not in XQuery, but is being attempted in academic projects by adding LCA functions on top of XQuery. SQL’s inherent hierarchical processing handles this schema-free processing today. The capability of schema-free multipath processing allows the queries defined in Example 2 to be automatically performed. The benefits are described directly below.

The first SQL query in Example 2 above demonstrates the increased level of query processing power added by the capability to select data from one path of the structure by utilizing data from another path. This automatically draws on the semantics between the structure paths making this simple multipath query many times more powerful and useful than linear queries.

The second SQL query in Example 2 is demonstrating the same type multipath query as the first example described directly above. Its purpose is to demonstrate how the same pathways of this different query can be reversed in use increasing the flexibility and use of the same data in different ways showing that an unlimited number of queries are possible.

The third SQL query in Example 2 demonstrates that the query filtering process can contain powerful multipath data filtering query conditions. Very powerful search and decision support queries can be easily specified while the complex processing for multipath data filtering is automatically handled internally by limiting the conditional processing to only the valid hierarchical range of input values.

The fourth SQL query in Example 2 demonstrates how output fields from multiple paths can be specified and processed correctly. This is a powerful and internally complex query because the data filtering semantics is different for each output field. This is because they are on different pathways having their own semantics and are each controlled differently for output to match their different semantics. This allows the automatic output of only meaningful results.

The fifth SQL query in Example 2 demonstrates the complete freedom to specify and process any query, regardless of the internal complexity. This query is the same as the previous example directly above with output and processing of multiple pathways except this query also has filtering requirements across multiple paths. This further complicates the query semantics and its complex internal processing logic which can still be performed by SQL automatically. This powerful multipath query capability can also be used for decision support queries.

Goldmine of Unused Hierarchical Structure Semantics Now AvailableCurrently all nonlinear hierarchical structures are being limited to linear processing today and because of this are utilizing only a fraction of their data value. Nonlinear hierarchical processing allows any combination of pathways to be queried to produce the desired result. In addition, multipath queries must support navigationless processing which allows even nontechnical users to specify these more powerful queries making them even more useful.
Deriving additional information from already available data is similar to utilizing the information contained in unstructured data except with hierarchical data structures the additional information is free for the taking using SQL. This is because it involves no additional work by the user. So it just makes good business sense to utilize this capability.

Additional Data Value Increases Using SQL CapabilitiesThe above advanced natural multipath nonlinear hierarchical processing capabilities are naturally available in full multipath hierarchical structures and their processing. SQL’s advanced hierarchical processing can additionally leverage these naturally existing hierarchical processing based capabilities to further increase data value as described below. The order they are specified in is significant because the different capabilities build upon each other.

Automatically Increasing Data Value Using Global QueryThe data preservation in hierarchical structures discussed in connection to the operation of Figure 1 allows for powerful hierarchical optimization. This optimization allows for the dynamic removal of unnecessary pathways in SQL views based on the active query. This enables the creation of SQL global views that have no overhead. This allows for unlimited and unplanned queries using a single global view of the data. This increases the value of the data by increasing its range of usefulness.

Automatically Increasing Data Value by Automatic OperationMultipath queries require nonprocedural navigationless access because user navigation is too slow and complex to handle the complexity of multipath queries. The new level of nonlinear complex automatic processing allows for increasing data value by being able to applying more and deeper semantics.

Automatically Increasing Data Value by Ease of UseNavigationless nonprocedural processing means that the query user does not need to know the structure or have to code complex queries and can be used by non-technical users. This makes data queries more easily available to more users increasing data value.

Automatically Increasing Data Value by Data CorrectnessAutomatic processing naturally follows SQL principles producing the result is correct. Correct results increase the data value of the data because its processing has a high level of accuracy and can be trusted for critical and important applications.

Automatically Increases Data Value by Interactive Dynamic QueriesAutomatic and navigationless processing means that the hierarchical processing can be performed interactively. Being able to have immediate interactive use of data increases its data value by supplying immediate, fresh and usable information.

ConclusionUtilizing hierarchical structures to store your data will allow for the most use and flexibility of your data. This involves multipath nonlinear processing. The ability to specify multipath queries and their correct processing is considerably more complicated to process than linear single path queries. This means that this processing needs to be performed automatically without user navigation because of its complexity and multipath semantic knowledge necessary. This is automatically performed utilizing the semantics between the pathways used in the query. For example, selecting data from one pathway based on data values from a different pathway. This opens the door to unlimited queries allowing any desired query to be automatically processed. This also assures that the results are correct.

Each different multipath query made against its global view has its own unique semantics based on the naturally occurring semantics that exists between all the pathways being accessed to solve the query. This significantly increases the data value of all the data in the structure, and it can be queried without the user having to know the data structure.

An online interactive prototype of an ANSI SQL navigationless XML multipath hierarchical processor that demonstrates the capabilities described in this article can be accessed at www.adatinc.com/demo.html.

Share

submit to reddit

About Michael David

Michael is is the founder of Advanced Data Access Technologies, Inc. Previously, he was the lead XML architect for NCR/Teradata, and served as their representative to the ANSI SQLX Group. Before that he was a staff scientist for Teradata and designed high level multi-featured SQL utilities. From his earlier career, he has more than 25 years of experience researching and designing commercial nonprocedural heterogeneous database hierarchical query processing products using flat, relational and hierarchical data. From this experience, he authored the book Advanced ANSI SQL Data Modeling and Structure Processing, as well as numerous papers and articles on this subject. His research on hierarchical and relational systems and data integration has resulted in discoveries that led to the development of an ANSI SQL transparent XML hierarchical processor prototype that integrates and processes relational and XML data at a full multipath hierarchical level. This also proves that inherent hierarchical processing capability is possible in ANSI SQL.
Contact Mike at mike@adatinc.com, and read his blog at www.adatinc.com/blog1.

Top