Where is the Query in XQuery?

Before the advent of relational databases, I worked as a designer and implementer on some of the earliest hierarchical database query languages. Back then, a query language was truly a
nonprocedural process and meant exactly what the intended meaning of “query” means. A query is a simple question requiring no procedural instructions to answer. This meaning of
“query” and “nonprocedural” has been continually diluted and softened by query product designers and developers over the years to include user navigation with XPath and
processing logic with XQuery’s FLWOR statement to control looping constructs.

A number of years ago I wrote a couple of articles for TDAN, listed directly below, that described how standard ANSI SQL could naturally operate hierarchically and how this inherent capability
could be seamlessly extended to access native XML. This capability would allow standard SQL to query native XML transparently and retain all of the characteristics and capabilities of a true query
language as defined above. To research this capability further, my company was able to develop a hierarchical XML processor prototype that integrates relational and XML data at a full hierarchical
level using only ANSI SQL syntax and semantics as described in my articles listed below.

In XML: Using SQL to Link Below the Root – Part 1

In SQL: Processing XML’s Complex Hierarchical Structures – Part 2

XQuery was designed from the ground up to process hierarchical XML data, but it is still missing capabilities that a hierarchical query processor product should have and did have originally. In
this article, I will list these natural hierarchical query capabilities functioning in the ANSI SQL SQL/XML prototype mentioned above and compare them to XQuery to describe the missing query
capabilities in question.

A Specific Isolated Query Specification

The SQL/XML prototype uses ANSI SQL’s simple straightforward intuitive query specification of SELECT, FROM, and WHERE keywords to easily specify the native XML hierarchical SQL query
nonprocedurally without requiring any procedural logic. This allows changes and additions to be made easily to the query. The user does not need to know the structure of the data being processed or
how to access a hierarchical database structure. This enables the SELECT, FROM and WHERE keywords to intuitively and easily query hierarchically structured data and produce hierarchical results.
ANSI SQL’s Left Outer Join is used to naturally define full multi-path hierarchical structures.

Much has been made of XQuery’s supposedly close mapping to SQL’s SELECT, FROM, and WHERE, but XQuery’s comparable SELECT and FROM operations are deeply embedded within
XQuery’s looping logic. XQuery requires procedural user code logic with the use of its FOR and LET statements that use looping constructs controlled in their FLWOR statement. In addition, the
selected data for output is contained in strategic locations throughout the looping logic. The essence of the query is embedded in the processing logic and any changes have to be reprogrammed. The
query user has to know the structure being accessed and be familiar with nontrivial access techniques and principles of hierarchical databases.

Dynamic Query Generation with the SELECT List

The SQL SELECT list of the SQL/XML prototype can be used to quickly and dynamically specify the desired output data which automatically and dynamically controls the specific range of processing
necessary to satisfy the query request. Simply adding or removing selected data items from the SELECT list can automatically add or remove access to different portions of the overall virtual
structure being processed in a true nonprocedural fashion.

XQuery query additions to modify what data is desired and needed often requires changes to the looping processing logic requiring coding changes in the logic of the query. This will then require
re-testing. XML functions with variables do allow some pre-planned flexibility in controlling the operation of the query, but does not compare to the SQL/XML prototype’s ability to
dynamically configure the query processing automatically in the SQL SELECT list to limit the area of the defined structure requiring processing.

Nonprocedural Navigationless Operation

The SQL nonprocedural and navigationless operation of the SQL/XML prototype means that no procedural navigation by the user is necessary. This allows the user to be nontechnical and have no
knowledge of the data’s hierarchical structure. The user does not need to know the location of the data in the structure or care if the data is located on more than one hierarchical pathway.
Nonprocedural processing also means that the SQL processing engine is performing the hierarchical processing correctly even though there is no standard for hierarchical processing today.

XQuery requires procedural XPath navigation which is a form of procedural logic that requires user knowledge of the structure being processed and knowledge of how to navigate the structure using
XPath. Correct hierarchical processing is left up to the user which is made extremely difficult by procedural navigation which should also take hierarchical processing into consideration. The rules
for correct hierarchical processing can get quite complex depending on the query and structure.

Full Nonlinear Hierarchical Query Processing

The SQL/XML prototype’s hierarchical processing has taken SQL’s processing from simple linear single-path hierarchical processing to fully principled multi-path nonlinear hierarchical
processing by recognizing SQL’s full multi-path hierarchical capability. The additional semantics available in multi-path queries are automatically utilized to enable many new hierarchical
capabilities that make more uses of the customer’s data and greatly increases: the power of the query, value of the data, and the number of different queries possible.

XQuery can not realistically process multi-path queries because it uses procedural navigation which becomes too complex to perform procedurally for multi-path hierarchical processing. Multi-path
processing also requires following specific hierarchical processing principles and logic that can quickly become very complex requiring exact coding requirements. These become increasingly
complicated with multiple layers and combinations of hierarchical processing procedures as the number of paths referenced increases. This makes it impractical for XQuery to perform multi-path

Structure Aware Operation

The SQL FROM clause of the SQL/XML prototype specifies the input source objects and how they are related hierarchically using the Left Outer Join. This allows the SQL/XML prototype to analyze the
Left Outer Join hierarchical data modeling usage to dynamically determine the exact hierarchical structure being processed. This allows the SQL/XML prototype to be aware of the hierarchical
structure and to naturally enhance and extend its hierarchical structure processing capabilities for unlimited advanced XML database processing. It will also not let the user specify a structure
that is not hierarchically valid to assure the results are always hierarchical correct.

XQuery is unaware of the structure of the data it is processing. It is surprising that XQuery supports operations like its default relational Inner Join operation which is not a hierarchical
operation and will invalidate hierarchical structures. It can produce results that are not hierarchically correct and it can not dynamically enhance the query operation hierarchically. Since XQuery
is unaware of the structure it is processing, it can not give a warning when hierarchical processing errors occur which is quite possible with its procedural processing.

Hierarchical Global Views and Optimization

The SQL/XML prototype supports ANSI SQL hierarchical global views. These views are comprised of Left Outer Joins whose syntax and natural hierarchical use define entire multi-path hierarchical
structures. This knowledge of the hierarchical structure is valuable metadata that allows the views to be dynamically optimized hierarchically based on the query’s variable selected data
which also controls the hierarchical output. This powerful semantic optimization removes hierarchical paths of the processed structure that do not require access for the active query. This means
that global views can be supported and will have no additional overhead since each query is optimized to fit the query, accessing only the necessary portion of the hierarchical structure to satisfy
the current query. This allows a single reusable global view for querying any single or multiple path query contained in the global structure with no overhead.

Since XQuery is not aware of the structure it is processing, it does not know if the structure is hierarchical, so it can not hierarchically optimize hierarchical structures and global views
supported by functions. Non hierarchical relational structures require that all paths of the structure to be accessed to preserve the validity of the view. This is because missing data anywhere in
the non hierarchical view (containing Inner Joins) can affect the results even if the data was not referenced. This limits XQuery to performing optimizations at the operational level and not at the
more powerful global hierarchical structure semantic level. This also means that every different XQuery query’s access requirements have to be specifically defined by its procedural
navigation. This also significantly limits XQuery’s query reuse.

Automatic and Correct Structured XML Output

Since the SQL/XML prototype is fully aware of the hierarchically processed result and what output data has been selected, it can automatically format the result as fully hierarchically structured
XML output based on the processed input hierarchical structure. Required transformations can be specified in ANSI SQL if needed. The structure of the XML result will automatically adapt to changes
in selected data which will automatically follow standard hierarchical processing operations such as automatically performing node promotion around nodes not selected for output. Nodes that have no
data selected are sliced out of the structure, but their descended nodes are preserved if they have data selected for output. This is standard hierarchical processing and formatting processing,
which can be overridden if necessary to force empty nodes to be output to preserve the fully processed structure.

XQuery’s formatted XML output is specified more formally and is more fixed. There is no automatic structure formatting because XQuery does not know the hierarchical structure being processed.
Dynamic changes to the output structure can be programmed into the execution logic. Any unplanned changes in the output need to be programmed and tied into the processing logic. The designers of
XQuery claim that this more fixed format output control is desired, but for a full query processing the dynamic ad hoc query processing requires automatic query formatting. This full query
automatic output processing performed by the SQL/XML prototype will also assure the structure output format is based correctly on the hierarchical structure result and its processing. With
procedural processing, there is always the possibility of a semantic mismatch between the processed structure result and the desired output structure result specified by the user.

Effective Interactive Use

Since the SQL/XML prototype is fully nonprocedural, navigationless, and automatically formats the XML hierarchical output; it can support interactive use effectively. In addition, the advanced
multi-path query processing with its greater level of semantic information automatically increases the value of the data allowing for a higher level of processing which can be used in decision
support. Global views also greatly increase the ease and value of interactive use because they allow unlimited ad hoc query specifications from the same view.

XQuery’s procedural logic and navigation greatly limits its practical use for interactive operation. It is too programmatical for the user to effectively specify interactively. In addition,
as mentioned previously, complex hierarchical rules must be followed to get correct results. This makes the query even more difficult to specify interactively and incorrect results can be common
and will go unnoticed. Another limitation is XQuery’s ability to dynamically modify stored queries and functions is limited by its fixed and hard coded processing logic.

XQuery’s Over-Extended Problem

The problem with XQuery is that it is trying to solve all of XML’s processing needs. It is also trying to act as a replacement for SQL with its XQuery relational processing capability.
Unfortunately, this limits and conflicts with XQuery’s hierarchical processing capabilities. This allows non hierarchical results to be produced and output as XML permitting invalid results.
It also supports both XML markup and XML database data processing at the same time without distinguishing between them. This limits XML database processing because XQuery processing must be driven
by user navigation that is required for markup data. User navigation is not necessary for XML database data processing which requires a more restricted set of hierarchical processing operations.
This enables the full multi-path navigationless processing capability that the standard SQL processor of the SQL/XML prototype utilizes. This avoids user navigation which would have allowed invalid
hierarchical operations for database processing.

The SQL/XML prototype only processes hierarchical XML and hierarchical modeled relational database data allowing it to always perform consistently correct full multi-path hierarchical processing at
a full hierarchical level without user navigation. This navigationless operation also allows this full hierarchical query by non technical users. The SQL/XML prototype does not attempt to process
markup data differently than database data. Markup data is primarily used in text strings. This can be used in SQL via an external function that knows how to process XML markup in a text string
which needs to be processed in a less hierarchically strict manner where hierarchical proximity is taken into consideration.


The SQL/XML prototype is nonprocedural allowing it to: be used by untrained users; support full unrestricted multi-path queries; always produces correct XML hierarchical results automatically; and
can be used interactively. XQuery’s operation is procedural requiring looping logic to be specified and can not support these additional nonprocedural query capabilities.

In reality, XQuery is not a true query language from the user’s point of view. It is actually a high level procedural language that allows exact, unrestricted, and flexible control not
possible with a true nonprocedural language like the SQL/XML prototype, but it is always described and marketed as a nonprocedural query language. This is doing XQuery a disservice and is holding
back the XML database data market and industry from fully utilizing XML database hierarchical data structures and from always producing correct hierarchical results.

More information on the ANSI SQL SQL/XML hierarchical processor prototype is available at www.adatinc.com.


submit to reddit

About Michael David

Michael is is the founder of Advanced Data Access Technologies, Inc. Previously, he was the lead XML architect for NCR/Teradata, and served as their representative to the ANSI SQLX Group. Before that he was a staff scientist for Teradata and designed high level multi-featured SQL utilities. From his earlier career, he has more than 25 years of experience researching and designing commercial nonprocedural heterogeneous database hierarchical query processing products using flat, relational and hierarchical data. From this experience, he authored the book Advanced ANSI SQL Data Modeling and Structure Processing, as well as numerous papers and articles on this subject. His research on hierarchical and relational systems and data integration has resulted in discoveries that led to the development of an ANSI SQL transparent XML hierarchical processor prototype that integrates and processes relational and XML data at a full multipath hierarchical level. This also proves that inherent hierarchical processing capability is possible in ANSI SQL.
Contact Mike at mike@adatinc.com, and read his blog at www.adatinc.com/blog1.