Data Publishing Architecture for the Extended Enterprise

Published in TDAN.com July 2003

Where small numbers of experts are responsible for analyzing large amounts of data, large server-based static Web reporting and data analysis systems have clearly shown their usefulness in the
enterprise. Users of such systems have access to all data in the same virtual space, and can perform a wide variety of analysis operations on their reports to satisfy a wide variety of needs.

While powerful, this architecture does not scale particularly well to large numbers of users without significant investments in both back-end server systems and front-end software licensed on a per
user basis. To keep IT costs under control while servicing enterprise needs, the answer has been to provide MOLAP capabilities to a relative few “expert users”, with data analysis results
distributed to wide user populations via static reports.

Where large numbers of users of different levels throughout the extended enterprise need not only Web reports, but also the ability to perform free-form analysis on the data and reports they
receive, analysis computation must be transferred to client machines. This forms the basis of data publishing architecture. Multi-dimensional cubes of data are accessed by the user over the Web as
a form of content. Further, the highly efficient data publishing paradigm permits product licensing to be limited to the server, enabling large populations of users access to free
“viewer-type” software that is now common on the Internet for audio, document and video files.

But unlike these content types, a client-side applet is also shipped to the user’s PC (and can then be cached for subsequent sessions) with each click on the content. While the user experiences
the virtual data space as it is explored, he or she has full analysis capabilities on the Web-top. So rather than being Web-enabled with a system that requires calculations and queries to revert
back to the server for processing, the user in a data publishing architecture is truly Web-based.


More Rapid Decision-Making Required by Many

The dual pressures of e-business productivity potential and challenging global economies are pushing executives for more rapid decision-making and hands-on analysis both inside and far beyond
corporate firewalls. This being the case, a data publishing architecture can find its place in disseminating data to large user audiences.

This article will outline the operational components and data path in this architecture, from raw data in a database to Web reports and data analysis available throughout the extended enterprise.
It will also describe how the combination of desktop and network-awareness of the software is used to provide additional functionality, including saving and sharing reports between sessions,
navigating a network of MOLAP cubes, and reaching back through to source data while doing analysis on the client.

On the server side, a design application is used to sample data and enables data publishers to quickly create the desired multi-dimensional data structures – dimensions, measures, etc. A
builder application then runs in batch mode or on request, converting data into highly compressed multi-dimensional files that are hosted on the Web server. Static or dynamic Web reports refer to
the data and analysis software and can be requested by the user’s browser as Web content. In sophisticated installations, design definitions can be generated dynamically by application logic (by
querying the user for data and presentation parameters) and a custom on-demand data file is delivered to the user.


Business Model Change Driving Technology Architecture Changes

Business efficiency and business model change are two major factors in driving innovation in technology and software architecture, so it’s worth noting that the analytics marketplace is still a
healthy one relative to overall declines in IT spending.

The reason for that growth is that in tough economic times operationally-oriented, ROI-led IT projects get priority over “nice-to-have” (and often big-ticket) infrastructure investments. A recent
IDC study states that organizations that have successfully implemented and utilized analytic applications have realized returns ranging from 17 percent to more than 2000 percent with a median
return on investment of 112 percent.

As the enterprise extends outside the corporate firewall in pursuit of better efficiencies, analytical applications are extending as well. Established BI vendors are talking about “information
democracy,” while mass-market software companies – suspecting the market has moved beyond the early adopter stage – are now offering solutions. The realities of the extended enterprise
and a challenging economy mean more managers are expected to deliver better insights to corporate analysts as well as suppliers, customers, regulators and contract employees who all comprise the
decision-making DNA of today’s organizations.

The challenge is to find a reporting and data analysis technology architecture that can embrace not just dozens of experts, but large populations comprised of thousands of unnamed ad hoc
information consumers who increasingly understand the value and potential of analytics to help them achieve their goals.


The Case(s) for Mass Deployment

Analytics, especially MOLAP, has traditionally been deployed to very small audiences of specialists. In most organizations, the analytics (and the data itself) is managed and accessible only to an
exclusive group of users called upon to produce reports and analysis for various members of that organization. Driven from that “all-controlled-in-one-place” environment, analytics has
concentrated on working with ever larger amounts of data – only a small portion of which would ever be needed for any given task – and ever longer lists of features to simplify
producing a very wide variety of reports.

But as any CEO running a Web-enabled extended enterprise will attest, Web reporting and data analysis capability is too valuable to keep in the hands of a select few, or to be bottlenecked by a
structure that funnels everything through a small number of experts.

Mass deployment of MOLAP analytics is natural for many organizations where data is moving broadly internally, and especially where it must be made available to a remote or external audience.
E-billing applications are a perfect corporate example, but there are many more inside and outside the private sector:

  1. Financial institutions providing portfolio information, market statistics, etc. to their clients;
  2. Data services companies, market analysis companies, and a plethora of organizations whose product is data, make all kinds available (for a price) to client audiences;
  3. Health/Medicare organizations looking to disseminate prescription cost and tracking data to users ranging from individual doctors right up to state health bureaucrats;
  4. Governments mandated to make various types of regulatory and economic data available to citizens over the Internet.

What is common about these applications?

  1. User populations can be very large, in the thousands or hundreds of thousands;
  2. Users are interested in a particular slice of data, which may be their own (in the case of e-billing, for example), or may be of particular interest to them (in the case of competitive
    intelligence data);
  3. The users are PC-enabled and online; and,
  4. Interpreting reports and performing data analysis is only a part-time interest to these users. They need easy-to-comprehend reports and ad hoc analysis capability, but the prospect of complex
    software installations, training classes or consulting manuals doesn’t interest them. They expect one-click gratification (as they get with other Web content from MP3s to PDFs)


The Mass Deployment Conundrum

Web reporting and data analysis software has followed the trends of most IT applications: Before the desktop revolution, users accessed central computers through dumb terminals. As personal
computers became more common, applications migrated to the desktop, where Business Intelligence as we know it today was really born in the early 1990s. When data warehousing became popular, that
data became the target of analysis applications, but it required moving the application back to the centralized computer, and so the client-server age of BI was born.

Web protocols have replaced proprietary client-server protocols in the current implementations of these applications, but otherwise, they have essentially stayed the same.

Scaling the number of users in a client-server environment stresses the shared resource – the server. The traditional solution is to move the server application to a bigger machine, or to
layer onto it elaborate load-balancing algorithms for running on clusters of machines. But this only pushes the boundary back incrementally when what is really needed is an architecture where
computational resources grow as the demand for it grows, that is, as the number of users grows.


The Data Publishing Architecture

The data publishing architecture is based on the premise that interacting with Web reports and performing data analysis is best done on the end-user’s desktop. A server should be used to deliver
data and analysis software as content; the server must not be a shared computational element or effective scaling is simply not possible.

The second key driver in shaping such an architecture is the nature of the Web as a means to deliver content. The data, formatted reports and the analysis application should reach a user in the
same way – and as effortlessly – as any other Web content. It should be as easy to view as static content like text and images, and as easy to manipulate as active content like forms or
video clips.

The primary components in the architecture are the designer, the builder, and the Web reporting and analysis client-side applet. In the simplest scenario, an administrator uses the designer to
prepare a profile or definition, runs the builder, and links the resulting MOLAP cube into a web site. The flow of data through this process is shown in Figure 1.

 

In a standard environment, published data and Web reporting/analysis software are delivered to the client as would any other Web content, causing no greater load on the server than any other
content. Whatever the number of users, the computing power they require grows with them since the analysis is done on each user’s PC.

In such a system, the user suffers network latency every time they interact with the data – not to mention waiting for computing time on the shared server – and cannot access or analyze data and
reports when disconnected.

In this architecture the server is not used as a computing resource, but it is, of course, still available for other services that may extend the value of an analysis session. The user may, for
example, wish to drill through to the source records behind a presented Web report, or share a particular view of the data with others. Or the data publisher/administrator may wish to provide links
from the report to other parts of the web site – possibly other MOLAP cubes – or to pages on other web sites.


Comparison of Data Publishing to Client/Server Architectures

The data publishing architecture essentially offers a MOLAP solution where cube size is kept small by building cubes with subsets of data for specific problems – those that a single user
would concentrate on at any given time. It takes advantage of the compression inherent in cube formation due to record aggregation, without suffering the data explosion experienced with large data
sets. The cubes can also be further compressed to reduce the cost of transporting them to the client. Commonly, megabytes of raw data are compressed into a cube of tens or hundreds of kilobytes,
suitable for access using standard Web-top/IP technology.

Figure 2 shows a comparison of data publication to ROLAP and MOLAP client/server architectures.

 


Extensions of the Basic Data Publishing Architecture

There are a few notable variations on the simple data publishing deployment scenario described thus far, including scheduled cube building, cubes on demand, interacting with networked components
and collaboration.


Scheduled Cube Building

The simple scenario dealt with a monolithic block of static data. In many applications, the data changes over time, and the data is subset for different purposes. An example would be e-government
demographic data which naturally changes over time, and would be subset different ways, depending on the goals of the user.

The administrator may define a small set of cube definitions, representing the different types of problems users are interested in solving. Those cube definitions can be applied to different
subsets of data for different time periods, for example, or for different geographic regions. In such a case, the administrator would set up scheduled cube builds for each cube definition, with
parameters for each build to cause it to use the desired subset of data.


Cubes on Demand

While the scenario above makes it easy to generate the data cubes for a web site on a fixed schedule, it may be that there are so many possible cubes that generating and storing them is
unattractive. On the other hand, it may be that only an unpredictable subset of the possible cubes would be accessed during a generation cycle, so it would be wasteful to generate them all each
time.

A more appropriate setup for cube building in such an environment would have OLAP cubes built on demand, using parameters specific to (or provided by) the user when the request is submitted. This
makes the transaction with the server more than just a file retrieval, but the computations involved in analyzing the data are still performed on the user’s machine, maintaining the scalability of
the architecture.


Network Services That Extend Analysis

There are a number of reasons a user may want to interact with networked components while analyzing MOLAP data within a data publishing architecture:

  1. Retrieve a subset of the source data for the cube being analyzed;
  2. Save a report (the end-product of an analytics session) for another user to make use of;
  3. Navigate to a related cube of data – maybe a more detailed view of particular element; and,
  4. Navigate to a related web page – maybe a document describing the data or the portion of it the user indicates.


Collaboration

An important aspect of data analysis is the saving of self-customized reports derived from online analysis sessions. Once saved on the server, these reports are not only available to the user in
later sessions, but can be shared with other users. And because data and analysis are delivered as content in a format similar in size to many email attachments, it’s entirely viable to email a
URL to the analytics content within a spontaneously created workgroup. Also, if an IT administrator permits it, members of the workgroup can then save their own self-customized reports on the same
OLAP cube. This makes analytics an integral part of work-sharing/collaboration projects – the new team sport in e-business.


Conclusion

The data publishing architecture described in this article meets the needs of a Web-enabled extended enterprise seeking to drive decision-making throughout and beyond the corporate firewall without
the burden of per user licensing costs. Equipped with nothing more than a browser and connection to the Internet, large populations of ad hoc information consumers can have one-click access to Web
reports and data analysis that is made available as highly efficient Web content files.

After that click, these people are able to create their own self-customized reports that can in turn be shared and modified within spontaneously created workgroups. For financial, health, data
services and e-government organizations seeking the documented ROI of analytics but unable to afford the hardware infrastructure or named user license schemes of traditional client-server analytics
systems, this architecture is a breakthrough worth serious investigation.

Share

submit to reddit

About Michael Kelly

A Databeacon co-founder, Mike Kelly brings a Ph.D., Electrical Engineering and more than ten years of related experience in the computing industry to Databeacon. Much of this was at the Computing Research Laboratory at Nortel Inc., creating innovative products utilizing the emerging technologies from industry and academia. Most recently, his focus was on the design and development of client-server applications, network-based programming, and the design and creation of Java-based software applications. www.databeacon.com.

Top