Published in TDAN.com January 2000
Publisher’s note: This paper was extracted from the book: “Building Corporate Portals with XML”, by Clive Finkelstein and Peter Aiken, published by McGraw-Hill in September 1999 [ISBN:
0-07-913705-9]. The paper, serving as an introduction to the book, references other chapters of the book that are not included with this paper.
One thing we are not short of today, is information. We are swimming in it! Our information comes from traditional printed sources such as books, magazines, newspapers, subscription reports and
newsletters; from audio sources such as radio; from video sources such as free-to-air television or cable TV; from email and from word-of-mouth. The saving grace with these information sources –
apart from radio and free-to-air TV – is that they are limited only to those who have subscribed to receive that information.
Not any more. Even today, and certainly more so in the future, each of these sources is moving to the Internet. They are offered as free services, where the cost of preparation is paid not by
subscription but by advertising. Even word-of-mouth, previously a reliable source of information from people you knew personally and whose opinion you respected, has moved to the Internet in
newsgroups and chat rooms – but with opinions offered by people, perhaps in another country, who are totally unknown to you. Both accurate and inaccurate comment now circle the globe not at
word-of-mouth speed, but at electronic speed.
Email is the killer application of the Internet; even more so of the corporate Intranet. Enormous knowledge is retained in corporate email archives – much to the chagrin of Microsoft, with certain
email messages used by government prosecutors in the Microsoft Antitrust trial as smoking guns to illustrate alleged abuses of monopoly power. Corporate email is a knowledge resource that is of
great value, yet until now it has been largely inaccessible.
Text searches on the Internet by traditional search engines are largely ineffective; a simple query can return thousands of links containing the entered keywords or search phrase. Only a small
fraction of these may be relevant, yet each link must be manually investigated to assess its content – if relevancy ratings are not also provided.
The problem is no less severe with enterprises. We are inundated with information. To the credit of the Information Technology (IT) industry, at least this information is being organized and made
more readily available through Data Warehouses. We discuss the building of Data Warehouses extensively in this book.
Most information in Data Warehouses is based on structured data sources as operational databases used by older legacy systems and relational databases. Data Warehouse products are also now becoming
available that use Internet technologies. These valuable information tools can now be used within an exterprise across the corporate Intranet. The information is thus more readily available.
We discussed earlier that structured data represents only 10% of the information and knowledge resource in most enterprises. The remaining 90% exists as unstructured data that has been largely
inaccessible to Data Warehouses. Text documents, email messages, reports, graphics, images, audio and video files all are valuable sources of data, information and knowledge that have been
untapped. They exist in physical formats that have been difficult to access by computer – as if they were behind locked doors.
The technologies are now available to open these doors. XML is one technology, as we have briefly seen. XML enables structured and unstructured data sources to be integrated easily, where this was
extremely difficult before. Organizations will develop new business processes and systems based on this integration, using Business Reengineering and Systems Reengineering methods. They will at
last be able to break away from the business process constraints that have inhibited change in the past.
Process Technologies in The Industrial Age
Most organizations today still use processes based on principles that are no longer effective. They were designed using the process engineering “bible”. Here is a short quiz: which book are we
referring to? Who was the author? When was it published?
Was the process engineering bible written by Michael Hammer, acknowledged by many as the “Father” of Business Process Reengineering [Hammer 1990]? Was it [Hammer and Champy 1993]? No, it was
before them …
Was it written by Ed Yourdon, Tom deMarco, Ken Orr or Gane and Sarson – all giants of the Structured Software Engineering era, which was process-driven? No to all of these …
Was it written by Edwards Deming, regarded by many as the “Father” of the quality movement? No, not him …
What about Peter Drucker, considered the “Father” of management gurus? Not him, either …
Was it Henry Ford, the “Father” of the assembly line? No, not him …
Yet each of these giants have contributed in their separate ways to improve the design, operation and functioning of enterprises and of information systems. We owe them all our thanks; we are in
their debt. They contributed greatly to the theory and practice of management, of organization and process design, of systems design and development. We draw on their works many times throughout
this book.
No, the process engineering bible was written long before each of these esteemed gentlemen.
We are in fact referring to “The Wealth of Nations” by Adam Smith, written around 1776, published most recently in [Smith 1910]. This has been the basis of most business processes used
in enterprises today!
Expressing what he wrote, but in today’s terminology, Adam Smith took complex processes and broke them down into simple steps. These were then carried out using the technology of his day – a
workforce that was largely illiterate. He showed that people could be trained to carry out these simple process steps, which they repeated endlessly. He then combined each of these steps in
different ways to build complex processes. While we have greatly simplified what he wrote and translated it into today’s environment, essentially this was its impact. For these became the
processes that fueled the Industrial Age.
Organizations grew as complex processes were built in this way. Manual technologies also used other technologies to supplement them. Mechanical technologies, electrical, electronic and other
technologies lead to corresonding engineering disciplines: mechanical engineering, electrical engineering etc. Yet the basic principle behind all of these processes was the work done by Adam Smith.
Henry Ford made a great contribution, with the assembly line. But still essentially the same approach was being used to design processes. And as these processes were automated, they were
implemented on computer in much the same way as the processes were carried out in the enterprise. The computer was used basically to do the same tasks, yet faster and more accurately.
The processes referred to relevant data. Each part of the enterprise maintained its own copy of the data that was required. As the processes were automated, the data was also automated. The same
data was implemented often in different versions, redundantly. The Information Engineering (IE) methodology, developed from 1976, was designed to address this problem – evolving in the mid 1980s
into Enterprise Engineering (EE) [Finkelstein 1981a, 1981b, 1989, 1992].
By the late 1980s, the inhibiting factor in the effectiveness and operation of processes in many enterprises was seen to be due to this evolutionary approach to business process design. The
Business Process Reengineering (BPR) revolution of the early 1990s began to address these problems. This was largely started by Michael Hammer in his landmark paper, provocatively titled:
“Reengineering Work: Don’t Automate, Obliterate!” [Hammer 1990].
XML and Enterprise Portals offer technologies that will progress these methods further. We will discuss their impact on Business Reengineering and on Systems Reengineering in Chapters 12 and 13.
Data Technologies in the Information Age
Our focus in this book is on Data Warehouses and Enterprise Portals. Data Warehouses provide access to structured data as discussed earlier. We will discuss data, warehouses and engineering later
in this chapter. We introduce Enterprise Portals here.
The term “Enterprise Information Portal” (EIP) we believe was first used in a report published by Merrill Lynch on November 16, 1998. A summary of this report is available from the [SageMaker]
web site. The full report can be downloaded in Adobe Acrobat Portable Document Format (PDF) file from this same web site. The Merrill Lynch summary and report define EIPs as:
The Merrill Lynch report and summary highlight the emergence of Enterprise Information Portals as an investment opportunity for their clients and others. InfoWorld presented a summary of the report
as a Front Page article of the January 25, 1999 issue. A copy of that article is available from the [InfoWorld Electric] web site. A financial summary of the potential of the EIP market from the
Merrill Lynch report was provided in the InfoWorld article. This is reproduced here as Figure 1.2.
The summary states: “We have conservatively estimated the 1998 total market opportunity of the EIP market at $4.4 billion. We anticipate that revenues could top $14.8 billion by 2002,
approximately 36% CAGR (Compound Annual Growth Rate) for this sector.”
As Figure 1.2 illustrates, software is required for Content Management, which is projected to grow from a market worth $1.2 billion in 1998 to one worth $4.7 billion in 2002. Products in the
Business Intelligence EIP market are expected to grow from $2.0 billion to $7.2 billion. The Data Warehouse and Data Mart EIP market is projected to grow from nearly $1 billion to $2.5 billion,
while the Data Management market will grow from $184 million to $360 million. The total EIP market therefore was projected in the Merrill Lynch report to grow from $4.4 billion to $14.8 billion
over the period 1998 to 2002.
Discussing the potential of the EIP market, the authors of the Merrill Lynch report believe it will “eventually reach or exceed the investment opportunities provided by the Enterprise Resource
Planning (ERP) market.” They give three main reasons why: “Enterprise Information Portals will emerge from a consolidation within and between the Business Intelligence, Content
Management, Data Warehouse, Data Mart and Data Management markets:
-
EIP systems provide companies with a competitive advantage: Corporate management is just realizing the competitive potential lying dormant in the information stored in its
enterprise systems. … EIP applications combine, standardize, index, analyze and distribute targeted, relevant information that end users need to do their day-to-day jobs more efficiently
and productively. The benefits include lowered costs, increased sales and better deployment of resources. -
EIP systems provide companies with a high return on investment (ROI): The emergence of ‘packaged’ EIP Applications are more attractive to customers because they
are less expensive than customized systems, contain functionality that caters to specific industries, are easier to maintain and faster to deploy. … EIP products help companies cut costs
and generate revenues. -
EIP systems provide access to all: The Internet provides the crucial inexpensive and reliable distribution channel that enables companies to make the power of information
systems available to all users (employees, customers, suppliers). Distribution channels include the Internet, Intranet and Broadcasting. … Companies will need to use both “publish”
(pull) and “subscribe” (push) mediums to ensure the right information is available or distributed to the right people at the right time.”
Source: [InfoWorld] Web Site and Merrill Lynch.
They go on to say that they: “envision the Enterprise Information Portal as a Browser-based system providing ubiquitous access to business related information in the same way that Internet
content portals are the gateway to the wealth of content on the Web.”
The Merrill Lynch report and the InfoWorld Front Page article triggered a flurry of articles in other publications. Software companies in these markets scrambled to refocus their software
development plans to deliver products for the new emerging market that had been identified.
Enterprise Information Portal Directions
The market potential had been identified, the software vendors had begun to develop products, but there was no clear definition of the EIP market apart from general directions in the Merrill Lynch
report. And there was no technical guidance that would help software vendors and their enterprise customers to build these Enterprise Information Portals.
The report also affected ourselves: your authors. We had been writing a book on Data Warehousing. Our purpose was publish a book that would to help enterprises move their Data Warehouses and Data
Marts to the Internet, Intranet and Extranet. We felt that this would provide benefit to the enterprises, their employees, customers, suppliers and business partners.
This was a difficult task to do, as another author who we respected had found. Richard Hackathorn had published “Web Farming for the Data Warehouse” [Hackathorn 1999]. He was writing
this around the time when the groundswell of support for XML had begun to build up following its acceptance as a recommended standard by the W3C Committee in February 1998 [W3C].
As discussed earlier in this chapter, XML is an technology that enables many applications and databases to overcome the great constraints of legacy systems and databases that had evolved as
redundant data versions. We saw it also an an important component to move Data Warehouses and Data Marts to the Web. The Merrill Lynch report identified the market potential that justified what
was, until then, just a “gut feel” for us. It highlighted a glaring omission; the absence of clear technical direction on how to build for this new environment. As authors, we do not pretend to
have all of the answers. But this is our field; having built many Data Warehouses, Data Marts, Web Sites and Electronic Commerce applications as consultants, instructors and webmasters over many
years.
We will share our knowledge with you in this book. The three of us, together, will discuss problems and solutions. And there will be others after us who will add more, based on their experience.
They will also write, or consult or teach: this new discipline will further evolve – that is the nature of the Information Technology industry.
Enterprise Portal Terminology
A number of terms have emerged along with the growing interest in Enterprise Information Portals. Internet Content Portals such as NetCenter (Netscape), MyYahoo (Yahoo), MSN (Microsoft) and AOL
became popular in 1998 as a central point that could be visited by millions on the Internet – as a gateway or jumping-off point to other locations on the World Wide Web. Some of these are content
providers; others are search engines. The terminology differs, but we feel a general term describing all of these is “Internet Portal”. This is the term we will use in this book for
reference to WWW consumer portals.
In the many articles that have appeared since publication of the Merrill Lynch report and the InfoWorld article, the terms “Enterprise Information Portal” (EIP), “Corporate Portal” (CP) and
“Enterprise Portal” (EP) have been variously used.
“Enterprise Information Portal”, being the first used, is the obvious term. But we find many articles are using “Enterprise Portal” and “Corporate Portal” as equivalent
terms to refer to an EIP. This is a new field and the terminology has not settled yet. So we will use all three terms interchangeably in this book to refer to portals for all enterprises: large
Corporations; Small or Medium Enterprises (SMEs); Federal, State or Local Government departments; and Defense departments.
Enterprise Portal Concepts
We will introduce some of the basic concepts of an Enterprise Portal in this section, with related concepts covered later in this chapter. The remainder of the book will progressively introduce you
to the concepts and methods that can be used to build Enterprise Portals.
In Part I: Enterprise Portal Design, there is a great parallel with Data Warehouse design. Part II: Enterprise Portal Development also parallels Data Warehouse development. Our
focus in these two parts is therefore mainly on Data Warehouses and Data Marts.
In Part III: Enterprise Portal Deployment we cover XML in Chapter 11. XML is an enabling technology that offers great benefit for Business Reengineering and Systems Reengineering. These
are covered in Chapters 12 and 13. Many enterprises are struggling to move out from under the weight of legacy systems and processes that are not appropriate or responsive enough for the
Information Age. Enterprise Portals and XML will enable these enterprises to transform themselves more effectively, without first having to throw all those legacy systems away and develop new
systems at great cost. Chapter 14 addresses quality in these transformed enterprises.
Finally, in Chapter 15 we will return to discuss the central role of Enterprise Portals, summarizing the main points from the book.
The main concepts of Enterprise Portals are illustrated in Figure 1.3, from the InfoWorld article on the [InfoWorld Electric] web site. The focus of Data Warehouses is Structured Data,
shown in the top part of Figure 1.3. Source data is drawn from online transactional databases such as ERP applications, legacy files or other relational databases. Source data may
also be point of sale data. This source data is first extracted, transformed and loaded by ETL and data quality tools into Relational OLAP databases and/or the Data Warehouse.
Data marts take subject area subsets from the Data Warehouse for query and reporting. Analytical applications carry out OLAP analysis using OLAP tools. Business
Intelligence tools also provide analytical processing, such as EIS and DSS products. Data mining tools are used to drill down and analyze data in the warehouse. Warehouse management
operates to manage the ETL and data quality stage, the Relational OLAP databases and Data Warehouse and the analytical applications.
The bottom part of Figure 1.3 lists Unstructured Data sources that are used by Enterprise Portals. In Chapter 11 we see how XML can use meta data tags to integrate unstructured data
sources with the Structured Data sources above. These unstructured data sources are managed by a Content Management Repository as Content Management Applications and
Database.
Figure 1.3
Enterprise Portal Concepts. Source: [InfoWorld Electric] Web Site
While they are conceptual in Figure 1.3, we will see these referenced as XML databases later in the book. Enterprise Portals extend Data
Warehouses to the Intranet and Internet. But unlike Data Warehouses which are data-driven, Enterprise Portals are also process-driven. They enable organizations to change their business processes
and workflow practices in dramatic ways. We introduce some of these ways when we discuss reengineering in Chapters 12 and 13. We cover many more changes and opportunities in Chapter 15.
[Aiken 1997] Aiken P, “Some (Incomplete) Thoughts on the Role of Maintenance in Systems Engineering”, in Joint Logistics Commanders/Joint Group on Systems Engineering Workshop, Functional
Working Group on Systems Engineering Life-Cycle Process. July 28-August 1, 1997. San Diego, CA.
[Aiken 1999] Aiken P, Ngwenyama O and Broome L, “Reverse Engineering New Systems”, IEEE Software, (March/April 1999).
[Appleton 1984] Appleton D, “Business Rules: The Missing Link”, Datamation, (October 1984), 30(16):145-150.
[Drucker 1974] Drucker, P F, “Management: Tasks, Responsibilities, Practices”, (1974), Harper & Row: New York, NY.
[Finkelstein 1981a] Finkelstein C, “Information Engineering”, a series of six InDepth articles published in US Computerworld, (May-June, 1981), IDG Communications, Framingham: MA
[Finkelstein 1981b] Finkelstein C and Martin J, “Information Engineering”, two volume Technical Report, Savant Institute, (Nov, 1981), Carnforth, Lancs:UK
[Finkelstein 1989] Finkelstein C, “An Introduction to Information Engineering”, Addison-Wesley, (1989), Sydney, Australia
[Finkelstein 1992] Finkelstein C, “Information Engineering: Strategic Systems Development”, Addison-Wesley, (1992), Sydney, Australia
[Hackathorn 1999] Hackathorn, Richard (1999), “Web Farming for the Data Warehouse”, Morgan Kaufman, ISBN: 1-55860-503-7. Includes use of XML for data sources from Internet (368 pages).
[Hammer, 1990] Michael Hammer, “Reengineering Work: Don’t Automate, Obliterate”, Harvard Business Review, Cambridge: MA (Jul-Aug 1990).
[Hammer and Champy 1993] Hammer M and Champy J, “Reengineering the Corporation: A Manifesto for Business Revolution”, Harper Business, (1993).
[InfoWorld Electric] InfoWorld Electric Web Site – http://www.infoworld.com/cgi-bin/displayStory.pl?/features/990125eip.htm
[Inmon 1993] Inmon B, “Data Architecture: The Information Paradigm”, (1993) QED Technical Publishing Group.
[ISO] ISO 11179:195-1996 “Information Technology – Specification and Standardization of Data Elements”.
[Martin 1986] Martin J, “Information Engineering”, Prentice Hall, (1986), Englewood Cliffs: NY
[Moriarty 1992] Moriarty T, “Migrating the Legacy: as the Industry Migrates to the PC, don’t give up your Mainframe Products yet”, Database Programming & Design, Dec 1992, 5(12):73(2).
[Rechtin 1996] Rechtin E and Maier M, “The Art of Systems Architecting”, (November 1996) CRC Press; ISBN: 0849378362.
[Sage 1977] Sage A, “Introduction to Systems Engineering: Methodology and Applications – Part 1”, IEEE Transactions on Systems, Man, and Cybernetics July 1977, SMC-7(7):499-504.
[SageMaker] SageMaker web site – http://www.sagemaker.com/company/lynch.htm
[Smith 1910] Smith A, “The Wealth of Nations”, London: Dent (1910)
[WebFarming] Web Farming web site – http://www.webfarming.com/
[W3C] WWW Consortium: XML, XSL and XLL specifications – http://www.w3.org/
[Zachman 1987] Zachman J.A, “A Framework for Information Systems Architecture”, IBM Systems Journal 26(3):276-292 IBM Publication G321-5298.
[Zachman 1991] Zachman J.A, “Zachman Framework Extensions: An Update”, Data Base Newsletter July/August 1991 19(4):1-16.
[Zachman 1992] Zachman J. A and Sowa J.F, “Extending and Formalizing the Framework for Information Systems Architecture”, IBM Systems Journal (1992), Vol. 31, No. 3, IBM Publication G321-5488.
[Zachman 1996] Zachman J. A, “Concepts of the Framework for Enterprise Architecture”, (1996) Zachman International, Los Angeles: CA
This paper is an extract from the book: “Building Corporate Portals with XML”, by Clive Finkelstein and Peter Aiken, published by McGraw-Hill in September 1999 [ISBN: 0-07-913705-9]. The paper
addresses one of the most significant developments of the Computer industry for the future. It shows how Meta data and Data Administration will shortly move into the mainstream and become one of
the most important aspects of the WWW, and of systems development in general. The paper introduces the Extensible Markup Language (XML) – the successor to HTML for the Internet, for corporate
Intranets and for Extranets. XML incorporates Meta data in any document, to define the content and structure of that document and any associated (or linked) resources. It has the potential to
transform integration of structured data (such as in relational databases or legacy files) with unstructured data (such as in text documents, reports, email, graphics, images, audio and video
files) for innovative application integration opportunities.
Further information about the book and additional extracts can be read online from the Online Store at http://bne002i.webcentral.com.au/catalogue/visible/default.shtml. Click the Read Extract link
below the image of the book front cover on the Home page. The book can be ordered directly from Amazon.com by clicking the Order Book link.
© Copyright 1999 The McGraw-Hill Companies, Inc. All Rights Reserved.