Published in TDAN.com April 2003
  Abstract
  Business Intelligence (BI) is vital to the decision-making process within a company. Enterprises need to analyze ever-increasing amounts of data on a regular basis to better understand all aspects
  of their business. Existing BI infrastructures are unable to handle the sheer quantity of data being stored, accessed and analyzed. Appliances are devices that are designed for high performance,
  efficiency and ease of use. The concept of a database appliance has been debated in the past, but has gained new relevance for today’s BI activities in an environment of explosive data
  growth. Given the wealth of third-party applications, maturity of DBMS’s, inexpensive storage and processing power, the time for the application of tera-scale data appliances in BI is now.
  Introduction
  In the current age of data warehousing, Business Intelligence (BI) can make or break a company. The timely processing and retrieval of vast amounts of data is vital to the decision-making process.
  However, just as important as timeliness is the depth of data analysis possible. With the growing size of the average data warehouse, achieving these goals has become increasingly difficult;
  already, terabyte-sized data warehouses are fairly common.
  According to Greg’s Law, data is estimated to double on average every nine months. Vendors have thus far handled this rapid growth in database size with very expensive and consistent
  upgrading of hardware and software, but over the past few years it has become clear that existing infrastructures are unable to effectively handle the demands of in-depth analysis on large amounts
  of data. Furthermore, the Internet has brought a greater level of user access to databases. As demand for access and analysis continues to grow, users relying on general-purpose hardware and
  software will have to search for solutions specifically designed to address this problem.
  The challenge is to provide a purpose-built solution for the problem that is both specific and flexible. That is, it must be suited to the task of handling vast amounts of data and yet be
  compatible with the customer’s existing BI applications and infrastructure. Furthermore, such a solution should be relatively simple to put into place, in comparison with the highly complex
  (from a database administration point of view) systems currently available. Such systems are purpose-built appliances—expandable, affordable and uniquely suited to the ever-growing needs of
  users in terms of speed and sophistication of data analysis.
  The Current State of Business Intelligence
The current BI infrastructure is a patchwork of hardware, software and storage that is growing ever more complex. Consider a typical BI solution:
- The Database Management System (DBMS) was initially architected for transaction processing, holding several hundred megabytes worth of records with a few internal users;
 - The DBMS has been improved incrementally over the years to support terabyte-sized databases, Internet-scale users and the evolving SQL definition;
 - The hardware/operating system is a clustered set of generic boxes that are optimized for everything from mathematical queries to genome investigation; and,
 - The system is attached to generic file systems that manage and serve data for a variety of applications.
 
  Some systems are optimized for performance, but these optimizations have been performed in stages over time, and the underlying architecture has remained general in nature. Several Database
  Administration (DBA) and DBMS packages have been put in place; Symmetric Multiprocessing (SMP) servers and disk arrays from a variety of vendors serve the data; and an even larger selection of
  client applications are placed on top of this warehouse behemoth. For example, a company may use an Oracle DMBS, an HP server, and a storage solution from EMC, and as their system grows, they may
  add Hitachi storage and a second server. With these types of systems, data and user applications have to be continuously tuned and optimized.
  Tera-scale databases that continue to grow steadily put tremendous strain on these systems. In addition, the queries run against the database grow more complex. Sophisticated analytical methods
  require complex queries and models; for example, Web log and customer segmentation analyses are taxing current database systems. The problem here is two-fold: first, the complex queries strain the
  system and slow the other queries being run. Second, if a business user is unable to get results in real-time, he is unlikely to try another query of equal or greater complexity. Therefore, the
  process behind obtaining useful information quickly becomes impaired.
  Even in cases where the user base and data set are relatively stable, current BI systems often fail to meet their basic goal of delivering vital business information so that timely decisions may be
  made. From an administration standpoint, this current ‘patchwork’ of solutions is a nightmare. From the point of view of the business user, it is frustrating and does not provide the
  agility and performance the users are looking for. These strains occur because vendors have upgraded these systems incrementally over the years rather than change the underlying architecture to
  address the unique requirements of today’s tera-scale databases.
  The issues with current BI architectures are evident across a broad range of companies and industries. While the system strain will become worse in the next few years, these problems exist today
  and are plaguing both business users and database administrators. Patchwork solutions can only hold together for so long; as database growth continues its exponential rise, the weak points in
  current systems are only going to become more aggravated. A new solution must be engineered now—that solution is a tera-scale data appliance that is purpose-built for BI.
  Brief History
  The concept of a tera-scale data appliance is rooted in decades of academic and industry discussion about database appliances, or machines. A brief review of this history is helpful in
  understanding the evolution of the appliance, and its challenges over time.
Evolution of the Database Appliance (Machine)
  In 1983, Haran Boral and David J. DeWitt wrote a paper hailing the end of the era of database machines (Boral and DeWitt, 1983). Almost twenty years ago, progress in the development of database
  machines was halted due to the problem of I/O bottleneck: the rate at which I/O speeds grow is minimal compared with the rate of growth of CPU speeds dictated by Moore’s Law. This lag in I/O
  speed versus CPU speed continues to this day. The database machines of the early 1980s found a solution to this problem by using custom hardware (memory and disks) that promised greater reliability
  and speed. However, the authors claimed that database machines were doomed because they were built around specialized hardware, which was expensive and difficult to maintain and integrate.
  Ten years later, DeWitt, one of the authors of the original paper, published a second paper with Jim Gray claiming that parallel database systems were the future of database technology (DeWitt and
  Gray, 1992). Making reference to such successful database machine ventures as Teradata and Tandem, Dewitt pointed out that a system based on ‘conventional shared-nothing hardware,’
  rather than specialized hardware, has the potential to be more robust and yield a higher level of performance.
  In 1995, Kjell Bratbergsengen of the Norwegian Institute of Technology released a paper charting the rocky history of the database machine (Bratbergsengen, 1995). The author discussed previous
  attempts, including intelligent secondary storage devices, filters, associative memory systems, multiprocessor database computers and text processors. He claimed the area of most promise was in
  multiprocessor machines—and in fact, this is the direction most research and commercial ventures have taken.
Examples of database appliances/machines from both academic and commercial sectors include:
- GAMMA (Research/education)
 - Clustra DataCenter Appliance
 - Network Appliance™ DAFS Database Accelerator
 - IBM SP2
 
  One of the challenges of the early attempts was the lack of powerful off-the-shelf components, including disk storage and memory, which would make a database appliance affordable. In addition,
  massively parallel processors were still at an early stage, and not powerful enough to handle tera-scale databases. Shared-nothing architectures were not sufficiently developed to help alleviate
  the disk I/O bottleneck. At the time that Bratbergsengen’s paper was written, disks were just reaching several gigabytes in size, and the transfer rates were only on the order of 3MB/sec.
  Today, although disk I/O is still a bottleneck for all systems, intelligent architectures have found ways of circumventing this problem.
  The Appliance
What is an Appliance?
  Webster defines an appliance as “an instrument or device designed for a particular use.” An appliance is an opaque drop-in solution that provides interfaces for all manner of tools
  without disturbing the inner workings of the appliance. An appliance is built with the end user’s point of view at the forefront of the design process. Appliances in any market come about as
  a result of the maturity of the technology. Introducing the appliance to a market is a logical next step, because appliances are efficient and affordable.
  There are simple appliances we now take for granted, like the toaster: the consumer of toast is going to want a toasted piece of bread as quickly as possible, toasted to the level of his choice.
  Thus, the user is given a place to put the bread as well as a knob to control the level of toasting. Most important, a toaster can be put in place with an absolutely minimal amount of
  ‘configuration.’
  Appliances in the computer world are so common that we often forget about them. It is an integrated box that can retrieve information at the request of external applications and keeps its inner
  workings hidden in order to maintain simplicity and ease of use. Take, for example, the network router. The majority of routers can be put in place with almost no configuration (other than setting
  the router’s IP address) and will start storing and forwarding packets. Hubs and switches are even simpler. The point is that these devices aid network transport greatly, yet are essentially
  transparent to the user.
  Another example is the video streaming appliance. These devices are put in place to enhance video stream quality and the speed of video delivery. The device is transparent to the end user, but the
  performance boost is evident. Most important, from an administration standpoint, the device is simple to install and configure, and requires a low level of maintenance.
These devices were developed to address particular problems and marketed as elegant solutions. Examples of appliances in the computer world include:
- Network Appliance™ (Filers, Storage, Cache)
 - Streaming Media (NetApp, Inktomi/HP, SeaChange, Avid)
 - Server Appliances (Sun Cobalt)
 - Information Appliances (Google Search Appliance)
 
  The concepts presented by these appliances were extended to the world of databases, and development of a database machine began. Academic and commercial research sought out solutions to the
  problems facing databases and proposed machines to handle these issues.
  Simplicity is the name of the game—we would not expect our toaster owner (‘administrator’) to have to open up the toaster and tinker with it to add extra slots for toast or to
  make the toast crispier. Likewise, why should database administrators be required to fine-tune the database system as the requirements increase? Appliances make our lives simpler. Why can’t
  this analogy be carried into the database world?
  The Case for a Tera-Scale Data Appliance for Business Intelligence
  Applied to BI, a tera-scale data appliance is a purpose-built machine capable of retrieving valuable decision-aiding intelligence from terabytes of data on the order of seconds or minutes as
  opposed to hours or days. Appliances represent the difference between making a decision using stale data and making a decision with the freshest information possible. Tera-scale data appliances are
  engineered for the purpose of delivering results while the results are still relevant.
A tera-scale data appliance that is purpose-built for BI is:
- Optimized for maximum performance
 - Scalable
 - Reliable
 - Easy to use
 
Optimization. Optimization affects both the storage and retrieval of data. A data appliance is engineered to deliver intelligence quickly and efficiently, no matter
  the database size. The appliance also allows for real-time updates to data, eliminating the delivery of stale data to the end user. The most important factors in BI are the timeliness and freshness
  of the results; they should be returned in a useful time frame, allowing a company to maximize their options. The appliance provides the real-time updates and retrievals critical to BI; such
  optimizations are done automatically by the appliance, without heavy DBA involvement.
Scalability. A tera-scale data appliance should be truly scalable. That is, the addition of extra storage to accommodate a larger data warehouse should not
  adversely affect performance. Specifically, the business users running queries against the data should not feel the effects of the growth. In order to accomplish this, the major bottleneck points
  must be distributed in the system rather than placed centrally. For large data transfers, bottlenecks are internal network speed and disk transfer speed; for complex queries, the bottleneck is
  often the CPU. An ideal data appliance should be able to scale to support a multi-terabyte-sized database without major performance degradation.
Reliability. Reliability is critical. One level of reliability comes from the inherent abstraction of an appliance. By keeping the inner workings from being
  modified by the users or administrators, the potential for failure decreases. Another level of reliability is provided by the homogeneous nature of an appliance; all parts of the system come from
  one vendor. The customer does not have to integrate disk arrays, operating systems, and database software, hoping that they will all work together flawlessly. Reliability increases as the number of
  vendors decreases, and multiple general-purpose offerings are replaced with a single solution.
Ease of Use. Obviously, we cannot do away with DBA entirely, as a certain level of management is necessary in order to maintain database integrity and performance.
  However, we can make the database system administrator’s job much easier, specifically in the area of end user software compatibility. By making the appliance compatible with all common
  database standards (ODBC, etc.) and placing it through rigorous testing, the appliance manufacturer can ensure that applications can interoperate with the appliance. Thus, the ongoing support
  issues can be minimized.
Why Now?
Given the long history of database development and the existence of previous attempts at database appliances/machines, why is now the time for a tera-scale data appliance in BI?
  There are several reasons that the appliance is now possible, but the most important of these is the maturity of database technology. The database standards have been set, and this allows the
  system to be built completely around the desires and needs of the end user. Furthermore, the concept of a relational database is well defined and the users are experienced and eager to run
  increasingly complex queries. A wide variety of sophisticated applications and tools with standard interfaces allow widespread access to the database. And, as noted earlier, terabyte-sized
  databases, an influx of users and a demand for complex queries have placed unprecedented strain on the existing patchwork infrastructure.
  Users of BI and data warehousing, therefore, need a system that yields high performance, both in speed and storage. High powered specialized hardware drove the database machines of the past, but
  now there is a need for better performance at a lower cost. The power of current technology is great enough that commercial, off-the-shelf components, which are dropping in price, can be used to
  construct a tera-scale data appliance. This appliance can provide valuable BI at a fraction of the cost of current industry database systems.
What is Today’s Tera-Scale Data Appliance for BI?
  People often associate appliances with simplicity, and databases by nature are not simple. The high-performance, tera-scale data appliance, however, is not a simple tool mechanically; rather, it
  makes BI more useful to the end user. The appliance starts over from the beginning, addressing the problems and concerns of the end user and the issues raised by the growing size of databases. The
  tera-scale data appliance is clean, efficient, expandable and powerful.
  A tera-scale data appliance integrates the hardware, DBMS and storage into one opaque device. It combines the best elements of SMP and Massively Parallel Processing (MPP) architectures into a new
  architecture to allow a query to be processed in the most optimized way possible. It is architected to remove all the bottlenecks to data flow so that the only remaining limit is the disk
  speed—a ‘data flow’ architecture where data moves at ‘streaming’ speeds. Through standard interfaces, it is fully compatible with existing BI applications, tools and
  data. And it is extremely simple to use.
  How Businesses Benefit from a Tera-Scale Data Appliance for Business Intelligence
  A tera-scale data appliance for BI provides speed for the business user. The time of waiting hours or days for queries to finish is past. Patience may be a virtue, but when it comes to BI, decision
  makers need results now. The size of the average data warehouse is increasing and showing no signs of slowing down, and with this increased store of knowledge comes an increased demand for BI.
  Businesses should not need to discard customer data from two months ago because their database slows to a crawl when the data is kept.
  A tera-scale data appliance for BI provides freedom to the business user. Right now, users are limited in the queries they can run because of the time required to run them. Thus, users end up
  running the same set of queries against the database. With the time required to run a complex query reduced to seconds, users can not only run their old queries more often, but they have the time
  to devise and run whole new sets of queries.
  A tera-scale data appliance provides simplicity for the administrator. The integrated nature of an appliance means that the time typically spent troubleshooting a complex database system can be
  spent in more productive endeavors. The effort is not to simplify a complex system, but rather to remove the appearance of being complex, by abstracting away the mechanical details. The end result
  is the removal of legacy systems and piecemeal components.
  A tera-scale data appliance provides ease of database growth. The inherent scalability in a modified-MPP architecture stems from the modularity of the nodes. Ideally, we want a database with
  linear scaleup (DeWitt and Gray, 1992); that is, with n times the hardware, we should be able to handle a task n times as large in the same amount of time. The tera-scale data appliance
  provides us just that flexibility.
  A tera-scale data appliance provides the lowest total cost of ownership. Being purpose-built means that it is constructed from commodity hardware, eliminating the overhead of special purpose
  hardware. The appliance has one source, one vendor, and therefore the costs associated with support are reduced. With existing technologies, the process of data growth typically incurs costs;
  hardware must be added and ongoing maintenance must be performed. The tera-scale data appliance reduces these costs with inexpensive yet powerful hardware from one source.
  With the simple, efficient solution provided by a tera-scale data appliance for BI, businesses will run more efficiently. Results will be returned within seconds or minutes—orders of
  magnitude faster than with current architectures. Businesses today demand rapid response times to generate rapid results.
  Conclusions
  The success of decision-making in a company relies on Business Intelligence. BI, in turn, relies on the underlying database architecture. Current database architectures are patchwork systems, built
  in pieces and not optimized for delivering timely results. The maturity and stability of the relational database, paired with the power of consumer computer components, allows for a breaking down
  of the database system. Starting with a clean slate, the next generation database system should be engineered with the end user in mind. The system should be clean, scalable and enable optimized
  BI. A new generation of tera-scale data appliances holds promise for companies that depend on Business Intelligence.
  References
  Boral, H. and D.J. DeWitt. ‘Database Machines: An Idea Whose Time Has Passed?—A Critique of the Future of Database Machines,’ Proceedings of the 1983 Workshop on Database
  Machines, (Springer-Verlag), (1983), 166-187.
Bratbergsengen, K. ‘Parallel Database Machines,’ Rivista di Informatica, Vol.XXV, n.4, (ottobre-dicembre 1995).
DeWitt, D. J. and J. Gray. ‘Parallel Database Systems: The Future of High Performance Database Processing,’ ACM Communications, vol. 35(6), (June 1992), 85-98.
  DeWitt, D., R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, M. Muralikrishna. ‘GAMMA—A High Performance Dataflow Database Machine,’ Proceedings of the 1986 VLDB
  Conference, Japan, (August 1986), 228-237.
  DeWitt D. J., S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. Hsiao, R. Rasmussen. ‘The Gamma Database Machine Project,’ IEEE Knowledge and Data Engineering, Vol.
  2, No. 1, (March 1990), 44-62.
  Sood, A.K. and A.H. Qureshi (Eds) ‘Database Machines, Modern Trends and Applications,’ NATO ASI Series F: Computers and Systems Sciences, Vol. 24. Springer-Verlag
  (1986).
Stonebraker, M., ‘The Case for Shared Nothing,’ Database Engineering, Vol. 9, No. 1, (1986), 4-9.
