Data Warehouse:

Introduction

The classic definition of a Data Warehouse is an architecture used to maintain critical historical data that has been extracted from operational data storage and transformed into formats accessible
to the organization’s analytical community. The creation, implementation and maintenance of a data warehouse requires the active participation of a large cast of characters, each with his or her
own set of skills, but all functioning as a series of teams within a large team.

An identification of the major roles and responsibilities for managing a data warehouse environment would normally include these functions. They are presented in the general order in which they
would participate in the warehouse development and implementation.

Business Sponsorship: This is the primary officer responsible for the targeted business area. He or she should have prior experience in systems development efforts, to
understand the issues that will confront the warehouse team, and the managerial authority over some of the business team members. He or she will own the data stored in the data warehouse for the
targeted business area. As each business area is included in the warehouse, the targeted area’s business sponsor will own the portion of the data for their area. The business sponsor champions the
data warehouse effort, helps communicate the value of data warehousing to the business community, and serves as project owner for that business area’s phase of the warehouse development. The
business sponsor shares funding responsibility for the inclusion of the targeted business phase’s data into the warehouse.

Systems Sponsorship: This is the person with ultimate responsibility for all systems warehouse activities such as those listed in the roles and responsibilities chart, and
should have managerial authority over the systems staff. The systems sponsor shares funding responsibility for the warehouse with the targeted business area sponsor. The systems sponsor will own
and control the operational aspects of the warehouse, such as software, hardware. Usually, the systems sponsor is the organization’s Chief Information Officer.

Facilitation Coordination: Most of the development of the data warehouse will be performed by teams. The use of a skilled facilitator will enable the group to properly
structure and conduct the meetings to gather the information and maintain the focus needed to achieve the goals within the project’s deadlines. The facilitator should be engaged at the start of
the warehouse development process and should be a member of the development team through the modification of the chosen development methodology. The facilitator’s skills will be needed as each new
area is incorporated into the warehouse, as much of the development effort will be repeated for each business area.

Education, Training and Documentation: With complex technologies, educating users and systems staff members is essential to a successful implementation. Educators should
have a deep understanding of warehouse concepts and technological solutions. Strong interpersonal and communication talents are also important. Educators should develop the materials they will use
in these sessions. Some areas for education should include the business case for a data warehouse, the differences between developing operational systems and warehouses, the need for strong Data
Administration (also called Information Resource Management) in building and maintaining a data warehouse.

There will be several layers of training during the warehouse development process. The initial training will be for those assigned to integrate the various infrastructure tools within the pilot
warehouse. This training will be oriented primarily to systems staff and business members of the development team. Later, training in the use of the warehouse, in analysis skills and in the tools
given to the warehouse user community will involve both systems and business staff. Initial training should be conducted by skilled trainers, with later user training possibly taught by experienced
warehouse users from the organization.

Data Administration: The goals of Data Administration (also called Information Resource Management – IRM) include minimizing redundant operational data and structuring data
to serve the organization’s overall purpose in capturing that data. These functions are essential in successfully creating and maintaining a data warehouse. Before serious warehouse development
effort can begin business and systems staff members must develop and publish naming standards, data element mapping procedures and guidelines for database creation. Without solid definitions of the
data, users will not have confidence in the information they are accessing from the warehouse, and systems staff cannot readily identify the data needed from the operational systems to populate the
warehouse. The organization should have a commitment to active data administration with a small group of employees overseeing the IRM efforts, especially if contractors are used in the data
warehouse development and maintenance.

Database Architecture: The database architect has primary administrative responsibility for the actual physical design of the warehouse environment. He or she participates
in the modeling activities as the representative of the physical implementation of the model’s entities. He or she oversees the creation of the database tables and the maintenance of the
warehouse’s physical environment and monitors the changes made to the environment by more junior database analysts. The database architect’s strength lies in his or her ability to develop the
vision of the physical view of the warehouse.

Repository Administration: The purpose of a repository is to provide a single point of access and control for an enterprise’s meta data. Meta data indicates where the data
comes from, how it should be translated or transformed, its form and functions. Data models and processes can be stored in a repository, giving a central location for business and systems data
gathered during the development effort. Managing a repository for a warehouse environment requires the skills of both a data (or IRM) administrator and a database administrator (not one person). A
Repository Administrator serves as the liaison between the technical and user communities for the operational and warehouse meta data, manages the integration of the logical models of the
operational and warehouse systems and particiaptes on the standards development team.

Business Area Requirements Analysis: The goal of a data warehouse business area analysis is to provide understanding of the analytical processes and data needed for business
examination. Representatives from the business area and Information Resource Management will convene to identify their requirements for a data warehouse, which will lead to the creation of a
warehouse data model. The requirements gathering phase should answer the questions: “What data do we want analytical access to? What processes will this data come from? What critical decisions
will this data support?” The facilitator’s function is to ensure that all the proper questions have been identified and answered satisfactorily, and that the meetings adhere to a stated agenda to
conserve time and staff resources.

Data Modeling: Modeling for operational systems and modeling for informational systems use the same techniques but result in two different types of models: a.) a detailed,
transaction processing representation of the operational business requirements; b.) a less detailed, more summarized analytical processing representation of the informational business requirements.
Both are necessary for the proper, cohesive development of systems, and should be conducted as part of any systems development or enhancement effort. The informational data model should support the
targeted business area’s analytical requirements, and be based on the area’s operational data model. A team of users builds the models, and then validates them by mapping data from the
operational models to the model of the warehouse. A well-designed model, either operational or informational, should: a.) answer business users’ questions, b.) map business processes to customer
data needs, c.) recognize the similarity and differences in data stored in various operational systems and d.) extract only the pertinent data. Participants in this exercise should have some
modeling training, and should be led by an experienced modeler who is part of the data administration staff.

Selection of Infrastructure Tools: Infrastructure can be described as the platforms, databases and front-end access tools necessary for a functioning warehouse environment.
Their selection is of paramount importance, since many subsequent decisions will be based on the tools selected in this phase. Both business and systems staff members as well as consultants should
be on the selection committee, since usability and compatibility may be competing needs and must be adequately addressed for a successful warehouse implementation. This team should be led by a
systems architect who will be ultimately responsible for procuring and integrating the various components.

Methodology Modification: Generally, a methodology guides complex development efforts by providing a map of tasks and roles. No methodology is complete for all types of
development and no methodology takes an individual enterprise’s characteristics into account. Therefore, the chosen methodology must be modified to encompass these developmental and organizational
needs. A team of methodologists, business and systems staff will be assembled to customize the methodology.

Physical Development of Warehouse Tables: When the modeling tasks are complete and the DBMS and hardware platforms have been delivered, the creation of the physical
warehouse tables can begin with the database analysts and database architect. There will be some interaction between the modeling team and the database administration team to address the recurring
question of normalized-versus-denormalized data. Performance and data accessibility are often competing ideals and the relative merits of each position must be weighed. The database architect and
the modeling team will resolve most outstanding issues between the logical and physical representations of the data, and the questions of “star schema, snowflakes, et al” can be addressed here.
Actual creation and maintenance of the physical tables and their associated structures such as indices can be performed by database adminsitrators and database analysts.

Creation of Extraction and Transformation Programs: The actual creation of a data warehouse entails the extraction of specific data elements from various systems, databases
and files into a relational database. Before being loaded into this database, some form of data transformation or integration will be required. This may involve changing the formats of certain
elements to conform to a standard, making all iterations of an element have the same data type and size or altering the algorithm that summarizes certain elements. These programs for transforming
and extracting data can be developed, or they can be purchased from vendors and customized for the particular environment.

Warehouse Physical Access, Cross-Environment Connectivity and Security: The issues of access, connectivity and security will be addressed at first during the infrastructure
tools selection process. Later, during database design and programming, further identification of issues and their resolutions should occur.

Creation of complex queries for warehouse users: Even though users will begin to access the warehouse data through the tools provided, some analysis may require the creation
of complex queries against the database. These queries should be written by an experienced SQL applications programmer, to maximize the performance of the warehouse’s database. A series of
expected complex queries should be developed before the warehouse is implemented, so users can perform their analysis independently and with little delay. A process must be created to ensure that
all queries created for use within the warehouse are saved for reuse, and are available for modification.

Testing components integration, data loading and query execution: Any development effort is not complete until it has been thoroughly tested. The testing for the data
warehouse should include developing test cases, scenarios and scripts to ensure the quality of the application; verification of the usability of the tools and the proper execution of all functions
of the warehouse. Testers should have a quality assurance background and use accepted testing methods. Some understanding of the infrastructure’s integration is expected, so the test team can
identify all possible problem areas and test the interfaces and application usages. Members of the test teams should be certain that no test issue is overlooked.

Warehouse Administration: As with any complex system, a data warehouse will need a warehouse administrator. He or she should be familiar with both the logical and physical
models of the warehouse, and will co-ordinate the scheduling of warehouse refreshment processes, manage the creation and reuse of queries written for the warehouse, serve as liaison to the
warehouse with the repository administrator, and act as a co-ordinator during the integration of business areas into the warehouse environment.

Warehouse production support for databases and users: All production environments require continual support, such as help desk activities for users and systems support for
the hardware and applications. Hardware and software upgrades would also be included in the production support team’s responsibilities.

Conclusion

The roles and responsibilities in a complex systems development and implementation process such as a data warehouse can be generally identified, but refinement and assignment of these roles will
continue over the life of the project. Most work will occur in teams, with the active participation of consultants and contract personnel to enhance the staff capabilities within the organization.
Where teams consist of both company employees and contract personnel for systems and business areas, there should be one person identified as the team member with primary responsibility for the
tasks assigned to that team.

Share this post

Anne Marie Smith, Ph.D.

Anne Marie Smith, Ph.D.

Anne Marie Smith, Ph.D., is an acclaimed data management professional, consultant, author and speaker in the fields of enterprise information management, data stewardship and governance, data warehousing, data modeling, project management, business requirements management, IS strategic planning and metadata management. She holds a doctorate in Management Information Systems, and is a certified data management professional (CDMP), a certified business intelligence professional (CBIP), and holds several insurance certifications.

Anne Marie has served on the board of directors of DAMA International and on the board of the Insurance Data Management Association.  She is a member of the MIS faculty of Northcentral University and has taught at several universities. As a thought leader, Anne Marie writes frequently for data / information management publications on a variety of data-oriented topics.  She can be reached through her website at http://www.alabamayankeesystems.com and through her LinkedIn profile at http://www.linkedin.com/in/annemariesmith.

scroll to top