Getting Down to Data Basics

Published in TDAN.com October 2002

Some daring young enterprises embark on Web-enabled database implementation with nothing more than a bevy of application developers. This approach is sure to fail. If you take nothing else away
from this article, make sure you understand this: every organization that manages data using a database management system (DBMS) requires a database administration group to ensure the effective use
and deployment of the company’s databases.

In short, e-businesses that are brand new to database development need a primer on database design and administration. So, with that in mind, it’s time to get back to data basics.

Data Modeling and Database Design

Novice database developers frequently begin with the quick-and-dirty approach to database implementation. They approach database design from a programming perspective. That is, novices do not have
experience with databases and data requirements gathering, so they attempt to design databases like the flat files they are accustomed to using. This is a major mistake, as anyone using this
approach quickly finds out once the databases and application moves to production.

At a minimum, performance will suffer and data may not be as readily available as required. At worst, data integrity problems may arise rendering the entire application unusable. A relational
database design can not be thrown together quickly by novices.

What is required is a practiced and formal approach to gathering data requirements and modeling that data. This modeling effort requires that the naming entities and data elements follow an
established and standard naming convention. Failure to apply standard names will result in the creation of databases that are difficult to use because no one knows its actual contents.

Data modeling also requires the collection of data types and lengths, domains (valid values), relationships, anticipated cardinality (number of instances), and constraints (mandatory, optional,
unique, etc.). Once collected and the business usage of the data is known, a process called normalization is applied to the data model.

Normalization is an iterative process that I will not cover in detail here. Suffice it to say, the a normalized data model reduces data redundancy and inconsistencies by ensuring that the data
elements are designed appropriately. A series of normalization rules are applied to the entities and data elements, each of which is called a “normal form.” If the data conforms to the first
rule, the data model is said to be in “first normal form,” and so on.

A database design in First Normal Form (1NF) will have no repeating groups and each instance of an entity can be identified by a primary key. For Second Normal Form (2NF), instances of an entity
must not depend on anything other than the primary key for that entity. Third Normal Form (3NF) removes data elements that do not depend on the primary key. If the contents of a group of data
elements can apply to more than a single entity instance, those data elements belong in a separate entity.

There are further levels of normalization that I will not discuss in this column to keep the discussion moving along. For an introductory discussion of normalization visit http://wdvl.com/Authoring/DB/Normalization.

Physical Database Design

But you can not stop after developing a logical data model in 3NF. The logical model must be adapted to a physical database implementation. Contrary to popular belief this is not a simple
transformation of entities to tables. Many other physical design factors must be planned and implemented. These factors include:

  • A relational table is not the same as a file or a data set. The DBA must design and create the physical storage structures to be used by the relational databases to be implemented.
  • The order of columns may need to be different than that specified by the data model based on the functionality of the RDBMS being used. Column order and access may have an impact on database
    logging, locking, and organization. The DBA must understand these issues and transform the logical model appropriately.
  • The logical data model needs to be analyzed to determine which relationships need to be physically implemented using referential integrity (RI). Not all relationships should be defined using RI
    due to processing and performance reasons.
  • Indexes must be designed to ensure optimal performance. To create the proper indexes the DBA must examine the database design in conjunction with the proposed SQL to ensure that database
    queries are supported with the proper indexes.
  • Database security and authorization must be defined for the new database objects and its users.

These are not simple tasks that can be performed by individuals without database design and implementation skills. That is to say, DBAs are required.

The DBA Management Discipline

Database administration must be approached as a management discipline. The term discipline implies planning and implementation, according to that plan. When database administration is treated as a
management discipline, the treatment of data within your organization will improve. It is the difference between being reactive and proactive.

All too frequently the DBA group is overwhelmed by requests and problems. This occurs for many reasons, including understaffing, overcommitment to application development projects, lack of
repeatable processes, lack of budget and so on.

When operating in this manner, the database administrator is being reactive. The reactive DBA functions more like a firefighter. His attention is focused on resolving the biggest problem being
brought to his attention. A proactive DBA can avoid many problems altogether by developing and implementing a strategic blueprint to follow when deploying databases within their organization.

The 17 Skills Required of a DBA

Implementing a DBA function in your organization requires careful thought and planning. The previous sections of this article are just a beginning. The successful eDBA will need to acquire and hone
expertise in the following areas:

  1. Data modeling and database design. The DBA must possess the ability to create an efficient physical database design from a logical data model and application specifications. The
    physical database may not conform to the logical model 100 percent due to physical DBMS features, implementation factors, or performance requirements. If the data resource management discipline
    has not been created, the DBA also must be responsible for creating data modeling, normalization, and conceptual and logical design.
  2. Metadata management and repository usage. The DBA is required to understand the technical data requirements of the organization. But this is not a complete description of his
    duties. Metadata, or data about the data, also must be maintained. The DBA, or sometimes the Data Administrator (DA), must collect, store, manage, and enable the ability to query the
    organization’s metadata. Without metadata, the data stored in databases lacks true meaning.
  3. Database schema creation and management. A DBA must be able to translate a data model or logical database design into an actual physical database implementation and to manage
    that database once it has been implemented.
  4. Procedural skills. Modern databases manage more than merely data. The DBA must possess procedural skills to help design, debug, implement, and maintain stored procedures,
    triggers, and user-defined functions that are stored in the DBMS. For more on this topic check out www.craigsmullins.com/db2procd.htm.
  5. Capacity planning. Because data consumption and usage continues to grow, the DBA must be prepared to support more data, more users, and more connection. The ability to predict
    growth based on application and data usage patterns and to implement the necessary database changes to accommodate the growth is a core capability of the DBA.
  6. Performance management and tuning. Dealing with performance problems is usually the biggest post-implementation nightmare faced by DBAs. As such, the DBA must be able to
    proactively monitor the database environment and to make changes to data structures, SQL, application logic or the DBMS subsystem to optimize performance.
  7. Ensuring availability. Applications and data are more and more required to be up and available 24 hours a day, seven days a week. The DBA must be able to ensure data availability
    using non-disruptive administration tactics. SQL code reviews and walk-throughs. Although application programmer usually write SQL, DBAs are usually blamed for poor performance. Therefore, DBAs
    must possess in-depth SQL knowledge so they can understand and review SQL and host language programs and to recommend changes for optimization.
  8. Backup and recovery. Everyone owns insurance of some type because we want to be prepared in case something bad happens. Implementing robust backup and recovery procedures is the
    insurance policy of the DBA. The DBA must implement an appropriate database backup and recovery strategy based on data volatility and application availability requirements.
  9. Ensuring data integrity. DBAs must be able to design databases so that only accurate and appropriate data is entered and maintained. To do so, the DBA can deploy multiple types
    of database integrity including entity integrity, referential integrity, check constraints, and database triggers. Furthermore, the DBA must ensure the structural integrity of the database.
  10. General database management. The DBA is the central source of database knowledge in the organization. As such he must understand the basic tenets of relational database
    technology and be able to accurately communicate them to others. Data security. The DBA is charged with the responsibility to ensure that only authorized users have access to data. This requires
    the implementation of a rigorous security infrastructure for production and test databases.
  11. General systems management and networking skills. Because once databases are implemented they are accessed throughout the organization and interact with other technologies, the
    DBA must be a jack of all trades. Doing so requires the ability to integrate database administration requirements and tasks with general systems management requirements and tasks (like job
    scheduling, network management, transaction processing, and so on).
  12. ERP and business knowledge. For e-businesses doing Enterprise Resource Planning (ERP) the DBA must understand the requirements of the application users and be able to administer
    their databases to avoid interruption of business. This sounds easy, but most ERP applications (SAP, Peoplesoft, etc.) use databases differently than homegrown applications. So DBAs require an
    understanding of how the ERP packaged applications impact the e-business and how the databases used by those packages differ from traditional relational databases. Some typical differences
    include application-enforced RI, program locking, and the creation of database objects (tables, indexes, etc.) on-the-fly. These differences require different DBA techniques to manage the ERP
    package effectively.
  13. Extensible data type administration. The functionality of modern DBMSes can be extended using user-defined data types. The DBA must understand how these extended data types are
    implemented by the DBMS vendor and be able to implement and administer any extended data types implemented in their databases.
  14. Web-specific technology expertise. For e-businesses, DBAs are required to have knowledge of Internet and Web technologies to enable databases to participate in Web-based
    applications. Examples of this type of technology include HTTP, FTP, XML, CGI, Java, TCP/IP, Web servers, firewalls and SSL. Other DBMS-specific technologies include IBM’s Net.Data for DB2 and
    Oracle Portal (formerly WebDB).
  15. Storage management techniques. The data stored in every database resides on disk somewhere (unless it is stored on one of the new Main Memory DBMS products). The DBA must
    understand the storage hardware and software available for use, and how it interacts with the DBMS being used. Storage technologies include raw devices, RAID, SANs and NAS.


Meeting the Demand

The number of mission-critical Web-based applications that rely on back-end databases is increasing. Established and emerging e-businesses achieve enormous benefits from the Web/database
combination, such as rapid application development, cross-platform deployment and robust, scalable access to data. E-business usage of database technology will continue to grow, and so will the
demand for the eDBA. Make sure your organization is prepared to manage its Web-enabled databases before moving them to production. Or be prepared to encounter plenty of problems.

Share

submit to reddit

About Craig Mullins

Craig S. Mullins is a data management strategist and principal consultant for Mullins Consulting, Inc. He has three decades of experience in the field of database management, including working with DB2 for z/OS since Version 1. Craig is also an IBM Information Champion and is the author of two books: DB2 Developer’s Guide and Database Administration:The Complete Guide to Practices and Procedures. You can contact Craig via his website.

Top