Data modeling and database design is an in-demand career path, and database design is a crucial skill set that experienced DBAs must possess. In this article, we’re going to look at some common “beginner” questions, though written with the assumption you’re familiar with some aspects of data modeling and database design.
Typical Database Design Life Cycle
It can be a bit contextual depending on the specific project, as far as the actual steps in sequential order goes, but there are several stages that a database system goes through during development.
For smaller database systems, the development life cycle is typically quite simple, with few stages to go through. The stages might go something like:
- Requirements analysis: Mapping out the development life cycle. At this stage, the scope and boundaries of the database system are defined.
- Database designing: Designing logical and physical models. The logical model is designed on paper, without any physical implementations or specific DBMS considerations. The physical model then implements the logical model, while taking those other implementations into account.
- Implementation: Converting data and loading. This stage includes importing and converting data from the old system into the new database, and the testing stages.
The difference between logical and physical models
A logical model will explain a database design, but does not typically go all the way down to the level of detail which is used to specify the database’s platform specifics or physical format. An example of a logical model would be an Entity-Relationship Diagram (ERD), which would contain only generalized data types, and attributes such as NULL/NOT NULL and check constraints.
A physical model would be using platform-specific syntax all the way from the code, which means that for example, a physical model for Oracle would be incompatible for an SQL Server instance. A physical model will also contain details about the database such as location on the file system, log files, and other information that would be relevant to your instance. For example, how each data platform would implement a monotonically increasing primary key, such as using IDENTITY on an SQL Server, but SEQUENCE on Oracle.
What are common rookie mistakes in database design?
One of the most common rookie mistakes in database design would be under-engineering, or not doing enough. Mistakes made in under-engineering would be things like, for example, if you need to store long lists of names and email addresses, but you build the database while ignoring things like allowing the customer to put values other than an email address for that column.
Opposite of under-engineering is over-engineering. Over-engineering is when you’re doing too much design to solve a basic problem. Going with the above example, you could over-address the issue by adding five new tables, for details such as phone number, mailing address, each part of their name, etc.
The golden middle-ground depends on the exact purpose of the database, but generally speaking, you should aim to neither under or over deliver to your customer.
Other rookie mistakes would be just not taking advantage of little tweaks and hacks to boost performance, like these hacks to improve SSAS memory.
How are release candidates determined?
Release candidates are generally put up when the company isn’t going to implement new features, and the expectation is that people will use it and report any existing minor bugs. In this manner, release candidates essentially are the nearly completed version, although some performance polish on the UI is likely still being tuned, as well as removing any debugging and tracing code from alpha / beta developments.
How difficult are migration plans for inherited databases?
So let’s say you inherit a database / design which hasn’t followed the best practices, and now you need to create a migration plan for a design that follows better practices. In this situation, you’d want to consider incremental changes. This would include creating physical tables that adhere to higher standards and implement views that can make your objects appear like the old system.
In some ideal situations, you’d have one database used by only one application, so that even if it was poorly designed, it would only affect one particular system. Addressing issues over time wouldn’t be as difficult.
If your goal is to refactor the design of a migrated database by implementing better practices, it’s beneficial to start by standardizing the object names. From there, you should make sure that the data types follow consistent design for tables, views, parameters, and expressions including procs, triggers, and functions. This minimizes the risk of implicit conversions, which means you’ll avoid the performance killing aspects of those in SQL Server apps.
You’ll also want to give all tables a clustered index, primary key, and declare all foreign keys while creating non-clustered indexes on them, for a bit of a performance improvement.
What are recommended modeling tools?
It depends on the scope of your database, but for the most part, the specific modeling tools most highly recommend across the industry are:
- ER/Studio
- Erwin
- Navicat
- SQLDBM
- Archi
Some of them are a bit pricier, such as Erwin and ER/Studio, and are intended for more enterprise customers. But inexpensive and even free modeling tools with great features do exist, such as SQLDBM. It is a reputable tool with the bonus of portability. Here’s a good comparison of the various popular data modeling tools