The model below is an example of an Event and Event Line concept, a familiar structure in many of our organizations. Each of these entities contains data elements that are not shown on this model aswell as other relationships that are not shown such as to Customer and Supplier.
Once an Order is considered “closed” and leaves the system, an Order Number value can be used again on a future order. You’ve heard the slogan “Reduce, Reuse, Recycle”– here it is being applied to natural key values. Therefore, Order Number is not a unique natural key. We can add the Order Create Date to Order’s primary key (Option 1 below), or createa surrogate key called Order Id and make Order Number and Order Create Date an alternate key (Option 2 below).
Would you choose Option 1, Option 2, or is there a better solution and why?
Once a month in my newsletter, I include a data modeling puzzle and many solve the puzzle and their solutions (as well as my own) are published in a very highly regarded data management publication(i.e., TDAN). Below please find a summary on how we solved this challenge and make sure your email address is on our monthly list by signing up here: www.stevehoberman.com/challenges.htm.
Top 3 Reasons for Choosing Option 1
- To avoid introducing a new data element. Option 1 uniquely identifies orders without introducing a new key (i.e., Order Id). As Jeff Lawyer, Senior Data Architect, summarizes: “No need to manufacture a surrogate key for Order…No need to create an alternate key for Order nor incur additional overhead to maintain an alternate index.”
- To facilitate partitioning. With Order Create Date as part of Order’s primary key, we can more efficiently manage partitioning of Order and Order Line by calendar year. In fact, not only can we partition by year, we can also more easily purge (or move to offline storage) Order and Order Line records after a specified number of years.
- To improve performance. Depending on how the database generated from this model will be used, retrieval performance can be quicker using the natural key. If both an Order Number and Order Date are known for example, we can get to Order Line instances directly without having to join through Order.
Top 3 Reasons for Choosing Option 2
- To reduce the size of Order’s primary key. Navin Ladda, Data Architect, summarizes this reason: “I would choose Option 2 for the prime reason that it will help with reducing the number of columns that will migrate over to the tables that are child to Order.” Also, because what is only shown is a model subset, it is possible that Order Line is a parent in one or more other relationships and therefore Order Line’s primary key will propagate as foreign keys and Option 2 will minimize the size of all of these downstream primary keys.
- To reduce errors. Both Rich Kier, Data Analyst and Susan Earley, Senior Data Modeler/Project Architect, stress that the developers for this application might misinterpret Order Create Date in the Order Line entity in Option 1. Susan says “Option 1 carries the risk that someone managing that data model doesn’t know that the Order Create Date must be propagated in order to preserve the uniqueness of the key.” Pete Stiglich, Data Architect, adds “If a developer or user forgets to include Order Create Date in the join, double counting can occur. Using the surrogate key (Order Id), there is only one possible way to join Order to Order Line.”
- To facilitate integration. In the future orders may arrive from other applications and the Order surrogate key provides a buffer in case these other applications identify orders differently, as we can assign these orders a new unique Order Id value. As Gary J. Deffendall, Lead Data Modeler, says, “With a surrogate key you should be able to guarantee uniqueness in the primary key regardless of the number of order systems you are integrating in your warehouse.”
Possibly a Better Solution
- Rethink the Order Line primary key. Several challengers including Steve White, Information Architect, and Tim Bowders, Information Security Analyst, questioned Order Line’s primary key. Steve says “Is Order line sequence number part of the key of Order line? I suppose it could be if you allow people to order the same item twice on the same order with different shipping instructions. Is product number part of the key if I have the sequence? Probably not – the order and line sequence uniquely identify the record.”
- Change the business process. We could just use the Order Number to identify orders if we can change the business process that allows order numbers to be reused. Diana Wild, Business & Data Architect, says, “I would suggest that you go back to the business and challenge the original business rule that says an order number can be reused. In fact, the business people can probably cite their own examples of where there was an issue with an older order and they had a great deal of difficulty figuring out which data in their existing systems was related to that order, for example, in processing a return.”
- Question whether reusing Order Numbers is an issue. Perhaps duplicate order numbers is not an issue. Michael Anderson, Manager, says, “Does my customer want to keep Order History where I reference previous Order numbers? If so, for how long? If the retention period is 2 years, then I can ‘reduce, reuse and recycle’ every two years.”