In my TDAN columns so far, we have described the need for the patterns, principles and definitions that underpin an architectural framework.
Now we move onto the more concrete things that an architectural framework needs to define – namely, the answers to: What business activities are being supported? What information is being
processed? Which business activities require it? Where is the information located?
Needless to say, these are not trivial and at an enterprise level far too big to cover as a single subject. Like most complex problems, it is easier if it is broken down into simpler parts that can
then be defined separately (though not in isolation) from each other.
In one of my previous columns, four standard views in the architectural framework were mentioned:
- The information or data view representing the data that is required by the business to support its activities. This answers the “what information is being processed” question.
- The functional business or domain view representing all the business processes and activities that must be supported. This answers the “what business activities are being carried out”
question. - The integration or data-flow view representing the flow of information through the business, where it comes from and where it needs to go. This answers the “which business activities
require it” question. - The deployment or technology view representing the physical configuration and technology components used to deploy the architecture in the operating environment. This answers the “where
is the information located” question.
Traditionally the production of these would be seen as business analysis or design or implementation deliverables and, because of the detail contained in them, produced as part of individual
projects and generally not completed until well after the architecture is published.
It’s true that much of the detail is mere implementation detail (the reader will hopefully recall this definition from one of my earlier columns– namely, “information that’s only important at a level of detail below
this one”) and not everything must be known in advance about these models in order to outline a robust architecture.
Still even with the minimum amount of detail, they are very big subject areas that need to be separated.
In this column, we are going to focus on the business information model and what needs to be considered and addressed to incorporate it into an enterprise data architecture.
What Does the Business Information Model Enable?
The business information model is probably the single most important component of an enterprise data architecture. As stated previously my column on establishing architectural patterns, the purpose of IT from a business perspective is to move information around the company and
deliver it to people (or computers operating as proxy people) to act upon, and then store it away somewhere safe so that it can be examined again later or reused by another business activity.
In order to do this, we must have some idea of what the data actually is.
Unfortunately, many of us know from hard won experience that in both the “AS-IS” and “TO-BE” architectures that we often have a problem with “not knowing”
pertinent information about the data particularly:
- Not knowing the “full picture” – knowledge is confined to data silos fragmented across the enterprise.
- Not knowing the business significance of a particular data item – data silos tend to only care about their own use of the data and document it from the application perspective rather than
the business perspective. - Not knowing if a data item is defined consistently across the entire enterprise. In many business environments, information is regularly chained, or even networked, across multiple applications
that each apply their own interpreted definition based on data analysis that many not be the same interpretation used elsewhere. - Not knowing what validation rules should be applied to a data item. It’s commonplace for individual application development teams to embed data validation “rules” in their
application that other users of the data only discover by accident when they “cleanse” the data. - Not knowing the impact of changing the definition – this is a natural consequence of the above.
- Not knowing…? (we don’t know what we don’t know)
The business information model provides a way to address many of these issues and provides a central point of reference – the “single version of the
truth” – regarding a single, consistent definition of the information of interest to the business.
However, producing a business information model is more than just a documentation exercise. It also underpins a number of other desirable business activities such as:
- It enforces “coherence boundaries” within the enterprise.
In the big wide world, coherence exists when the language used by two parties to communicate to each other about a particular subject area is both logical
and consistent. For an enterprise architecture, this means that any information exchange interactions across any boundary with another party must be logical, consistent and easily understood by
both parties and any intermediaries.A coherence boundary is the point at which the semantic definition in one business or business area must coincide with the semantic definition in another
business or business area.A single enterprise-wide business information model that is actively governed and enforced will provide coherence and once the domain is decomposed into business areas coherence boundaries
naturally form as a result of it. (We’ll be coming to this in the next installment.) - It acts as the beginning of the data quality capability.
The fundamental purpose of any data quality initiative is to build trust in the information that is being presented to the data consumers. Trust is directly dependent on consistency that can only
be achieved if there is a baseline platform independent definition which to compare individual instances of against. - It acts as the beginning of the business intelligence capability.
Business intelligence is a core activity aimed at understanding what the business is doing and how well it is doing it. Its main problem is identifying and prioritising the important things that
need to be monitored and the key characteristics that provide the most insight.We could, of course, just waste a lot of money monitoring everything that moves but business intelligence needs to be more intelligent than that.
- It describes all the key business concepts that need to be managed so acts as a checklist to ensure full coverage within the technical architecture. There must be at least one business activity
that either creates, deletes or uses the information defined (“update” activities are optional) by the business entity.I did once get bitten by not having this where some essential reference data was manually being created by one application development team but used by many other applications. Nobody knew where
the data came from, and this gap was only discovered when the application development team was disbanded. - It forms the basis for data governance.
Data governance, as with data quality, needs a set of established rules against which to govern, and it needs to be maintained independently of the thing being governed (that’s the whole
concept of “government”). - It forms the basis for data integration or data federation or even data virtualisation.
All three of these integration approaches become easier once coherence, data quality and data governance are in place because consistency, trust and common understanding removes the need reduces
the need for “defensive” processing by consuming applications. - If model-driven generation is being considered as an implementation approach, then the business information model is the platform independent foundation that all platform-specific models are
derived from. - If you’re into service-oriented architecture (SOA), then it will describe the public form of each service interface and the high-level service groups that everything else is derived
from.
Of course, to deliver these benefits and capabilities the business information model needs to start from a solid business and technical foundation and be managed in a systematic manner with a
pre-defined set of architectural patterns and principles that are actively governed independently of the implementation activities.
Producing the Business Information Model
A common criticism against producing and maintaining an information model is that it is labour intensive, and, in a fast changing business environment, is always out of date. Some people even go
as far as saying that the information model is irrelevant and “the code is the only model we need.” That may be a valid opinion in many environments; but in a large scale enterprise
environment, this silo mentality quickly breaks down and becomes unworkable.
Even if the need for a business information model is accepted, there is always concern about the effort required to produce and maintain it. Unfortunately, in the same way that a piece of software is
never complete, a data model will never be finished because there is one more rule that could potentially be discovered.
However, the business information model does not have to be fully formed before it is usable.
Practically, it is sufficient to just have a conceptual data model (the business concept map) that acts as the top layer of the business information model and defines the major business concepts.
Definition can then be added over time as the architecture is rolled out over the organisation and the various business systems and sub-systems are integrated into it.
The architecture then only has to concern itself with the basics of governing the business information model and defining its characteristics.
So what are the minimum artefacts required to include the business information model as an architectural artefact? The main architectural artefacts are:
- The business concept map mentioned above which describes all the main business entities – the primary business concepts – around which all other information is organised. This
establishes the purpose and scope of the data that is being processed within the domain boundary covered by the architecture. (Obviously, we can assume no knowledge of any separate architecture
that might cover information outside this boundary.) - A set of “modelling guidelines” identifying the principles and practices that will be adopted when adding detail to the business concept map.
- Optionally, depending on what business capabilities that need to be supported, an appropriate modelling language with an extensible meta-model that will be applied to the business information
model and anything derived from it.
Establishing the Business Concept Map
How much detail is included in the framework business information model would depend on the extent of the business analysis that has already been completed and the depth of understanding of the
existing business domain.
There is never too much information (extraneous information can always be filtered out), but there can be not enough and if there is not enough information then assumptions will have to be
made.
As a minimum the top level business concept map should contain:
- The primary business concepts – these are the externally recognised things around which all other information is gathered. The primary business concepts will be business-sector specific
and shared across all organisations that operate in the same business sector. - Secondary business concepts that are significant, i.e., frequently referred to by the business and that define concepts that a knowledgeable business stakeholder (but not necessarily a
subject-matter expert) would expect to be made aware of. - The relationships the main business concepts have with each other – these generally reflect the main paths the business stakeholders follow when navigating through the stored information.
This doesn’t have to be every single relationship that may exist but should be detailed enough to ensure understanding of the overall structure of the business domain.
This is probably better illustrated with an example, so the following example is (a small) part of a business information model related to an organisation involved with financial market
data.
The important points to note are:
- The packages denote known “business areas” that would have responsibility for managing the life cycle of the information that resides within that business area.
This isn’t a mandatory feature at this point, but everything must eventually have an owner because we can’t assign responsibility without ownership and hence are unable to enforce
governance. (The ideas of business areas, ownership and governance will be further explored in my next column on “Domain Decomposition.”) - The model is not exhaustive – each of these classes may have further specialisations; or where specialisations are defined, there may be other ones that aren’t defined here because
they are not of significance. - A package does not have to contain any classes if there aren’t any that are significant or we don’t know what they are yet.
For example, we know that financial accounting is a defined business area and will contain a lot of information that is relevant to its function, and we know that both sales invoice and purchase
invoice feed into it somehow. What we’re not yet sure of is how they relate so the associations just point to the package rather than an individual class.This might be legitimate, e.g. financial accounting is an externally managed (outsourced) service whose internals are hidden from the rest of the business. This affects the scope of the model and
would need to be resolved before the model can be finalised. - Relationships can genuinely be with a package rather than any specific class within that package. For example, the model implies that a customer may legitimately sign a sales contract for
access to any category of financial markets information. - Not everything resides within a business area package because we don’t always know who “owns” the business entity. For example, is sales invoice the responsibility of customer
management or financial accounting (it has relationships to both) or another business area entirely?Initially the important thing is to capture the concept and find an owner for it as soon as we can.
- The model defines a number of secondary business entities (those that are components or specialisations of another business entity) because the concept is significant to the business and needs
to be identified (e.g., quoted price as a component of listed instrument). - The model doesn’t define any attributes of any of the classes – they might be hidden or they might not be defined. It’s not important at this stage because this is all mere
implementation detail. - The model doesn’t define cardinality for any of the relationships though it could if they were known and significant.
How the business information model for each of these concepts evolves depends on the requirements of the business, but it contains sufficient detail to categorise all the significant information
required by the business and identify the extension points where additional information may be added later.
Selecting the Modelling Language
Choosing an appropriate modelling language is mostly a case of deciding what characteristics of the modelling language are important and then selecting an appropriate language from the choices
on offer.
Most organisations would probably choose the Unified Modelling Language as their in-house standard; but in terms of producing a business information model, there are other languages that may be far
better suited to the task.
A comparison of all the widely used languages is outside the scope of this article1 but the selection should be an objective decision made based on matching to the business requirements
NOT based on an unproven assertion that it is the de facto standard.
As a case in point, the only things that are mandatory in many modelling languages are the type of element and an assigned “name” for the element – everything else might well be
optional within the language specification, though, if we’re lucky, with a default defined for each optional element.
Often we want to tighten up these rules and make some of the optional characteristics mandatory (e.g., all attributes must have a specified datatype; strings must have a maximum length specified;
numbers must always have an allowed range defined; a description must always be provided; and so on).
There may be any number of desired “tweaks” in order to enforce “good practice” and ensure that the resulting models contain all the information to support all the desired
implementation activities.
In addition, all modelling languages are incomplete! Each of them was originally created with a specific focus (e.g., UML was focussed on application
development), and anything outside of that focus needs to be added by extension of the meta-model for the modelling language.
For example, we might want to record aliases and synonyms against various elements to support a data dictionary because businesses often have more than one name for any given concept. Many modelling
languages only support a single name for anything so this would need to be supported as an extension to the basic language.
If extensions are required, then the selected modelling language must be built on an extensible meta-model and be implemented on a platform that allows access to the necessary extension points.
Establishing the Modelling Guidelines
Given the wide variety of notations that already exists for information or data modelling, it might seem odd that an additional specification of domain-specific modelling conventions would still
need to be established.
However, all modelling notations contain a lot of flexibility that allows the individual modeller to apply personal preferences to how a particular modelling problem is solved; and, as the literature
easily demonstrates, there are many different approaches that can be taken to ultimately achieve the same outcome.
Where many different groups are responsible for different parts of the model this subjectivity will lead to inconsistency within the model.
So, if we accept that architecture is primarily about defining patterns and principles and removing subjectivity, then a formal definition of the modelling conventions that will be used and what they
mean is a natural product of the architectural activity.
In this context, a key architectural document is a “Modelling Conventions & Guidelines” document (or something similar) that should answer questions such as:
- How do we categorise the different classes of data that exist within the organisation?
The generic “class” or “entity” in many modelling languages represents any composite data structure and does not differentiate between true business entity classes,
datatype classes, reference data classes (e.g., enumerated domains), stereotype classes, component classes. - When and how should we use the different classes that we have identified? They have each have a different purpose so there must be criteria for when to use one type of class instead of
another. - How do we represent “identification” in the model?
Some modelling notations (e.g., UML) do not support the concept of “externally identifiable” business entities (i.e., things that are identified by a combination of data items instead
of a pseudo-key such as an object ID).Being able to externally identify things that reside elsewhere outside of a domain boundary is an essential capability. Doing it consistently is even more important.
- How and when should we use enumerated values?
Enumerated data items form a significant part of any information model, and very rarely is just documenting the values in a simple code list sufficient to describe the nuances of the information
being processed.For example, an enumeration may indicate an undocumented classification of data (i.e., alternative sub-typing) or support an internal hierarchy (e.g., business sector codes) where one code
implies other codes are also applicable or the code may represent a data abstraction that “hides” information for ease of implementation.All of these usages imply more complexity than just a code list so an enumerated domain may not be appropriate.
- What naming convention should be used?
Platform-specific naming conventions make models difficult to read by users not familiar with the naming convention (e.g., calling something SharePriceMovingAverage or (even worse and I’ve
seen this!) ShrPrcMvngAvg when we mean “Share Price Moving Average” can be confusing.The naming convention should be platform independent and standardised as far as possible.
- In what circumstances should we use data abstraction and when should we explicitly declare characteristics?
- What basic datatypes should be used?
Most modelling notations and programming languages define a whole bundle of datatypes that don’t have any meaning whatsoever to a business stakeholder (e.g., Byte, Long, NMTOKEN, uinteger,
VARCHAR).An organisation may also have “non-standard” datatypes that are widespread within the organisation and need a standard definition to be applied (e.g., “Money” [currency
code and value] and “Date Range” [Start date and end date] are datatypes that appear regularly in many models).Datatypes are the sub-atomic particles of the IT world from which everything else is built so it is imperative that they are known and agreed across the board.
- What language should we use for declaring data validation rules so that they are unambiguous and understandable?
Few of the existing data modelling notations answer these types of question; so to ensure consistency across the entire business information model, they need to be answered as part of the
overall architecture.
The “modelling conventions” should be of use to anyone that will be:
- Maintaining or extending the corporate data model in a manner consistent with the existing model.
- Using the business information model to generate other artefacts and needs to understand the specific meaning of the various techniques that have been used.
- Needing to understand, in a business context, the information that is available in the corporate data model and the constraints that apply to it.
That’s pretty much the entire stakeholder and IT development community! If nothing else, that alone would place it firmly in the architectural domain as something that needs to be governed
independently of any vested interests.
Conclusion
In this column, we have looked at the importance of the business information model and what purpose it serves within an enterprise data architecture. One aspect of this that has been mentioned a
couple of times but not elaborated is the inclusion of a domain decomposition based on the business information model.
In my next column, we will look at domain decomposition and a systematic approach to the partitioning of the business concept map into the business area responsible for managing and maintaining each
primary business entity.
End Note:
- I’m not aware of any exhaustive comparison of existing modelling languages but the article “An Ontology of Data Modelling Languages: A Study Using a Common-Sense Realistic Ontology” [Milton & Kazmierczak,
Journal of Database Management, Apr-2004] provides interesting background reading on the general concepts that should be present.