Published in TDAN.com April 2003
Introduction
that makes as much good sense today as it did when I wrote it. I defined meta-data as …
“Information documented in IT tools that improves both business & technical understanding of data and data-related processes.”
I have seen my definition repeated in DM Review magazine, Intelligent Enterprise magazine and several other publications by some astute 🙂 authors. I can see sticking with that definition because
it says so much more than “data about data”. However, when looking closely at my definition … or the industry definition, the question that could pop into people’s minds is – what exactly is
data? My definition mentions data and data-related processes but doesn’t clearly specify structured or unstructured data.
This article is not intended to define or debate the differences between structured and unstructured data. This author considers structured data to be tabular or delimited by nature and recorded in
a file or database table. For the purpose of this article, unstructured data will be referred to as “artifacts”. Artifacts includes
data/documents/content recorded in electronic format that can be managed and leveraged for the benefit of your company, your customers, your suppliers, etc. Artifacts include word processing files,
html files (web pages), project plans, presentation files, spreadsheets, graphics, audio files, video files, emails … any data that is not in tabular or delimited format. Some people call this
recorded knowledge. Some people call this web content. Some people call this data documents as in document management. Everybody calls it valuable. For this article, that is the definition of
unstructured data.
Just like structured data … to manage artifacts of unstructured data, a company needs to record meta-data about those artifacts, organize that meta-data, and make that meta-data available to the
knowledge workers of the organization so they can locate artifacts when they need them. The conceptual model (Figure 1) described in this article represents many of the types of meta-data that can
be recorded about artifacts. The model may not include absolutely everything that you need to know about the artifacts, but it should provide a good start toward understanding the relationship
between meta-data and unstructured data.
The rest of the article walks through the conceptual model entity by conceptual entity and offers a brief description of each of the types of meta-data that can and should be recorded about
unstructured data and artifacts.
Artifact – Top
The artifact sits in the middle of the conceptual meta-model for unstructured data. As stated earlier, artifacts are the basic occurrences of unstructured data. Artifacts can include word
processing files, html files (web pages), project plans, presentation files, spreadsheets, graphics, audio files, video files, emails … any data that is not in tabular or delimited format. All of
the meta-data recorded in the conceptual entities surrounding the artifact relate directly to the artifact.
While it makes sense to have naming standards for managed artifacts, implementing such a standard is difficult to enforce across an organization. An overwhelming number of artifacts already exist
throughout your company and numerous more are constantly being created. Most companies will not consider going back and renaming existing artifacts to follow a naming convention. The naming of
artifacts tends to follow personal preference and the artifact name originates when the artifact is stored locally on a desktop, laptop, network drive, … typically by the author of that artifact.
Therefore it can be difficult to locate artifacts by their name alone, thus the need for the additional artifact-related meta-data that is listed below.
Samples of meta-data related to Artifacts …
- Artifact Name – represents the full technical name of the artifact – for example … projectname.doc, presentation.ppt, datamovement.xml, audiofile.wav, and so on.
- Artifact Description – represents a brief textual description of what is stored in the artifact.
The rest of the meta-data described in this article is meta-data related to artifacts. Some companies may consider putting timestamps on the meta-data to store historic information that is known
about the artifact. Some companies may elect to manage only a subset of the meta-data entities that are covered below.
Business Function – Top
Many organizations are designed by business function. For example, your organization may have an accounting function, a payroll function, a human resources function, sales, marketing,
manufacturing, purchasing, IT … and the list goes on. Your company may have many of each of these business functions. In many organizations, these functions are divided into sub-functions or
sub-sub functions that correspond to the organizational chart. Organization by business function makes sense for many companies and categorizing your artifacts by business function makes sense too.
This conceptual entity identifies the meta-data that links specific artifacts to business functions; whether that business function is the origination point for the artifact or another business
function area that makes use of the artifact.
To be successful tagging artifacts with Business Function meta-data, organizations might start with the highest level of the organizational chart (all companies, subsidiaries, …) and define
company codes to represent the companies and business function codes to represent the functions. The list of codes and names needs be defined appropriately, recorded and utilized consistently,
reviewed periodically, and managed to eliminate duplicates so it stays consistent with the company’s business.
Samples of meta-data related to Business Function …
- Company Code/Business Function Code – these two pieces of meta-data may need to be stored together for companies that have multiple business entities (companies) that have similar
business functions. Depending on your organization, you may need to capture a company code for each Business Function Code. - Company/Business Function Relationship Type Code – this code would identify the type of relationship between the company/business function and the artifact. For example – “ORIGIN” in
this code could tell the knowledge worker that the artifact was authored by a specific company/business function(s), “CONSUMER” can represent that this company/business function makes use of the
artifact, “APPROVAL” can represent that this company/business function has the responsibility of approving the artifact.
Subject Area – Top
Subject area meta-data may be used to relate an artifact to an enterprise data model or business models with the “subject area” type of categorization or breakdown. By creating a code for each
subject area and linking subject area codes to artifacts, this gives knowledge workers additional way of searching for artifacts.
Just like company/business function code – this code will require its share of due diligence to make certain subject areas are clearly defined, recorded, managed to eliminate duplicates, and
that they stay consistent with the business.
Samples of meta-data related to Subject Area …
- Subject Area Code – represents the specific subject area to which the artifact is linked
- Subject Area Type Code – represents the type of subject area classification – enterprise data model, business model, organizational model, …
Purpose – Top
Purpose meta-data identifies the uses (intended or otherwise) of the artifact. Purposes may also be broken into sub-purposes. For example – if “store opening” was defined as the
purpose, “equipment delivery” may be a sub-purpose. The same holds true for a purpose of “financial reporting” and a sub-purpose of “report distribution”. The
purpose code can be used to identify why the artifact was created and how it is to be used.
Just like the other codes listed above – this code will require its share of due diligence to make certain purposes and sub-purposes are clearly defined, recorded, managed to eliminate
duplicates, and that they stay consistent with the business. The recording and management of purpose codes themselves may be very useful in describing how the business operates.
Samples of meta-data related to Purpose …
- Purpose Code – represents the specific purpose to which the artifact is linked
- Purpose Sub-Code – represents a sub-type within a purpose to further define make how the artifact will be used.
Steward – Top
Steward meta-data, in the context of an artifact, is meta-data about a person who is accountable for the artifact. Notice that I said “a person” and not “THE person”.
Depending on your organization and how you define stewardship, there may be several steward types and thus the need to record a steward type code along with the steward (person) information. Keep
in mind that your may have several stewards of the same or different types that may be associated with a single artifact.
Different steward type codes may include “AUTHOR”, “REVIEWER”, “APPROVER”, “USER” (knowledge workers). Consider using several steward types (as opposed to assigning a single steward per
artifact) to identify, record and track the different types of accountability for the management of the specific artifact. The link between the steward and the artifact may also be important if the
artifact is time-sensitive and must be reviewed periodically to make certain it is current.
Just like the other codes listed above – the steward type code will require its share of due diligence to make certain accountabilities per steward type are clearly defined, recorded, managed
to eliminate duplicates, and that they stay consistent with the business.
Samples of meta-data related to Steward …
- Steward Type Code – represents the type of steward that is being associated to the artifact.
- Steward Person Identifier – represents the link to the steward (person). It is not recommended that this identifier is the steward’s name (for obvious reasons). This meta-data may
contain an employee id, social security number, or the specific data at your company that can be used to associate an artifact to a person.
Location – Top
A location may be as specific or as varied as a country, region, state, city, complex, building, floor, or mail-stop. Location information can be used to link people associated with a specific
location to the artifact. Location can also be used to secure who sees or uses the artifact and to identify where the artifact is relevant or should be distributed.
It may be necessary to record information about numerous locations per each piece of artifact and also to delineate the locations by location type (market region, office complex, building,
…).
Like the other codes listed above – the location codes and location types will require its share of due diligence to make certain location definitions are clearly defined, recorded, managed
to eliminate duplicates, and that they stay consistent with the business.
Samples of meta-data related to Location …
- Location Code – represents the code for the location being linked to the artifact.
- Location Type Code – representing the type of location that is being associated to the artifact (sales region, physical location, mail center, …).
Community & Audience – Top
Community or Audience meta-data identifies a group of knowledge workers that are associated to an artifact. Depending on your organization, you may have communities made up of other communities or
you may need ways to identify types of communities. Examples of a community types could include user-group communities, management teams, project teams, etc. that may be related or linked to a
specific artifact. Sub-communities may break these down further – User Groups may break down to ER/Win Users, Portal Users, Data Warehouse Users; Project Teams may break down to specific
projects by name; Management Teams may break down into different levels of managers, and so on.
Like the other meta-data listed above – the creation of community codes and type codes will require due diligence to make certain the codes are clearly defined, recorded, managed to eliminate
duplicates, and that they stay consistent with the business. Companies should consider both opt-in and organization-defined communities that are designed for specific purposes.
Samples of meta-data related to Communities …
- Community Code – represents the specific community of interest interested people to which the artifact is linked
- Community Type Code – represents the type of community classification – user group, management team, project team, …
Security – Top
Security meta-data can be used to make it such that only specific people, groups, location can be granted access to an artifact and to identify the type of access that the group has to the
artifact. This type of meta-data may also be used to answer questions – “What artifacts can John Smith view? change? eliminate?”, “What artifact can be changed by the ABC
Management Team?”, and so on.
Like the other meta-data listed above – the security groups and their relationship to knowledge workers will require their share of due diligence to make certain security group definitions
are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the business.
Samples of meta-data related to Security …
- Security Grouping Code – represents the security grouping by community, location, steward type, others … to which the artifact is linked.
- Security Grouping Type Code – represents the type of grouping – by “COMMUNITY”, “LOCATION”, “STEWARD TYPE”, … to which security is being applied.
- Security Type Code – this field may specify the type of security that is associated with the artifacts … “ALL ACCESS”, “CHANGE”, “READ-ONLY”, …
Data Related – Top
Data Related meta-data is included on the conceptual meta-model to secure a place to relate artifacts to specific pieces of structured data in the organization. Business rules, data models, data
flow diagrams, architecture definition (whether it is technology architecture, data architecture, enterprise architecture), data dictionaries, application development documentation, report catalogs
and reports themselves, … are all considered vital artifacts for the operation of your business and can be related to specific structured data in the organization. These artifacts need to be
managed just like all others. This entity allows the knowledge worker to identify and locate artifacts that have a relationship to structured data in the organization.
Like the other meta-data listed above – the data names and type codes will require their share of due diligence to make certain names and type codes are clearly defined, recorded, managed to
eliminate duplicates, and that they stay consistent with your company’s information technology. The data-related meta-data can also be used to link an artifact to your “data about data” in your
enterprise or application meta-data repositories.
Samples of meta-data related to Data …
- Data Name – represents a database name, table name, column name, core element name, any specific structured data (logical or physical) that can be related to an artifact.
- Data Type Code – specifies the type of structured data represented by the Data Name above – “Table”, “Column”, “Entity”, “Database”, …
- Repository Link Code – represents a unique identifier that couples the artifact with specific data in an enterprise or application repository.
Time or Time/Date – Top
There are many ways that the Time meta-data can be related to artifacts. The Time meta-data by itself has no meaning and must include an additional Time Type Code that denotes what the specific
time represents.
Time Type Codes can specify “CREATE DATE”, “RETIRE DATE”, “REVIEW DATE”, “LAST UPDATE DATE”, other dates pertaining to the specific artifact. For these Time Types codes, it may also be
important to record meta-data that specifies the accountable party who took a specific action associated with the artifact at the recorded time in the past (created, updated) or in the future
(retire, review).
Time meta-data may also represent ranges of times. For example, if an artifact should only be available from a certain date to a certain date, this information can be recorded with start and end
Time/Date meta-data.
Samples of meta-data related to Time/Date …
- Artifact Time(s)/Date(s) – represents the specific occurrences of Time/Date associated with the artifact and further defined by the Time/Date Type code.
- Time/Date Type code – represents the meaning of the Time/Date recorded … Examples of values that may be recorded in this code are listed in the second paragraph in this section.
Media Type – Top
Media Type meta-data represents the format type of the artifact and may be associated with the tools or software that are required to view, edit and manage the artifact. For example – an
artifact may be a word processing document, spreadsheet, graphics, audio, video, … that can only be opened using specific tools (Word, Excel, Access, Illustrator, etc). It is important to
identify the format of the artifact so the knowledge worker knows what media types are standard, acceptable, and the tools they will need to view the artifact.
Like the other meta-data listed above – the media type codes will require their share of due diligence to make certain media types are clearly defined, recorded, managed to eliminate
duplicates, and that they stay consistent with your company’s information technology.
Samples of meta-data related to Media Type …
- Media Type Code – represents the media used to record the artifact, for example – word processing document, spreadsheet, presentation file, graphic, audio file, …
- Media Tool Code – represents the tools or software that is required to open, view, or change the artifact, for example – Microsoft Word, Lotus, PowerPoint, Visio, Wave Player, …
- Media Version Code – represents the version of the Media Tool that will be required to open, view, or change the artifact.
Package – Top
Package meta-data can be used to associate artifacts to other artifacts. For example — user manuals, books, procedure guides, … are often made up of many “chapters”, “pieces”
or numerous graphics. The artifacts that make up a package can refer to other artifacts. The individual chapters, pieces, graphics may stand-alone and require separate management as an individual
artifact as well as require being managed as a part of a larger package of artifacts.
Like the other meta-data listed above – the package type codes will require their share of due diligence to make certain package types are clearly defined, recorded, managed to eliminate
duplicates, and that they stay consistent with the companies need for packaging artifacts.
Samples of meta-data related to Package …
- Package Name – represents the name of the artifact that includes other artifacts. For example, Data Architecture Plan, Equipment Operating Manual, Store Layouts, …
- Package Type Code – represents the type of package of artifacts and contain values that identify “USER MANUAL”, “PROCEDURE GUIDE”, “GRAPHICS PACKAGE”, … depending on how your company
packages artifacts.
Status and Version – Top
Status meta-data can be recorded about each artifact to identify the history and the present state of each artifact including “PRODUCTION”, “TEST”, “UNDER REVIEW”, “RETIRED”, … The status
code, when used along with a status date field, can be used to monitor the activity of each artifact by recording when the artifact was a draft, approved, under review, in production, back under
review, and so on.
Version meta-data can be used when multiple copies or versions of the same artifact exist. For example, user manuals may contain different information for different releases of applications,
software products, procedures, and so on.
Like the other meta-data listed above – the status and version meta-data will require their share of due diligence to make certain that this meta-data is clearly defined, recorded, managed to
eliminate duplicates, and that they stay consistent with the companies need for applying status and versioning to artifacts.
Samples of meta-data related to Status and Version …
- Status Code – represents the status of an artifact at the point in time represented by the status/version date. Examples are listed in the first paragraph in this section above.
- Version Code – represents the numbered or codified version of an artifact. For example typical values may include “Release 1.0”, “Release 1.1”, “English Version”, “Spanish Version”, …
- Status/Version Date – represents the date or point in time when the status or version of the artifact was recorded and made available.
Project and Process – Top
Project and Process meta-data can be used to relate a specific artifact to a project, process, or specific task. Companies that create project plans may intend to link a specific project task to a
deliverable (which will become an artifact).
Companies may elect to link artifacts to a specific step or section of a process. For example, … when setting up a display cases in a new store – there can be pictures that demonstrate exactly
how each of the sections of the display case are supposed to look, when building a house – there are schematic diagrams (read artifacts) that are required for the landscape, architecture, wiring,
plumbing & HVAC, when describing a process – there are supporting documents for each of the steps of the process along the way.
The ability to create a link between projects, processes and artifacts potentially requires several pieces of meta-data depending on how projects and processes are developed at your company. If
your company follows a strict planning process including the development and maintenance of detailed project plans, the coding that is used for that planning can be linked to the artifacts via this
Project and Process meta-data. Additionally, you may need to develop a Project/Process Artifact Type Code that defines the relationship between the artifact and the process – “DELIVERED”,
“UTILIZED”, “INPUT”, “OUTPUT”.
Like the other meta-data listed above – the project and process meta-data will require their share of due diligence to make certain that the projects and processes to be linked to the
artifacts are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the companies need for managing projects and processes at your company.
Samples of meta-data related to Project and Process …
- Project Name – represents the business name of the project that is being linked to an artifact.
- Process Name or Code – represents the process name or coded value of the process that is linked to the artifact.
- Project/Process Artifact Type Code – representing the relationship between the project and process and the artifact. Potential values for this meta-data and defined in the next to last
paragraph of this section.
Event – Top
Event meta-data can be used to associate artifacts with events (happenings) that take place at your company or within part of your company. Events can be “one-time” events or events that repeat
themselves periodically. And … to be consistent with previously defined meta-data, events can be categorized into types of events and events can be parts of other events. For example, a
store may have a certain procedure or set of procedures that they follow for their annual holiday sale or promotion, a manufacturing company may produce certain goods only during an event at a
certain time of year, a conference may be held once a year, and the information, pictures, processes, … that are used to support these events are likely recorded in artifact (documents,
graphics, …).
Artifacts should be linked to events in order for companies to retrieve the information they need prepare or follow through with these events. Event meta-data can identify specific events, event
types, and can relate events to other events.
As with the other meta-data listed above, creating codes for events and event types requires its share of due diligence to make certain this meta-data is clearly defined, recorded, managed to
eliminate duplicates, and that they stay consistent with the business. The recording and management of event codes and event type codes also may be very useful in describing how the business
operates.
Samples of meta-data related to Events …
- Event Code – representing the specific event to which the artifact is linked
- Event Type Code – representing the type of event that is represented in the Event code. For example, “CONFERENCE”, “MEETING NOTES”, “BANQUET”, “2003 SUMMER AUDIT”, … and so one.
Conclusion
This article walked through a conceptual meta-model for unstructured data entity by conceptual entity and offered a brief description and some samples of each of the types of meta-data that should
be consider when recording meta-data about unstructured data and artifacts. Hopefully this article broadened your thinking about the meta-data component of managing unstructured data.
Perhaps in the future I will consider changing my meta-data definition to read as follows …
Meta-Data is …
understanding of data and data-related people and processes.”
Whew! That’s a mouthful. “Data About Data” is a lot easier to remember.