Robert S. Seiner (RSS): Hello Susan. Thank you for taking the time to speak with me about the DAMA Dictionary of Data Management 2nd Edition that was released earlier this year. Please tell me about your involvement with DAMA and the role that you played in the compilation of the 2nd Edition.
Susan Earley (SE): I was asked to be the co-secretary of the DAMA Chicago chapter in early 2007. I went to the EDW conference that year, and won the DAMA Dictionary on CD. That’s also when I heard about the DAMA DMBOK that was being developed. I volunteered to contribute, and wound up becoming the assistant editor when the project was starting to run behind. I liked the experience editing, and learned a lot about how publishing works. I then volunteered to update the dictionary. I am also going to be assistant editor on the DAMA DMBOK version 2, which is going to start development soon.
RSS: Can you give me a brief history of the DAMA Dictionary, when the initial version was published, the purpose, and the audience … whatever you want to share with the TDAN.com readers.
SE: The original DAMA Dictionary was published in 2008 as sort of a pre-cursor to the DMBOK. Having a dictionary/glossary was the first step towards a common vocabulary for data management. The audience was all of the Data Management profession – sort of a first baby step to creating standards that other professions enjoy.
RSS: What is the difference between the first edition and the second edition?
SE: The second edition has about twice as many terms as the first edition. We removed the last section from the first edition, which just grouped terms into subject areas, in order to make room for more terms. Some areas of terms were only lightly covered in the first edition, such as types of visualization for Business Intelligence, but are more thoroughly covered in the second edition.
RSS: Has the purpose or use of the dictionary changed over the years? How has the DAMA Dictionary evolved and where do you see it going from here? Is there a 3rd edition in the works?
SE: The purpose is to create a common language of preferred terms and definitions. We have so many terms used to identify the same thing; we want to have each ‘thing’ have one ‘name’ as much as possible, in order to reduce confusion. The dictionary has doubled this time, but I don’t see it complying with Moore’s law; most of the main terms are now included. Next time will probably see only a few new terms, a few more preferred terms from continuing evolution of the topics, and maybe some new topic areas.
RSS: What can you tell us about the process of collecting data terms and definitions, getting people to provide feedback and the final acceptance of the terms and definitions?
SE: I started with the first edition dictionary.I converted it to a spreadsheet to make it easier to manage. The processes I used were to browse the DAMA DMBOK for text that defined a term, browse through the ICCP test data bank looking for terms and definitions that are included on the tests, browse the web for common terms and definitions, and ask for terms from the DAMA group on LinkedIn. Several people submitted terms and sometimes even definitions for those terms, which was nice. There is a citation where people submitted terms that I did not adjust for copyright clearance. We then put the terms and definitions on a website for commentary. That generated more terms and good commentary, which tightened up the definition texts.
RSS: The final acceptance of the definition of a term within a single company or organization can be a difficult task. Knowing who I know in DAMA circles (and some of the personalities involved), how were you able to get the DAMA participants to come to agreement across industries and across companies?
SE: I mostly made executive decisions based on the commentary, giving preference to more specific terms over more general or colloquial terms. I would also do some Internet searches for those terms, and the one with the most hits (used most often) would usually get preference. For terms with multiple words, I’d look up the English definitions and use the most appropriate words. There were some terms submitted that I couldn’t find a definition for on the Internet, so they didn’t make it in.
We aren’t looking necessarily to change any terms or definitions to something DAMA owns; we’re looking to document what is currently in popular use.
RSS: Were there certain terms that were overly difficult and how did DAMA settle disagreements? Is there anything advice you have for organizations that are having similar difficulties?
SE: There were some odd colloquial terms in the first edition that I removed, and there are some differences of opinion on what to call some things (like unstructured/polystructured/superstructured/multistructured data), and where I had room, I’d add the additional terms with a link to the preferred term.
For organizations working on their own internal vocabulary and encountering conflict, I suggest doing the same thing that dimensional data modelers do – use the term, but prefix it with something showing which version it is, such as the name of the team preferring a specific definition for that term. So if you have one term, but three meanings depending on who is using the term, you really have three terms. For example, Sale could be Marketing Sale (a customer buys something), Inventory Sale (any time inventory leaves due to a transaction) or Financial Sale (any time inventory leaves due to a sale transaction for greater than$0). You still have Sale, but now it’s qualified so you know they aren’t the same when those departments compare reports.
RSS: The Dictionary is available on a CD from DAMA and through Technics Publications. The CD has a great User Interface. Tell us a little bit about the interface, who and what it was designed for, and how it was developed.
SE: The CD version is in a hyperlinked PDF. I wrote macros to convert the spreadsheet with the dictionary contents into a bookmarked word document, put it in the template/wrapper, and then trolled through the document with search/replace to insert the hyperlinks for significantly referenced dictionary terms.
RSS: In your experience at U.S. Cellular, and in speaking with other people that use the DAMA Dictionary, can you tell us how organizations use the tool and when and how it makes the life of a data management professional better/easier?
SE: At U.S. Cellular, we tried to use it when we needed to look up a term during training and. Several associates skimmed through it to see if they would learn anything. It wasn’t anything we used formally, though. At Sears Holdings, where I am now, I will be using it as a reference when I’m putting together data management materials for training or documentation.
RSS: There is a lot of talk in the information management industry about semantics and the development of common data language and vocabulary for business and technical professionals. How does the DAMA Dictionary fit into that discussion?
SE: So much progress can be made, and time saved, just by having a common vocabulary. This effort is the first (now second) step in that direction. Having a single source for common terms and language definitions (even if specific to data management) will go a long way towards having an adoptable language reference.
RSS: DAMA International is obviously working hard to keep up with the times. Are there any other DAMA projects taking place that the TDAN.com readers should know about?
SE: We are currently starting work on the next version of the DAMA DMBOK. Pat Cupoli is the primary editor for that, and she is currently working on the author instructions. DAMA DMBOK version 1 is still being translated into multiple languages. One good thing is that when the translators went through the first edition, they found only a handful of typos or editing goofs. Considering the size of the document, that’s awesome.
RSS: Is the DAMA Dictionary of Data Management 2nd Edition available in printed format as well as the CD version? Where should people go if they are interested in purchasing a copy of the Dictionary? Do you have any final words for the TDAN.com readers?
SE: Yes, it is available in both formats. We got feedback that paper versions were desirable after the DAMA DMBOK was released only in CD format. You can get them both from Amazon or from technicspub.com. Enterprise Server licenses for the CD version of both are also available from Technics Publications.
Thank you for the opportunity to tell this story. I have learned so much from doing this, met so many lovely people, and I look forward to learning more with the next version of the DAMA DMBOK.
RSS: Thank you, Susan, for telling us more about this excellent publication.