Welcome to the first “The Data Maze” column. I am grateful that I can expound in some detail on one of my favorite topics, data, and hope that I can assist with passing some of the knowledge that I have learned through the years and also learn from you in the form of feedback from this column.
As soon as I found out I was going to write a regular column, I quickly proceeded to work on the first task, the column name. As I pondered the name for this column I, for some unknown reason, wanted it to start with the name “The Data…”. As any good architect/designer knows, you don’t want to paint yourself into a corner at the beginning, but I was obstinate in my approach to the problem. Recollecting my preschool days and my friends Ned, Ted and Ed, I immediately wanted to pursue a rhyming pattern that I thought would be cool for all involved. Imagine my disappointment when I googled “words that rhyme with data” and words such as “beta,” “pro rata” or “errata” resulted, which either didn’t make sense or in the case of “errata” meant an error. How would a column called “The Data Errata” do with readers?
Thinking back I could have talked about data quality issues which would keep me writing daily for hundreds of years, but probably both you and I would feel dirty after the whole experience. I went through several sleepless nights before deciding to skip the rhyming approach and take a more metaphorical approach. This helped broaden my thinking, but I still was getting no results until I happened to be scampering around the house, eating cheese, heading for my treadmill in the basement when somehow I got lost between the living room and the basement. As I wandered wild-eyed around the sofa, behind the hutch and then around again multiple times, it hit me like a mousetrap. I was literally reliving my career, and what better name for my column than “The Data Maze.”
I may have exaggerated a bit on the origins of the column name, but I think the name “The Data Maze” fits. I have been involved with IT and predominantly data architecture for more than 20 years, and each time I thought I might have figured it out, there was a new twist or new paradigm that shook up the market and, as a result, my carefully laid plans. Conquering a maze implies that we, as data professionals, must constantly learn and adapt in order to get to the finish line or we stagnate, take a wrong career turn or get lost in a never-ending cycle of wrong turns. To complicate matters, we must traverse this maze on several levels, specifically related to tools, technology, applications and even techniques.
From a tools perspective, I am going to really date myself, but I remember the day when I came into work and they had just installed a wonderful new CASE tool (anyone remember Knowledgeware IEW?) on my huge, slow desktop PC. I spent hours trying out all the cool features and making up all kinds of quick and dirty data models, then generating DDL, looking at the results and just admiring my handiwork as a result of this wonderful new tool. This kind of development tool was a luxury for data professionals back in that day; and now their successors, such as Embarcadero’s ERStudio or Rational’s integrated development environment including InfoSphere Data Architect, are essential tools in any large, complex and integrated development project.
From a technology perspective, I started out working with an enterprise DBMS application called IDMS; and then when I moved on into the DoD arena, I worked with a more obscure DBMS called Model 204. I eventually moved into the mainstream and gained experience with multiple DBMSs, including Oracle, Sybase and DB2. Fast forward to today and we not only still have the traditional vendors, but now a horde of new vendors and their related database technologies have entered the market at a rapid pace – vendors such as Aster Data, Greenplum and Netezza, to name a few, and concepts such as in-database analytics, enterprise mashups and MapReduce with volumes of data that were unheard of just several years ago.
In this era of rapidly changing technology and multi-terabyte even multi-petabyte databases, all of us need to chart a path through this maze and come through to the finish line as unscathed as possible. Keep in mind that a solution that may have worked for a particular problem several years ago may no longer make sense because a new solution, design pattern and/or product has become available. I know that in IT this should be a foregone conclusion, but I have seen many an architect hang onto their favorite hammer for years after it goes out of style.
Things have also changed from an application (green screen to Internet/web services) and techniques (e.g., structured vs. unstructured modeling techniques) perspective. Suffice to say that for all of these reasons, the purpose of this column is to attempt to guide you through the current and upcoming data maze. The first series of columns will specifically concentrate on the rapidly growing area of analytics and its impact on data professionals. It will discuss the foundations of data warehousing and why database specialization has come about in recent years. Future columns will delve into more details of the different platforms including characteristics, various uses and some of the main vendors. From there, we will continue to develop on this base and explore the fascinating world of analytics and other data items and why we, as data professionals, will be impacted by it in a profound way. Thanks for tuning in and happy wandering.
To comment on ‘The Data Maze’ Feature Column, please respond through the publication’s comment form and/or send an email to Dan Sutherland at sutherl99@msn.com.