Readers of my column know my aversion to buzzwords.[1] I approach the hot catchphrase “Data as Code” with trepidation. Already, we have to name a few:
– Infrastructure as Code (with its own acronym, IaC)
– Configuration as Code (Config as Code — why not CaC?)
– Environment as Code (EaC is not available, as we will soon see)[2]
I am not surprised that on AWS’s Well-Architected page, we find “Everything as Code.”[3] Terrence Wong supplies an acronym (EaC) and a definition in his post for Octopus[4]:
“Everything as Code is an approach to software development and DevOps that uses code to define and manage IT resources. The code representation of resources makes it easier for developers to:
– Audit changes
– Improve consistency
– Scale resources
– Transfer settings from one environment to another
Taken literally, EaC is an ideal state where every part of the software lifecycle is code.”
I admit to the fact I am not a developer, but I understand the concept. Extending DevOps beyond development to these other aspects of the software lifecycle makes sense. But Terrence’s definition does limit the scope to technology. It’s not leading us to “Management as Code (MaC),” “Relationships as Code (RaC),” “Ice Cream as Code (ICaC),” and so on. To be honest, I have not searched for these on the web, because I needed to start drafting this article. ICaC may be out there.
But “Data as Code”? More often than not, I’ve seen a separation of data management and software lifecycle. The paper “Data Management Lifecycle and Software Lifecycle Management in the Context of Conducting Science” compares these two.[5] The authors offer suggestions for linking them two but keep data and software distinct.
I’ve read a number of articles promoting Data as Code (DaC, of course!) Sven Balnojan describes DaC as “using the same good practices we learned in software and applying them to data in all fields of data, operational data, analytical data, data science data… versioning, automated testing, CI systems… with the goal of delivering high-value data fast.”[6] He explains elsewhere that “DaC is simply a mental model which might help you or not”.[7] Balnojan likens whether this resonates with the reader with one’s preference for jazz. I like jazz, but am not sure I like DaC. Balnojan goes on to write, “DaC means stopping [talking] only about ‘data pipelines’ and DAGs [directed acyclic graphs] and start to think about data delivery pipelines.”
Read Balnojan’s article about the differences between data pipelines and data delivery pipelines. To me, this all sounds like DataOps (which I do like) extended — a fine idea, but I question whether it deserves to spawn another “as Code” acronym.
DataOpsZone offers a similar definition: “Data as Code is about adopting proven software development best practices within data management. Drawing inspiration from Infrastructure as Code (IaC), DaC extends these principles to the realm of data.”[8] The blog goes on to name the “core tenets” as versioning, automated testing, and continuous integration.
Yes, DaC sounds like DataOps to me.
We could spend the rest of this column analyzing the difference between DataOps and DaC. But that is not why I decided to write about the “as Code” obsession.
If you read my previous column, “The Unique Identifier of the Rose,” you may remember my rediscovery of Umberto Eco during my visit to Melk Abbey, Austria.[9] I wrote about his expertise in semiotics: “Eco authored books about it, and I look forward to reading them.” With my words published publicly, I had no choice but to find one of these books to read.
I thought I’d start with “Semiotics and the Philosophy of Language.”[10] “Times Literary Supplement” raved that “this collection can be read with pleasure by those unversed in semiotic theory.” With pleasure, yes, but not with ease, for me at least. I pride myself on my vocabulary, but I have spent more time looking up word definitions while reading Eco’s book than I can recall with any other volume. Synecdoche, metonymy, aletic vs. deontic, for starters, plus a liberal sprinkling of Latin, Greek, German, French, and Italian. It’s like an exhausting and exhilarating semantic obstacle course.
When I read chapter five, “Codes,” I recalled Data as Code, and so here we are. Eco devotes this chapter to a survey of the semiotic and linguistic approaches to Codes. These disciplines became as fixated on applying codes to everything as software developers.
“Thus, one can record such expressions as phonological code, linguistic codes, semantic codes, kinship codes, codes of myths, literary and artistic codes, cultural codes, genetic code, discrete vs. analogical coding, coded vs. uncoded communication, behavioral codes, gestural kinesic, paralinguistic, proxemic, physiognomic, perceptual codes — to give only some prudent examples.”[11]
Some prudent examples, indeed!
Before plunging into Eco’s elucidation and what illumination it throws on code as data, we need to go to the authoritative source for the origin and meaning of the word “code.” This is why I have a subscription to the Oxford English Dictionary online!
Per the OED, code originated from the “Anglo-Norman and Middle French, French code collection of laws or statutes (early 13th cent. in Old French).”[12] Eco refers to this original meaning when he defines institutional codes.[13] Centuries later, the meaning developed to encompass a system of signs or symbols, implemented as codes and cyphers.[14] Morse code is an example of these correlation codes.[15] The adoption of code in computing and linguistics is a 20th century phenomenon. In both cases, the word associates with language, for humans or digital machines.[16]
Eco introduces an exercise in creating a basic computer code to illustrate that most codes are not simply correlative (1 = A, 2 = B, etc.), but need inference or interpretation. This is important in considering whether language is a code, or a code can be language. In his example, Eco postulates a simple set of instructions for a computer to find two numbers in two different cells, multiply them, and record the product in a third cell. His point is that the same expression, in this case, “03” can mean an operation (to multiply) or the location of a number subject to the operation, based on the location of “03” in the sequence of commands. “Such a complex code implies contextual selections,” Eco observes.[17]
Ah, context. I’ve written before about context and data; how data quality itself is contextual.[18] There is no absolute measure of data quality. In fact, you need to consider some of the same factors all those codes Eco lists address, from semantics (what does the data mean, in this use case, environment, business) to cultural (what does the data mean to the users within their cultural framework). If “data as code” accounts for these contextual selections, then it may be a useful mental model.
These definitions of code all suggest rules, structure, a framework. As a data governance practitioner, I face skepticism when I propose a set of rules. I hear always that these are too confining, restrictive, innovation deadening. I now have a response. Towards the end of his code exegesis, Eco veers away from the academic tone of the chapter and reminds us of his evocative powers[19]:
“A code is not only a rule which closes but also a rule which opens. It not only says ‘you must’ but says also ‘you may’ or ‘it would also be possible to do that.’”[20]
I began my TDAN.com residency by writing about how structure encourages creativity and innovation rather than stifling it. I showed how J. S. Bach, working within the stringent framework of the fugue (an institutional code if there ever was one in music) created stunning masterwork after masterwork. “Data Governance — Providing the Structure for Invention”.[21] Code, in the sense that Eco memorably ascribes above, works just as well in this context as structure.
But I digress, so back to Data as Code. If this is about
managing data as code in the sense of the OED’s computing definition, then I’m
not sure that this will advance the art and science of data management. But if
we think of Data as a Code, a code that opens as well as closes, that expands
possibilities for using data within a structure, then we have a mental model
worth exploring.
[1] tdan.com/through-the-looking-glass-care-of-the-soul-of-data/30172
[2] octopus.com/blog/what-is-everything-as-code
[3] docs.aws.amazon.com/wellarchitected/latest/devops-guidance/everything-as-code.html
[4] octopus.com/blog/what-is-everything-as-code
[5] openresearchsoftware.metajnl.com/articles/10.5334/jors.ax#B1
[6] towardsdatascience.com/data-as-code-principles-what-it-is-and-why-now-aaf1e24fa732
[7] thdpth.com/p/data-as-code-a-new-mental-model-for
[8] dataopszone.com/data-as-code-dac-explained/#:~:text=Data%20as%20Code%20is%20about,to%20the%20realm%20of%20data.
[9] tdan.com/through-the-looking-glass-the-unique-identifier-of-the-rose/31352
[10] Eco, Umberto, Semiotics and the Philosophy of Language, 1984, First Midland Book edition 1986
[11] Eco, Ibid. pg. 166.
[12] “Code, N., Etymology.” Oxford English Dictionary, Oxford UP, December 2023, doi.org/10.1093/OED/1081909886 .
[13] Eco, Ibid, pg. 165
[14] “Code, N., Sense II.4.b.” Oxford English Dictionary, Oxford UP, December 2023, https://doi.org/10.1093/OED/1750064856.
[15] Eco, Ibid, pg. 165
[16] “Code, N., Sense II.5.b.” Oxford English Dictionary, Oxford UP, December 2023, doi.org/10.1093/OED/2747285473 and “Code, N., Sense II.7.” Oxford English Dictionary, Oxford UP, December 2023, doi.org/10.1093/OED/8027718018.
[17] Eco, Ibid, pp. 174-175
[18] Through the Looking Glass: The Context of Out-of-Tune Data – TDAN.com
[19] I’ll save you the trouble of looking this up: Exegesis: An explanation or interpretation of a text, esp. of scripture or a scriptural passage. Also more generally: a critical discourse or commentary. “Exegesis, N., Sense 2.a.” Oxford English Dictionary, Oxford UP, July 2023, doi.org/10.1093/OED/1065616672.
[20] Eco, Ibid., pg. 187
[21] tdan.com/towards-data-governances-4th-era-part-iii-providing-the-structure-for-invention/26793