I recently taught an online class on BCBS 239: Effective Risk Data Aggregation and Reporting for Risk.net. Preparing the course materials took me back to 2007-2008, when I worked for Merrill Lynch managing the Credit Risk Reporting team. I recall how difficult it was for the banks to provide the aggregated risk data the regulators demanded during the crisis. A significant contributor to the banks’ troubles was the lack or the consistent use of a unique identifier for each counterparty.
In fact, I can recall even before I worked in banking that this issue bedeviled an insurance company I consulted with. We were attempting to provide a 360-degree view of the business the company had with their clients (yes, this was 25-plus years ago, so nothing has changed!) and I vividly remember studying the data model of their “Client Master.” At one point, the lead consultant laughed, and said, “Your Client Master has no client!” This was true. The model had nothing to uniquely identify the client.
When I got to Merrill Lynch in 1998, the institutional business was in the midst of a multi-year migration to a new counterparty master system. I thought that was wonderful, until I came across “counterparties,” with unique IDs named, for example, “PIMCO O/B/O IBM Pension Fund.”
“There must be some mistake,” I said to a colleague. They told me that this was not a bug, but a feature. The ID for PIMCO O/B/O IBM Pension Fund represented the relationship between PIMCO and IBM Pension Fund when the former trades on behalf of the latter. PIMCO and IBM Pension Fund each had their own unique IDs. This “Account ID” linked the two of them together to depict the trading relationship. Trades could be booked to PIMCO O/B/O IBM Pension Fund or to IBM Pension Fund directly, without the trading system needing to do anything different.
To my colleagues, this was a perfectly logical construct. But over time, expediency corrupted this construct. Teams used the account ID structure for all sorts of scenarios. For example, in certain countries, we needed to hide the name of the underlying counterparty from most employees. The trading desk requested an account ID with a name such as “PIMCO O/B/O AB 24.” Again, PIMCO had its own identifier, but “AB 24” was just part of the name of the Account ID. The actual counterparty had no unique identifier in the master counterparty system. Only employees in the specified country could view the name, which was typically stored offline.
I had one opportunity to strike a blow for sanity. I coordinated business requirements for a new derivatives master agreement system. The existing system identified each agreement by a combination of master ID, agreement type, and internal entity ID. Thus, you could have many ISDA agreements with master ID 1111, each with a different internal entity. (“ISDA” stands for International Swaps Dealers Association. ISDA agreements were the most common agreement type.) All the business stakeholders agreed this was terrible, and that a master ID must be unique. We set that down as one of our highest priority requirements.
The developers realized a single ID presented challenges. Data conversion required a mapping between the existing agreement record to the new ID. Downstream systems had adapted to the weirdness of the tripartite agreement identification system. Early in the project, the head of credit risk technology, who was leading the development team, came to me. He complained that this single ID requirement was turning out to be quite difficult. He asked if we could drop it. I said no — it had to stay. I must have been more persuasive than I thought — I was certainly stubborn. When we went live later in 2002, we had one unique ID per agreement. I like to think this is one aspect of the system which contributed to its longevity as it was still in use after the Bank of America merger, and, as far as I’m aware, even after I left Bank of America in 2013.
I’ve found this challenge with every employer I’ve worked for since. As I prepared to teach the BCBS 239 class, I asked myself why creating and maintaining unique identifiers is so difficult. Surely, part of the challenge with big banks is system integration and legacy data, but I’ve seen smaller firms struggle as well. Data in silos, common to most organizations, contributes to the problem. But that should make the unique identification of customers and other critical data even more attractive. So, what makes this so hard?
As I write this, I’ve just returned from a European river cruise, and one of our stops was in Melk, Austria, home of the famous Melk Abbey. We toured the Abbey, including the renowned library, home of over 100,000 volumes (none of which we were allowed to peruse). This medieval marvel reminded me of a book I had read decades ago, Umberto Eco’s “The Name of the Rose.”[1] For those of you not familiar with the book or the movie (which starred Sean Connery), “The Name of the Rose” is a murder mystery. It takes place in another Benedictine Abbey in Northern Italy, and its vast library features prominently in the story. I looked up the book on Wikipedia to refresh my memory. Not only did the fictional library and the Melk library share their massive size, but the narrator and leading protagonist is Adso of Melk!
But what I did not recall is that Umberto Eco, who passed away in 2016, was a professor of semiotics. His theories of semiotics factor into the symbolism and layers of meaning embedded in his book. Semiotics did not initially ring a bell for me, but it does sound similar to semantics. We data professionals are all acquainted with semantics, and semantic layers. Indeed, the terms are related. They both have the same etymology, from the Greek word for “significant,” which is further related to the Greek words for “to interpret a sign” and “sign.”
So, what is semiotics? Eco authored books about it, and I look forward to reading them. For now, we can start with the Oxford English Dictionary’s definition: “The science of communication studied through the interpretation of signs and symbols as they operate in various fields, especially language.” Daniel Chander, in his “Semiotics for Beginners” paper, writes: “We seem as a species to be driven by a desire to make meaning: above all, we are surely Homo significans — meaning makers. Distinctively, we make meanings through our creation and interpretation of ‘signs’.”[2]
The “Signs” chapter of Chander’s paper is long and comprehensive, given that the paper is for beginners like me. He covers the two principal models for defining what a sign is and its relation to what it signifies. Ferdinand de Saussure proposed a two-part model, composed of a “signifier,” the form of the sign, and the “signified,” the concept it represents. Charles Sanders Peirce, not to be outdone, formulated a tri-part model:
- The Representamen: The form which the sign takes (not necessarily material).
- An Interpretant: Not an interpreter but rather the sense made of the sign.
- An Object: To which the sign refers.
Now if that sounds all esoteric to you, I agree! Chander dives deep into the pros and cons of these models and the commentary offered by Eco and other major figures in semiotics. And this is just one chapter in the paper! I was not going to spend hours of my vacation reading the rest of the paper, so I stopped here. I did find something that struck me as relevant to the topic of unique identifiers — the three modes of signs:[3]
- Symbol/Symbolic: A mode in which the signifier does not resemble the signified, but which is fundamentally arbitrary or purely conventional, so that the relationship must be learned. Among the examples: languages, numbers, morse codes, traffic lights.
- Icon/Iconic: A mode in which the signifier is perceived as resembling or imitating the signified — being similar in possessing some of its qualities. Examples: a portrait, a scale model, a metaphor, and realistic sounds in program music — think of the simulated thunder from Beethoven’s “Symphony No. 6” or Berlioz’s “Symphonie Fantastique.”
- Index/Indexical: A mode when the signifier is not arbitrary but is directly connected in some way to the signified. Think of medical symptoms or signals like a knock at the door.
Unique identifiers, as we typically encounter them in the context of data, clearly fall into the first category. I’ve always argued that these types of identifiers should be arbitrary, with no embedded meaning. Many accounting systems feature account numbers with embedded meanings. For instance, the first several characters in the account number could indicate a product type. The next couple might be a country code. I’ve never liked accounting systems much for this reason.
But there may be an innate difficulty grasping that a random string of alphanumeric characters signifies a customer, a department, or another entity. The name of a company, for example, can itself signify what the company does, the product it produces, or the image it seeks to promote. The name can be iconic or indexical. A unique ID cannot. I remember certain counterparty IDs from my days at Merrill Lynch even today. That’s only because we used those IDs for test cases again and again. Shakespeare, in “Romeo and Juliet,” says that “a rose by any other name would smell as sweet.”[4] But what would a rose’s unique ID smell like?
I asked the question earlier about why implementing unique identifiers is so difficult. I speculate this is because our minds struggle to make the association between these abstract, arbitrary signs and the concrete entities they signify. Perhaps there is merit in embedding some degree of meaning in an identifier after all. But full stop if that means I need to accept the merit of account numbers where digits 11 to 13 tell us what the product is!
[1] Eco, Umberto. The Name of the Rose. Translated by Richard Dixon and William Weaver, Vintage, 2014.
[2] Chandler, Daniel, “Semiotics for Beginners”, 1999
[3] Chandler, Ibid.
[4] Shakespeare, William, Romeo and Juliet, Act 2, Scene 2