This isn’t the column I was intending to write for TDAN.com. I had something else nearly ready that was expanding on the broad questions of ethics in information and data management I discussed last time, drawing on some work I’m doing with an international client and a recent roundtable discussion I had with some regulators and other stakeholders (under Chatham House Rules) discussing the concept of an Ethical Data Review Board in commercial contexts.
But events have occurred. Events that bring to mind, once again, the importance of data management professionals taking some time to think the unthinkable and take our risk models outside the boundaries of our own immediate learned experiences.
In other circumstances, this might be an article about the need for developers of convenient devices for tracking things to consider the potential for those technologies to be misused or abused to track people and design protections in before launching the product. All too often, the risk of malicious use of data is not considered in design and implementation of products and systems (one reason for this is often a lack of diversity on design teams). And, even where it is, there is always a risk that is considered so unthinkable, or so low probability, that it is discounted. After all, things will never get that bad, surely?
Well, this past month we have seen two things that are having and will continue to have a profound effect on risk modelling for data processes in organisations. And both of them are teachable moments about how we think about ethics and risk in data management. The first is the invasion of Ukraine and the Digital Iron Curtain that has begun to descend on Russia as the State tightens controls on sources of information and as sanctions cause companies to withdraw from that market. The second is the hardening of regulatory attitudes in the EU towards cross-border data transfers to the United States and elsewhere, which has a potentially significant impact on the functioning of cross-border data flows and the ability of EU-based organisations to use cloud services or third party providers.
But First, a History Lesson.
The modern era of data management arguably began with the Hollerith punch card tabulation machines developed for the 1890 US Census and which were later developed and sold by a small company called International Business Machines. In the early part of the 20th Century, these machines were sold to governments around the world, including in Europe.
This was an era of technological advancement in government, with industrial rationalisation and the development of bureaucracy driving a need for accessible, timely information. In France, one of the pioneers of this work was René Carmille, a former army officer turned civil servant, who promoted the value of codification of geographic areas, economic activity classification, professions, and a citizen registration number.
In effect, he was a pioneer of master data and reference data management for statistical and operational purposes in the dawn of the information age. He realised early on in the development of the census tabulation systems that the recording of data such as religion could lead to misuse or abuse, and resisted the collection of this data in the National Census on the grounds that France was a secular nation and, therefore, a civil census didn’t need to know about people’s religion (note: Religious Belief is still a protected category of data under EU law and most international data protection laws).
After the Nazi invasion of France, Carmille continued his work for the Vichy government (the French government of occupation) while also being an active member of the French Resistance. His statistical services office was ordered by the occupying forces to use their national registration scheme to help identify Jewish people living in France. Using a combination of “work-to-rule” tactics, which delayed the production of automated reports, and sabotage of tabulation machines so they would never record data about religion from punch cards, Carmille managed to at least slow the plans of the Nazis as they had to rely on manual records. He also used data from death notices to help make false identities for resistance fighters.
Carmille died in Dachau concentration camp in January 1945.
In the Netherlands, a comprehensive population registration system for administrative purposes had been established by the late 1930s that was designed to track people from “cradle to grave” and was extended in the early 1940s to include ethnicity and religion data for Dutch citizens. The efficiency and effectiveness of this peace-time system is recognised by many as a key component in the overall strategy of the occupying forces in the 1940s.
In more modern times, we have seen population census data or population identity data used to support genocide. The Rwandan massacres in the 1990s were facilitated by an identity card system inherited from colonial times that brought the concept of ethnic grouping to the fore.
Lessons to Learn
Today, our data landscape is different. But it is also very much the same. We have governments with large data holdings about people. These often record sensitive information about or relating to people that, in a benign context needs to be kept secure from accidental or unauthorised disclosure or unintended uses. We also have private companies of all sizes with varying degrees of data about us, our likes and interests, our health, and who we communicate with.
But when this data falls in to the hands of a non-benign actor, whether through an information security breach or, more fundamentally, through the physical presence of a hostile government or invading force at the desk and keyboard of the staff who are authorised to access the data, then there are no guarantees that that data will be safe and will be used as intended. There is every chance it will, much like the data in the Dutch national registration system in the 1940s, be used in a manner that is harmful to individuals.
Organisations focus a lot on the information security risk, the risk of exfiltration of data by a criminal gang (or potentially a government actor) or the crippling of national infrastructure through ransomware attacks like the one that took out the IT infrastructure of the entire Irish Health Service for nearly a year during a pandemic.
But there are clearly other threats we need to be considering in our design and implementation of data management systems. Therefore, we need to think the unthinkable in our threat modelling. What if the government ceases to be a benign actor, either through the election of a totalitarian strongman or the arrival on the doorstep of an invading army?
The Interconnectedness of Things
Of course, René Carmille was dealing with a tabulation system that was in the same building as his team. The massive advances in information technology over the past three decades (happy 30th birthday, world-wide web) have created a massively integrated and interconnected information exchange. Things are a lot more complicated now than hacking the pins in a punch card reader to prevent data being read from the card.
Data crosses borders. Data moves through jurisdictions with different laws. And not all jurisdictions that data moves through are always benign and may engage in monitoring or interception of certain data. Equally, hostile regimes need to restrict access to information in their countries so they limit access to the global internet or they monitor at a granular level the sites and services accesses, the metadata of messages sent and received, and a host of other telecommunications metadata.
(As an aside, this is also the same type of data that the adtech industry tries to harvest to enable them to sell beer and diapers more effectively to the discerning consumer).
Privacy and security conscious users will try to minimise the risk of this monitoring through the use of VPN services, and data encryption. But that requires people to have planned for the threat model of “someone snooping on you who wishes you ill”.
The problem arises when a government decides to flex its muscles in respect of how people can access on-line services or data.
Reportedly, Russian social media users are trying to get around the banning of western media sites through the use of VPNs. However, as access to app stores is also blocked since the start of the month, meaning that only people who have already downloaded reliable and secure VPNs or who are technically savvy enough to install them on devices without an App Store can do so now.
However, on the other side of the coin, this month has also seen Data Protection Regulators in Europe begin to issue decisions and guidance on cross-border data transfers to the US under GDPR. These cases have adopted an absolutist approach to the perceived risk of US authorities having unfettered access to data about or relating to people in the EU transferred to or processed by US-owned electronic communication service providers.
The data in question would include things like IP addresses, search logs, files stored in cloud-computing environments such as popular office productivity suites, email addresses etc. My colleague, Carey Lening, has written a post on LinkedIn about the looming headache posed by these decisions.
The summary for this article is this: in an effort to increase the protections over data relating to people in the EU and to apply the law as it is written, regulators may have overlooked the implications at a technical level of pulling up the drawbridge. As it is unlikely the US will change the statutory provisions and Executive Orders which are at the heart of this issue, we are in a game of chicken when it comes to the infrastructure of our modern data environments. So, it is going to get ‘complicated’.
What is a Data Geek to do?
As I write this, there is a war going on a 5 hour flight from me and refugee children are arriving in my country with just the clothes on their backs. Their perception of the threat and risk model for data will probably be very different to mine. But, as I ponder the lessons of history, I think there are some things we can do as data leaders.
1. Always think of the worst case scenario when identifying risks to people arising from data.
On a simple level, data glitches can harm people. Two years ago, a glitch in the supply chain systems for KFC in the UK lead to riots. Bugs in software used by the Royal Mail resulted in staff being wrongly accused of theft, losing their jobs, and many going to prison. At the macro level, we should never assume that our governments or organisations will not misuse data and we need to plan our controls and safeguards to undermine the misuse of data, just like René Carmille did.
2. We need to get ready for a great disruption in EU/US data flows as Regulators are now taking action.
For better or worse, EU data protection supervisory authorities are now taking action and closing the door on EU/US data transfers due to the failure of the US to address the gaps identified since the 1990s in the legislative frameworks that apply to personal data in the US. This may have knock on implications for other jurisdictions, even those that currently enjoy an “Adequacy Decision” from the EU Commission.
Rather than seeing this as a negative, we must see this as a liminal moment where we are choosing the path for the future of data architectures operating at a trans-national level. We need to choose carefully.
3. We need to develop an understanding of data history.
Just as there is a movement now around data literacy, we need to match that with a focus on data history. This will help us understand how data can be a cause and cure of risks to people and society. It may also provide a refreshed understanding of the nuances of lived experiences.
For example, European data protection laws grew out of the experiences of World War 2 not as a restrictive structure to prevent the processing of data, but their original framing (via the OECD and the Council of Europe) was intended to promote higher common international standards of protection of people in the context of their data so that data could move across borders.
But there are countless other examples of how data can and is misused to the detriment of people. We need to remember and learn from these, whether it is identity cards in Rwanda, filter bubbles on social media in Myanmar, or simply failing to accept and speak up that a financial accounting system in the Post Office was so bug-ridden it was not fit for purpose.
Conclusion
We are living through interesting times where we need to think the unthinkable about the threats and risks to people, to data, and to people as a result of the use of data. Just as COVID pushed us into one cycle of digital transformation, the current crisis has put the focus on a range of issues from data security and the ability to access fair and unbiased information to energy consumption and stability.
Data and data management can be both the cure and the cause of these issues. When in doubt, perhaps ask yourself, “What would René Carmille do?”