Data is Risky Business: The Data Crisis Unmasked

As I am writing this, countries around the world are fighting a global pandemic. This is the first global pandemic of the modern information age. There have been a number of epidemics in recent years, such as Ebola, but they (thankfully) have been regional in nature. This time around, we have the power of AI, smartphones, and modern data analytics to help us react and respond.

However, with great power comes great responsibility. And this global crisis has exposed a number of critical data management issues that those of us who have been around the block a few times will recognize. Whether it is the challenges of comparing outbreak data from different countries due to differences in recording and calculating statistics, the effectiveness of contact tracing applications and their potential impacts on data privacy, or the ethics of how different administrations (local or national government level) are reporting data, we see similar themes to what we have encountered for decades in organisations. Only this time lives (and economies) are on the line.

That’s before we even consider the data management and information risk management aspects of the post-pandemic work place, where knowledge workers will increasingly be working remotely. Because of this arrangement, many of the traditional “soft-skill” sticking-plasters or manual process workarounds that we have implemented in organisations to declare “success” on data management projects will need to be reassessed and replaced as they simply won’t work (or at least won’t work as well) anymore. The days of being able to swing by the desk or cubicle of Jane in Finance with a friendly coffee and a muffin to have a quick chat about the issues in the latest data analysis runs before the problem becomes a crisis are over. Organisations that have had to embrace remote working overnight are joining the rest of us who have had to learn that managing in a remote context is different from managing when everyone is in the same place. And those challenges go far beyond dealing with interruptions from your kids when you are on a conference call.

Bluntly, the global pandemic has catapulted most organisations forward a full five years in their digital transformations but without any of the formal planning for how to use, deliver, manage, and govern data in this new world. It has also exposed the limitations of technology to deliver miracles when the data or processes are not clear or are simply unavailable.

Risk Management

A global pandemic of a potentially highly virulent disease with no vaccine available is something that has probably been on the risk registers of organisations at some level. But, as many governments have learned, the assumptions you make in framing a risk and the way in which that risk is presented can impact your planning and response. While many countries had identified a global influenza pandemic similar to the 1918-1919 influenza pandemic as a risk, assumptions about the availability of a vaccine and the existence of known effective treatments for influenza affected the risk treatment.

The United Kingdom, for example, hadn’t updated its risk profile or risk treatment for a flu pandemic since at least 2014. The UK, and other countries, disbanded or downgraded pandemic response preparation teams in recent years. Many other countries under-invested in the infrastructure for their public health infection control functions. For example, the Irish Health Surveillance Protection Unit for example has had no substantial investment in information management capabilities in over a decade.

In effect, governments were preparing for the wrong risk. They didn’t keep their planning, governance, and response process documentation up-to-date to reflect changes in risk profile or the available methods for gathering or disseminating information about a new threat. This happened all while removing or reducing funding to key governance functions that would have assisted in preparation to mitigate emergent risks.

Now, who among us hasn’t been part of that data quality or data governance programme in an organisation? Who among us hasn’t had to respond to a data security or data quality issue only to find that the documentation was out of date or didn’t exist? In organisations that have been implementing data protection and data privacy governance structures in response to changing legislation around the world, those teams and those CPO/DPO roles often struggle for resources. Just like public health and pandemic preparation teams.

Data Quality

When is a COVID-19 death not a COVID-19 death? When is a test not a test? These fundamental questions are key to comparing data between different jurisdictions. In Chile, the health ministry records people as “recovered” after 14 days, regardless of their medical status, because they are no longer contagious. Which means people who are recovered who later die might not be recorded as being COVID-19 related deaths. In the UK, the government counted tests that were ordered or posted out to people as tests that had been completed in order to hit a target of 100,000 tests, some of which had no return address labels for returning completed tests. The CDC appears to be counting PCR tests for active infections along with serology tests for antibodies as the same thing in their data (they are not).

These examples highlight two key issues from data quality:

Data Definition (information product specification) and data lineage (understanding where your data came from and what it is actually telling you)
The problems of “managing by quota” and reporting success based on the same.

In respect of the former, comparison of data for infection, deaths, and recovery between different jurisdictions need to be scrutinised to ensure that the comparison between things that are the same. Whether a country is counting deaths in elder care facilities or not or whether they are classifying people as recovered by default after 14 days should be highlighted. This is a key test also for data literacy of media reporting on things. It is important to look behind the numbers and understand their derivation.

With regards to management by quota, it is all to easy for a government (or any other organization) to set a target for something and then adjust how the thing is measured so that the target is reached. Deming wrote about this in Out of the Crisis when he described how manufacturers would ship defective goods to meet end of period quotas, knowing that the returns and complaints would be dealt with in a latter period. Mission accomplished.

A key difference here is that management by quota, without a clear definition of what that quota is measuring, can lead to false confidence which, in the context of a pandemic, can cost lives. If the operational definition of a “COVID test” is “a test that has been sent out, correctly administered, returned, and a diagnostic analysis completed,” then that should be stated.

That, of course, means understanding an end-to-end information value chain and associated processes to make sure that everything works so that a valid test can be completed and counted. Anything less is theatre.

Data Ethics

Another area that has been highlighted over recent weeks is the importance of ethics in the handling and presentation of data and information management technologies.

Allegations have emerged of different jurisdictions (such as the State of Georgia in the US) adjusting graphs to create a visual of a downward trend by putting time series data out of sequence. Florida removed the data scientist who was running their COVID-19 data reporting from her role, despite the reporting having been praised for its transparency.

In addition, we have the ethical issues of data-driven responses to managing the pandemic, from contact tracing applications to thermal scanning. Deploying technologies such as these requires a balancing of privacy and public interest, but also requiring objective rigour in assessing whether the technology will actually work for the purpose for which they are proposed.

For example, thermal cameras sound like great idea. Anyone registering a temperature over 38 degrees Celsius can be denied entry to the building to keep everyone safe. Only, what do you do about false positives and negatives with those technologies? Are there other things that might cause someone to run a high skin temperature (for that is what they measure) from time to time? Any hormonal conditions? Do any of your staff cycle or run to the office? Is there anything that could be done by a malicious actor (or an overly diligent staff member) that could fake out the scanner by suppressing their temperature, like taking paracetamol?

We also have the ethical and data management issues that are raised by remote working where the organisation might have a need to measure productivity in some way, but they are measuring and monitoring in a space that is also the employee’s personal space that they may share with other people. The decisions organisations take about how they manage and measure work in this context will be significantly important for trust in the work place, particularly as decades of research show that knowledge workers don’t like to be treated like battery hens.

Data Debt

Organisations are also going to have to face up to their process and data debt. The disintermediation of work that will arise from remote working will mean that staff won’t always have access to the human subject matter expert and there will need to be codification and governance of data and data related processes.

If a team member can’t do their work because they can’t access or understand the data, that is not their fault, that is management’s failing.

We need to stop talking about this stuff and actually get down and do it. And we need to stop declaring “mission accomplished” on challenges we’ve only just gotten a start on.

A Call to Action

The COVID pandemic has unmasked the crisis in data management that was lurking below the surface. “Black Swan” risks arising from data quality, data governance, and the patchwork of information management tools and technologies in many organisations, combined with underinvestment in the people and process aspects of data compared with the technology are coming home to roost as organisations see their digital transformations accelerated.

As data management professionals, we owe a duty to those we work with, for, and on behalf of to start tackling these endemic issues so that the resilience of our organizations and economies can be improved ahead of the next “will never happen” level of global crisis.

MenuMenu

Data is Risky Business: The Data Crisis Unmasked

Risk Management

Data Quality

Data Ethics

Data Debt

A Call to Action

Daragh O Brien

MenuMenu

Risk Management

Data Quality

Data Ethics

Data Debt

A Call to Action

Share this post

Daragh O Brien