Data is Risky Business: Ethics at the Core of a Data-Driven World

There is a famous Dilbert cartoon from a few years ago in which Dogbert provides Dilbert’s company with advice on how they can generate revenue from the information they hold about their customers, but first they have to dehumanize the enemy by calling them data. This pretty accurately summarizes one approach to data management that has pervaded the early years of this Information Revolution, and it is reported that a Google representative at an event held in Denmark recently told the audience that there wouldn’t be any change in data ethics until there was a market for it.

However, as the tools and technologies for gathering, analyzing, and acting on information become increasingly powerful, we find ourselves facing a tipping point in our love affair with these technologies. This tipping point is all the more pronounced as we consider the impact of data-driven processes on democratic processes and human rights around the world.

The question of ethics in information management is often conflated with the challenges of managing data privacy, particularly in an increasingly interconnected information landscape. However, privacy is the entry point for any meaningful discussion on ethical issues in information management. When we begin to look at the various ethical issues that arise in the implementation of ‘big data’, we see that the real privacy issue is not simply the potential loss of privacy and individual agency in an age when we are transparent to the algorithms, but rather the issues that arise when we must trade off privacy against other issues or benefits. If information should be processed to serve mankind (as Recital 4 of the General Data Protection Regulation tells us), and as we dig deeper into the ethical issues, we should find further questions of ethics and ethical conduct that impact on that fundamental principle of ethical information management.

In Chapter 3 of the book I’ve recently written with my colleague Katherine O’Keefe, Ethical Data and Information Management, we look at examples such as the ethical questions raised when the tools for big data analytics can only run on technology that is affordable in the First World, a problem which has lead one data scientist and blogger to explore the potential for what he calls “Cheap Data Science”. The ethical question here is simple: is it fair that the future, to paraphrase the science fiction author William Gibson, “is here, but not yet evenly distributed”? We can also question how the very people who might best benefit from improved data analytics of issues, such as soil erosion or the spread of disease, cannot because of the barrier to entry created by the bias about system performance and network capabilities that developers living in affluent Western economies have baked into the design of the technologies that data analysts in developing countries would benefit from having. Is it ethically responsible or sustainable to design software and tools that only work reliably in wealthier developed nations?

We also look at the potential benefits and harms of granular tracking and microtargeting of students at university level. There is a prevailing mind-set of ‘more data is better’, which has led to the development of technologies that analyze and predict student behavior, performance, and potential to drop out. However, there is every reason to believe that the headline success stories are simply describing correlation rather than causation. This raises additional ethical issues in the data-driven world where success stories are often not subjected to the rigorous scrutiny that they should otherwise be subjected to. In the case of the burgeoning Ed Tech sector, the unanswered question that needs to be addressed is whether the investment in technologies to track students in their performance and interactions with course work is the cause of higher grades and better performance as claimed. Or, whether students who would perform better and get higher grades are attracted to courses that have these cutting-edge facilities available. Is the relationship described causation or correlation?

Furthermore, even if there is a causal relationship, there has been limited research on the potential downsides of this type of invasive student tracking. The research that has been done raises concerns about the impact on pedagogic methods in universities, and also raises concerns about student privacy and the chilling effects on independence of thinking and expression among students, as well as the choices that students or parents might make about course selection or their academic performance.

The issues of algorithmic bias in artificial intelligence also give rise to ethical concerns, particularly when the questions of the inherent bias in training data are taken into account. While these algorithmic processes are often hailed as beneficial to society through time and cost savings, they often they come with a hidden cost. For example, in Chapter 4 of our book we look at the problems with systems like COMPAS, a sentencing support system used in the US courts system, which journalists at ProPublica found to be ‘remarkably unreliable’ in its predictions. White defendants were nearly half as likely to be flagged for potential risk of reoffending as African-American defendants and the sentences recommended tended to be longer.

The question of how we train AI systems is, in and of itself, an ethical choice. In many respects when we are developing AI systems we are acting as parents, imparting values and supporting the development of ways of thinking about issues and inferring facts from the available data. The quality of the models we develop is directly influenced by the quality of the models we are developing from. In the example of COMPAS, a likely root cause of the inherent bias in the system is the want for better expression, the inherent bias in the system. Historical court rulings and case studies were used to train the AI. Historically, certain ethnic groups have faired better or worse in the US criminal justice system. Similarly, facial recognition machine learning inherits biases from the images used to train it.

Other aspects of algorithmic bias are more subtle in their societal impact. When women tend to be shown lower-paying job advertisements, and hiring algorithms replicate similar results, then this is shows an undesirable social and societal effect that raises ethical questions of fairness in the work place. The emergence of Lethal Autonomous Weapons Systems raises the potential for armed conflicts to become more commonplace as the risk to human life (on the combatant side) will be reduced through the deployment of autonomous weapons platforms.

Of course, it is not all doom and gloom. There are also many examples of the use of machine learning, AI, and data analytics to support and enhance the human condition. Companies such as Microsoft and Accenture have demonstrated smartphone-based assistive technologies for the visually impaired, which can assist with a range of tasks and narrate information to the user. Developments such as these have the potential to significantly benefit the lives of countless people and are impossible without the analytics, machine learning, and AI technologies that are at the cutting edge of our data driven world. However, there are open ethical questions to be resolved. For example, implementing facial recognition into these technologies might mimic the human eye and brain identifying a familiar face in a crowd. But the location of that processing and matching and who else has access to the bio metric data of the people that you know and want to recognize becomes a balancing act. After all, with an assistive technology the matching will not happen inside your skull but rather in a web-based service hosted either on a device or in a cloud environment.

In the Information Revolution we are generating, capturing, and processing an increasingly wide array of data about people, products, locations, events, and the relationships between them. As the potential to impact the lives of people in increasingly subtle but significant ways continues to increase, it is essential that we put an ethical core at the heart of the data driven world. This does not, however, mean we need to invent a new discipline or develop ‘new ethics’. Many of the questions we struggle with today have been discussed for many thousands of years. The philosopher Martin Heidegger famously extolled the need to recognize technology as a means to an end, not an ends in and of itself. Immanuel Kant equally famously exhorted us to treat people as an ends in and of themselves, not simply a means to an ends. Plato decried the impact of new technologies on the way knowledge is codified and imparted (he was talking about the development of writing, but that was the ‘big data’ of its day).

What we need to do is to incorporate fundamental ethical concepts and principles into the defined data management disciplines that we already have. Anything else would simply be reinventing the wheel. The integration of ethics into Data Governance, its influence on Data Quality management, and the need to explicitly recognize the motivation for data processing activities are all key issues that need to be addressed. Appropriate mechanisms need to be introduced into organizations to ensure an effective alignment between the Ethic of the Individual, the Ethic of the Organization, and the Ethic of Society. After all, if the Ethic of the Organization is to consider all customers nothing more than “data,” then it will be difficult for any individual to act outside that ethical frame, even when the organization is at odds with the Ethic of Society and is being lambasted for failing to control fake news and other issues.

Organizations that succeed in addressing this challenge will develop a strong sustainable competitive advantage as they will find it easier to attract and retain both staff and customers. Organizations that don’t will find themselves becoming increasingly counter to the Ethic of Society as they find out that they can only attract and retain staff on the more extreme negative end of the Ethic of the Individual. With higher staff turnover and a higher chance of being blind to an ethical crisis, such organizations will ultimately fail.

Another consideration is the pace at which legislation lags technology innovation at times. Legislation and regulation are usually only considered when the Ethic of Society has been so outraged by the actions of an organization that legal sanctions are imposed or legislation for such sanctions are introduced. By adopting and adapting to an ethical information management culture, organizations will keep ahead of the requirement for regulation by simply striving to do the right thing in the context of Society, not just the Organization.

The tipping point is now. Those organizations that put Ethics at the core of their data-driven world will reap long-term benefits. We have been here before. In the 1980s Joseph Juran[1] gave a speech where he set out the core thesis of his Quality Trilogy. In the book, we map some of the key statements Juran made to the current ethics environment in information. That mapping is reproduced below for your reference.

Figure 1: Extracted from Chapter 10 of Ethical Data and Information Management, published by Kogan Page

[1] The Quality Trilogy: A Universal Approach to Managing for Quality, A paper delivered to the ASQ, accessible online at http://pages.stern.nyu.edu/~djuran/trilogy1.doc

MenuMenu

Data is Risky Business: Ethics at the Core of a Data-Driven World

Daragh O Brien

MenuMenu

Share this post

Daragh O Brien