What is the work of data management and analysis for if we don’t properly gather, utilize, and listen to the information we say we’re seeking? While it is reasonable to believe in the burden of proof, endless data collection with no clear direction in mind to address related phenomena is a waste of everyone’s time. In the next few iterations of this column, we will be looking at how to improve equity in each step of data utilization, beginning with its collection.
As stated in this column’s previous issues, one of the most obvious ways to ensure that data is both equitably collected and utilized is to ensure diversity within the group of people utilizing it. However, as Caroline Criado Perez stated in a 2020 TIME article, We Need to Close the Gender Data Gap By Including Women in Our Algorithms, “Closing this data gap is both easy and hard. It’s easy because it has a very simple solution: collect sex-disaggregated data. But it’s hard because the gender data gap is not the product of a conspiracy by a group of misogynistic data scientists.” Bias is ingrained in all of us, and even if those in charge were 51 percent female and 49 percent male (as is the general makeup of the world, not accounting for those who identify outside the gender binary), it would not solve all of our data issues. Is this an excuse to slack off on hiring women in the data and information fields? Certainly not, but it is a reason to double check the prejudice and partiality of everyone’s work, regardless of gender or otherwise identification.
As we cannot rely on our instincts to guarantee representation in studies and beyond, it is crucial to advocate for the implementation of clear policies and systems to keep us in check. Consider, for example, the Rooney Rule. It originated as a National Football League policy, requiring teams within the league to interview minority candidates for senior football operation and head coaching jobs. There is no hiring quota, the interviewed pool is simply not allowed to be entirely white. The practice has been extended to many other fields, where pools of candidates for various positions must include a certain percentage of women and people of color. In this way, though we know that hiring almost always includes some bias, a methodology is in place that attempts to subvert prejudice and push processes in a more equitable direction. The Rooney Rule (while a good practice that everyone should use when recruiting new hires) does not exactly fit the bill for data collection. It does, however, leave some numbers games up to a set procedure, which I propose all data-gathering organizations and individuals do as well.
It is an unfortunate reality that studies are very rarely representative of the people they are attempting to survey and that we cannot trust our own judgement when it comes to representation. It has been shown that when scanning a large group of people, we overestimate the representation women and minorities have within the group. For example, if you walk by a crowd that is 40% female and 60% male, you will likely assume it was majority female. Criado Perez continues in her TIME article:
“From cars that are 71% less safe for women than men (because they’ve been designed using a 50th-percentile male dummy), to voice-recognition technology that is 70% less likely to accurately understand women than men (because many algorithms are trained on 70% male data sets), to medication that doesn’t work when a woman is on her period (because women weren’t included in the clinical trials), we are living in a world that has been designed for men because for the most part, we haven’t been collecting data on women.”
It seems that we must install a rule wherein trials and studies must utilize sample sizes that are representative of the population projected to be affected by its outcomes. If a medication will be used by people of all genders, then those genders must all be represented in the trials. Just as we cannot assume male, we cannot assume white, cisgender, or any other identities. And if we cannot leave the work of counteracting assumptions up to human error or machine error, we might have to take pen to paper when accounting for accurate groupings. Algorithms reflect the partiality of their creators and will not save us, but good old-fashioned rules could.
Inaccurate data harms all of us. No one wins when a 17 year old girl passes away in a car crash because the airbags for the driver’s seat were exclusively modeled after the average male body. No one wins when half the population’s heart attacks are more likely to be brushed off than diagnosed. As of 2019, 51% of iPhone users identified as female. Using a 70% male data set to design its voice recognition software is negligent. So, tell me– what rules will you put in place to improve data collection, and therefore our world?