A commonly lamented problem in machine learning is that algorithms are biased. This bias can come from different sources and be expressed in different ways, sometimes benignly and sometimes dramatically.
I don’t disagree that there is bias in these algorithms, but I’m inclined to argue that in some senses, this is a feature rather than a bug. That is: all methodical choices are biased, all data are biased, and all models are wrong, strictly speaking. The problem of bias in research is not new, and the current wave of despair is simply a reframing of this problem with automated approaches as the culprit.
To be clear, there are serious cases in which algorithmic biases have led to deeply problematic outcomes. For example, when a proprietary, black box algorithm regularly suggests stricter sentencing for black defendants and those suggestions are taken to be unbiased, informed wisdom – that is not something to be taken lightly.
But what I appreciate about the bias of algorithmic methods is the visibility of their bias; that is – it gives us a starting point for questioning, and hopefully addressing, the inherent social biases. Biases that we might otherwise be blind to, given our own personal embedding in the social context.
After all, strictly speaking, an algorithm isn’t biased; its human users are. Humans choose what information becomes recorded data and they choose which data to feed into an algorithm. Fundamentally, humans – both specific researchers and through the broader social context – chose what counts as information.
As urban planner Bent Flyvbjerg writes: Power is knowledge. Those with power not only hold the potential for censorship, but they play a critical role in determining what counts as knowledge. In his ethnographic work in rural appalachia, John Gaventa similarly argues that a society’s power dynamics become so deeply entrenched that the people embedded in that society no longer recognize these power dynamics at all. They take for granted a shared version of fact and reality which is far from the unbiased Truth we might hope for – rather it is a reality shaped by the role of power itself.
In some ways, algorithmic methods may exacerbate this problem – as algorithmic bias is applied to documents resulting from social bias – but a skepticism of automated approaches opens the door to deeper conversations about biases of all forms.
Ted Underwood argues that computational algorithms need to be fundamentally understood as tools of philosophical discourse, as “a way of reasoning.” These algorithms, even something as seemingly benign as rank-ordered search results – deeply shape what information is available and how it is perceived.
I’m inclined to agree with Underwood’s sentiment, but to expand his argument broadly to a diverse set of research methods. Good scientists question their own biases and they question the biases in their methods – whether those methods are computational or not. All methods have bias. All data are biased.
Automated methods, with their black-box aesthetic and hopefully well-documented Git pages, may make it easier to do bad science, but for good scientists, they convincingly raise the specter of bias, implicit and explicit, in methods and data.
And those are concerns all researchers should be thinking about.