Category Archives: Computer Science

Computational Models of Belief Systems & Cultural Systems

Work on belief systems is similar to the research on cultural systems – both use agent-based models to explore how complex systems evolve given a simple set of actor rules and interactions – there are important conceptual differences between the two lines of work.

Research on cultural systems takes a maco-level approach, seeking to explain if, when, and how, distinctive communities of similar traits emerge, while research on belief systems uses comparable methods to understand if, when, and how distinctive individuals come to agree on a given point.

The difference between these approaches is subtle but notable. The cultural systems approach begins with the observation that distinctive cultures do exist, despite local tendencies for convergence, while research on belief systems begins from the observation that groups of people are capable of working together, despite heterogeneous opinions and interests.

In his foundational work on cultural systems, Axelrod begins, “despite tendencies towards convergence, differences between individuals and groups continue to exist in beliefs, attitudes, and behavior” (Axelrod, 1997).

Compare this to how DeGroot begins his exploration of belief systems: “consider a group of individuals who must act together as a team or committee, and suppose that each individual in the group has his own subjective probability distribution for the unknown value of some parameter. A model is presented which describes how the group might reach agreement on a common subjective probability distribution parameter by pooling their individual opinions” (DeGroot, 1974).

In other words, while cultural models seek to explain the presence of homophily and other system-level traits, belief systems more properly seek to capture deliberative exchange. The important methodological difference here is that cultural systems model agent change as function of similarity, while belief systems model agent change as a process of reasoning.



Computational Models of Cultural Systems

Computational approaches to studying the broader social context can be found in work on the emergence and diffusion of communities in cultural system. Spicer makes an anthropological appeal for the study of such systems, arguing that cultural change can only be properly considered in relation to more stable elements of culture. These persistent cultural elements, he argues, can best be understood as ‘identity systems,’ in which individuals bestow meaning to symbols. Spicer notes that there are collective identity systems (i.e., culture) as well as individual systems, and chooses to focus his attention on the former. Spicer talks about these systems in implicitly network terms: identity systems capture “relationships between human beings and their cultural products” (Spicer, 1971). To the extent that individuals share the same relationships with the same cultural products, they are united under a common culture; they are, as Spicer says, “a people.”

Axelrod presents a more robust mathematical model for studying these cultural systems. Similar to Schelling’s dynamic models of segregation, Axelrod imagines individuals interacting through processes of social influence and social selection (Axelrod, 1997). Agents are described with n-length vectors, with each element initialized to a value between 0 and m. The elements of the vector represent cultural dimensions (features), and the value of each element represents an individual’s state along that dimension (traits). Two individuals with the exact same vector are said to share a culture, while, in general, agents are considered culturally similar to the extent to which they hold the same trait for the same feature. Agents on a grid are then allowed to interact: two neighboring agents are selected at random. With a probability equal to their cultural similarity, the agents interact. An interaction consists of selecting a random feature on which the agents differ (if there is one), and updating one agent’s trait on this feature to its neighbor’s trait on that feature. This simple model captures both the process of choice homophily, as agents are more likely to interact with similar agents, and the process of social influence, as interacting agents become more similar over time. Perhaps the most surprising finding of Axelrod’s approach is just how complex this cultural system turns out to be. Despite the model’s simple rules, he finds that it is difficult to predict the ultimate number of stable cultural regions based on the system’s n and m parameters.

This concept of modeling cultural convergence through simple social processes has maintained a foothold in the literature and has been slowly gaining more widespread attention. Bednar and Page take a game theoretic approach, imagining agents who must play multiple cognitively taxing games simultaneously. Their finding that in these scenarios “culturally distinct behavior is likely and in many cases unavoidable” (Bednar & Page, 2007) is notable because classic game-theoretic models fail to explain the emergence of culture at all: rather rational agents simply maximize their utility and move on. In their simultaneous game scenarios, however, cognitively limited agents adopt the strategies that can best be applied across the tasks they face. Cultures, then, emerge as “agents evolve behaviors in strategic environments.” This finding underscores Granovetter’s argument of embeddedness (M. Granovetter, 1985): distinctive cultures emerge because regional contexts influence adaptive choices, which in turn influence an agent’s environment.

Moving beyond Axelrod’s grid implementation, Flache and Macy (Flache & Macy, 2011) consider agent interaction on the small world network proposed by Watts and Strogatz (Watts & Strogatz, 1998). This model randomly rewires a grid with select long-distance ties. Following Granovetter’s strength of weak ties theory (M. S. Granovetter, 1973), the rewired edges in the Watts-Strogatz model should bridge clusters and promote cultural diffusion. Flache and Macy also introduce the notion of the valiance of interaction, considering social influence along dimensions of assimilation and differentiation, and taking social selection to consist of either attraction or xenophobia. In systems with only positively-valenced interaction (assimilation and attraction), they find that the ‘weak’ ties have the expected result: cultural signals diffuse and the system tends towards cultural integration. However, introduction of negatively valenced interactions (differentiation and xenophobia), leads to cultural polarization; resulting in deep disagreement between communities which themselves have high internal consensus.


The Joint Effects of Content and Style on Debate Outcomes

I am heading out later today to head to the Midwest Political Science Association (MPSA) conference. My advisor, Nick Beauchamp will be presenting our joint work on “The Joint Effects of Content and Style on Debate Outcomes.”

Here is the abstract for that work:

Debate and deliberation play essential roles in politics and government, but most models presume that debates are won mainly via superior style or agenda control. Ideally, however, debates would be won on the merits, as a function of which side has the stronger arguments. We propose a predictive model of debate that estimates the effects of linguistic features and the latent persuasive strengths of different topics, as well as the interactions between the two. Using a dataset of 118 Oxford-style debates, our model’s combination of content (as latent topics) and style (as linguistic features) allows us to predict audience-adjudicated winners with 74% accuracy, significantly outperforming linguistic features alone (66%). Our model finds that winning sides employ stronger arguments, and allows us to identify the linguistic features associated with strong or weak arguments.


Demographic bias in social media language analysis

Before the break, I had the opportunity to hear Brendan O’Connor talk about his recent paper with Su Lin Blodgett and Lisa Green: Demographic Dialectal Variation in Social Media: A Case Study of African-American English.

Imagine an algorithm designed to classify sentences. Perhaps it identifies the topic of the sentence or perhaps it classifies the sentiment of the sentence. These algorithms can be really accurate – but they are only as good as the corpus they are trained on.

If you train an algorithm on the New York Times and then try to classify tweets, for example, you may not have the kind of success you might like – the language and writing style of the Times and a typical tweet being so different.

There’s a lot of interesting stuff in the Blodgett et al. paper, but perhaps most notable to me is their comparison of the quality of existing language identification tools on tweets by race. They find that these tools perform poorly on text associated with African Americans while performing better on text associated with white speakers.

In other words, if you got a big set of Twitter data and filtered out the non-English tweets, that algorithm would disproportionally identify tweets from black authors as not being in English, and those tweets would then be removed from the dataset.

Such an algorithm, trained on white language, has the unintentional effect of literally removing voices of color.

Their paper presents a classifier to eliminate that disparity, but the study is an eye-opening finding – a cautionary tail for anyone undertaking language analysis. If you’re not thoughtful and careful in your approach, even the most validated classifier may bias your data sample.


Visualizing Pareto Fronts

As the name implies, multi-objective optimization problems are a class of problems in which one seeks to optimize over multiple, conflicting objectives.

Optimizing over one objective is relatively easy: given information on traffic, a navigation app can suggest which route it expects to be the fastest. But if you have multiple objectives this problem become complicated: if, for example, you want a reasonably fast route that won’t use too much gas and gives you time to take in the view outside your window.

Or, perhaps, you have multiple deadlines pending and you want to do perfectly on all of them, but you also have limited time and would like to eat and maybe sleep sometime, too. How do you prioritize your time? How do you optimize over all the possible things you could be doing?

This is not easy.

Rather than having a single, optimal solution, these problems have a set of solutions, known as the Pareto front. Each of these solutions is equally optimal mathematically, but each represents a different trade-off in optimization of the features.

Using 3D Rad-Viz, Ibrahim et al. have visualized the complexity of the Pareto front, showing the bumpy landscape these solution spaces have.

Chen et al. take a somewhat different approach – designing a tool to allow a user to interact with the Pareto front, visually seeing the trade-offs each solution implicitly makes and allowing a user to select the solutions they see as best meeting their needs:



The Use of Faces to Represent Points in k-Dimensional Space Graphically

This is my new favorite thing.

Herman Chernoff’s 1972 paper, “The Use of Faces to Represent Points in k-Dimensional Space Graphically.” The name is pretty self-explanatory: it’s an attempt to represent high dimensional data…through the use, as Chernoff explains, of “a cartoon of a face whose features, such as length of nose and curvature of mouth, correspond to components of the point.”

Here’s an example:


I just find this hilarious.

But, as crazy as this approach may seem – there’s something really interesting about it. Most standard efforts to represent high dimensional data revolve around projecting that data into lower dimensional (eg, 2 dimensional) space. This allows the data to be shown on standard plots, but risks loosing something valuable in the data compression.

Showing k-dimsional data as cartoon faces is probably not the best solution, but I appreciate the motivation behind it – the questioning, ‘how can we present high dimensional data high dimensionally?’


Interactive Machine Learning

For one of my class projects, I’ve been reading a lot about interactive machine learning – an approach which Karl Sims describes as allowing “the user and computer to interactively work together in a new way to produce results that neither could easily produce alone.”

In someways, this approach is intuitive. Michael Muller, for example, argues that any work with technology has an inherently social dimension. “Must we always analyze the impact of technology on people,” he asks, “or is there just as strong an impact of people on technology?” From this perspective, any machine learning approach which doesn’t account for both the user and the algorithm is incomplete.

Jerry Fails and Dan Olsen fully embrace this approach, proposing a paradigm shift in the fundamental way researchers approach machine learning tasks. While classic machine learning models “require the user to choose features and wait an extended amount of time for the algorithm to train,” Fails and Olsen propose an interactive machine learning approach which feeds a large number of features into a classifier, with human judgement continually correcting and refining the results. They find this approach removes the need to pre-select features, reduces the burden of technical knowledge on the user, and significantly speeds up training.


A Lesson from the West Area Computers

I really want to read Hidden Figures, the new book by Margot Lee Shetterly which chronicles “the untold story of the Black women mathematicians who helped win the space race.” If you aren’t as excited about this book as I am, it highlights the work and experiences of the West Area Computers – a group of black, female mathematicians who worked at NASA Langley from 1943 through 1958.

I haven’t gotten a chance to yet, but I was particularly struck by one incident I heard on the podcast Science Friday and which I found recounted in the Smithsonian Magazine:

But life at Langley wasn’t just the churn of greased gears. Not only were the women rarely provided the same opportunities and titles as their male counterparts, but the West Computers lived with constant reminders that they were second-class citizens. In the book, Shetterly highlights one particular incident involving an offensive sign in the dining room bearing the designation: Colored Computers.

One particularly brazen computer, Miriam Mann, took responding to the affront on as a her own personal vendetta. She plucked the sign from the table, tucking it away in her purse. When the sign returned, she removed it again. “That was incredible courage,” says Shetterly. “This was still a time when people are lynched, when you could be pulled off the bus for sitting in the wrong seat. [There were] very, very high stakes.”

But eventually Mann won. The sign disappeared.

I love this story.

Not because it has a hopeful message about how determination always wins – but because it serves as a reminder of the effort and risk people of color face every day just in interacting with their environment.

The West Computers were tremendously good at their jobs and were respected by their white, male, colleagues. I imagine many of these colleagues considered themselves open-minded, even radical for the day, for valuing the talent of their black colleagues.

When I hear the story about how Mann removed the “Colored Computers” sign every day, I don’t just hear a story of the valiant strength of one woman.

I hear a story of white silence.

I hear a story about how other people didn’t complain about the sign. I imagine they barely even noticed the sign. It didn’t effect them and never weighed upon their world.

John Glenn reportedly refused to fly unless West Area Computer Katherine Johnson verified the calculations first – such respect he had for her work.

And yet it never crossed anyone’s mind that a “Colored Computers” sign might not be appropriate.

That’s just the way the world was then.

And that makes me wonder – what don’t I see?

To me, this story is a reminder that people of color experience the world differently than I do – because people like me constructed the world I experience. There must be so many things every day that just slip passed my notice, no matter how open minded or progressive I’d like to be.

It’s easy too look back at the 1940’s and see that a “Colored” sign is racist. What’s hard is to look at the world today and to see that sign’s modern day equivalent.



Multivariate Network Exploration and Presentation

In “Multivariate Network Exploration and Presentation,” authors Stef van den Elzen and Jarke J. van Wijk introduce an approach they call “Detail to Overview via Selections and Aggregations,” or DOSA. I was going to make fun of them for naming their approach after a delicious south Indian dish, but since they comment that their name “resonates with our aim to combine existing ingredients into a tasteful result,” I’ll have to just leave it there.

The DOSA approach – and now I am hungry – aims to allow a user to explore the complex interplay between network topology and node attributes. For example, in company email data, you may wish to simultaneously examine assortativity by gender and department over time. That is, you may need to consider both structure and multivariate data.

This is a non-trivial problem, and I particularly appreciated van den Elzen and van Wijk’s practical framing of why this is a problem:

“Multivariate networks are commonly visualized using node-link diagrams for structural analysis. However, node-link diagrams do not scale to large numbers of nodes and links and users regularly end up with hairball-like visualizations. The multivariate data associated with the nodes and links are encoded using visual variables like color, size, shape or small visualization glyphs. From the hairball-like visualizations no network exploration or analysis is possible and no insights are gained or even worse, false conclusions are drawn due to clutter and overdraw.”

YES. From my own experience, I can attest that this is a problem.

So what do we do about it?

The authors suggest a multi-pronged approach which allows non-expert users to select nodes and edges of interest, simultaneously see a detail and infographic-like overview, and to examine the aggregated attributes of a selection.

Overall, this approach looks really cool and very helpful. (The paper did win the “best paper” award at the IEEE Information Visualization 2014 Conference, so perhaps that shouldn’t be that surprising.) I was a little disappointed that I couldn’t find the GUI implementation of this approach online, though, which makes it a little hard to judge how useful the tool really is.

From their screenshots and online video, however, I find that while this is a really valiant effort to tackle a difficult problem, there is still more work to do in this area. The challenge with visualizing complex networks is indeed that they are complex, and while DOSA gives a user some control over how to filter and interact with this complexity, there is still a whole lot going on.

While I appreciate the inclusion of examples and use cases, I would have also liked to see a user design study evaluating how well their tool met their goal of providing a navigation and exploration tool for non-experts. I also think that the issues of scalability with respect to attributes and selection that they raise in the limitations section are important topics which, while reasonably beyond the scope of this paper, ought to be tackled in future work.


Facts, Power, and the Bias of AI

I spent last Friday and Saturday at the 7th Annual Text as Data conference, which draws together scholars from many different universities and disciplines to discuss developments in text as data research. This year’s conference, hosted by Northeastern, featured a number of great papers and discussions.

I was particularly struck by a comment from Joanna J. Bryson as she presented her work with Aylin Caliskan-Islam, Arvind Narayanan on A Story of Discrimination and Unfairness: Using the Implicit Bias Task to Assess Cultural Bias Embedded in Language Models:

There is no neutral knowledge.

This argument becomes especially salient in the context of artificial intelligence: we tend to think of algorithms as neutral, fact-based processes which are free from the biases we experience as humans. But such a simplification is deeply faulty. As Bryson argued, AI won’t be neutral if it’s based on human culture; there is no neutral knowledge.

This argument resonates quite deeply with me, but I find it particularly interesting through the lens of an increasingly relativistic world: as facts increasingly become seen as matters of opinion.

To complicate matters, there is no clear normative judgment that can be applied to such relativism: on the one hand this allows for embracing diverse perspectives, which is necessary for a flourishing, pluralistic world. On the other hand, nearly a quarter of high school government teachers in the U.S. report that parents or others would object if they discussed politics in a government classroom.

Discussing “current events” in a neutral manner is becoming increasingly challenging if not impossible.

This comment also reminds me of the work of urban planner Bent Flyvbjerg who turns an old axiom on its head to argue that “power is knowledge.” Flyvbjerg’s concern doesn’t require a complete collapse into relativism, but rather argues that “power procures the knowledge which supports its purposes, while it ignores or suppresses that knowledge which does not serve it.” Power, thus, selects what defines knowledge and ultimately shapes our understanding of reality.

In his work with rural coal minors, John Gaventa further showed how such power dynamics can become deeply entrenched, so the “powerless” don’t even realize the extent to which their reality is dictated by the those with power.

It is these elements which make Bryson’s comments so critical; it is not just that there is no neutral knowledge, but that “knowledge” is fundamentally controlled and defined by those in power. Thus it is imperative that any algorithm take these biases into account – because they are not just the biases of culture, but rather the biases of power.