Monthly Archives: April 2016

Comparing Texts with Log-Likelihood Word Frequencies

One way to compare the similarity of documents is to examine the comparative log-likelihood of  word frequencies.

This can be done with any two documents, but it is a particularly interesting way to compare the similarity of a smaller document with the larger body of text it is drawn from. For example, with access to the appropriate data, you may want to know how similar Shakespeare was to his contemporaries. The Bard is commonly credited with coining a large number of words, but it’s unclear exactly how true this is – after all, the work of many of his contemporaries has been lost.

But, imagine you ran across a treasure trove of miscellaneous documents from 1600 and you wanted to compare them to Shakespeare’s plays. You could do this by calculating the expected frequency of a given word and comparing this to the observed frequency. First, you can calculate the expected frequency as:

Screen Shot 2016-04-29 at 7.53.54 AM

Where Ni is the total number of words in document i and Oi is the observed frequency of a given word in document i. That is, the expected frequency of a word is: (number of words in your sub corpus) * (sum of observed frequency in both corpora) / the number of words in both corpora.



Then, you can use this expectation to determine a word’s log-likelihood given the larger corpus as:

Screen Shot 2016-04-29 at 7.54.02 AM

Sorting words by their log-likelihood, you can then see the most unlikely – eg, the most unique – words in your smaller corpus.

Linguistic Clues and Evidence Relationships

Earlier in the week, I wrote about the theory of coherent argument structure introduced by Robin Cohen in 1987. Her model also included two other elements: a theory of linguistic clue interpretation and a theory of evidence relationships. These theories, the focus of today’s post, are both closely connected to each other as well as to the theory of argument structure.

Theory of Linguistic Clue Interpretation
Cohen’s theory of linguistic clue interpretation argues for the existence of clue words; “those words and phrases used by the speaker to directly indicate the structure of the argument to the hearer.” Capable of being identified through simple n-gram models as well as more sophisticated means, these linguistic cues, or discourse markers, are a common feature of argument mining. Cohen outlines several common clue types such as redirection, which re-direct the hearer to an earlier part of the argument (“returning to…”) and connection, a broad category encompassing clues of inference (“as a result of…”), clues of contrast (“on the other hand…”), and clues of detail (“specifically…”). Most notably, though, Cohen argues that clues are necessary for arguments whose structure is more complex than those covered by the coherent structure theory. That is, the function of linguistic cues is not “to merely add detail on the interpretation of the contained proposition, but to allow that proposition an interpretation that would otherwise be denied” (Cohen 1987).

Discourse markers are also “strongly associated with particular pragmatic functions,” (Abbott, Walker et al. 2011) making them valuable for sentiment analysis tasks such as determining agreement or disagreement. This is the approach Abbott, Walker et al. used in classifying types of arguments within a corpus of 109,553 annotated posts from an online forum. Since the forum allowed for explicitly quoting another post, Abbott, Walker et al. identified 8,242 quote-response pairs, where a user quoted another post and then added a comment of their own.

In addition to the classification task of determining whether the response agrees or disagrees with the preceeding quote, the team analysed the pairs on a number of sentiment spectrums: respect/insult, fact/emotion, nice/nasty, and sarcasm. Twenty discourse markers identified through manual inspection of a subset of the corpus, as well as the use of “repeated punctuation” served as key features in the analysis.

Using a JRip classifier built on n-gram and bi-gram discourse markers, as well as a handful of other features such as post meta-data and topic annotations, Abbott, Walker et al. found the best performance (0.682 accuracy, compared to a 0.626 unigram baseline) using local features from both the quote and response. This indicates that the contextual features do matter, and, in the words of the authors, vindicates their “interest in discourse markers as cues to argument structure” (Abbott, Walker et al. 2011).

While these discourse markers can provide vital clues to a hearer trying to reconstruct an argument, relying on them in a model requires that a speaker not only try to be understood, but be capable of expressing themselves clearly. Stab and Gurevych, who are interested in argument mining as a tool for automating feedback on student essays, argue that discourse markers make a poor feature, since, in their corpora, these markers are often missing or even misleadingly used (Stab and Gurevych 2014). Their approach to this challenge will be further discussed in the state of the art section of this paper.

Theory of Evidence Relationships
The final piece of Cohen’s model is evidence relationships, which explicitly connect one argument to another and govern the verification of evidence relations between propositions (Cohen 1987). While the coherent structure principle lays out the different forms an argument may take, evidence relationships are the logical underpinnings that tie an argument’s structure together. As Cohen explains, the pragmatic analysis of evidence relationships is necessary for the model because the hearer needs to be able to “recognize beliefs of the speaker, not currently held by the hearer.” That is, whether or not the hearer agrees with the speaker’s argument, the hearer needs to be able to identify the elements of the speaker’s argument as well as the logic which holds that argument together.

To better understand the role of evidence relationships, it is helpful to first develop a definition of an “argument.” In its most general form, an argument can be understood as a representation of a fact as conveying some other fact. In this way, a complete argument requires three elements: a conveying fact, a warrant providing an appropriate relation of conveyance, and a conveyed fact (Katzav and Reed 2008). However, one or more of these elements often takes the form of an implicit enthymeme and is left unstated by the speaker. For this reason, some studies model arguments in their simplest form as a single proposition, though humans generally require at least two of the elements to accurately distinguish arguments from statements (Mochales and Moens 2011).

The ubiquity of enthymemes in both formal and informal dialogue has proved to be a significant challenge for argument mining. Left for the hearer to infer, these implicit elements are often “highly context-dependent and cannot be easily labeled in distant data using predefined patterns” (Habernal and Gurevych 2015). It is important to note that the existence of these enthymemes does not violate the initial assumption that a speaker argues with the intent of being understood by a hearer. Rather, enthymemes, like other human heuristics, provide a computational savings to a hearer/listener pair with a sufficiently shared context. Thus, enthymemes indicate the elements of an argument that a speaker assumes a listener can easily infer, a particular challenge when a speaker is a poor judge of the listener’s knowledge or when the listener is an AI model.

To complicate matters further, there are no definitive rules for the roles enthymemes may take. Any of an argument’s elements may appear as enthymemes, though psycholinguistic evidence indicates that the relationship of conveyance between two facts, the argument’s warrant, is most commonly left implicit (Katzav and Reed 2008). Similarly, the discourse markers which might otherwise serve as valuable clues for argument reconstruction “need not appear in arguments and thus cannot be relied upon” (Katzav and Reed 2008). All of this poses a significant challenge.

In her work, Cohen bypasses this problem by relying on an evidence oracle which takes two propositions, A and B, and responds ‘yes’ or ‘no’ as to whether A is evidence for B (Cohen 1987). In determining argument relations, Cohen’s oracle identifies missing premises, verifies the plausibility of these enthymemes, and ultimately concludes that an evidence relation holds if the missing premise is deemed plausible. In order to be found plausible, the inferred premise must be both plausibly intended by the speaker and plausibly believed by the hearer. In this way, the evidence oracle determines the structure of the argument while also overcoming the presence of enthymemes.

Decreasing reliance on such an oracle, Katzav and Reed develop a model to automatically determine the evidence relations between any pair of propositions within a text. Their model allows for two possible relations between an observed pair of argumentative statements: a) one of the propositions represents a fact which supposedly necessitates the other proposition (eg, missing conveyance), or b) one proposition represents a conveyance which, together with a fact represented by the other proposition, supposedly necessities some missing proposition (eg, missing claim) (Katzav and Reed 2008). The task then is to determine the type of relationship between the two statements and use that relationship to reconstruct the missing element.

Their notable contribution to argumentative theory is to observe that arguments can be classified by type (eg, “causal argument”), and that this type constrains the possible evidence relations of an argument. Central to their model is the identification of an argument’s warrant; the conveying element which defines the relationship between fact A and fact B. Since this is the element which is most often an enthymeme, Katzav and Reed devote significant attention to reconstructing an argument’s warrant from two observed argumentative statements. If, on the other hand, the observed pair fall into type b) above, with the final proposition missing, then the process is trivial: “the explicit statement of the conveying fact, along with the warrant, allows the immediate deduction of the implicit conveyed fact” (Katzav and Reed 2008).

This framework cleverly redefines the enthymeme reconstruction challenge. Katzav and Reed argue that no relation of conveyance can reasonably be thought to relate just any type of fact to any other type of fact. Therefore, given two observed propositions, A and B, a system can narrow the class of possible relations to warrants which can reasonably be thought to relate facts of the type A to facts of the type B. Katzav and Reed find this to be a “substantial constraint” which allows a system to deduce a missing warrant by leveraging a theory of “which relations of conveyance there are and of which types each such relation can reasonably be thought to relate” (Katzav and Reed 2008).

While this approach does represent an advancement over Cohen’s entirely oracle-dependent model, it is not without its own limitations. For successful warrant recovery, Katzov and Reed require a corpus with statements annotated with the types of facts they represent and a system with relevant background information similarly marked up. Furthermore, it requires a robust theory of warrants and relations, a subject only loosely outlined in their 2008 paper. Reed has advanced such a theory elsewhere, however, through his collaborations with Walton. This line of work is picked up by Feng and Hirst in a slightly different approach to enthymeme reconstruction.

Before inferring an argument’s enthymemes, Feng and Hirst argue, one must first classify an argument’s scheme. While a warrant defines the relation between two propositions, a scheme is a template which may incorporate more than two propositions. Unlike Cohen’s argument structures, the order in which statements occur does not affect an argument’s scheme. A scheme, then, is a flexible model which incorporates elements of Cohen’s coherent structure theory with elements of her evidence relations theory.

Drawing on the 65 argument schemes developed by Walton, et al. in 2008, Feng and Hirst seek to classify arguments under the five most common schemes. While their ultimate goal is to infer enthymemes, their current work takes this challenge to primarily be a classification task – once an argument’s scheme is properly classified, reconstruction can proceed as a simpler task. Under their model, an argument mining pipeline would reconstruct an argument’s scheme, fit the stated propositions into the scheme, and then use this template to infer enthymemes (Feng and Hirst 2011).

Working with 393 arguments from the Araucaria dataset, Feng and Hirst achieved over 90% best average accuracies for two of their schemes, with three other schemes rating in the 60s and 70s. They did this using a range of sentence and token based features, as well as a “type” feature, annotated in their dataset, which indicates whether the premises contribute to the conclusion in linked or convergent order (Feng and Hirst 2011).

A “linked” argument has two or more interdependent propositions which are all necessary to make the conclusion valid. In contrast, exactly one premise is sufficient to establish a valid conclusion in a “convergent” argument (Feng and Hirst 2011). They found this type feature to improve classification accuracy in most cases, though that improvement varied from 2.6 points for one scheme to 22.3 points for another. Unfortunately, automatically identifying an argument’s type is not an easy task in itself and therefore may not ultimately represent a net gain in enthymeme reconstruction. As future work, Feng and Hirst propose attempting automatic type classification through rules such as defining one premise to be linked to another if either would become an enthymeme if deleted.

While their efforts showed promising results in scheme classification, it is worth noting that best average accuracies varied significantly by scheme. Their classifier achieved remarkable results for an “argument from example” scheme (90.6%) and a “practical reasoning” scheme (90.8%). However, the schemes of “argument from consequences” and “argument from classification” were not nearly as successful – achieving only 62.9% and 63.2% best average accuracy respectively.

Feng and Hirst attribute this disparity to the low-performing schemes not having “such obvious cue phrases or patterns as the other three schemes which therefore may require more world knowledge encoded” (Feng and Hirst 2011). Thus, while the scheme classification approach cleverly merges the challenges introduced by Cohen’s coherent structure and evidence relationship theories, this work also highlights the need to not neglect the challenges of linguistic cues.

State of the Art Techniques in Argument Mining

As part of a paper I’m working on, here’s a review of select recent state of the art efforts in the area of argument mining.

In their work on automatic, topic-independent argument extraction, Swanson, Ecker et al. introduce an Implicit Markup hypothesis which harkens back to the earlier work of Cohen. This hypothesis is built of four elements: discourse relation, dialogue structure, syntactic properties, and semantic density (Swanson, Ecker et al. 2015). In their model, discourse relations can be determined by any two observed arguments. The second argument, or claim, is defined as the one to which a warrant – if observed – is syntactically bound. Dialogue structure considers the position of an argumentative statement within a post. Notably, with its focus on relative position, this is more similar to Cohen’s model of coherent structure than to the concept of schemes introduced by Walton. A sophisticated version of Cohen’s linguistic cues, syntactic properties are a clever way to leverage observed discourse markers in order to infer missing discourse markers. For example, observing sentences such as “I agree that <x>” can help identify other argumentative content of the more general form “I <verb> that <x>.”

The final element, semantic density, is a notable advancement in processing noisy data. Comprised of a number of features, such as sentence length, word length, deictic pronouns, and specificity, semantic density filters out those sentences which are not relevant to a post’s core argument. When dealing with noisy forum post data, this process of filtering out sentences which are harder interpret provides valuable computational savings without loosing an argument’s core claim. Furthermore, this filtering can help with the enthymeme challenge – in fact, Swanson, Ecker et al. filter out most enthymemes, focusing instead on claims which are syntactically bound to an explicit warrant.

With this model, Swanson, Ecker et al. take on the interesting task of trying to automatically predict argument quality – a particularly timely challenge given the ubquity of argumentative data from noisy online forums. With a corpus of over 100,000 posts on four political topics, Swanson, Ecker et al. compare the prediction of their model to human annotations of argument quality. Testing their model on three regression algorithms, they found that a support vector machine (SVM) performed best, explaining nearly 50% of the variance for some topics (R2 = 0.466, 0.441 for gun control and gay marriage respectively).

Stab, Gurevych, and Habernal, all of the Ubiquitous Knowledge Processing Lab, have also made important contributions to the state of the art in argument mining. As noted above, Stab and Gurevych were among the first to expressly tackle the challenge of poorly structured arguments in their work identifying discourse structures in persuasive essays (Stab and Gurevych 2014).

In seeking to identify an argument’s structure and the relationship between its elements, this work has clear ties back to earlier argumentative theory. Indeed, while unfortunately prone to containing poorly-formed arguments, student essays are a model setting for Cohen’s theory: a single speaker does their best to form a coherent and compelling argument while a weary reader is tasked with trying to understand their meaning.

A notable contribution of Stab and Gurevych was to break this effort into a two-step classification task. The first step uses a multiclass identifier to classify the components of an argument, while the second step is a simpler binary classification of a pair of argument components as either support or non-support. As future work, they propose developing this as a joint inference problem, since the two pieces of information are indicators of each other. However, they found current accuracy in identifying argument components to be “not sufficient for increasing the performance of argumentative relation identification” (Stab and Gurevych 2014). Their best performing relation identification classifier, an SVM built with structural, lexical, and syntactic features, achieved “almost human performance” with an 86.3% accuracy, compared to a human accuracy of 95.4%. Emphasizing the challenges of linguistic cues in noisy text, a model using discourse markers in student essays yielded an F1-score of only 0.265.

Finally, in what may be the most promising line of current argument mining work, Habernal and Gurevych build a classifier for their labeled data using features derived in an unsupervised manner from noisy, unlabeled data. Using text from online debate portals, they derive features by “projecting data from debate portals into a latent argument space using unsupervised word embeddings and clustering” (Habernal and Gurevych 2015).

While this debate portal data contains “noisy texts of questionable quality,” Habernal and Gurevych are able to leverage this large, unlabeled dataset to build a successful classifier for their labeled data using a sentence-level SVM-Hidden Markov Model. To do this, they employ “argument space” features; composing vectors containing the weighted average of all word embeddings in a phrase, and then projecting those vectors into a latent vector space. The centroids found by clustering sentences from the debate portal in this way represent “a protypical argument” – implied by the clustering but not actually observed. Labeled data can than be projected into this latent vector space and the computed distance to centroids are encoded as a feature. In order to test cross-domain performance, the model was trained on five domains and tested on a sixth.

While this continues to be a challenging task, the argument space features consistently increased the model’s performance in classifying an argument’s component type. The best classification of claims (F1-score: 0.256) came from combining the argument feature space with semantic and discourse features. This compares to a human-baseline F1-score of 0.739 and a random assignment F1-score of 0.167.

Importantly, Habernal and Gurevych ground their approach in argumentative theory. Building off the work of Toulmin, they take each document of their corpora to contain a single argument composed of five components: a claim to be established, premises which give reason for the claim, backing which provides additional information, a rebuttal attacking the claim, and finally a refutation attacking the rebuttal. Each type of component is classified in their vector space, allowing them to assess which elements are more successfully classified as well as to gain insights into what argument structure prove particularly problematic.

Argument Structure

In her 1987 paper on “Analyzing the structure of argumentative discourse,” Robin Cohen laid out a theory of argument understanding comprised of three core components: coherent structure, linguistic clue interpretation, and evidence relationships.  As the title suggests, this post focuses on the first of those elements: argument structure.

Expecting a coherent structure minimizes the computational requirements of argument mining tasks by limiting the possible forms of input. The coherent structure theory parses arguments as a tree of related statements, with every statement providing evidence for some other statement, and one root statement serving as the core claim of the argument. The theory posits that argument structures may vary, but there are a finite number of unique structures, and those structures are discoverable. Cohen herself introduces two such structures: pre-order “where the speaker presents a claim and then states evidence” and post-order, “where the speaker consistently presents evidence and then states the claim” (Cohen 1987).

Argument structure is a particularly notable and challenging element of argument mining. Identifying argument structures are essential for evaluating the quality of an argument (Stab and Gurevych 2014), but it is a difficult task which has gone largely unexplored. A key challenge is the lack of argument delimiters; one argument may span multiple sentences and multiple premises may be contained in the one sentence. In the resulting segmentation problem, we are able to determine which information for the arguments, but not how this information is split into the different arguments (Mochales and Moens 2011).

To address this challenge, Mochales and Moens have sought to expand models of argument structure, parsing texts “by means of manually derived rules that are grouped into a context-free grammar (CFG)” (Mochales and Moens 2011). Restricting their focus to the legal domain – where arguments are consistently well-formed – Mochales and Moens manually built a context-free grammar in which document has a tree-structure (T) formed by an argument (A) and a decision (D). Further rules elucidated what elements may form the argument and what elements may form the decision. By maintaining a tree-structure for identified arguments, Mochales and Moens broadened the range of possible argument structures without sacrificing too much computational complexity.

Using this approach, Mochales and Moens were able to obtain 60% accuracy when detecting argument structures, measured manually by comparing the structures given by the CFG and the structures given by human annotators. This is a notable advancement over the simple structures introduced by Cohen, but there is still more work to be done in this area. Specifically, as Mochales and Moens point out, future work includes broadening the corpora studied to include additional types of argumentation structure, developing techniques which can identify and computationally handle structures more complex than trees, and incorporating feedback from those who generate the arguments being parsed. The limitation of this model to legal texts is particularly notable, as “it is likely it will not achieve acceptable accuracy when applied to more general texts in which discourse markers are missing or even misleadingly used (e.g. student texts)” (Stab and Gurevych 2014).


A common theme in community work is questioning what it means to be an expert. 

Given the complex and technical issues our communities face, it seems reasonable, perhaps, to rely on the knowledge of experts. After all, there’s a reason why people undertake years of schooling to become urban planners, architects, or other types of experts.

A prevalent challenge to this model is that it over looks the knowledge which “average” community members have. An architect may know how to design a building that won’t fall down, but the ‘community’, broadly speaking, knows what aesthetics and functionality are most important and needed. They are experts in their own right.

I was reminded of this debate earlier this week though, surprisingly, an article in Nature about quantum physics work by J. J. W. H. Sørensen et al.:

With particles that can exist in two places at once, the quantum world is often considered to be inherently counterintuitive. Now, a group of scientists has created a video game that follows the laws of quantum mechanics, but at which non-physicist human players excel.

There are few interesting points here. First, the work is advancing human understanding of quantum physics. Second, the human brain seems to be more capable of understanding quantum physics than we previously thought.

Finally – and germane to the point above – the physicists on the team who designed the game…found it extremely challenging. Being a physicist, or having expertise in physics, didn’t determine someone’s ability to succeed at this quantum game. Gamers, on the other hand, when their own type of expertise, did better than the physicists and the computer models combined.

Argument Mining

In 1987, computer scientist Robin Cohen outlined a theory of argument structure which laid the groundwork for modern argument mining tasks. Taking argument to a process in which a speaker intentionally tries to convince a hearer, her approach focused on understanding the structure arguments can take.

This structure is generally tree-like: the speakers primary claim is the root, and supporting arguments appear as branches. Secondary arguments may further expand the tree, as the speaker makes claims to reinforce a supporting argument. That is, a simple argument can take the form A and B, therefore C, or could take the form A therefore B therefore C.

In this way a complex argument can be modeled a tree with all the various supporting and secondary arguments point back up to the core argument root.

The problem that Cohen noted, which has continued to be a challenge in more recent argument mining techniques, is that core premises often go unsaid.

Take, for example, the simple argument structure of “P therefore Q.” In many contexts, a speaker will state P and Q, but leave out the primary claim: P therefore Q. As human interpreters, filling this gap is often a trivial task. Consider the simple argument:

Joey is dangerous.
Joey is a shark.

It is left the reader to infer that Joey is dangerous because he is a shark…and that all sharks are dangerous. (This, of course, could be debated…)

While there are no doubt instances where this lack of clarity causes confusion for a human reader, in general, this is a challenge which is easy for people with their broad array of contextual knowledge – and terribly difficult for machines.

Joel Katzav and Chris Reed formalize this missing argument (enthymeme) challenge. Defining an argument as “a representation of a fact as conveying some other fact,” a complete argument then has three elements: a conveying fact, the appropriate relation of conveyance, and the conveyed fact.

In parsing content, then, an algorithm could work to define a sentence or otherwise defined element as either a “non-argument” or as one of the argument types above. This makes the computer’s job a little easier: it only has to recognizes pieces of an argument and can flag which arguments are incomplete.

Furthermore, syntactic clues often give both humans and machines some insight into the structure of an implied argument: because X, therefore Y. Annotated debate texts can then help machines learn the relevant syntactic clues, allowing them to better parse arguments.

This is still somewhat unsatisfying, though, as annotating texts is difficult, expensive…and may still be inaccurate. In one study of online-debate, Rob Abbott et al employed 5-7 annotators per post and still found not-insignificant disagreement on some measures. Most notably, it seems, people are not much better at recognizing sarcasm than people.

Furthermore, arguments are not always…formal.

In legal texts or a public debate, it might be reasonable to assume that a given speaker makes the best possible argument as clearly as possible for a general human audience. This assumption can not be extended to many online forums or other domains, such as student essays. Colloquially, syntactic clues may be missing…or may even be miss used.

Latest work in argument mining has focused on over coming these challenges.

A 2015 paper by Ivan Habernal and Iryna Gurevich, for example, aimed to build an argument mining system that could work across domains, on unlabeled data. An earlier paper by Christian Stab and Iryna Gurevich focused on trying to parse (often poorly-formated) student essays.

By projecting argument elements into a vector space – or argument space – researchers can use unsupervised techniques to cluster arguments and identify argument centroids, which represent “prototypical arguments” not actually observed in the text.

There’s still more work to do, but these recent approaches have been reasonably successful and show a lot of promise.


I wonder if the process of learning is like…sediment on a shore.

That doesn’t sound very glamorous, but it feels appropriate somehow.  A wave comes in, carrying all sorts of knowledge – far more than one person could possibly manage. It’s a little overwhelming. You might lose your footing. Or recklessly risk being swept out to sea.

It’s exhilarating.

And then the wave recedes, eclectic flotsam left in its wake.

You gather up what bits you can; painfully little compared to the vast sea before you. And you wait for the next wave to come in; awash with possibilities.

A good class is like a good book: once you finish it, you want to read it again; to rediscover its mysteries anew.

Fitness Landscapes and Probability Distributions

Imagine trying to solve a problem of unknown complexity. You have to start somewhere, so you try a solution more or less at random. If you’re lucky, you know enough about the situation to start with an educated guess.

Regardless of how successful – or unsuccessful – your attempt was, you learn something about the best way to tackle the problem.

Next time you do a little bit better.

Perhaps there are other people around you trying to solve the same or similar problems. You can learn from their efforts as well.

Eventually you converge on what seems like the best possible solution, and then, problem solved, you keep deploying the same solution.

In several disciplines, this process can be described as exploring a fitness landscape. There are optimal solutions, really bad solutions, and everything in-between. Some combination of a priori knowledge and learned exploration gives you an intuition of what the fitness looks like.

Imagine the quick calculations you do in your head when trying to figure out how long it will take you to get somewhere. If you’ve gone there before, you might have a sense of the average length of travel. If you’re familiar with an areas traffic patterns you might have a sense of how much traffic to expect, or what routes to avoid. You may also have a sense of whether it would be more socially proper to arrive a little bit late or a little bit early.

You almost effortlessly predict an optimal solution to a complex problem.

There’s a great deal of interesting research being done to understand how individuals and groups explore or exploit these complex landscapes. As a matter of simplicity in an already challenging problem, it is common to study problems for which an optimal solution is universally an optimal solution.

That is – if every person had perfect knowledge of the fitness landscape, they would each make the same normative judgements about what solutions are “good.”

For my own research interests, this is an important piece of the challenge. One’s definition of “good” or “optimal” is a crucial piece of what policy solutions one might seek – or, more generally, how one might interpret the “fitness landscape.”

If two or more people are exploring the same landscape but have different normative judgements as to what is optimal, this poses a huge challenge.

One solution to this challenge is to hope for the convergence of opinion – so a group may not start with normative agreement on the fitness landscape, but with good deliberation they will come to collective agreement eventually.

There’s a great deal of social science research looking at how consensus forms in groups, with an eye towards possible biases and poorly-formed consensus. Does a group agree with the loudest voice in the room? Does it converge on whatever idea was most popular before discussion began? Did it give full attention and weight to all possible alternative before a final decision was reached?

Yet, on top of all the things that could go wrong in consensus forming, one of the most disconcerting thoughts is that such ideal consensus is not possible at all.

Examining such a question means understanding just what causes a person’s opinion to form in they first place – an understanding, I’m afraid, we are quite far from.

Some opinions may be formed on the spot, with no clear reason why. Other opinions may have cemented through some past series of experiences.

But here’s a thought experiment – imagine dozens of clones of the same person, each starting their life in an identical setting. With every event they encounter – on whatever time scale you prefer – the effect of that experience on them is given by some probability distribution.

If one were so inclined, you could build your favorite sense of nature vs. nurture into this probability distribution.

After some series of n events, the people produced…would be different?

If that were to be the case – even among a starting group with all the same initial conditions, it would pose a significant challenge to the idea of consensus, and ultimately would require some method to make sense of overlapping conceptions of a similar fitness landscape.

Political Parties

Much has been made this election cycle of the influence of the “political establishment.”

On the Democratic side, some are arguing that unpledged delegates – “super delegates” – are polluting the democracy of the system. On the Republican side, at least one candidate has been crying foul over party rules, and it certainly does seem like there’s been a concerted effort by the Republican establishment to prevent the nomination of the current party frontrunner.

My impression is that most people’s opinion on this topic is driven largely by how a party is treating their favored candidate. A reasonable reaction, I think – as a general rule, things seem fair when you’re winning and unfair when you’re not.

But, this debate introduces a more broadly interesting question: what should the role of parties be in a democratic society?

While the role of a political party in determining its candidates is arguably less than democratic, there’s simultaneously something laughable about outrage over their influence. That is – this is exactly the way U.S. political parties are supposed to work.

Our political parties are not unbiased voices of the the people – they are organizations, designed to advance a given platform.

Again, one may still have qualms with the democratic nature of this system – there’s no democracy in system where the only choices are Pepsi and Coke – but this is the way our system is designed to work.

And that’s not inherently a bad thing. Representative democracy is more than a practical alternative to pure democracy – there are, in fact, some benefits to a system which (thoughtfully) aggregates the breadth of public views.

It strikes me that, as much as the party infrastructure is decried as unjust, the real problem here is that – in the United States – we are deeply entrenched in a two-party system. No doubt Parliamentary systems have their own challenges – but this is a big challenge of the U.S. system.

The problem, that is, isn’t that political parties have too much power over who their nominees are – it’s that a dramatically sparse field of political parties have too much power over our system.