I had the opportunity today to attend a talk by Oren Tsur, a post-doc in my lab who has done a lot of work around Natural Language Processing (NLP). He spoke about his work analyzing political text, which was published by the Association for Computational Linguistics (ACL) earlier this year.
Tsur noted that political writing and speech is intentionally crafted to influence audiences. This provides an interesting framework to explore the question: can we automatically identify and quantitatively measure topical framing and agenda setting campaigns?
That is, using Natural Language Processing techniques, can a computer identify framing, spin, and agenda setting in political speech?
Tsur and his coauthors used a dataset from VoteSmart of “all individual statements and press releases in a span of four years (2010-2013), a total of 134000 statements made by 641 representatives.”
It’s data sets like that which make “unsupervised” analysis so important. It’s not practical for a human to read through and categorize that many statements…but can a computer be taught to do so effectively?
Each document was considered as a “bag of words,” and each word was associated with various topics with different probabilities. Topics might be similar, but were fine-grained enough to pick up subtle differences.
One topic caught words like “Obamacare” and “repeal” while another caught words like “social” and “benefits.” And, yes, you can then connect each category to who is saying it to determine which of those topics is “owned” by republicans and which is “owned” by democrats.
Furthermore, Tsur could compare how frequently the same words or phrases (ngrams) appeared in different documents, demonstrating that republicans tend to be much more “on message.” That is, Republicans at any given time, republican politicians are more likely to have phrases in common with each other – perhaps sticking to the same talking points.