Mapping blog posts

The last few months I’ve been doing a lot of network mapping. Students in my class created network maps of their morals, and I’ve played around with mapping themes from Hamlet’s soliloquies.

Last summer I created my own moral network map by responding to prompts like What abstract moral principles seem compelling to you? and What personal virtues do you strive to develop?

Creating my own map, and indeed, creating a map for the tragic Dane, were interesting, reflective processes, but somewhat unsatisfying in their unscientificness.

It was too arbitrary, too driven by my own whims and biases. And while perhaps a map of my own morals should be driven by my own whims and biases, I can’t help but think there’s a better way to capture this information.

A more proper methodology would be to conduct interviews, develop a coding scheme, and to capture what ideas came up in what contexts – noting not only the ideas themselves, but what ideas are connected to each other. David Williamson Shaffer takes this approach in much of his methodology looking and the development of epistemic frames – that is, specific professional ways of thinking.

Well, that sounds great, but it’s way too much work for a weekend. Also, I can’t exactly conduct a focus group interview with myself, now can I?

But as I thought about this more, I realized that I do have rich data on my ways of thinking – captured nearly a year’s worth of blog posts.

Text analysis can be a rich and complex process, but, curious to see what I could come up with in a short Sunday afternoon, as a first step I intentionally kept it overly simple. Looking at my first two weeks of blog posts (10 posts), I relied upon word counts to extract key themes:071913_v2

The above network was created with the following rules:

  • A word is recorded if it’s used four or more times in a single post. Four times was an arbitrary cut off based on the distribution of word counts. I would have had a lot more words if I’d included those used three times or less. Interestingly, one post has no impact on this map as a result of that cut off.
  • “Common words” are removed from the count. The word count software had a filter for this, but I ultimately elected to also remove words like “really” and “truly” as well as various forms of “you,” “your,” and “you’re.” After some debate, I elected to keep the word “what.”
  • Words/nodes are sized by frequency – the more times I used a word, the larger it appears.
  • Each word is connected to every other word taken from the same blog post. If a word appears in two blog posts, it is connected to both clusters of words.
  • Colors were generated by an analysis of clusters within the network – groups of words that are highly connected to each other.

It’s particularly notable that this blog-generated map includes many independent clusters of various size, including two nodes which stand alone. This is very different from the map I generated through self-reflection, which was highly interconnected.

However, challenges of the word count approach can also be seen. For example, my post “Petty Bourgeois Radicals and the Freerider Problem,” was inspired by reading Roberto Unger. However, with only one mention of Unger in that post, the cluster generated by that post stands alone and isn’t connected to the post which explicitly references Unger multiple times.

Elinor Ostrom, who was also mentioned once, doesn’t appear at all.

But, problems aside, this is an interesting and relatively quick approach for network mapping. And – here’s the coolest bit – by connecting a blog post’s date to the words from that post, you can create a time-lapse animation of the network’s evolution over time: (Try a different browser if the animation isn’t working…)


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.