Languages which are still being spoken are generally referred to as living languages. The metaphor is apt – languages are “living” not only insofar as its speakers are biologically living, but in that the language itself grows and changes throughout time. In a genuinely meaningful sense of the word, the language is alive.
This is a beautiful metaphor, but problematic for text analysis. It is, after all, difficult to model something which is changing while you observe it.
Language drift can be particularly problematic for digital humanities projects with corpora spanning a century or more. As Ben Schmidt has pointed out, topic models trained on such corpora produce topics which are not stable over time – e.g. a single topic represents different or drifting concept during different windows of time.
But the changes of a language are not restricted to such vast time scales. On social media and other online platforms, words and meanings come and go, sometimes quite rapidly. Indeed, there’s no a priori reason to think such rapid change isn’t a feature of all every day language – it is simply better documented through digital records.
This raises interesting questions and problems for scholars doing text analysis – at what time scales do you need to worry about language change? What does language change indicate for an individual or for a society?
One particularly interesting paper which tackles some of these questions is Danescu-Niculescu-Mizil et al’s No country for old members: User lifecycle and linguistic change in online communities.
Studying users of two online beer discussion forums, they find remarkably that users have a consistent life cycle – new users adopt the language of the community, getting closer and closer to linguistic norms. At a certain point, however, their similarity peaks – users cease changing with the community and move further and further linguistically as a result.
The language of the community continues changing, but the language of these “older” users does not.
This finding is reminiscent of earlier studies on organizational learning, such as those by James March – in which employees learn from an organization while the organization simultaneously learns from the employees. In his simulations, organizations in which people learn too quickly fail to converge on optimal information. Organizations in which people learn more slowly – or in which employees come and go – ultimately converge on better solutions.
Both these findings reflect the sociolinguistic theory of adult language stability – the idea that your learning, and specifically your language stays steady after a certain age. The findings from Danescu-Niculescu-Mizil, however, suggests something more interesting: your language becomes stable overtime in a given community. It’s not clear that your overall language will stabilize, rather, you learn the norms of a given community. Since these communities may change overtime, your overall language may still be quite dynamic.