This weekend, I had the opportunity to attend a rich discussion hosted by The Welcome Project with local author Jennifer De Leon. The conversation focused on De Leon’s 2013 short story The White Space.
While helping her father put together his first résumé, the U.S.-born De Leon writes:
Without cell phone or fax numbers, email or website addresses, the top of the page looks lonely. Where do I write that my father grew up along the southern coast of Guatemala, where his father worked for the U.S.-owned United Fruit Company (UFC), which helped kick Communism to the world curb while pretending to care about Guatemalan citizens’ intake of bananas? They were only interested in profits and maintaining a capitalist economy.
…On my own résumés over the last ten years, phrases like terminal degree, academic honors, and double major are arranged nearly under the canopy of this section. But I can’t use any of these terms here. My father was denied the opportunity to complete secondary school in Guatemala because he needed to help support his brothers and sisters. Instead he plucked feathers off dead chickens in a small factory in Guatemala City from the time he was 14 years old.
…So tonight, as I help my father write his first résumé, I struggle to find words to fill this white space.
There is much in De Leon’s story which would resonate with any adult child: that feeling that you don’t really know your parents the way you might know a friend; that there is something intangibly distant about their experiences; that they lived in and were shaped by a world which ceased to exist before you were born; that the rich texture of their experience will always be beyond your grasp.
There is much in her story which would resonate with any first-generation to college student: feeling that vast void which palpably disconnects generational experience; realizing the values and norms you so blithely take for granted can seem foreign and obscure; coming to the inescapable conclusion that those same norms glibly dismiss the experiences of people whom you know to have real value.
And, as De Leon and others discussed this weekend, there is much in her story which resonates broadly with children of immigrants: feeling the generational and cultural divide even more sharply; feeling ashamed at your lack of fluency in your parent’s language; feeling like you’re torn between selves, between worlds, between identities.
Feeling like nothing you can do will ever make up for the sacrifice your parents made on your behalf.
In reflecting on these all these interwoven, sweet and painful complications, De Leon concluded:
“Like most beautiful things in life, it’s not so simple. I just do my best.”
Ally Lee Steinfeld had been missing since early September. Her body was found recently, mutilated and burned. She was 17.
Her death made Steinfeld at least the 21st transgender person killed in the United States this year. A record high of 22 murders were captured by the Human Rights Campaign last year.
We have to do better.
Steinfeld’s case is not being pursued as a hate crime. The sheriff overseeing the case told the Associated Press: “You don’t kill someone if you don’t have hate in your heart. But no, it’s not a hate crime.” That talking point was echoed by the prosecutor in the case, who told Time: “I would say murder in the first-degree is all that matters. That is a hate crime in itself.”
Perhaps this is accurate in a practical sense – in Missouri, where the crime took place, first-degree murder is punishable by execution or life imprisonment. A hate crime charge would be unlikely to add penalty.
Such comments, however, miss the point. A woman is dead. We have to do better.
Some advocates have even started to question whether hate crimes prosecution is an effective strategy. As one ACLU lawyer put it, “I worry that what hate crime laws do is narrow our focus on certain types of individual violence while absolving the entire system that generates the violence.”
And that’s the thing – it is a problem with the entire system. We are all culpable in perpetuating the gross transphobia of our society – through violent transphobic acts, through subtle jokes and misgendering, or by being complicit through silence while such hateful acts take place.
We have to do better.
Personally, I’m not prepared to abandon hate crime legislation just yet – whether adding to a punishment or not, ignoring the hate of a crime seems to implicitly indicate that while the crime may be punishable, the hate itself is sanctioned. But I’ve met a lot of good, smart lawyers who tell me that sometimes you have to sacrifice framing in the legal system – you go for the toughest penalty you can go for.
I do not know whether we can best accomplish our work through hate crime legislation or through other modes of advocacy. I only know that we have to do better.
We tell young women that they can be anything, that they can do anything. That they should shut down the haters and embrace their true selves. We tell women that it is their right in the 21st century to be the person they want to be. We tell them this is America. We tell them they are free.
Three months before she died, Steinfeld posted to Instagram: “I am proud to be me I am proud to be trans I am beautiful I don’t care what people think.”
Languages which are still being spoken are generally referred to as living languages. The metaphor is apt – languages are “living” not only insofar as its speakers are biologically living, but in that the language itself grows and changes throughout time. In a genuinely meaningful sense of the word, the language is alive.
This is a beautiful metaphor, but problematic for text analysis. It is, after all, difficult to model something which is changing while you observe it.
Language drift can be particularly problematic for digital humanities projects with corpora spanning a century or more. As Ben Schmidt has pointed out, topic models trained on such corpora produce topics which are not stable over time – e.g. a single topic represents different or drifting concept during different windows of time.
But the changes of a language are not restricted to such vast time scales. On social media and other online platforms, words and meanings come and go, sometimes quite rapidly. Indeed, there’s no a priori reason to think such rapid change isn’t a feature of all every day language – it is simply better documented through digital records.
This raises interesting questions and problems for scholars doing text analysis – at what time scales do you need to worry about language change? What does language change indicate for an individual or for a society?
Studying users of two online beer discussion forums, they find remarkably that users have a consistent life cycle – new users adopt the language of the community, getting closer and closer to linguistic norms. At a certain point, however, their similarity peaks – users cease changing with the community and move further and further linguistically as a result.
The language of the community continues changing, but the language of these “older” users does not.
This finding is reminiscent of earlier studies on organizational learning, such as those by James March – in which employees learn from an organization while the organization simultaneously learns from the employees. In his simulations, organizations in which people learn too quickly fail to converge on optimal information. Organizations in which people learn more slowly – or in which employees come and go – ultimately converge on better solutions.
Both these findings reflect the sociolinguistic theory of adult language stability – the idea that your learning, and specifically your language stays steady after a certain age. The findings from Danescu-Niculescu-Mizil, however, suggests something more interesting: your language becomes stable overtime in a given community. It’s not clear that your overall language will stabilize, rather, you learn the norms of a given community. Since these communities may change overtime, your overall language may still be quite dynamic.
I made the mistake of going outside today, so now all I can think about is how incredibly hot it is. For people who bask in warm weather, I suppose, it is not too miserable – but, for me, upper 80s at the end of September is more that I would hope for.
Mid-60s would do just fine.
If you’re wondering, the average high for Boston in September is a reasonable 73 degrees Fahrenheit. The record high, however, is a discomforting 102, achieved in 1881.
I was curious to learn more about that heat wave – hoping, perhaps, for some eloquently antiquated news paper articles on the subject.
Instead, I found something much more interesting. The record 102 temperature was reached on September 7, 1881 – the day after the “Yellow Day,” when “saffron curtain” mysteriously blanketed New England states.
It was eventually traced back to the great Thumb Fire of Michigan, one of the most devastating fires in that state’s history, burning over a million acres, but at the time, no one had any idea what was going on.
Yesterday Boston was shrouded, and nature’s gloom soon infusing itself into the hearts of all made it a day long to be remembered, reminding one vividly of the famous dark day of years ago. About 7 O’Clock in the morning the golden pall shrouded the city in its embrace, and the weird unreal appearance continued throughout the day. As one approached a doorway from within and glanced out upon the sidewalk and street, it was difficult to dispel the illusion that an extensive conflagration was raging near, and that it was the yellow, gleaming light from the burning houses that produced the singular effect. Stepping to the sidewalk and glancing upward the roofs of the houses cut sharp and clear against the depths beyond.
The air became still, and calm, during that Tuesday, and people remarked about the odd tinge that colors took on as the day wore on. Plants were particularly brilliant – the odd light sharpening their green and blue hues. Lawns, usually a mundane green, took on brilliant color, and looked oddly bluish, in the day’s strange light. Yellow objects appeared colorless and white, and the color in red objects popped, while blue objects became ghostly. People in the street looked sickly and yellowish. Overhead, birds flew low in the skies.
The event was particularly startling because professed Prophetess Mother Shipton had reportedly predicted some two centuries before:
The world to an end shall come, In eighteen hundred and eighty one.
As far as I can tell, however, the world did not actually come to an end that day.
Reading articles skeptical of the veracity of topic model outputs has reminded me of this passage from Wittgenstein’s Philosophical Investigations:
Our language can be seen as an ancient city: a maze of little streets and squares, of old and new houses, and of houses with additions from various periods; and this surrounded by a multitude of new boroughs with straight regular streets and uniform houses.
In short: words are complicated. Their meaning and use shifts over time, building a complex infrastructure which can be difficult to interpret. Indeed, humanists can spend a whole career examining and arguing over the implications of words.
In theory, topic models can provide a solution to this complication: if a “topic” accurately represents a “concept,” then it dramatically reduces the dimensionality of a set of documents, eliciting the core concepts while moving beyond the complication of words.
Of course, topics are also complicated. As Ben Schmidt argues in Words Alone: Dismantling Topic Models in the Humanities, topics are even more complicated – words, at least, are complicated in an understood and accessible way. Topics models, on the other hand, are abstract and potentially inaccessible to people without the requisite technical knowledge.
To really understand a topic returned by a topic model, it is not enough to look at the top N words – a common practice for evaluating and presenting topics – you need to look at the full distribution.
But what does it even look like to examine the distribution of words returned by a topic model? The question itself belies understanding.
While “words” are generally complicated, Schmidt finds a clever opportunity to examine a distribution of “words” using ships logs. Each text contains the voyage of a single ship and each “word” is given as a single longitude and latitude. The “words” returned by the topic model can then be plotted precisely in 2D space.
With these visualizations of topic distributions, Schmidt raises important questions about the assumptions of coherence and stability which topic models assume.
He doesn’t advocate entirely against topic models, but he does warn humanists to be weary. And, importantly, he puts forth a call for new methods to bring the words back to topic models – to find ways to visualize and understand entire distributions of words rather than simply truncating topics to lists of top words.
Yesterday, President Trump issued his third travel ban. As you may recall, the previous Executive Order on this topic called for the “assessment of current screening and vetting procedures.” While the ban itself was suspended by numerous legal challenges, apparently the information gathering work was in fact completed.
The new travel ban effects nationals of 8 countries – nationals of Chad, Iran, Libya, Syria, Venezuela, Yemen, Somalia, and North Korea. Sudan was removed from the previous travel ban list, while Venezuela and North Korea were added. Six of the effected countries have majority muslim populations.
The new ban will remain in effect indefinitely.
Experts indicate that the new ban will be harder to challenge in court. It is more polished, more precise, and more removed from President Trump’s numerous anti-Muslim campaign comments. It ameliorates some of the most egregious problems with the initial, January 27 ban: there will be a several week delay before the new ban goes into effect, people who currently hold valid visa will not be effected by the new ban, and restrictions vary slightly by country, allowing, for example, Iranians with valid student visas to enter the country.
In short, this is what a politically savvy travel ban would have looked like in the first place. It has been thoroughly considered and vetted; carefully dressed up to give the impression of a relatively reasonable piece of U.S. policy.
But make no mistake: this travel ban still represents a grave overreach based in fear and racism. It is still unacceptable.
I have attended several travel ban protests in the last nine months and it looks as though in the near future I’ll be attending more.
And while attending those protests, I suppose I’ll be remembering Machiavelli’s advice to his beloved prince: If you’re going to do something terrible, start by doing something as terrible as possible. Then, when you benevolently scale back to something slightly less terrible, the people will appreciate your reasonableness and moderation.
Both gender and language are social constructs, and sociological research indicates a link between the two.
In Lakoff’s classic 1973 paper, Language and woman’s place, she argues that “the marginality and powerlessness of women is reflected in both the ways women are expected to speak, and the ways in which women are spoken of.” This socialization process achieves its end in two ways: teaching women the ‘proper’ way to speak while simultaneously marginalizing the voices of women who refuse to follow the linguistic norms dictated by society. As Lakoff writes:
So a girl is damned if she does, damned if she doesn’t. If she refuses to talk like a lady, she is ridiculed and subjected to criticism as unfeminine; if she does learn, she is ridiculed as unable to think clearly, unable to take part in a serious discussion: in some sense, as less than fully human. These two choices which a woman has – to be less than a woman or less than person – are highly painful.
Lakoff finds numerous lexical and syntactic differences between the speech of men and women. Women tend to use softer, more ‘polite’ language and are more like to hedge or otherwise express uncertainty with in their comments. While she acknowledges that – as of the early 70s – these distinctions have begun to blur, Lakoff also notes that the blurring comes almost entirely in the direction of “women speaking more like men.” Eg, language is still gendered, but has acceptable language grown in breadth for women, while ‘male’ language remains narrow and continues to be taken as the norm.
A more recent study by Sarawgi et al looks more closely at algorithmic approaches to identifying gender. They present a comparative study using both blog posts and scientific papers, examining techniques which learn syntactic structure (using a context-free grammar), lexis-syntatic patterns (using n-grams), and morphological patterns using character-level n-grams.
Sarawgi et al further argue that previous studies made the gender-identification task easier by neglecting to account for possible topic bias, and they therefore carefully curate a dataset of topic-balanced corpora. Additionally, their model allows for any gamma number of genders, but the authors reasonably restrict this initial analysis to the simpler binary classification task, selecting only authors who fit a woman/man gender dichotomy.
Lakoff’s work suggests that there will be lexical and syntactic differences by gender, but surprisingly, Sarawgi et al find that the character-level n-gram model outperformed the other approaches.
This, along with the fact that the finding holds in both formal and informal writing, seems to suggest that gender-socialized language may be more subtle and profound than previously thought. It is not just about word choice or sentence structure, it is more deeply about the very sounds and rhythm of speech.
The character n-gram approach used by Sarawgi is taken from an earlier paper by Peng et al which uses character n-grams for the more specific task of author attribution. They test their model on English, Greek, and Chinese corpora, achieving impressive accuracy on each. For the English corpus, they are able to correctly identify the author of text 98% of the time, using a 6-gram character model.
Peng et al make an interesting case for the value of character n-grams over word n-grams, writing:
The benefits of the character-level model in the context of author attribution are that it avoids the need for explicit word segmentation in the case of Asian languages, it captures important morphological properties of an author’s writing, it can still discover useful inter-word and inter-phrase features, and it greatly reduces the sparse data problems associated with large vocabulary models.
While I initially found it surprising that a character level n-gram approach would perform best at the task of gender classification, the Peng et al paper seems to shed computation light on this question – though the area is still under theorized. If character n-grams are able to so accurately identify the single author of a document, and that author has a gender, it seems reasonable that this approach would be able to infer the gender of an author.
Still, the effectiveness of character n-grams in identifying an author’s gender indicates an interesting depth to the gendered patterns of language. Even as acceptable language for women converges to the acceptable language of men, the subtleties of style and usage remain almost subconsciously gendered – even in formal writing.
Last week, an altercation related to a “What is Gender?” event occurred in Speaker’s Corner – “a traditional site for public speeches and debates” in London.
The event was organized by a group self-identified “gender-critical feminists” – essentially, women who don’t believe that all women deserve equal rights.
As you might imagine, in the face of such an event a group of protestors showed up to demonstrate in favor of the opposite: all women deserve to be treated with dignity and respect.
From there, details begin to get fuzzy, but it appears that a woman from the first group – the “gender-critical feminists” – began harassing and attacking a woman from the second group – those supporting equality. The attacker was eventually pulled off the victim, getting clocked in the face in the process.
Afterwards, pictures of the attacker’s bruised face began to circulate online, along with a questionable story. The woman – who can be seen in a video to be shaking another woman like a rag doll until a third woman intervenes – claimed that she was the real victim; the other women attacked her.
Except, she didn’t say women.
“Gender-critical feminist” is a palatable label adopted by women more colloquially known as TERFs – Trans-Exclusionary Radical Feminists. They are fervently passionate self-identified feminists whose feminism does not have space for all women.
In short, the attacker, having incited violence with seeming intention, proceeded to misgender her victims and successfully paint herself in popular media as just a normal old woman who was wrongly attacked while attempting to mind her own business.
This narrative is exceedingly dangerous.
Taken by itself, the event is unfortunate. Indeed, any time a person is attacked in the street is cause for concern.
But the narrative that emerged from this incident plays dangerously into broader misconceptions and stereotypes. It reinforces the idea that some women are inherently dangerous and that other women would be wise to distance themselves; it tacitly assumes that only some women are ‘truly’ women in some mystically vague sense of the word, while other women are not; and it erases and attempts to overlay the experience of women for whom these first two statements ring so obviously false.
It is gaslighting on a social scale.
Consider the account described in a statement by Action For Trans Health London, one of the organizations leading the demonstration against the TERFs:
Throughout the action, individuals there to support the ‘What is Gender?’ event non-consensually filmed and photographed the activists opposing the event. Often photos and videos taken by transphobes are posted online with the intention of inciting violence and harassment against trans activists. Due to this clear and documented history of transphobes violently ‘outing’ individuals of trans experience, visibility can be a high risk to trans individual’s personal safety.
During the action, a transphobe approached activists whilst filming with their camera. An individual then attempted to block their face from the lens of camera, leading to a scuffle between both individuals. This altercation was quickly and efficiently broken up by activists from both sides.
Action for Trans Health London later shared personal accounts from women who were assaulted by TERFs during the events of that evening.
Activists had good reason to be concerned for their safety.
Yet the stories emerging from that night don’t talk about the women who were assaulted. They don’t talk about the valid fear these women experienced when someone got up in their face with a camera. They didn’t talk about the pattern of violence and harassment these women face while just trying to lead their normal lives.
In fact, the stories do worse than ignore the incident all together. They blare the headline that a woman was hit during the altercation while reserving the full sense of ‘woman’ for the perpetrator; implicitly directing compassion to the person who did the attacking.
If you’re not familiar with the term, gaslighting is “a form of manipulation that seeks to sow seeds of doubt in a targeted individual or members of a group, hoping to make targets question their own memory, perception, and sanity.”
If you have never experienced gaslighting, be glad. If you have experienced gaslighting, you know that it is one of the worst possible sensations. You lose the ability to trust yourself, to trust your own instincts and senses. You lose the ability to know what is real due to the unwavering insistence of those around you that your reality is false.
And make no mistake, the dominant narrative emerging from the incident at Speaker’s Corner is a sophisticated form of gaslighting.
It is gaslighting when an attacker is allowed to mischaracterize their victims, it is gaslighting when the injuries suffered by an attacker are treated as more concerning than the injuries they inflicted, and it is gaslighting to pretend that people who have been systematically and zealously victimized are somehow the real perpetrators deserving of our scorn.
The sad truth is that there is an epidemic of violence against trans people. In the United States alone, at least 20 transgender people have been violently killed so far in 2017. Seven were murdered within the first six weeks of the year. Almost all were transgender women of color.
We cannot pretend that this violence isn’t occurring, and we cannot stay silent in the face of false narratives which wrongfully defame and mischaracterize an entire population of women.
I don’t know how to say it more plainly than that. To deny the rights of all women, to deny the existence of all women, and to deny the richly varied experiences of all women is simply unconscionable. You cannot do those things and call yourself a feminist.
I am not much of anyone, and it is always daunting to wonder what one small person can do in the face of terrible, complex, and systemic problems. I endeavor to do more, but literally the least I can do is to say this:
To all my transgender sisters: I see you. I believe you. And I will never, ever, stop fighting for you. I will not be silent.
Text processing algorithms are notoriously bad at processing humor. The subtle, contradictory humor of irony and sarcasm can be particularly hard to automatically detect.
If, for example, I wrote, “Sharknado 2 is my favorite movie,” an algorithm would most likely take that statement at face value. It would find the word “favorite” to be highly correlated with positive sentiment. Along with some simple parsing, it might then reasonably infer that I was making a positive statement about an entity of type “movie” named “Sharknado 2.”
Yet, if I were indeed to write “Sharknado 2 is my favorite movie,” you, a human reader, might think I meant the opposite. Perhaps I mean “Sharknado 2 is a terrible movie,” or, more generously, “Sharknado 2 is my favorite movie only insofar as it is so terrible that it’s enjoyably bad.”
This broader meaning is not indicated anywhere in the text, yet a human might infer it from the mere fact that…why would Sharknado 2 be my favorite movie?
There was nothing deeply humorous in that toy example, but perhaps you can see the root of the problem.
Definitionally, irony means expressing meaning “using language that normally signifies the opposite,” making it a linguistic maneuver which is fundamentally difficult to operationalize. A priori, how can you tell when I’m being serious and when I’m being ironic?
Humans are reasonably good at this task – though, suffering from resting snark voice myself, I do often feel the need to clarify when I’m not being ironic.
Algorithms, on the other hand, perform poorly on this task. They just can’t tell the difference.
This is an active area of natural language processing research, and progress is being made. Yet it seems a shame for computers to be missing out on so much humor.
I feel strongly that, should the robot uprising come, I’d like our new overlords to appreciate humor.
Something would be lost in a world without sarcasm.
I had the pleasure of attending a talk today by Dashun Wang, Associate Professor at Northwestern’s Kellogg School of Management. While one of our lab groups is currently studying the ‘science of success,’ Wang – a former member of that lab, is studying the nature of failure.
Failure, Wang argued, is much more ubiquitous than success. Indeed, it is a “topic of the people.”
It is certainly a topic those of us in academia can relate to. While people in all fields experience failure, it can perhaps more properly be considered as a way of life in academia. The chances of an average doctoral student navigating the long and winding road to success in academia are smaller than anyone wants to think about. There aren’t enough jobs, there’s not enough funding, and the work is really, really hard. More than that, it’s ineffable: how do you know when you’re ‘generating knowledge’? What does that look like on an average day?
Mostly it looks like failure.
It looks like not knowing things, not understanding things, and not getting funding for the things about which you care most. It looks like debugging for hours and it looks like banging your head against the wall.
It looks like a lot of rejections and a lot of revise & resubmits.
Those successful in academia – subject, as they are to the fallacy of survival bias – often advise becoming comfortable with the feeling of failure. With every paper, with every grant, simply assume failure. It is even becoming common for faculty to share their personal CV of Failures as a way to normalize the ubiquity of failure in academia.
But, Wang argues, failure is the key to success.
I suppose that’s a good thing, since, as he also points out, “in life you don’t fail once, you fail repeatedly.”
Failure is a thinning process, no doubt – many people who experience significant failure never come back from it. But a series of failures is no guarantee of future failure, either.
People who stick with it, who use failures as an opportunity to improve, and who learn – not just from their most immediate failure but from their history of failure – can, with time, luck, and probably more failures, eventually succeed.