I’ve been taking a great class this semester in Natural Language Processing – a computer science field which deals, as you may have guessed, with the processing of “natural” language. NLP is the foundation of technologies like spellcheck, automatic translation (a work in progress!), and Siri.
Essentially, you feed a bunch of human-generated text into a computer and it gives you something in response, with the “something” varying greatly based on what you’re trying to do.
A few weeks ago I deleted all the vowels from the Declaration of Independence.
(And then nondeterministically put them back in).
But at more sophisticated levels, you can analyze the sentiment of a text, mimic human dialogue, or generate new text in the style of a given author. Eventually, I hope to use NLP techniques to process transcripts of political and civic dialogue, but for now I’m enjoying learning the basics of the field.
The fundamentals of NLP are fascinating – in our native language, we each easily construct our own sentences and relatively easily interpret the sentiment and meaning of other’s sentences. We’re generally familiar with the basic syntax and parts of speech in our native language, but generally we don’t give these much thought as we communicate with those around us.
And, as spoken languages are living languages, in casual conversation we effortlessly change the rules and adapt to new words and styles.
One might think that teaching a computer all the rules of grammar as well as the flexibly of our unspoken rules would be quite complicated. And that’s true to some extent, but more generally the challenge of computer-interfaced language is just different.
ELIZA, one of the early successful NLP programs, is relatively simple. Programmed to respond to human-typed input as a Rogerian psychotherapist, ELIZA is based off an algorithm of pattern-matching. You say, “I am sad,” and ELIZA responds, “I’m sorry you are sad.”
On the other hand, satire and sarcasm continue to elude NLP programs…such humor is just too subtle to capture in rules, I suppose.
The rules for a given NLP program can become quite elaborate and yet, the underlying theory is relatively simple: you start at the beginning of a sentence, and then explore a set of rules with each rule given with a certain probability. When you reach an end symbol (eg, a period), you are done.