0statcompling

While considering how to disambiguate business name tokens for my CTO, I read and web-surfed lustily along, tracing the history of speech recognition - nothing like a CTO who trusts your abilities and gives you some room to play - thank you, John Rausch. Thanks also go to Mary Hallebeek, witness to the salubrious effects of speech recognition on a friend of hers who suffers from muscular dystrophy. M's email vignette about the friend described how speech recognition allowed her friend to finish a dissertation. The friend was emotionally attached to her speech recognizing software enough to say good-night to her PC at day's end. The heart-tugger inspired me. (For uses of language and speech technology for the deaf and mute, and other things CL, see R. Orton's Geocities site)

In the late 70s work in speech-recognition, workers at IBM noticed grammars for speech-recognition were brittle, and introduced Hidden Markov Models as a statistical means to model the sequences of phones as a time-series. For a brief explanation for programmers, click here; my own explanation is here. The observations, phones, in this case, are emitted by a state. The observer is unable to see (hence Hidden) what state, though, but can guess on the basis of probabilities. Efficient algorithms (Viterbi, Forward-Backward) aid in this probabilistic guessing. There is also an algorithm to organize the observations into states - the Baum-Welch algorithm. A few names of the speech recognition community involved in these efforts are Jelinek, and Rabiner and Juang. Of Jelinek it is anecdotically said he spoke the words "Every time I fire a linguist, my performance goes up!".

It took a while for the textual NLP community to make a similar discovery and engage itself in statistical efforts. One of the first people to apply the stochastic method to find ngrams was Kenneth Church. The beginning of his Stochastic Parts article describes the choices other types of parsers have to make when choosing between rules, and adding a stochastic score simplifies the choice. In an interview in the Dutch NLP student magazine Ta!, Dr Chuch explains how statistics squelches combinatorial explosion by nipping it in the bud at a lexical level.

Church warns against over-using computational grammars and argues that Exploratory Data Acquisition a la Ken Hale might benefit NLP - note the part about distributional arguments of field linguists and re-read my theoretical linguistics page. Moreover, Church warns against mixing up the questions Chomsky asked with the promise of NLP to deliver systems that society wants and using applied linguistics to get there. A computational linguist must be willing to collect data, and fit them in with the architecture of her local workplace. On METAL, for instance, I used a traditional German grammar book to write a cfg covering German comparatives. That's a descriptive, if you will EDA, activity, and not an explanatory acitivity. It fulfilled a great need to make the system run..

At BBN, Weischedel, Meteer, Grimshaw et. al. also experimented with probabilistic/Bayesian methods of dealing with unknown words and ambiguity in an almost apologetic manner. (While the history of linguistics and the debates of yesteryear are hard to trace, I'd have to attribute the apologetic undertone to Chomsky's critique of Shannon's information theory, with its heavy use of Bayes' law and finite state automata. When building systems that deliver on the promise of processing language and text, however, this fear of statistics is irrational. There is no reason not to use Shannon's monumental theory, and I doubt that Chomksy statistical critique holds much water). In any event, Weischedel et. al. offer arguments why full sentential parsing may not be necessary, proposing instead linking NP and PP arguments to a verb by means of an ontology. This paper discusses PP attachment as well.

Around the same time, on the other coast, if, indeed, BBN is on a coast, workers at Xerox described how to build a practical part-of-speech-tagger using Hidden Markov Models, advocating an unsupervised approach assigning tags automatically by means of the Baum-Welch algorithm (Cutting et. al.). The three approaches sketched here have much in common. They were more modest in approach than, say, the METAL system in that they did not seek to parse unrestricted text fully. In Church's case, this was inherent in the title of his Stochastic Parts article, but even Cutting et. al. modestly proposed partial parsing and assigning of case roles through lexical semantics or ontologies (as they are now called), offering a survey of work at various sites.

Cutting et. al. was in effect the culmination of the three articles described here, as it offered a full and mature description of categorial tagging and disambiguation by means of HMMs, which has become the standard method for tagging; alternatives being Brill's and Daeleman's work. At the closing of the article, exemplary in its clarity of purpose, the authors discuss applications of the work. Penelope Sibun, for example, claims to have reached 80% correct assignment of semantic roles without a sentential parse even back then with her program SOPA. Relating NP's as arguments to verbs without parsing full sentences is an instantiation of Church expresses in the Ta! article as desirable or workable. This approach actually looks at the linguistics and lexical constraints and characteristics, quite in accord with Church's recommendations, which he traces to a linguistics tradition cast aside by Chomsky and disciples.

By the mid-90s, the statistical NLP revolution, a technology revolution rather than a scientific one, was in full swing. With the increasing amount of online text, Church' work, the BBN and Xerox efforts remain relevant today, seemingly an eternity in computational linguistics/information science - 'seemingly' because the algorithms do not change, but their dispersion among workers in various fields does. Assigning case or semantic roles to sentential NP's, in other words, studying the relationships between verbs and arguments, is still topical (see Gilda and Jurafksy - insert link and Schulte im Walde). I will take this up below.

back
next
publications