0statcompling

will net us the phrase Attorney's District Office already. We have have automatically detected an important legal institution. (Essentially, this is what I did at Oracle, and tooting my own horn, I did so before this was an NLP cliche. I will grant the Oracle architecture was so advanced it invited this approach.). This is partial parsing or chunking, a careful step-by-step process that is only remotely connected to the full-blown parsing at, say, METAL. Partial parsing and/or chunking, dealing with input that does not overlap between the constituents and that does not exhibit lexical ambiguity (see above) , does not exhibit spurious parses or combinatorial explosion.

We now write grammar rules {VERBAL_COMPLEX -> VBD* VBZ} and {NOUN_PHRASE_CHUNK --> DT NN}. The rules find the verbal complex were/VBD committed/VBD and the Noun Phrase The/DT Offenses/NNZ. By using information compiled in an ontology/semantic net/thesaurus about what kinds of subjects and what kinds of objects the verb commit takes, a program could automatically decide [The offenses] is the object of [were committed]. We can make a connection between the Noun Phrase The/DT Offenses/NNS and the verbal complex were/VBD committed/VBZ and posit that [The Offenses] is the semantic object of [were committed]. We have achieved this by piecemeal parsing, and in the main by lexical analysis.

A different way of doing NP-chunking is by treating it as a Named Entity Recognition type of problem. Such algorithms do not use grammars, but a kind of "change-over algorithm". Outside the series of tokens that make up the Named Entity the tokens receive a tag, e.g. OUTSIDE. When the first token in the desired and designated class is encountered, the assigned tags are changed, e.g. to BORDER. After the BORDER tag, an INSIDE tag can be assigned to each token. Training data are used to provide counts of the most likely tag given current and previous tags and current and previous token.

The innovation of using, say, Bikel et. al.'s NER formulas over a conventional HMM is that a dependency is created between previous and current token, not just the current token and state and previous states. In order to achieve this, Bikel et. al. extend the standard HMM architecture and incorporate a Markov chain in it on the lexical level. The effect of this is that a series of words, word-1....word-i, are generated within a larger Markov Model that decides which tag state the machine is in. Most of Bikel's motivation to capture co-occurrence on the lexical level seems is that some words signal the BORDER area: the series of words word-1....word-i is inside a given tag as is to be expected given some leading edge signalled the transition from OUTSIDE to INSIDE. For instance, Mr. signals the likelihood of a series of words word-1....words-i that should be tagged with PERSON, e.g. Jones. I am currently writing a critique of this approach, motivated by the idea that the HMM extension is unnecessarily cumbersome and that algorithms giving us what we need are already extant. Such algorithms operate on the trellis.

Let's return to the Klein demo tagger. There's a seemingly unglamorous aspect to this tagger and its eery speed- in-house optimization and endless tinkering have made this program very fast and flexible. That's a lab-specific
success. Such successes will be impossible without expert programmers, a level linguists rarely attain. I have been called a decent programmer, but would not be able to begin to optimize code in such a way as this code has been optimized to judge by the speed.

As a final thought, one of the words is misspelled and the tagger effortlessly labels this word deponent/NN. Only about 15 years ago, parsers would grind to a halt when seeing unknown words, the lights in the building would blink, and after endless churning, unification would determine this was a noun - <insert link Weischedel>. But by then looting hordes encouraged by the resulting black-outs would be loose in the streets pillaging.

Optimization, fast machines, techniques dealing with unknown words, i.e. words not in the dictionary,
and eternal tinkering now make for very fast tagging, enabling useful partial parsing and plausibly enabling real-time semantic role assignment and other complex tasks on large text collections.

pop back to main statistics pages