I worked briefly at Oracle, and, before the project I worked on ended,  managed to implement  the CKY algorithm in C and apply it to Named Entity Recognition there - Oracle had an enormous NLP infrastructure at the time already. The CKY algorithm is a so-called dynamic programming algorithm, which traverses a matrix driven by nested loops. (Dynamic programming algorithms abound in NLP. The Forward and Backward algoritms, the Viterbi algorithm, the Forward-Backward algorithm, and the Baum-Welch algorithm - see next page - are all instances of it. All traverse a matrix after defining a base-step and avoid repetition by using an induction step. In bioinformatics, the Needleman-Wunsch and Levenshtein algorithms are examples of dynamic programming.) 

Oracle's programs had the ability to add features to online words, and I leveraged off of that and capitalization to grab business names, person names and institution names. My favorite institution was the
Budweiser Beach Volleyball League of California. When I saw this institution name glide across the screen I whistfully looked from my cube to the Bay Area, wondering if being a California beach bum was still a career option. A word to the chorus out there that says a context-free parsing algorithm for Named Entities is computationally inefficent. Yes, I know all the complexity results (you wise-asses). However, my grammar did not consist of many more than 25 rules, was completely transparent, and achieved huge coverage. Tell me that is true for your average type 1 grammar.

I returned to LexisNexis and was asked to attack ambiguity in business name for a search engine specializing in names, business names, and so on. My CTO expected me to use the usual finite state approach, but I started web surfing on new developments, and slowly caught on to the statistical revolution, which, in the mean time, has gone mainstream.

previous
next
publications
1