I am not done here!!

Allen, James.  "Ambiguity Resolution: Statistical Methods", Chapter 7 in:
Natural Language Understanding, Benjamin/Cummings, New York, 1995.
[Puts one on the right footing and provides solid basis for further exploration]

Benish, William A. "The Application of Information Theory to Diagnostic Testing: A Primer", online paper (draft), viewable at http://filer/case.edu/~wab4primer2.pdf,  Case Western, Cleveland, OH, 2006.
[Helped proofread Dr Benish' work while learning basic information theory at the same time. Taking the mathematics of information theory and applying it to the area of medical diagnostics is very innovative.]

Bikel, Daniel M, Richard Schwartz, and Ralph Weischedel, "An Algorithm That Learns What's In a Name",
Machine Learning, 34, 1-3, pp. 211-31, 1999.
[HMMs are mixed with Markov Chains in order to capture lexical co-occurrence probabilities. The resulting 'machine' is part HMM, part Markov Chain, but what is it really? And what are its properties? I am currently thinking about a more conventional apporach, taking into account the need for capturing lexical co-occurrences ].

Bleau, Barbara Lee,
Forgotten Calculus, 3rd ed., 2001, Barron's.

Cutting, Doug, and Julian Kupiec, Jan Pederson, Penelope Sibun. "A Practical Part-Of-Speech Tagger", In T
hird Conference on Applied Natural Language Processing (ANLP 92), pp.133-40, 1992.

Chapman,  Nigel .
Perl: The Programmer's Companion, Wiley's Publishing Company, 1997.
[Being a computational linguist means working through the exercises as if one is, gulp, an undergrad.  Not having been to Great-Britain forever, I worked the online version.]

Church, Kenneth Ward. "A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text",  ANLP 1988,  pp.136-43.

Wagner, Carl. "Basic Probability Theory", Chapter 3 in
Choice, Chance, and Inference: An Introduction to Combinatorics, Probability and Statistics, online book, http://www.math.utk.edu/~wagner/papers/book.pdf,  Department of Mathematics, The University of Tennessee, Knoxville, TN,  37996-1300.
[If you feel the need to acquire a college level understanding of probability theory above and beyond the exposure in Allen.]

Winograd                          
Language as a Cognitive Process, Addison

Siklossy, L.                       
Let's Talk Lisp, University of Texas Press

Dale and Orschalik              PASCAL,
University of Texas Press.

Holub, Allen I.
The C Companion, Prentice-Hall, 1987.
[This gem breaks through every humanities student's anxiety-generating taboo imaginable, and exposes concepts in computer science you need to know to be a C programmer, esp. pointers and stacks. It's only after reading this you can read Peters and Ritchie. Thank you, Franz Weckesser, NLPer with a CS background for the reference. Holub unlocked the door for me to become a reasonable C programmer.]

Wall, Robert                      
Introduction to Mathematical Linguistics.

McNaughton,  Robert E., 
Elementary Computability, Formal Languages, and Automata, Prentice Hall, N.J., 1981.

Dowty, Wall and Peters      
Introduction to Montague Grammar.

Rabiner, L. R. and Juang, B. H., "An introduction to hidden {M}arkov models",
IEEE ASSP Magazine, pp. 4-15, 1986

Wall, Larry and Tom Christiansen, Jon Orwant.
Programming  Perl, Third Ed., O'Reilly, Farnham, 2000.

Weischedel, Ralph, and Richard Schwartz, Jeff Palmucci, Marie Meteer, Lance Ramshaw.  "Coping with Ambiguity and Unknown Words through Probabilistic Models",
Computational Linguistics, pp. 358-382, 1992, 2. [Seminal.]

Zhai, Chengxai.  A Brief Note On the Hidden Markov Models, online lecture notes, Department of Computer Science, University of Illinois at Urbana-Champaign,  March 16, 2003.
[Disappeared and is off line. Good mathematical lecture notes - no proofs, just notation, definitions, and a discussion of getting around numerical underflow by using logarithms for implementation.]



           
1