Path: news1.ucsd.edu!ihnp4.ucsd.edu!swrinde!news.sgi.com!news.msfc.nasa.gov!newsfeed.internetmci.com!news.mathworks.com!uunet!in3.uu.net!news.new-york.net!news.columbia.edu!news.cs.columbia.edu!news.cs.columbia.edu!news-not-for-mail From: radev@news.cs.columbia.edu (Dragomir R. Radev) Newsgroups: comp.ai.nat-lang,comp.ai,comp.answers,news.answers Subject: Natural Language Processing FAQ Supersedes: Followup-To: comp.ai.nat-lang Date: 28 Jul 1996 16:04:17 -0400 Organization: Columbia University, Dept. of Computer Science, NYC Lines: 1336 Approved: news-answers-request@MIT.EDU Expires: 10 Sep 1996 20:03:31 GMT Message-ID: NNTP-Posting-Host: tune.cs.columbia.edu Summary: This posting contains Frequently Asked Questions (FAQ) about natural language processing and their answers. It should be read by anyone who wishes to post to the comp.ai.nat-lang newsgroup. Keywords: language natural processing computational linguistics Cc: Xref: news1.ucsd.edu comp.ai.nat-lang:4294 comp.ai:24015 comp.answers:15817 news.answers:63184 Last-Modified: June 05, 1995 18:00 EST Posting-Frequency: Monthly Version: 0.06 Archive-Name: natural-lang-processing-faq This is the new draft of a FAQ (frequently asked questions and answers) list for the comp.ai.nat-lang newsgroup. The main reason for posting it now is for me to get as much feedback as possible before I go any further. Please don't hesitate to send me any comments, be they positive or negative. There are many blank spots in the FAQ, please help fill them. Copyright (c) 1994, 1995 Dragomir R. Radev. All rights reserved. Permission to distribute this FAQ by all volatile electronic means (mailing lists, FTP, WWW, Usenet news, etc.) is hereby given under the restriction that the file is not modified and all disclaimers and acknowledgements remain intact. This permission does NOT apply to CD-ROMS and/or commercial printed publications. All requests for republication in this case should be referred to the FAQ maintainer (radev@cs.columbia.edu) Version: 0.06 TABLE OF CONTENTS ================= [1] What is this FAQ all about [2] What is Computational Linguistics [3] What is comp.ai.nat-lang [4] How to get this FAQ [5] World-Wide Web resources. [6] Which schools offer graduate programs in CL/NLP [7] How to apply to graduate school in CL/NLP in the USA [8] Where to get information on graduate programs [9] Major non-academic research laboratories [10] What major publications exist in the field [12] Electronic mailing lists [13] Newsgroups [14] Professional Organizations, Associations [15] Conferences [16] Evaluation Competitions [17] How to join a mailing list [18] How to obtain files by anonymous ftp [19] FTP repositories [20] What are some important books in NLP [21] Encyclopedia of Artificial Intelligence [22] Machine Translation [23] What are the major accomplishments of the field [24] About this FAQ Disclaimers and Notes --------------------- 1. Please read this FAQ list before posting to comp.ai.nat-lang 2. The FAQ is a collection of materials, rather than a complete reference. Some of the information may be out of date, so please be careful and take everything with a grain of salt. The maintainer, Dragomir R. Radev (radev@cs.columbia.edu), doesn't assume any responsibility for wrong information. The list of contributors to the FAQ appears at the end of this document. 3. Any comments,contributions, and corrections are more than welcome. Please help make the FAQ really helpful and interesting. ----------------------------------------------------------------------------- [1] What is this FAQ all about This is an attempt to put together a list of frequently (and not so frequently) asked questions about Natural Language Processing and their answers. This document is in no way perfect or complete or 100% accurate. In no way should the maintainer be responsible for damage resulting directly or indirectly from using information in this FAQ. The FAQ originated from Mark Kantrowitz's FAQ on AI. Some questions in the present document come directly from Mark's original FAQ. ----------------------------------------------------------------------------- [2] What is Computational Linguistics Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science that is aiming at computational models of human cognition. Computational linguistics has applied and theoretical components. The applied component of CL is more interested in the practical outcome of modelling human language use. The goal is to create software products that have some knowledge of human language. Such products are urgently needed for improving human-machine interaction since the main obstacle in the interaction beween human and computer is one of communication. Today's computers do not understand our language but computer languages are difficult to learn and do not correspond to the structure of human thought. Although existing CL programs are far from achieving human ability, they have numerous possible applications. Even if the language the machine understands and its domain of discourse are very restricted, the use of human language can increase the acceptance of software and the productivity of its users. Natural language interfaces enable the user to communicate with the computer in German, English or another human language. Some applications of such interfaces are database queries, information retrieval from texts and so-called expert systems. Current advances in recognition of spoken language improve the usability of many types of natural language systems. Communication with computers using spoken language will have a lasting impact upon the work environment, completely new areas of application for information technology will open up. Much older than communication problems between human beings and machines are those between people with different mother tongues. One of the original goals of applied computational linguistics was fully automatic translation between human languages. From bitter experience scientists have realized that they are far from achieving this. Nevertheless computational linguists have created software systems which can simplify the work of human translators and clearly improve their productivity. The future of applied computational linguistics will be determined by the growing need for user-friendly software. Even though the successful simulation of human language competence is not to be expected in the near future, computational linguists have numerous immediate research goals involving the design, realization and maintenance of systems which facilitate everyday work, such as grammar checkers for word processing programs. Theoretical CL takes up issues in theoretical linguistics. It deals with formal theories about the linguistic knowledge that a human needs for generating and understanding language. Today these theories have reached a degree of complexity that can only be managed by employing computers. Computational linguists develop formal models simulating aspects of the human language faculty and implement them as computer programmes. These programmes constitute the basis for the evaluation and further development of the theories. In addition to linguistic theories, findings from cognitive psychology play a major role in simulating linguistic competence. Within psychology, it is mainly the area of psycholinguistics that examines the cognitive processes constituting human language use. The special attraction of computational linguistics lies in the combination of methods and strategies from the humanities, natural and behavioural sciences, and engineering. ----------------------------------------------------------------------------- [3] What is comp.ai.nat-lang Here follows the original charter for comp.ai.nat-lang. Name: comp.ai.nat-lang Moderation: This group will be unmoderated. Purpose: To discuss issues relating to natural language, especially computer-related issues from an AI viewpoint. The topics that will be discussed in this group will concentrate on, but are not limited to, the following: * Natural Language Understanding * Natural Language Generation * Machine Translation * Dialogue and Discourse Systems * Natural Language Interfaces * Parsing * Computational Linguistics * Computer-Aided Language Learning This group will avoid discussing issues that are more properly covered by other newsgroups. For example, speech synthesis should be discussed in comp.speech. However, due to the interdisciplinary nature of the field, there may be overlap in material between other groups. To try to keep this to a minimum, topics should pertain to computer-related aspects of natural language. Rules of Decorum: Because of the unmoderated format, anyone with access to this newsgroup will be able to post without review. This is meant to encourage discussion of the topics. Please refrain from "flames" or unnecessary criticism of a person's viewpoints or personality in a harsh or insulting manner. Criticisms should constructive and polite whenever possible. ----------------------------------------------------------------------------- [4] How to get this FAQ This FAQ is available currently from the following newsgroups: comp.ai.nat-lang, comp.answers, comp.ai, and news.answers The official archive of the above newsgroups is at MIT. You can get a copy of the FAQ from ftp://rtfm.mit.edu/pub/usenet-by-hierarchy/comp/ai/nat-lang The current copy can also be retrieved from the following HTTP: http://www.cs.columbia.edu/~acl/nlpfaq.txt ----------------------------------------------------------------------------- [5] World-Wide Web resources. The fullest archive of Web resources related to Natural Language Processing and Computational Linguistics is available from the ACL home page: http://www.cs.columbia.edu/~acl Click on "NLP/CL Universe" to get to the directory of NLP-related resources. Drago ----------------------------------------------------------------------------- [6] Which schools offer graduate programs in CL/NLP This list is, *of course*, completely preliminary. Please send me information about other programs. I will try and get in touch with the editors of the ACL guide to Graduate Programs in CL for more information. Universities are given in alphabetical order. If a certain university is not included now and you feel it must be included, please send me some information about it. Australia: Melbourne, University of Microsoft Institute of Advanced Software Technology in association with Macquarie University Canada: Montreal, University of Ottawa, University of Toronto, University of Waterloo, University of Finland: Helsinki, University of France: Paris 7, Jussieu, University of Germany: Bonn, University of Heidelberg, University of Humboldt University, Berlin Koblenz-Landau, University of Saarlandes, University of the Stuttgart, University of Tuebingen, University of Italy: Pisa, University of Trento, University of Japan: Kyoto University Korea: Pohang University of Science and Technology, Pohang Netherlands: Amsterdam, University of Groningen, University of Nijmegen, University of Tilburg, University of Utrecht, University of Sweden: Goteborg (Gothenburg), University of Uppsala, University of Switzerland: Geneva, University of Zurich, University of UK: Brighton, University of Cambridge, University of Durham, University of Essex, University of Edinburgh, University of Sheffield, University of Sussex, University of USA: Brown University Buffalo, SUNY at California at Berkeley, University of California at Los Angeles, University of Carnegie-Mellon University Columbia University Delaware, University of Duke University Georgetown University Georgia, University of Georgia Institute of Technology Harvard University Indiana University Johns Hopkins University Massachusetts at Amherst, University of Massachusetts Institute of Technology New Mexico State University New York University Pennsylvania, University of Rochester, University of Southern California, University of Stanford University SUNY, Buffalo Wisconsin - Milwaukee, University of Yale University ----------------------------------------------------------------------------- [7]How to apply to graduate school in CL/NLP in the USA Usually, the best timetable is as follows (given that M is the month when your studies would start, usually, in September) M - 24 : Try to clarify your interests, is it really NLP that you are interested in, what possible subfields might be of interest to you, etc. Remember: 5 years working in an area you are not interested in will be a very painful experience. M - 18 : Read publications in the area of your interest in order to discover the best places for you to apply in terms of research, and professors. Remember: Unless you are familiar with the most current research, you will not be able to find the best place for you. M - 18 : Go to your local library and consult some of the available directories (see [3-3]) - write down as much information as you can about some 15-25 universities. These universities form your preliminary list. Remember: There are some 100 universities in the USA offering NLP/CL programs. Some of them will be more attractive to you than others. M - 18 : Talk to your advisers at school, talk to other students, post questions on the Internet. This way you will get advice on a few more univer- sities that you might have skipped until this moment. Remember: Others have faced what you are going through. Use their experience. M - 15 : Send letters to the universities that you have on your preliminary list. Make sure you indicate when do you want to start, what degree (MA, MS, Ph.D.) you are interested in, whether or not you will be applying for financial aid, whether you will need some special visa... Remember: Ask for all the information that you need, give them all the information they'd need to satisfy your request. M - 12 : Read carefully the information that you have received from the universities. Shorten your list of places to the number that you will eventually apply to (usually 5-8 is a good number). Make Remember: Make sure you include both your best choice schools and some places where you are almost certain of getting accepted. M - 10 : Fill in all the forms that are sent to you, ask your professors to send reference letters to the schools directly. Remember: Professors will be probably very busy at that time of the year (any time of the year...) Give them the reference forms as early as possible and make sure you specify a reasonable time for them to fill them in and send them out. M - 10 : (or earlier) - take the necessary tests (GRE, TOEFL, or others) that the schools want. Make sure you tell the testing service which universities you want them to send your scores to. Remember: Time yourself through several practice tests. The GRE General test, for example, is more about mastery of timing than knowledge. M - 9 : (approximately) - mail your forms to the schools, preferably 2-3 weeks before the deadlines. Remember: You don't want your applications to get there at the same time as everyone else. Give the admissions committee some extra time to review your application M - 6 : usually six months before the beginning of the semester that you are applying for, you will get a letter saying whether you have been accepted. Remember: Usually, thick letters, e-mails, and telegrams mean acceptance. Thin one-sheet letters will most likely be disappointing for you. M - 5 : now, you have been accepted to a few schools. Go back to the same resources that you used when you were deciding where to apply (journals, catalogs, directo- ries, professors, etc.). Ask the schools that accepted you to fly you in for a visit (many will do this). Remember: Don't forget non-academic factors such as location, financial aid, the athmosphere in the department, etc. ----------------------------------------------------------------------------- [8] Where to get information on graduate programs A: The Peterson's Guide A: The ACL Directory of Graduate Programs in Computational Linguistics ----------------------------------------------------------------------------- [9] Major non-academic research laboratories AT&T Bell Labs, Murray Hill, NJ BBN Systems and Technologies Corporation Bellcore, Morristown, NJ DFKI (German research center for AI) General Electric IRST, Italy IBM T.J. Watson Research, Yorktown Heights, NY Microsoft Research, Redmond, WA NEC Corporation SRI International, Menlo Park, CA SRI International, Cambridge, UK Xerox, Palo Alto, CA Xerox, Grenoble, France ----------------------------------------------------------------------------- [10] What major publications exist in the field JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH (JAIR) JAIR is a refereed publication, covering all areas of AI, that is distributed free of charge over the internet by WWW, ftp, electronic mail, gopher, and the newsgroups comp.ai.jair.announce (announcements and abstracts of new papers) and comp.ai.jair.papers (papers, code, and other materials, distinguished by subject line). In addition, each complete volume of JAIR is published by Morgan Kaufmann. Submissions in all areas of AI are invited. Papers should describe work that has both practical and theoretical significance. Only papers of the highest quality will be accepted. JAIR aims for a review turn-around time of about 7 weeks, with electronic publication occurring immediately after the editor receives the final version of an accepted article. JAIR can be accessed by via the World Wide Web using the URL http://www.cs.washington.edu/research/jair/home.html by gopher to gopher://p.gp.cs.cmu.edu/ or by anonymous ftp to p.gp.cs.cmu.edu:/usr/jair/pub/ ftp.mrg.dist.unige.it:/pub/jair/pub/ For more information, send electronic mail to jair@cs.cmu.edu with the subject AUTORESPOND and the message body HELP. Or contact jair-ed@ptolemy.arc.nasa.gov. COMPUTER SPEECH & LANGUAGE (CS&L) Published 4 times annually. ISSN 0885-2308. Subscriptions: Institutions $170, Individuals $75. Harcourt Brace and Company Limited, High Street, Foots Cray, Sidcup, Kent, DA14 SHP. England. Editors: Prof. S.J. Young & Dr. S.E. Levinson Submissions (outside Americas): Prof. Steve Young, Cambridge University Engineering Dept., Trumpington Street, Cambridge, CB2 1PZ, England. Email: sjy@eng.cam.ac.uk Submissions (from Americas): Dr. Steve Levinson, Head Linguistics Reseach, AT&T Bell Laboratories, 600 Mountain Ave., Murray Hill, New Jersey 07974. USA. Email: sel@research.att.com MACHINE TRANSLATION Published 4 times annually. ISSN 0922-6567. Subscriptions: Institutions $141 plus $16 postage; Individuals $55 (members of ACL $46). Kluwer Academic Publishers, PO Box 322, 3300 AH Dordrecht, The Netherlands, or Kluwer Academic Publishers, PO Box 358, Accord Station, Hingham, MA 02018-0358. SPEECH TECHNOLOGY Published quarterly, since 1981. Media Dimensions, New York, NY, USA NATURAL LANGUAGE & LINGUISTIC THEORY (NALA) Published quarterly. ISSN 0167-806X Subscriptions: Individual $59,-/Dfl.156,-; Institutional $200,-/Dfl.383,- including p&h. Kluwer Academic Publishers USA: Order Dept, Box 358, Accord Station, Hingham, MA 02018-0358. Phone (617) 871-6600; Fax (617) 871-6528; E-mail: Kluwer@world.std.com Other: P.O.Box 322, 3300 AH Dordrecht, The Netherlands. Phone (31) 78 524400; Fax (31) 78 183273; Telex: kadc nl; E-mail: vanderLinden@wkap.nl JOURNAL OF NATURAL LANGUAGE ENGINEERING (JNLE) Published quarterly, starting in March 1995. Emphasis: Practical (commercial) applications of computational linguistics. Cambridge University Press, 40 West 20th Street, New York, NY 10011-4211, fax 914-937-4712. Subscriptions: individuals $59, institutions $118. (These prices for USA, Canada, and Mexico only. Outside these countries write to Cambridge University Press, The Edinburgh Building, Cambridge CB2 2RU, UK.) [Note: Subtract 20% pre-publication discount through December 1, 1994.] Editors: Branimir Boguraev, Roberto Garigliano, and John Tait Submissions: From North and South America and Oceania, submit to Branimir Boguraev . From Europe, Asia, and Africa, submit to Roberto Garigliano . See also Computational Linguistics in the ACL entry. ----------------------------------------------------------------------------- [12] Electronic mailing lists Michael Everson has updated his List of Language Lists. FTP LNGLST15.TXT from /everson on . Information Retrieval: irlist Natural Language and Knowledge Representation (moderated): nl-kr@cs.rpi.edu (formerly nl-kr@cs.rochester.edu) Gatewayed to the newsgroup comp.ai.nlang-know-rep. Natural Language Generation: siggen@black.bgu.ac.il LFG (Lexical-Functional Grammar): majordomo@list.stanford.edu Parsing: sigparse@cs.cmu.edu Statistics, Natural Language, and Computing: empiricists@csli.stanford.edu Colibri (weekly update on Conferences, Seminars, Jobs and Shareware in NLP and speech) colibri-request@let.ruu.nl Dependency Grammar dg@ai.uga.edu Prosody: listserv@purccvm.bitnet TEI: tei-l Text Analysis and Natural Language Applications: SCHOLAR@CUNYVM.BITNET Text Corpora: corpora-request@nora.hd.uib.no Speech production and perception: foNETiks LN: ln@frmop11.bitnet Linguist: linguist@tamvm1.tamu.edu ELSNET: elsnet-list@cogsci.ed.ac.uk Eastern (European) Language Engineering list: to join, send mail to poul_andersen@eurokom.ie Preprint archive mailing list For further information about (among other topics) submission of papers to the server, subscribing or canceling your subscription, requesting full text of any of the papers above, retrieving macro files for these papers, searching past listings, or submitting comments to the server operators, send a message: To: CMP-LG@XXX.LANL.GOV Subject: help ----------------------------------------------------------------------------- [13] Newsgroups alt.usage.english English grammar, word usages, and related topics. comp.ai.nat-lang Natural language processing by computers. comp.ai.nlang-know-rep Natural Language and Knowledge Representation. (Moderated) comp.speech Research & applications in speech science & technology. sci.lang Natural languages, communication, etc. alt.etext Electronic texts. comp.text.sgml ISO 8879 SGML structured documents markup languages comp.theory.info-retrieval comp.ai.doc-analysis.misc comp.internet.library ----------------------------------------------------------------------------- [14] Professional Organizations, Associations ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) Natural language processing research and applications. Members receive the journal Computational Linguistics, ISSN 0891-2017. Regular membership $40 ($25 full-time students not earning a regular income; $25 for retired and unemployed), $10 extra for first class/air postage in North America, $20 elsewhere. For more information write to Association for Computational Linguistics, PO Box 6090, Somerset, NJ 08875, or send email to acl@cs.columbia.edu. Institutions must subscribe to the journal through MIT Press Journals, 55 Hayward Street, Cambridge, MA 02142, USA, phone 617-253-2889, fax 617-258-6779, e-mail journals-orders@mit.edu. To get information about the ACL listserver, send mail to listserv@cs.columbia.edu with index acl-l in the message body. To get the membership form, include get acl-l membership-form.txt in the message body. The ACL archive can also be accessed by anonymous ftp from ftp.cs.columbia.edu:/acl-l/. The ACL Web page is accessible through the URLs http://www.cs.columbia.edu/~acl/ ASSOCIATION FOR MACHINE TRANSLATION IN THE AMERICAS (AMTA) 655 Fifteenth Street, NW, Suite 310, Washington, DC 20005 Membership: $40 Associate members, $65 active members, Institutional $200, Corporate $400. Members receive the MT News International and the MT Yellow Pages. SIGNLL is the ACL Special Interest Group on Natural Language Learning (language acquisition and related topics). To join, send mail to walter.daelemans@kub.nl or use the forms on the SIGNLL home page. For more information, see the SIGNLL home page at the URL http://www.cs.rulimburg.nl/~antal/signll/signll-home.html COGNITIVE SCIENCE SOCIETY Membership: $50 individuals, $25 student. Add $15 overseas postage. Members receive a copy of the journal Cognitive Science without additional charge. Write to Alan Lesgold, Secretary/Treasurer, Cognitive Science Society, LRDC, University of Pittsburgh, 3939 O'Hara Street, Pittsburgh, PA 15260, fax 1-412-624-9149, email al+@pitt.edu. AMERICAN ASSOCIATION FOR ARTIFICIAL INTELLIGENCE (AAAI) AAAI, 445 Burgess Drive, Menlo Park, CA 94025. phone 415-328-3123, fax 415-328-4457, info@aaai.org, membership@aaai.org, Membership includes AI Magazine, and the AI Directory: $50 regular, $20 student, $75 institution/library (US/Canadian) $75 regular, $45 student, $100 institution/library (Foreign) AAAI has several special interest groups (SIGs) on medicine, manufacturing, business, and law. (Add $10/year for each subgroup.) Life memberships $700 (US/Canadian), $1000 (Foreign) ----------------------------------------------------------------------------- [15] Conferences COLING - last conference - Kyoto, Japan (August 94) ACL - next conference - Santa Cruz, California (Summer 96) EACL - last conference - Dublin, Ireland (Spring 1995) IJCAI - last conference - Montreal, Canada (Summer 1995) AAAI PacLing - last conference - Brisbane, Australia (Spring 1995) ----------------------------------------------------------------------------- [16] Evaluation Competitions MUC - ARPA Message Understanding Conference Currently running MUC-6 (1994-95) using text articles from the Wall Street Journal Corpus. Systems compete in any or all of five categories including, named entity categorisation, word sense disambiguation, mini-MUC (contents scanning, template filling), coreference identification, predicate-argument identification. TREC - ARPA Text Retrieval Conference Information retrieval using NLP/statistical techniques. ----------------------------------------------------------------------------- [17] How to join a mailing list A: Most often, you have to send mail to the listserver at the site where the mailing list resides, and put "subscribe in the body of the mail message. The underlined text is what you have to type in. Example: Mail listserv@tamvm1.tamu.edu ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Subject: some text here ^^^^^^^^^^^^^^ subscribe LINGUIST Dragomir R. Radev ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ . ^ ----------------------------------------------------------------------------- [18] How to obtain files by anonymous ftp A: There are many ways. The most common way, however, is using a local ftp client. Suppose you want to get the file /pub/editors/webster.tar.Z from ftp.uu.net Here is a sample session. You type in whatever is underlined here. $ftp ftp.uu.net ^^^^^^^^^^^^^^ Connected to ftp.uu.net. 220 ftp.UU.NET FTP server Thu Apr 14 15:45:10 EDT 1994) ready. Name (ftp.uu.net:radev): anonymous ^^^^^^^^^ 331 Password required for anonymous. Password: radev@cs.columbia.edu ^^^^^^^^^^^^^^^^^^^^^ (put your email address here) 230 Guest login ok, access restrictions apply. ftp> cd pub/editors ^^^^^^^^^^^^^^ ftp> binary ^^^^^^ ftp> get webster.tar.Z ^^^^^^^^^^^^^^^^^ 200 PORT command successful. 150 Opening BINARY mode data connection for webster.tar.Z (148579 bytes). 226 Transfer complete. local: webster.tar.Z remote: webster.tar.Z 148579 bytes received in 2.2 seconds (67 Kbytes/s) ftp> quit ^^^^ $ ----------------------------------------------------------------------------- [19] FTP repositories A: Here follows a list of the most popular FTP sites that carry NLP-related materials (data, tools, etc.) * Consortium for Lexical Research (CRL) The Consortium for Lexical Research is designed to serve as a repository for software and resources of importance to the natural language processing research community. Sharable resources, and the task of centralizing lexical data and tools, are of foremost concern in lexical research and computational linguistics. It is our objective to help alleviate the repeated recreation of basic software tools, and to assist in making essential data sources more generally available. CLR maintains a public ftp site, and a separate library of materials only for members of CLR. Currently CLR has about 60 members, mostly academic institutions, and almost every major natural language processing center in the U.S. belongs. Access to the members-only materials is strictly regulated by password and userid. Our catalog of current holdings is available by using anonymous ftp to clr.nmsu.edu (128.123.1.12). The file to 'get' is "catalog.ps" for a postscript version, or "catalog" for a simple ascii version. * Linguistic Data Consortium (LDC) To order LDC materials, send mail to ldc@unagi.cis.upenn.edu or fax your order to (215) 573-2175. If you require additional information before placing your order, please call (215) 898-0464. * Oxford Text Archive (OTA) ftp ota.ox.ac.uk ota/textarchive.list the current catalogue There are two classes of texts available from this FTP server (a) texts which are in TEI format and which we can make freely available (these all appear as category P texts in the shortlist) (b) texts which are available only under our standard conditions of use, (these all appear as category U or A in the shortlist) * University of Michigan Linguistics Archive (UMICH) ftp linguistics.archive.umich.edu /linguistics moderator: John Lawler (jlawler@umich.edu) ----------------------------------------------------------------------------- [20] What are some important books in NLP General: Rustin, Randall (ed.) "Natural Language Processing", Algorithmics Press, New York, NY, 1973. Schank, Roger C., and Colby, Kenneth M. (eds.) "Computer Models of Thought and Language", W.H. Freeman, San Francisco, CA, 1973, 454 pp. Charniak, Eugene and Wilks, Yorick A. (eds.) "Computational Semantics", North-Holland, Amsterdam, Netherlands, 1976, 294 pp. Metzing, Dieter (ed.) "Frame Conceptions and Text Understanding", De Gruyter, Berlin, Germany, 1980, 167 pp. Tennant, Harry R., "Natural Language Processing", Petrocelli Books, New York, NY, 1981. Lehnert, Wendy G., and Ringle, Martin H. (eds.) "Strategies for Natural Language Processing", Lawrence Erlbaum Associates, Hillsdale, NJ, 1982, 533 pp. King, Margaret (ed.) "Parsing Natural Language", Academic Press, London, England, 1983, 308 pp. Gazdar, G. and Mellish, C., "Natural Language Processing in Lisp: An Introduction to Computational Linguistics", Addison-Wesley, Reading, Massachusetts, 1989. (There are three different editions of the book, one for Lisp, one for Prolog, and one for Pop-11.) Michael A. Covington, "Natural Language Processing for Prolog Programmers", Prentice-Hall, Englewood Cliffs, NJ, 1994. ISBN 0-13-629213-5. Grosz, Barbara J., Sparck-Jones, Karen, and Webber, Bonnie L., eds. "Readings in Natural Language Processing", Morgan Kaufmann Publishers, Los Altos, CA, 1986, 664 pages. ISBN 0-934613-11-7, $44.95. Robert C. Berwick, "Computational Linguistics", MIT Press, Cambridge, MA, 1989, ISBN 0262-02266-4. Brady, Michael, and Berwick, Robert C., eds. "Computational Models of Discourse", MIT Press, Cambridge, MA, 1983. Ralph Grishman, "Computational Linguistics: An Introduction", Cambridge University Press, New York, 1986, 193 pages. Allen, James F., "Natural Language Understanding", The Benjamin/Cummings Publishing Company, Menlo Park, California, (Addison-Wesley Publishing Company, Reading, Massachusetts), 1988, 550 pages, ISBN 0-8053-0330-8. [A new edition came out in 1994] Code for the book is available from ftp.cs.cmu.edu:/user/ai/areas/nlp/bookcode/allen/ Terry Winograd, "Language as a Cognitive Process", Addison-Wesley, Reading, MA, 1983. Schank, R. and Abelson, R. "Scripts, Plans, Goals, and Understanding," Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1977. Terminology: David Crystal, "A Dictionary of Linguistics and Phonetics", 3rd Edition, Basil Blackwell Publishers, New York, 1991. Parsing: Tomita, M. (Editor), "Current Issues in Parsing Technology", Kluwer Academic Publishers, Norwell, MA, 1991. Marcus, M. "A Theory of Syntactic Recognition for Natural Language," The MIT Press, Cambridge, MA, 1980. Pereira, F. and Sheiber, S. "Prolog and Natural-Language Analysis," Center for the Study of Language and Information, 1987. Probabilistic Parsing: Ted Briscoe and John Carroll, "Generalised Probabilistic LR Parsing of Natural Language (Corpora) with Unification-based Grammars", University of Cambridge Computer Laboratory, Technical Report Number 224, 1991. Zhi Biao Wu, Loke Soo Hsu, and Chew Lim Tan, "A Survey of Statistical Approaches to Natural Language Processing", Technical report TRA4/92, Department of Information Systems and Computer Science, National University of Singapore, 1992 Natural Language Understanding: Dyer, M. "In-Depth Understanding: A Computer Model of Integrated Processing for Narrative Comprehension," MIT Press, Cambridge, MA, 1983. Aravind Joshi, Bonnie Webber and Ivan Sag, eds. "Elements of Discourse Understanding", Cambridge University Press, New York, 1981. Cohen, P. R., Morgan, J. and Pollack, M., editors, "Intentions in Communication", MIT Press, Cambridge, MA, 1990. Natural Language Interfaces: Raymond C. Perrault and Barbara J. Grosz, "Natural Language Interfaces", Annual Review of Computer Science, volume 1, J.F. Traub, editor, pages 435-452, Annual Reviews Inc., Palo Alto, CA, 1986. Natural Language Generation: McKeown, Kathleen R. and Swartout, William R., "Language Generation and Explanation", in Zock, M. and Sabah, G., editors, Advances in Natural Language Generation, Volume 1, Pages 1-51, Ablex Publishing Company, Norwood, NJ, 1988. (Overview of the state of the art in natural language generation.) Mann, W. & S. Thompson. Rhetorical Structure Theory: a theory of text organization. Speech: Ronnie W. Smith, Spoken Natural Language Dialog Systems: A Practical Approach John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to Speech: The MITalk System", Cambridge University Press, 1987. [Synthesis, precursor of DECtalk.] Frank Fallside and William A. Woods (editors), "Computer Speech Processing" Prentice Hall, Englewood Cliffs, NJ, 1985. X. D. Huang, Y. Ariki and M. A. Jack, "Hidden Markov Models for Speech Recognition", Edinburgh University Press, 1990. [Analysis] A. Nejat Ince (editor), "Digital Speech Processing: Speech Coding, Synthesis, and Recognition", Kluwer Academic Publishers, Boston, 1992. [Analysis and Synthesis] Kai-Fu Lee, "Automatic Speech Recognition: The Development of the SPHINX System", Kluwer Academic Publishers, Boston, MA, 1989. [Analysis] Douglas O'Shaughnessy, "Speech Communication: Human and Machine" Addison-Wesley, MA, 1987. [Analysis and Synthesis] Lawrence R. Rabiner and Ronald W. Schafer, "Digital Processing of Speech Signals", Prentice Hall, Englewood Cliffs, NJ, 1978. [Analysis and Synthesis] Lawrence R. Rabiner and Biing-Hwang Juang, "Fundamentals of Speech Recognition", Prentice Hall, Englewood Cliffs, NJ, 1993. ISBN 0-13-015157-2. [Analysis] Ronald W. Schafer and John D. Markel (editors), "Speech Analysis", IEEE Press, New York, 1979. [Analysis] Alex Waibel and Kai-Fu Lee (editors), "Readings in Speech Recognition" Morgan Kaufmann Publishers, San Mateo, CA, 1990, 680 pages. ISBN 1-55860-124-4, $49.95. [Analysis] Alex Waibel, "Prosody and Speech Recognition", Morgan Kaufmann Publishers, San Mateo, CA, 1988. [Analysis] Machine Translation: W. John Hutchins and Harold L. Somers, "An Introduction to Machine Translation", Academic Press, San Diego, 1992. 362 pages, ISBN 0-123-62830-X. Bonnie J. Dorr, "Machine Translation: A View from the Lexicon" MIT Press, Cambridge, MA 1993. 432 pages, ISBN 0-262-04138-3. Kenneth Goodman and Sergei Nirenburg., editors, "The KBMT Project: A Case Study in Knowledge-Based Machine Translation", Morgan Kaufmann Publishers, San Mateo, CA, 1991. 331 pages, ISBN 1-558-60129-5, $34.95. Arnold, D.J.; Balkan, L.; Lee Humphreys, R.; Meijer, S.; and Sadler, L. (1994). Machine Translation: An Introductory Guide. NCC Blackwell. The journal "Machine Translation" is the principle forum for current research. A review of MT systems on the market appeared in BYTE 18(1), January 1993. Reversible Grammars: Tomek Strzalkowski, editor, "Reversible Grammar in Natural Language Processing", Kluwer Academic Publishers, 1993. Proceedings of the ACL Workshop on Reversible Grammar in Natural Language Processing, UC Berkeley, 1991. (See especially Remi Zajac's paper.) Statistical Processing: Eugene Charniak, "Statistical Language Learning", MIT Press, Cambridge, Massachusetts, 1993, 170 pages. Categorial Grammar (CG): M. Moortgat, "Categorial Investigations. Logical and Linguistic Aspects of the Lambek Calculus", Groningen-Amsterdam Studies in Semantics:9, Foris, Dordrecht, Holland, 1988. Richard T. Oehrle, Emmon Bach and Deirdre Wheeler, "Categorial Grammars and Natural Language Structures", Studies in Linguistics and Philosophy:32, D. Reidel Publishing Company, Dordrecht, 1988. Mary McGee Wood, "Categorial Grammars", Linguistic Theory Guides, Routledge, London, 1993. Dependency Grammar: Igor' Aleksandrovich Mel'cuk, "Dependency syntax : theory and practice", State University Press of New York, 1987. Functional Grammar (aka Systemic Grammar): Michael A. K. Halliday, "An Introduction to Functional Grammar", Edward Arnold, London, 1985. Generalized Phrase Structure Grammar (GPSG): Gerald Gazdar, Ewan Klein, Geoffrey Pullum and Ivan Sag, "Generalized Phrase Structure Grammar", Oxford:Blackwell, 1985. Government and Binding (GB): Noam Chomsky, Lectures on government and binding, Foris Publications 1981. Vivian J. Cook, "Chomsky's Universal Grammar: An Introduction", Basil Blackwell Publisher, New York, 1988, 201 pages. Victoria Fromkin and Robert Rodman, "An Introduction to Language", Holt, Rinehart, and Winston, New York, 4th edition, 1988, 474 pages. Liliane M.V. Haegeman, "Introduction to Government and Binding Theory", Basil Blackwell Publishers, Oxford, 1991, 618 pages. Geoffrey C. Horrocks, "Generative Grammar", Longman, London, 1987, 339 pages. Andrew Radford, "Transformational Grammar: A First Course", Cambridge University Press, New York, 1988, 625 pages. Head-driven Phrase Structure Grammar (HPSG): Carl Pollard and Ivan Sag, "Information-based Syntax and Semantics", Stanford:CSLI, University of Chicago Press, 1987. Lexical-Functional Grammar (LFG): Joan Bresnan (ed.), "The Mental Representation of Grammatical Relations", Cambridge:MA, MIT Press, 1982. Tree Adjoining Grammar (TAG): A. Joshi, L. Levy and M. Takahasihi, "Tree Adjunct Grammars" In: Journal of Computer and System Sciences 10:136-63, 1975. A. Joshi, "An Introduction to Tree Adjoining Grammars" In: Alexis Manaster-Ramer (ed.), "The Mathematics of Language", Benjamins, Philadelphia, 1987. Cognitive Grammar: Ronald W. Langacker, "Foundations of cognitive grammar" Stanford University Press, 1987. Programming for NLP: Pereira, Fernando C.N. and Shieber, Stuart "Prolog and Natural-Language Analysis," Center for the Study of Language and Information, Stanford, CA 1987, 264 pp. Gazdar, Gerald and Mellish, Christopher S., "Natural Language Processing in Lisp: An Introduction to Computational Linguistics", Addison-Wesley, Reading, Massachusetts, 1989. (There are three different editions of the book, one for Lisp, one for Prolog, and one for Pop-11.) Michael A. Covington, "Natural Language Processing for Prolog Programmers", Prentice-Hall, Englewood Cliffs, NJ, 1994. ISBN 0-13-629213-5. Peter Norvig. Paradigms of AI Programming Bibliographies: Gazdar, Gerald, Alex Franz, Karen Osborne, and Roger Evans (1987). "Natural Language Processing in the 1980s: A Bibliography", Center for the Study of Language and Information (CSLI) lecture notes no. 12, CSLI, Stanford, CA, 240 pp. Miscellaneous: Austin, J.L. How to do things with words. Searle, J. Speech acts. Levinson, S. Pragmatics. Ross, Don, and Dan Brink (eds.) (1994) "Research in Humanities Computing 3: Selected Papers from the ALLC/ACH Conference, Tempe, Arizona, March 1991," Clarendon Press, Oxford, England. Gazdar, Gerald, Franz, Alex, Osborne, Karen, and Evans, Roger, "Natural Language Processing in the 1980s: A Bibliography", Center for the Study of Language and Information (CSLI) lecture notes no. 12, CSLI, Stanford, CA, 1987, 240 pp. _The Mulltilingual PC Directory_. By Ian Tresman. 254pp. Stamford CT: Knowledge Computing Ltd. Stefan Wermter, Hybrid connectionist natural language processing Chapman & Hall Inc, 1995. Connectionist approaches to natural language processing. Edited by Ronan G. Reilly and Noel E. Sharky. Earlsdale, 1992 ISBN 0-86377-179-3 _Natural Language Processing_. Ed. Fernando C.N. Pereira and Barbara J. Grosz. A Bradford Book. Cambridge, MA, and London: The MIT Press, 1994. Rptd from _Artificial Intelligence: An International Journal_, Volume 63, Numbers 1-2 (1993). _Research in Humanities Computing 1: Selected Papers from the ALLC/ACH Conference, Toronto, June 1989_. Ed. Ian Lancashire. Oxford: Clarendon Press, 1991. Peter D. Smith, _An Introduction to Text Processing_. Cambridge MA and London: The MIT Press, 1990. ISBN 0-262-19299-3. Computer processing of natural language Author Gilbert K Krulee published Prentice Hall ISBN 0-13-610299-3 Sadock, J. Toward a linguistic theory of speech acts. Vanderveken, D. & J. Searle. Meaning and speech acts. (2 vols.) ----------------------------------------------------------------------------- [21] Encyclopedia of Artificial Intelligence A GUIDE TO COMPUTATIONAL LINGUISTICS ARTICLES IN THE ENCYCLOPEDIA OF ARTIFICIAL INTELLIGENCE, 2nd Edition Stuart C. Shapiro (editor) (John Wiley & Sons, 1992) compiled by: William J. Rapaport Department of Computer Science and Center for Cognitive Science State University of New York at Buffalo Buffalo, NY 14260 rapaport@cs.buffalo.edu AUTHOR TITLE PAGES Volume 1: Bookman, L. A., & Alterman, R. Analog Semantic Features 27-28 Alvarado, S. J. Argument Comprehension 30-52 Kucera, H. Brown Corpus 128-130 Srihari, S. N., & Hull, J. J. Character Recognition 138-150 Ballard, B., & Jones, M. Computational Linguistics 203-224 Hardt, S. L. Conceptual Dependency 259-265 Hindle, D. Deep Structure 328-330 Ingria, R.; Boguraev, B.; & Pustejovsky,J. Dictionary/Lexicon 341-365 Scha, R.; Bruce, B. C.; & Polanyi,L. Discourse Understanding 365-379 Tennant, H. Ellipsis 445-446 Novak, V. Fuzzy Logic: Applications to Natural Language 515-521 Woods, W. A. Grammar, Augmented Transition Network 552-563 Bruce, B., & Moser, M. G. Grammar, Case 563-570 Gazdar, G. Grammar, Generalized Phrase Structure 570-573 Joshi, A. K. Grammar, Phrase Structure 573-580 Burton, R. Grammar, Semantic 580-583 Bateman, J. A. Grammar, Systemic 583-592 Mallery, J. C.; Hurwitz, R.; & Duffy,G. Hermeneutics 596-611 Hill, J. C. Language Acquisition 761-772 Fass, D., & Pustejovsky, J. Lexical Decomposition 806-812 Pustejovsky, J. Lexical Semantics 812-819 Volume 2: Nagao, M. Machine Translation 898-902 Klavans, J. L., & Tzoukermann, E. Morphology 963-972 McDonald, D. D. Natural-Language Generation 983-997 Carbonell, J. G., & Hayes, P. J. Natural-Language Understanding 997-1016 Petrick, S. Parsing 1099-1109 Small, S. L. Parsing, Word-Expert 1109-1116 Wilks, Y., & Fass, D. Preference Semantics 1183-1194 Cruse, D. A. Presupposition 1194-1201 Dyer, M. G.; Cullingford, R. E.; & Alvarado, S. J. Scripts 1443-1460 Sowa, J. F. Semantic Networks 1493-1511 Devlin, K. J. Situation Theory and Situation Semantics 1541-1547 Briscoe, E. J. Speech Recognition 1553-1559 Norvig, P. Story Analysis 1568-1576 Alterman, R. Text Summarization 1579-1587 Sparck Jones, K. Thesaurus 1605-1613 Knight, K. Unification 1630-1636 Additional articles from the 1st edition (1987): Coelho, H. Grammar, Definite Clause 339-342 Berwick, R. Grammar, Transformational 353-361 Newmeyer, F. J. Linguistics, Competence and Performance 503-508 Wilks, Y. Machine Translation 564-571 Tennant, H. Menu-Based Natural Language 594-597 Koskenniemi, K. Morphology 619-620 Bates, M. Natural-Language Interfaces 655-660 Riesbeck, C. K. Parsing, Expectation-Driven 696-701 Keyser, S. J. Phonemes 744-746 Webber, B. Question Answering 814-822 Smith, B. C. Self-Reference 1005-1010 Hirst, G. Semantics 1024-1029 Woods, W. Semantics, Procedural 1029-1031 Allen, J. F. Speech Acts 1062-1065 Allen, J. Speech Recognition 1065-1070 Allen, J. Speech Synthesis 1070-1076 Briscoe, E. J. Speech Understanding 1076-1083 Lehnert, W. G. Story Analysis 1090-1099 ----------------------------------------------------------------------------- [22] Machine Translation Globalink, Inc 9302 Lee Highway Fairfax, VA, 22031, USA Tel: +1 703 273 5600 Fax: +1 703 273 3866 Archers Translation Services 203-205 Desborough Road High Wycombe, Bucks., HP11 2QL, UK Tel: +44 494 537755 Fax: +44 494 474001 Gesellschaft f|r multilinguale Systeme (GMS) Balanstr. 57 81541 Munich, Germany http://www.gmsmuc.de ----------------------------------------------------------------------------- [5] What are the major accomplishments of the field Note: This section is in a very preliminary stage. Overall: Weizenbaum (1966), ELIZA Woods (1967), Procedural semantics Thorne et al. and Woods (1968-70), ATNs Winograd (1970), Shrdlu Colby, Weber & Hilf, 1971; Colby, 1975, PARRY Wilks (1972), Preference semantics Woods et al. (1972), LSNLIS / Lunar Charniak (1972), Frames and demons Wilks (1973), Stanford machine translation project Grosz (1977), Focus in task-oriented dialogues Marcus (1977), Deterministic parsing Davey (1978) Cohen, Phil (1979), Planning speech acts Allen (1980), Understanding speech acts McDonald (1980), MUMBLE McKeown (1982), TEXT Appelt (1982), KAMP (Integration of Functional Grammar with Discourse Plans) Pollack (1986), Plan inference Mann & Thompson (1987), Rhetorical Structure Theory Conceptual Dependency: Schank (1969), Conceptual Dependency Schank, Riesbeck, Rieger, Goldman (1975), MARGIE Cullingford (1979), SAM Wilensky (1979), PAM DeJong (1980), FRUMP Lebowitz (1980), IPP Dyer (1982), BORIS Lytinen (1986), MOPTRANS Hovy (1986), PAULINE Ram (1989), AQUA Dehn (1989), AUTHOR/STARSHIP Martin (1986) Direct Memory Access Parsing (DMAP) Fitzgerald (1995) Indexed Concept Parsing ----------------------------------------------------------------------------- [24] About this FAQ This FAQ is maintained by Dragomir R. Radev from Columbia University. Please send me all your comments, suggestions, corrections, additions, and such to my e-mail address: radev@cs.columbia.edu ----------------------------------------------------------------------------- Large parts of the answers to questions 10, 14, and 20 come from Mark Kantrowitz's comp.ai FAQ. Q.2 is due to Hans Uszkoreit, Q.21 comes from William Rapaport and Stuart Shapiro Partial list of contributors (in alphabetical order): Paul Buitelaar paulb@zag.cs.brandeis.edu Russell Collingham R.J.Collingham@durham.ac.uk Robert Dale rdale@microsoft.com Dan Fass fass@cs.sfu.ca Joshua Goodman goodman@das.harvard.edu Malcolm Grandis Malcolm@celtic.demon.co.uk Graeme Hirst gh@cs.toronto.ca Mark Kantrowitz mkant+ai-faq@cs.cmu.edu Alberto Lavelli lavelli@irst.it David Pautler pautler@ils.nwu.edu Ashwin Ram ashwin@cc.gatech.edu Daniel Radzinski dr@tovna.co.il William J. Rapaport rapaport@cs.buffalo.edu Hinrich Schuetze schuetze@Sante.Stanford.EDU Stuart Shapiro shapiro@cs.buffalo.edu Kevin Thomas kevint@cdplus.com R. M. Thomas rmthomas@sciolus.cistron.nl Hans Uszkoreit uszkoreit@coli.uni-sb.de Gertjan van Noord vannoord@let.rug.nl -- Dragomir R. Radev Graduate Research Assistant Natural Language Processing Group Columbia University CS Department Home: 212-749-9770 Office: 212-939-7121 http://www.cs.columbia.edu/~radev -- Dragomir R. Radev Graduate Research Assistant Natural Language Processing Group Columbia University CS Department Home: 212-749-9770 Office: 212-939-7121 http://www.cs.columbia.edu/~radev .