没有合适的资源?快使用搜索试试~ 我知道了~
首页《自然语言处理》神级教材:第二版精华解析
《自然语言处理》神级教材:第二版精华解析
![](https://csdnimg.cn/release/wenkucmsfe/public/img/star.98a08eaa.png)
《自然语言处理:理论与实践》(Speech and Language Processing)是丹尼尔·朱尔夫斯基(Daniel Jurafsky)和詹姆斯·马丁(James H. Martin)合著的一本经典教材,该书在计算机语音和语言处理领域具有很高的地位。作为第二版,它关注的是自然语言处理(NLP)、计算语言学和语音识别等核心主题,这些技术旨在使计算机能够执行与人类语言相关的实用任务。 书中提到,自然语言处理的目标是让计算机能够进行人机交互,提升人际交流效率,或者对文本和语音进行有用的数据处理。例如,一个重要的应用场景就是对话式代理(Conversational Agent),如电影《2001太空漫游》中的HAL 9000计算机。HAL作为人工智能代理,展现了高度的人类语言理解和交互能力,这是NLP技术的一个显著应用实例。 通过学习本书,读者将了解到NLP的基础理论,包括词法分析、句法分析、语义理解、篇章结构分析等,这些都是构建智能系统的核心组成部分。此外,书中还会深入探讨统计机器翻译、情感分析、语音识别、机器对话系统以及深度学习在NLP中的应用,这些都是近年来NLP领域的前沿研究方向。 第二版的教材没有包含第三版可能更新的内容,但仍提供了丰富的实践案例和理论框架,适合对NLP有深入研究需求的学者、工程师和研究人员。作者强调,未经许可不得擅自引用书中的内容,体现了学术严谨性。 阅读这本书,不仅有助于理解自然语言处理的基本原理,还能掌握如何将这些理论应用于实际场景,推动人机交互和信息处理的科技进步。无论是希望在这个领域发展职业生涯的专业人士,还是对人工智能技术感兴趣的业余爱好者,这都是一本不可或缺的参考书籍。
资源详情
资源推荐
![](https://csdnimg.cn/release/download_crawler_static/11061856/bg10.jpg)
16 Chapter 1. Introduction
gence (IJCAI) meetings. Artif cial intelligence journals that periodically feature work
on speech and language processing include Machine Learning, Journal of Machine
Learning Research, and the Journal of Artificial Intelligence Research.
There are a fair number of textbooks available covering various aspects of speech
and language processing. Manning and Sch¨utze (1999) (Foundations of Statistical Lan-
guage Processing) focuses on statistical models of tagging, parsing, disambiguation,
collocations, and other areas. Charniak (1993) (Statistical Language Learning) is an
accessible, though older and less-extensive, introduction to similar material. Manning
et al. (2008) focuses on information retrieval, text classif cation, and clustering. NLTK,
the Natural Language Toolkit (Bird and Loper, 2004), is a suite of Python modules
and data for natural language processing, together with a Natural Language Process-
ing book based on the NLTK suite. Allen (1995) (Natural Language Understanding)
provides extensive coverage of language processing from the AI perspective. Gazdar
and Mellish (1989) (Natural Language Processing in Lisp/Prolog) covers especially
automata, parsing, features, and unif cation and is available free online. Pereira and
Shieber (1987) gives a Prolog-based introduction to parsing and interpretation. Russell
and Norvig (2002) is an introduction to artif cial intelligence that includes chapters on
natural language processing. Partee et al. (1990) has a very broad coverage of mathe-
matical linguistics. A historically signif cant collection of foundational papers can be
found in Grosz et al. (1986) (Readings in Natural Language Processing).
Of course, a wide-variety of speech and language processing resources are now
available on the Web. Pointers to these resources are maintained on the home-page for
this book at:
http://www.cs.colorado.edu/˜martin/slp.html.
![](https://csdnimg.cn/release/download_crawler_static/11061856/bg11.jpg)
Section 1.7. Summary 17
Allen, J. (1995). Natural Language Understanding. Benjamin
Cummings, Menlo Park, CA.
Backus, J. W. (1959). The syntax and semantics of the proposed
international algebraic language of the Zurch ACM-GAMM
Conference. In Information Processing: Proceedings of the
International Conference on Information Processing, Paris,
pp. 125–132. UNESCO.
Berger, A., Della Pietra, S. A., and Della Pietra, V. J. (1996). A
maximum entropy approach to natural language processing.
Computational Linguistics, 22(1), 39–71.
Bird, S. and Loper, E. (2004). NLTK: The Natural Language
Toolkit. In Proceedings of the ACL 2004 demonstration ses-
sion, Barcelona, Spain, pp. 214–217.
Bledsoe, W. W. and Browning, I. (1959). Pattern recognition
and reading by machine. In 1959 Proceedings of the Eastern
Joint Computer Conference, pp. 225–232. Academic, New
York.
Bresnan, J. and Kaplan, R. M. (1982). Introduction: Grammars
as mental representations of language. In Bresnan, J. (Ed.),
The Mental Representation of Grammatical Relations. MIT
Press, Cambridge, MA.
Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J.,
Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S.
(1990). A statistical approach to machine translation. Com-
putational Linguistics, 16(2), 79–85.
Carlson, L., Marcu, D., and Okurowski, M. E. (2001). Build-
ing a discourse-tagged corpus in the framework of rhetorical
structure theory. In Proceedings of SIGDIAL.
Charniak, E. (1993). Statistical Language Learning. MIT Press.
Chomsky, N. (1956). Three models for the description of lan-
guage. IRI Transactions on Information Theory, 2(3), 113–
124.
Chomsky, N. (1959). A review of B. F. Skinner’s “Verbal Be-
havior”. Language, 35, 26–58.
Church, K. W. (1980). On memory limitations in natural lan-
guage processing. Master’s thesis, MIT. Distributed by the
Indiana University Linguistics Club.
Cohen, P. R. and Perrault, C. R. (1979). Elements of a plan-
based theory of speech acts. Cognitive Science, 3(3), 177–
212.
Colmerauer, A. (1970). Les syst`emes-q ou un formalisme pour
analyser et synth´etiser des phrase sur ordinateur. Internal pub-
lication 43, D´epartement d’informatique de l’Universit´e de
Montr´eal†.
Colmerauer, A. (1975). Les grammaires de m´etamorphose GIA.
Internal publication, Groupe Intelligence artif cielle, Facult´e
des Sciences de Luminy, Universit´e Aix-Marseille II, France,
Nov 1975. English version, Metamorphosis grammars. In L.
Bolc, (Ed.), Natural Language Communication with Comput-
ers, Lecture Notes in Computer Science 63, Springer Verlag,
Berlin, 1978, pp. 133–189.
Cullingford, R. E. (1981). SAM. In Schank, R. C. and Riesbeck,
C. K. (Eds.), Inside Computer Understanding: Five Programs
plus Miniatures, pp. 75–119. Lawrence Erlbaum, Hillsdale,
NJ.
Davis, K. H., Biddulph, R., and Balashek, S. (1952). Automatic
recognition of spoken digits. Journal of the Acoustical Society
of America, 24(6), 637–642.
Dejean, H. and Tjong Kim Sang, E. F. (2001). Introduction to
the CoNLL-2001 shared task: Clause identif cation. In Pro-
ceedings of CoNLL-2001.
Fillmore, C. J. (1968). The case for case. In Bach, E. W. and
Harms, R. T. (Eds.), Universals in Linguistic Theory, pp. 1–
88. Holt, Rinehart & Winston, New York.
Francis, W. N. (1979). A tagged corpus – problems and
prospects. In Greenbaum, S., Leech, G., and Svartvik, J.
(Eds.), Studies in English linguistics for Randolph Quirk, pp.
192–209. Longman, London and New York.
Francis, W. N. and Kuˇcera, H. (1982). Frequency Analysis of
English Usage. Houghton Miff in, Boston.
Gazdar, G. and Mellish, C. (1989). Natural Language Process-
ing in LISP. Addison Wesley.
Grosz, B. J. (1977). The representation and use of focus in a
system for understanding dialogs. In IJCAI-77, Cambridge,
MA, pp. 67–76. Morgan Kaufmann. Reprinted in Grosz et al.
(1986).
Grosz, B. J., Jones, K. S., and Webber, B. L. (Eds.). (1986).
Readings in Natural Language Processing. Morgan Kauf-
mann, Los Altos, Calif.
Hajiˇc, J. (1998). Building a Syntactically Annotated Corpus:
The Prague Dependency Treebank, pp. 106–132. Karolinum,
Prague/Praha.
Harris, Z. S. (1962). String Analysis of Sentence Structure.
Mouton, The Hague.
Hobbs, J. R. (1978). Resolving pronoun references. Lingua,
44, 311–338. Reprinted in Grosz et al. (1986).
Joshi, A. K. and Hopely, P. (1999). A parser from antiquity. In
Kornai, A. (Ed.), Extended Finite State Models of Language,
pp. 6–15. Cambridge University Press, Cambridge.
Kaplan, R. M. and Kay, M. (1981). Phonological rules and
f nite-state transducers. Paper presented at the Annual meet-
ing of the Linguistics Society of America. New York.
Karttunen, L. (1999). Comments on Joshi. In Kornai, A. (Ed.),
Extended Finite State Models of Language, pp. 16–18. Cam-
bridge University Press, Cambridge.
Kay, M. (1979). Functional grammar. In BLS-79, Berkeley, CA,
pp. 142–158.
Kilgarriff, A. and Palmer, M. (Eds.). (2000). Computing and the
Humanities: Special Issue on SENSEVAL, Vol. 34. Kluwer.
Kintsch, W. (1974). The Representation of Meaning in Memory.
Wiley, New York.
Kleene, S. C. (1951). Representation of events in nerve nets
and f nite automata. Tech. rep. RM-704, RAND Corporation.
RAND Research Memorandum†.
![](https://csdnimg.cn/release/download_crawler_static/11061856/bg12.jpg)
18 Chapter 1. Introduction
Kleene, S. C. (1956). Representation of events in nerve nets and
f nite automata. In Shannon, C. and McCarthy, J. (Eds.), Au-
tomata Studies, pp. 3–41. Princeton University Press, Prince-
ton, NJ.
Koenig, W., Dunn, H. K., Y., L., and Lacy (1946). The sound
spectrograph. Journal of the Acoustical Society of America,
18, 19–49.
Kuˇcera, H. and Francis, W. N. (1967). Computational analy-
sis of present-day American English. Brown University Press,
Providence, RI.
Lehnert, W. G. (1977). A conceptual theory of question an-
swering. In IJCAI-77, Cambridge, MA, pp. 158–164. Morgan
Kaufmann.
Manning, C. D., Raghavan, P., and Sch¨utze, H. (2008). In-
troduction to Information Retrieval. Cambridge University
Press, Cambridge, UK.
Manning, C. D. and Sch¨utze, H. (1999). Foundations of Statis-
tical Natural Language Processing. MIT Press, Cambridge,
MA.
Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A. (1993).
Building a large annotated corpus of English: The Penn tree-
bank. Computational Linguistics, 19(2), 313–330.
McCulloch, W. S. and Pitts, W. (1943). A logical calculus of
ideas immanent in nervous activity. Bulletin of Mathematical
Biophysics, 5, 115–133. Reprinted in Neurocomputing: Foun-
dations of Research, ed. by J. A. Anderson and E Rosenfeld.
MIT Press 1988.
Merton, R. K. (1961). Singletons and multiples in scientif c dis-
covery. American Philosophical Society Proceedings, 105(5),
470–486.
Miltsakaki, E., Prasad, R., Joshi, A. K., and Webber, B. L.
(2004). The Penn Discourse Treebank. In LREC-04.
Mosteller, F. and Wallace, D. L. (1964). Inference and Disputed
Authorship: The Federalist. Springer-Verlag, New York. 2nd
Edition appeared in 1984 and was called Applied Bayesian
and Classical Inference.
Naur, P., Backus, J. W., Bauer, F. L., Green, J., Katz, C.,
McCarthy, J., Perlis, A. J., Rutishauser, H., Samelson, K.,
Vauquois, B., Wegstein, J. H., van Wijnagaarden, A., and
Woodger, M. (1960). Report on the algorithmic language AL-
GOL 60. Communications of the ACM, 3(5), 299–314. Re-
vised in CACM 6:1, 1-17, 1963.
Norman, D. A. and Rumelhart, D. E. (1975). Explorations in
Cognition. Freeman, San Francisco, CA.
Och, F. J. and Ney, H. (2003). A systematic comparison of var-
ious statistical alignment models. Computational Linguistics,
29(1), 19–51.
Ogburn, W. F. and Thomas, D. S. (1922). Are inventions in-
evitable? A note on social evolution. Political Science Quar-
terly, 37, 83–98.
Palmer, M., Fellbaum, C., Cotton, S., Delfs, L., and Dang,
H. T. (2001). English tasks: All-words and verb lexical sam-
ple. In Proceedings of SENSEVAL-2: Second International
Workshop on Evaluating Word Sense Disambiguation Sys-
tems, Toulouse, France.
Palmer, M., Kingsbury, P., and Gildea, D. (2005). The proposi-
tion bank: An annotated corpus of semantic roles.. Computa-
tional Linguistics, 31(1), 71–106.
Partee, B. H., ter Meulen, A., and Wall, R. E. (1990). Mathe-
matical Methods in Linguistics. Kluwer, Dordrecht.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference. Morgan Kaufman, San Ma-
teo, Ca.
Pereira, F. C. N. and Shieber, S. M. (1987). Prolog and Natural-
Language Analysis, Vol. 10 of CSLI Lecture Notes. Chicago
University Press, Chicago.
Pereira, F. C. N. and Warren, D. H. D. (1980). Def nite clause
grammars for language analysis— a survey of the formalism
and a comparison with augmented transition networks. Artifi-
cial Intelligence, 13(3), 231–278.
Perrault, C. R. and Allen, J. (1980). A plan-based analysis of
indirect speech acts. American Journal of Computational Lin-
guistics, 6(3-4), 167–182.
Quillian, M. R. (1968). Semantic memory. In Minsky, M. (Ed.),
Semantic Information Processing, pp. 227–270. MIT Press,
Cambridge, MA.
Rabiner, L. R. and Juang, B. (1993). Fundamentals of Speech
Recognition. Prentice Hall, Englewood Cliffs, NJ.
Reeves, B. and Nass, C. (1996). The Media Equation: How
People TreatComputers, Television, and New Media Like Real
People and Places. Cambridge University Press, Cambridge.
Russell, S. and Norvig, P. (2002). Artificial Intelligence: A
Modern Approach. Prentice Hall, Englewood Cliffs, NJ. Sec-
ond edition.
Schank, R. C. (1972). Conceptual dependency: A theory of nat-
ural language processing. Cognitive Psychology, 3, 552–631.
Schank, R. C. and Albelson, R. P. (1977). Scripts, Plans, Goals
and Understanding. Lawrence Erlbaum, Hillsdale, NJ.
Schank, R. C. and Riesbeck, C. K. (Eds.). (1981). Inside
Computer Understanding: Five Programs plus Miniatures.
Lawrence Erlbaum, Hillsdale, NJ.
Searle, J. R. (1980). Minds, brains, and programs. Behavioral
and Brain Sciences, 3, 417–457.
Shannon, C. E. (1948). A mathematical theory of communica-
tion. Bell System Technical Journal, 27(3), 379–423. Contin-
ued in following volume.
Shieber, S. M. (1994). Lessons from a restricted Turing test.
Communications of the ACM, 37(6), 70–78.
Sidner, C. L. (1983). Focusing in the comprehension of def nite
anaphora. In Brady, M. and Berwick, R. C. (Eds.), Compu-
tational Models of Discourse, pp. 267–330. MIT Press, Cam-
bridge, MA.
Simmons, R. F. (1973). Semantic networks: Their computa-
tion and use for understanding English sentences. In Schank,
R. C. and Colby, K. M. (Eds.), Computer Models of Thought
and Language, pp. 61–113. W.H. Freeman and Co., San Fran-
cisco.
![](https://csdnimg.cn/release/download_crawler_static/11061856/bg13.jpg)
Section 1.7. Summary 19
Turing, A. M. (1936). On computable numbers, with an ap-
plication to the Entscheidungsproblem. Proceedings of the
London Mathematical Society, 42, 230–265. Read to the So-
ciety in 1936, but published in 1937. Correction in volume 43,
544–546.
Turing, A. M. (1950). Computing machinery and intelligence.
Mind, 59, 433–460.
Vapnik, V. N. (1995). The Nature of Statistical Learning The-
ory. Springer-Verlag, New York.
Voorhees, E. M. and Tice, D. M. (1999). The TREC-8 ques-
tion answering track evaluation. Proceedings of the TREC-8
Workshop.
Weizenbaum, J. (1966). ELIZA – A computer program for the
study of natural language communication between man and
machine. Communications of the ACM, 9(1), 36–45.
Weizenbaum, J. (1976). Computer Power and Human Reason:
From Judgement to Calculation. W.H. Freeman and Company,
San Francisco.
Wilensky, R. (1983). Planning and Understanding. Addison-
Wesley, Reading, MA.
Wilks, Y. (1975a). Preference semantics. In Keenan, E. L.
(Ed.), The Formal Semantics of Natural Language, pp. 329–
350. Cambridge Univ. Press, Cambridge.
Wilks, Y. (1975b). A preferential, pattern-seeking, semantics
for natural language inference. Artificial Intelligence, 6(1),
53–74.
Winograd, T. (1972). Understanding natural language. Cogni-
tive Psychology, 3(1), 1–191. Reprinted as a book by Aca-
demic Press, 1972.
Woods, W. A. (1967). Semantics for a Question-Answering
System. Ph.D. thesis, Harvard University.
Woods, W. A. (1973). Progress in natural language understand-
ing. In Proceedings of AFIPS National Conference, pp. 441–
450.
![](https://csdnimg.cn/release/download_crawler_static/11061856/bg14.jpg)
Speech and Language Processing: An introduction to natural language processing,
computational linguistics, and speech recognition. Daniel Jurafsky & James H. Martin.
Copyright
c
2007, All rights reserved. Draft of October 11, 2007. Do not cite
without permission.
2
REGULAR EXPRESSIONS
AND AUTOMATA
User
1
: Men are all alike.
ELIZA
1
: IN WHAT WAY
User
2
: They’re always bugging us about something or other.
ELIZA
2
: CAN YOU THINK OF A SPECIFIC EXAMPLE
User
3
: Well, my boyfriend made me come here.
ELIZA
3
: YOUR BOYFRIEND MADE YOU COME HERE
User
4
: He says I’m depressed much of the time.
ELIZA
4
: I AM SORRY TO HEAR YOU ARE DEPRESSED.
Weizenbaum (1966)
Imagine that you have become a passionate fan of woodchucks. Desiring more in-
formation on this celebrated woodland creature, you turn to your favorite Web browser
and type in woodchuck. Your browser returns a few sites. You have a f ash of inspira-
tion and type in woodchucks. This time you discover “interesting links to woodchucks
and lemurs” and “all about Vermont’s unique, endangered species”. Instead of having
to do this search twice, you would have rather typed one search command specify-
ing something like woodchuck with an optional final s. Or perhaps you might want
to search for all the prices in some document; you might want to see all strings that
look like
$
199 or
$
25 or
$
24.99. In this chapter we introduce the regular expression,
the standard notation for characterizing text sequences. The regular expression is used
for specifying text strings in situations like this Web-search example, and in other in-
formation retrieval applications, but also plays an important role in word-processing,
computation of frequencies from corpora, and other such tasks.
After we have def ned regular expressions, we show how they can be implemented
via the finite-state automaton. The f nite-state automaton is not only the mathemati-
cal device used to implement regular expressions, but also one of the most signif cant
tools of computational linguistics. Variations of automata such as f nite-state trans-
ducers, Hidden Markov Models, and N-gram grammars are important components of
applications that we will introduce in later chapters, including speech recognition and
synthesis, machine translation, spell-checking, and information-extraction.
剩余1043页未读,继续阅读
![application/x-rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://profile-avatar.csdnimg.cn/5fe0dde5dfab4e4a8ffe65aff6305076_lyiang001.jpg!1)
Violety-Lee
- 粉丝: 1
- 资源: 19
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![](https://csdnimg.cn/release/wenkucmsfe/public/img/voice.245cc511.png)
会员权益专享
最新资源
- 共轴极紫外投影光刻物镜设计研究
- 基于GIS的通信管线管理系统构建与音视频编解码技术应用
- 单站被动目标跟踪算法:空频域信息下的深度研究与进展
- 构建通信企业工程项目的项目管理成熟度模型:理论与应用
- 基于控制理论的主动队列管理算法与稳定性分析
- 谷歌文件系统下的实用网络编码技术在分布式存储中的应用
- CMOS图像传感器快门特性与运动物体测量研究
- 深孔采矿研究:3D数据库在采场损失与稳定性控制中的应用
- 《洛神赋图》图像研究:明清以来的艺术价值与历史意义
- 故宫藏《洛神赋图》图像研究:明清艺术价值与审美的飞跃
- 分布式视频编码:无反馈通道算法与复杂运动场景优化
- 混沌信号的研究:产生、处理与通信系统应用
- 基于累加器的DSP数据通路内建自测试技术研究
- 跨国媒体对南亚农村社会的影响:以斯里兰卡案例的社会学分析
- 散单元法与CFD结合模拟气力输送研究
- 基于粒化机理的粗糙特征选择算法:海量数据高效处理研究
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035111.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![](https://csdnimg.cn/release/wenkucmsfe/public/img/green-success.6a4acb44.png)