1.3. From Linguistics to Natural Language Processing 4
— Pages 4-5, Computational Linguistics: An Introduction, 1986.
Large data and fast computers mean that new and different things can be discovered from
large datasets of text by writing and running software. In the 1990s, statistical methods and
statistical machine learning began to and eventually replaced the classical top-down rule-based
approaches to language, primarily because of their better results, speed, and robustness. The
statistical approach to studying natural language now dominates the field; it may define the
field.
Data-Driven methods for natural language processing have now become so popular
that they must be considered mainstream approaches to computational linguistics.
... A strong contributing factor to this development is undoubtedly the increase
amount of available electronically stored data to which these methods can be applied;
another factor might be a certain disenchantment with approaches relying exclusively
on hand-crafted rules, due to their observed brittleness.
— Page 358, The Oxford Handbook of Computational Linguistics, 2005.
The statistical approach to natural language is not limited to statistics per-se, but also to
advanced inference methods like those used in applied machine learning.
... understanding natural language require large amounts of knowledge about
morphology, syntax, semantics and pragmatics as well as general knowledge about
the world. Acquiring and encoding all of this knowledge is one of the fundamental
impediments to developing effective and robust language systems. Like the statistical
methods ... machine learning methods off the promise of te automatic acquisition of
this knowledge from annotated or unannotated language corpora.
— Page 377, The Oxford Handbook of Computational Linguistics, 2005.
1.3.3 Statistical Natural Language Processing
Computational linguistics also became known by the name of natural language process, or
NLP, to reflect the more engineer-based or empirical approach of the statistical methods. The
statistical dominance of the field also often leads to NLP being described as Statistical Natural
Language Processing, perhaps to distance it from the classical computational linguistics methods.
I view computational linguistics as having both a scientific and an engineering side.
The engineering side of computational linguistics, often called natural language
processing (NLP), is largely concerned with building computational tools that do
useful things with language, e.g., machine translation, summarization, question-
answering, etc. Like any engineering discipline, natural language processing draws
on a variety of different scientific disciplines.
— How the statistical revolution changes (computational) linguistics, 2009.
Linguistics is a large topic of study, and, although the statistical approach to NLP has shown
great success in some areas, there is still room and great benefit from the classical top-down
methods.