没有合适的资源?快使用搜索试试~ 我知道了~
首页Natural Language Processing Fundamentals
资源详情
资源评论
资源推荐


Sohom Ghosh and Dwight Gunning
Build intelligent applications that can interpret the
human language to deliver impactful results
Natural Language
Processing
Fundamentals

Natural Language Processing Fundamentals
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmitted in any form or by any means, without the prior written permission of the
publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of
the information presented. However, the information contained in this book is sold
without warranty, either express or implied. Neither the author, nor Packt Publishing,
and its dealers and distributors will be held liable for any damages caused or alleged to
be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
Authors: Sohom Ghosh and Dwight Gunning
Reviewer: Ankit Malik
Managing Editor: Bhavesh Bangera
Acquisitions Editor: Koushik Sen
Production Editor: Shantanu Zagade
Editorial Board: David Barnes, Ewan Buckingham, Simon Cox, Shivangi Chatterji,
Manasa Kumar, Alex Mazonowicz, Douglas Paterson, Dominic Pereira, Shiny Poojary,
Saman Siddiqui, Erol Staveley, Ankita Thakur, Mohita Vyas and Jonathan Wray
First Published: March 2019
Production Reference: 1290319
Published by Packt Publishing Ltd.
Livery Place, 35 Livery Street
Birmingham B3 2PB, UK
ISBN: 978-1-78995-404-3

Table of Contents
Preface i
Introduction to Natural Language Processing 1
Introduction .................................................................................................... 2
History of NLP ................................................................................................ 2
Text Analytics and NLP .................................................................................. 3
Exercise 1: Basic Text Analytics .......................................................................... 4
Various Steps in NLP ...................................................................................... 6
Tokenization ......................................................................................................... 7
Exercise 2: Tokenization of a Simple Sentence ................................................ 7
PoS Tagging ........................................................................................................... 8
Exercise 3: PoS Tagging ....................................................................................... 9
Stop Word Removal ........................................................................................... 10
Exercise 4: Stop Word Removal ........................................................................ 10
Text Normalization ............................................................................................ 12
Exercise 5: Text Normalization ......................................................................... 12
Spelling Correction ............................................................................................ 13
Exercise 6: Spelling Correction of a Word and a Sentence ........................... 13
Stemming ............................................................................................................ 15
Exercise 7: Stemming ........................................................................................ 15
Lemmatization ................................................................................................... 17
Exercise 8: Extracting the base word using Lemmatization ......................... 17
NER ...................................................................................................................... 18
Exercise 9: Treating Named Entities ................................................................ 18
Word Sense Disambiguation ............................................................................ 19

Exercise 10: Word Sense Disambiguation ....................................................... 20
Sentence Boundary Detection ......................................................................... 21
Exercise 11: Sentence Boundary Detection .................................................... 21
Activity 1: Preprocessing of Raw Text .............................................................. 22
Kick Starting an NLP Project ....................................................................... 23
Data Collection ................................................................................................... 23
Data Preprocessing ............................................................................................ 23
Feature Extraction ............................................................................................. 24
Model Development .......................................................................................... 24
Model Assessment ............................................................................................. 24
Model Deployment ............................................................................................ 24
Summary ....................................................................................................... 24
Basic Feature Extraction Methods 27
Introduction .................................................................................................. 28
Types of Data ................................................................................................ 28
Categorizing Data Based on Structure ............................................................ 28
Categorization of Data Based on Content ...................................................... 30
Cleaning Text Data ....................................................................................... 31
Tokenization ....................................................................................................... 31
Exercise 12: Text Cleaning and Tokenization ................................................. 31
Exercise 13: Extracting n-grams ....................................................................... 33
Exercise 14: Tokenizing Texts with Dierent Packages – Keras
and TextBlob ....................................................................................................... 36
Types of Tokenizers .......................................................................................... 38
Exercise 15: Tokenizing Text Using Various Tokenizers ................................ 38
Issues with Tokenization ................................................................................... 45
Stemming ............................................................................................................ 45
RegexpStemmer ................................................................................................. 45
剩余373页未读,继续阅读

















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0