- Browse
- » Getting started with natural language processing
Getting started with natural language processing
Author
Publisher
Varies, see individual formats and editions
Publication Date
[2022]
Language
English
Description
Loading Description...
More Details
ISBN
9781617296765
Table of Contents
From the eBook
Intro
inside front cover
Getting Started with Natural Language Processing
Copyright
dedication
contents
front matter
preface
acknowledgments
about this book
Who should read this book
How this book is organized: A road map
About the code
liveBook discussion forum
Other online resources
about the author
about the cover illustration
1 Introduction
1.1 A brief history of NLP
1.2 Typical tasks
1.2.1 Information search
1.2.2 Advanced information search: Asking the machine precise questions
1.2.3 Conversational agents and intelligent virtual assistants
1.2.4 Text prediction and language generation
1.2.5 Spam filtering
1.2.6 Machine translation
1.2.7 Spell- and grammar checking
Summary
Solution to exercise 1.1
2 Your first NLP example
2.1 Introducing NLP in practice: Spam filtering
2.2 Understanding the task
2.2.1 Step 1: Define the data and classes
2.2.2 Step 2: Split the text into words
2.2.3 Step 3: Extract and normalize the features
2.2.4 Step 4: Train a classifier
2.2.5 Step 5: Evaluate the classifier
2.3 Implementing your own spam filter
2.3.1 Step 1: Define the data and classes
2.3.2 Step 2: Split the text into words
2.3.3 Step 3: Extract and normalize the features
2.3.4 Step 4: Train the classifier
2.3.5 Step 5: Evaluate your classifier
2.4 Deploying your spam filter in practice
Summary
Solutions to miscellaneous exercises
3 Introduction to information search
3.1 Understanding the task
3.1.1 Data and data structures
3.1.2 Boolean search algorithm
3.2 Processing the data further
3.2.1 Preselecting the words that matter: Stopwords removal
3.2.2 Matching forms of the same word: Morphological processing
3.3 Information weighing
3.3.1 Weighing words with term frequency.
3.3.2 Weighing words with inverse document frequency
3.4 Practical use of the search algorithm
3.4.1 Retrieval of the most similar documents
3.4.2 Evaluation of the results
3.4.3 Deploying search algorithm in practice
Summary
Solutions to miscellaneous exercises
4 Information extraction
4.1 Use cases
4.1.1 Case 1
4.1.2 Case 2
4.1.3 Case 3
4.2 Understanding the task
4.3 Detecting word types with part-of-speech tagging
4.3.1 Understanding word types
4.3.2 Part-of-speech tagging with spaCy
4.4 Understanding sentence structure with syntactic parsing
4.4.1 Why sentence structure is important
4.4.2 Dependency parsing with spaCy
4.5 Building your own information extraction algorithm
Summary
Solutions to miscellaneous exercises
5 Author profiling as a machine-learning task
5.1 Understanding the task
5.1.1 Case 1: Authorship attribution
5.1.2 Case 2: User profiling
5.2 Machine-learning pipeline at first glance
5.2.1 Original data
5.2.2 Testing generalization behavior
5.2.3 Setting up the benchmark
5.3 A closer look at the machine-learning pipeline
5.3.1 Decision Trees classifier basics
5.3.2 Evaluating which tree is better using node impurity
5.3.3 Selection of the best split in Decision Trees
5.3.4 Decision Trees on language data
Summary
Solutions to miscellaneous exercises
6 Linguistic feature engineering for author profiling
6.1 Another close look at the machine-learning pipeline
6.1.1 Evaluating the performance of your classifier
6.1.2 Further evaluation measures
6.2 Feature engineering for authorship attribution
6.2.1 Word and sentence length statistics as features
6.2.2 Counts of stopwords and proportion of stopwords as features
6.2.3 Distributions of parts of speech as features.
6.2.4 Distribution of word suffixes as features
6.2.5 Unique words as features
6.3 Practical use of authorship attribution and user profiling
Summary
7 Your first sentiment analyzer using sentiment lexicons
7.1 Use cases
7.2 Understanding your task
7.2.1 Aggregating sentiment score with the help of a lexicon
7.2.2 Learning to detect sentiment in a data-driven way
7.3 Setting up the pipeline: Data loading and analysis
7.3.1 Data loading and preprocessing
7.3.2 A closer look into the data
7.4 Aggregating sentiment scores with a sentiment lexicon
7.4.1 Collecting sentiment scores from a lexicon
7.4.2 Applying sentiment scores to detect review polarity
Summary
Solutions to exercises
8 Sentiment analysis with a data-driven approach
8.1 Addressing multiple senses of a word with SentiWordNet
8.2 Addressing dependence on context with machine learning
8.2.1 Data preparation
8.2.2 Extracting features from text
8.2.3 Scikit-learn's machine-learning pipeline
8.2.4 Full-scale evaluation with cross-validation
8.3 Varying the length of the sentiment-bearing features
8.4 Negation handling for sentiment analysis
8.5 Further practice
Summary
9 Topic analysis
9.1 Topic classification as a supervised machine-learning task
9.1.1 Data
9.1.2 Topic classification with Naïve Bayes
9.1.3 Evaluation of the results
9.2 Topic discovery as an unsupervised machine-learning task
9.2.1 Unsupervised ML approaches
9.2.2 Clustering for topic discovery
9.2.3 Evaluation of the topic clustering algorithm
Summary
Solutions to miscellaneous exercises
10 Topic modeling
10.1 Topic modeling with latent Dirichlet allocation
10.1.1 Exercise 10.1: Question 1 solution
10.1.2 Exercise 10.1: Question 2 solution
10.1.3 Estimating parameters for the LDA.
10.1.4 LDA as a generative model
10.2 Implementation of the topic modeling algorithm
10.2.1 Loading the data
10.2.2 Preprocessing the data
10.2.3 Applying the LDA model
10.2.4 Exploring the results
Summary
Solutions to miscellaneous exercises
11 Named-entity recognition
11.1 Named entity recognition: Definitions and challenges
11.1.1 Named entity types
11.1.2 Challenges in named entity recognition
11.2 Named-entity recognition as a sequence labeling task
11.2.1 The basics: BIO scheme
11.2.2 What does it mean for a task to be sequential?
11.2.3 Sequential solution for NER
11.3 Practical applications of NER
11.3.1 Data loading and exploration
11.3.2 Named entity types exploration with spaCy
11.3.3 Information extraction revisited
11.3.4 Named entities visualization
Summary
Conclusion
Solutions to miscellaneous exercises
Appendix A Installation instructions
index
inside back cover.
Excerpt
Loading Excerpt...
Author Notes
Loading Author Notes...
Reviews from GoodReads
Loading GoodReads Reviews.
Staff View
Loading Staff View.