Pre-processing text
Remove punctuation
Tokenize
Splitting sentence into words
Lowercase
.lower()
Remove StopWords
Stemming
Leave root word, chopping off suffix
Problems: Meanness, Meaning -> Mean
Explicitly correlates words with similar meanings
PorterStemmer
Lemmatizing
Grouping together inflected form of words, basically has goal of stemming
Lemmatizing is more accurate as uses more informed analysis, but takes more time
If not in wordnet, just leave word
Wordnet lemmatizer
Last updated