NLTK – stemming


A word stem is part of a word. It is sort of a normalization idea, but linguistic.
For example, the stem of the word waiting is wait.

word-stem
word stem

Given words, NLTK can find the stems.

NLTK – stemming
Start by defining some words:

words = ["game","gaming","gamed","games"]

We import the module:

from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize

And stem the words in the list using:

from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
 
words = ["game","gaming","gamed","games"]
ps = PorterStemmer()
 
for word in words:
    print(ps.stem(word))
nltk-stemming
nltk word stem example

You can do word stemming for sentences too:

from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
 
ps = PorterStemmer()
 
sentence = "gaming, the gamers play games"
words = word_tokenize(sentence)
 
for word in words:
    print(word + ":" + ps.stem(word))
python-nltk
Stemming with NLTK

There are more stemming algorithms, but Porter (PorterStemer) is the most popular.