NLTK – stemming


A word stem is part of a word. It is sort of a normalization idea, but linguistic.
For example, the stem of the word waiting is wait.

word-stem

word stem

Given words, NLTK can find the stems.

Related course
Easy Natural Language Processing (NLP) in Python

NLTK – stemming
Start by defining some words:

words = ["game","gaming","gamed","games"]

We import the module:

from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize

And stem the words in the list using:

from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
 
words = ["game","gaming","gamed","games"]
ps = PorterStemmer()
 
for word in words:
    print(ps.stem(word))
nltk-stemming

nltk word stem example

You can do word stemming for sentences too:

from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
 
ps = PorterStemmer()
 
sentence = "gaming, the gamers play games"
words = word_tokenize(sentence)
 
for word in words:
    print(word + ":" + ps.stem(word))
python-nltk

Stemming with NLTK

There are more stemming algorithms, but Porter (PorterStemer) is the most popular.

NLTK stop words
NLTK speech tagging
This entry was posted in NLTK. Bookmark the permalink.

One Response to NLTK – stemming

  1. Itsthanga says:

    I tried with the word identifying i am getting as output identifi