python logo

nltk stemming


Python hosting: Host, run, and code Python in the cloud!

A word stem represents the base or root form of a word, serving as a vital concept in the field of linguistics. In simpler terms, it is a method to normalize words.
For instance, the word “waiting” has the stem “wait.”
word stem

With the help of NLTK (Natural Language Toolkit), you can effortlessly identify word stems. This toolkit is particularly beneficial for text normalization processes.

Related course: Easy Natural Language Processing (NLP) in Python

Understanding Stemming in NLTK

To demonstrate stemming, let’s consider a set of related words:

1
words = ["game","gaming","gamed","games"]

First, it’s crucial to import the required modules from NLTK:

1
2
from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize

Using the above modules, you can stem the words in the provided list:

1
2
3
ps = PorterStemmer()
for word in words:
print(ps.stem(word))

nltk word stem example

Stemming can also be extended to entire sentences. Here’s how:

1
2
3
4
sentence = "gaming, the gamers play games"
words = word_tokenize(sentence)
for word in words:
print(word + ":" + ps.stem(word))

Stemming with NLTK

While several stemming algorithms exist, the Porter Stemmer (PorterStemmer) remains one of the most widely used and recognized.

Previous Topic | Next Topic






Leave a Reply:




Itsthanga Thu, 16 Mar 2017

I tried with the word identifying i am getting as output identifi