logo

Natural Language Processing - prediction

NLTK Natural Language Processing with PythonWe can use natural language processing to make predictions.

Example: Given a product review, a computer can predict if its positive or negative based on the text.

In this article you will learn how to make a prediction program based on natural language processing.

Related course: Natural Language Processing with Python

nlp prediction example


Given a name, the classifier will predict if it’s a male or female.

To create our analysis program, we have several steps:


  • Data preparation

  • Feature extraction

  • Training

  • Prediction

Data preparation
The first step is to prepare data.
We use the names set included with nltk.


from nltk.corpus import names

# Load data and training
names = ([(name, 'male') for name in names.words('male.txt')] +
[(name, 'female') for name in names.words('female.txt')])

This dataset is simply a collection of tuples. To give you an idea of what the dataset looks like:


[(u'Aaron', 'male'), (u'Abbey', 'male'), (u'Abbie', 'male')]
[(u'Zorana', 'female'), (u'Zorina', 'female'), (u'Zorine', 'female')]

You can define your own set of tuples if you wish, its simply a list containing many tuples.

Feature extraction
Based on the dataset, we prepare our feature. The feature we will use is the last letter of a name:
We define a featureset using:


featuresets = [(gender_features(n), g) for (n,g) in names]

and the features (last letters) are extracted using:


def gender_features(word):
return {'last_letter': word[-1]}

Training and prediction
We train and predict using:


classifier = nltk.NaiveBayesClassifier.train(train_set)

# Predict
print(classifier.classify(gender_features('Frank')))

Example
A classifier has a training and a test phrase.


import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import names

def gender_features(word):
return {'last_letter': word[-1]}

# Load data and training
names = ([(name, 'male') for name in names.words('male.txt')] +
[(name, 'female') for name in names.words('female.txt')])

featuresets = [(gender_features(n), g) for (n,g) in names]
train_set = featuresets
classifier = nltk.NaiveBayesClassifier.train(train_set)

# Predict
print(classifier.classify(gender_features('Frank')))

If you want to give the name during runtime, change the last line to:


# Predict
name = input("Name: ")
print(classifier.classify(gender_features(name)))

For Python 2, use raw_input.

BackNext

Leave a Reply

Login disabled