Natural Language Processing - prediction
Example: Given a product review, a computer can predict if its positive or negative based on the text.
In this article you will learn how to make a prediction program based on natural language processing.
nlp prediction example
Given a name, the classifier will predict if it's a male or female.To create our analysis program, we have several steps:
- Data preparation
- Feature extraction
- Training
- Prediction
Data preparation The first step is to prepare data. We use the names set included with nltk.
from nltk.corpus import names
# Load data and training
names = ([(name, 'male') for name in names.words('male.txt')] +
[(name, 'female') for name in names.words('female.txt')])
This dataset is simply a collection of tuples. To give you an idea of what the dataset looks like:
[(u'Aaron', 'male'), (u'Abbey', 'male'), (u'Abbie', 'male')]
[(u'Zorana', 'female'), (u'Zorina', 'female'), (u'Zorine', 'female')]
You can define your own set of tuples if you wish, its simply a list containing many tuples.
Feature extraction Based on the dataset, we prepare our feature. The feature we will use is the last letter of a name: We define a featureset using:
featuresets = [(gender_features(n), g) for (n,g) in names]
and the features (last letters) are extracted using:
def gender_features(word):
return {'last_letter': word[-1]}
Training and prediction We train and predict using:
classifier = nltk.NaiveBayesClassifier.train(train_set)
# Predict
print(classifier.classify(gender_features('Frank')))
Example A classifier has a training and a test phrase.
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import names
def gender_features(word):
return {'last_letter': word[-1]}
# Load data and training
names = ([(name, 'male') for name in names.words('male.txt')] +
[(name, 'female') for name in names.words('female.txt')])
featuresets = [(gender_features(n), g) for (n,g) in names]
train_set = featuresets
classifier = nltk.NaiveBayesClassifier.train(train_set)
# Predict
print(classifier.classify(gender_features('Frank')))
If you want to give the name during runtime, change the last line to:
# Predict
name = input("Name: ")
print(classifier.classify(gender_features(name)))
For Python 2, use raw_input.