Reading about Python? Actually practice it. Try PyChallenge free

Python Tutorial

Natural Language Processing - prediction

Natural Language Processing with PythonWe can use natural language processing to make predictions.

Example: Given a product review, a computer can predict if its positive or negative based on the text.

In this article you will learn how to make a prediction program based on natural language processing.

nlp prediction example

Given a name, the classifier will predict if it's a male or female.

To create our analysis program, we have several steps:

  • Data preparation
  • Feature extraction
  • Training
  • Prediction

Data preparation The first step is to prepare data. We use the names set included with nltk.

from nltk.corpus import names

# Load data and training names = ([(name, 'male') for name in names.words('male.txt')] + [(name, 'female') for name in names.words('female.txt')])

This dataset is simply a collection of tuples. To give you an idea of what the dataset looks like:

[(u'Aaron', 'male'), (u'Abbey', 'male'), (u'Abbie', 'male')]
[(u'Zorana', 'female'), (u'Zorina', 'female'), (u'Zorine', 'female')]

You can define your own set of tuples if you wish, its simply a list containing many tuples.

Feature extraction Based on the dataset, we prepare our feature. The feature we will use is the last letter of a name: We define a featureset using:

featuresets = [(gender_features(n), g) for (n,g) in names]

and the features (last letters) are extracted using:

def gender_features(word): 
    return {'last_letter': word[-1]}

Training and prediction We train and predict using:

classifier = nltk.NaiveBayesClassifier.train(train_set)

# Predict print(classifier.classify(gender_features('Frank')))

Example A classifier has a training and a test phrase.

import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import names

def gender_features(word): return {'last_letter': word[-1]}

# Load data and training names = ([(name, 'male') for name in names.words('male.txt')] + [(name, 'female') for name in names.words('female.txt')])

featuresets = [(gender_features(n), g) for (n,g) in names] train_set = featuresets classifier = nltk.NaiveBayesClassifier.train(train_set)

# Predict print(classifier.classify(gender_features('Frank')))

If you want to give the name during runtime, change the last line to:

# Predict
name = input("Name: ")
print(classifier.classify(gender_features(name)))

For Python 2, use raw_input.

BackNext

Practice
Stop reading. Start writing Python.
PyChallenge gives you interactive exercises in your browser — no install needed.
Practice Python with interactive exercises