In Natural Language Processing there is a concept known as Sentiment Analysis.
Given a movie review or a tweet, it can be automatically classified in categories. These categories can be user defined (positive, negative) or whichever classes you want.
Classification is done using several steps: training and prediction.
The training phase needs to have training data, this is example data in which we define examples. The classifier will use the training data to make predictions.
sentiment analysis, example runs
We start by defining 3 classes: positive, negative and neutral. Each of these is defined by a vocabulary:
Every word is converted into a feature using a simplified bag of words model:
defword_feats(words): returndict([(word, True) for word in words])
positive_features = [(word_feats(pos), 'pos') for pos in positive_vocab] negative_features = [(word_feats(neg), 'neg') for neg in negative_vocab] neutral_features = [(word_feats(neu), 'neu') for neu in neutral_vocab]
Our training set is then the sum of these three feature sets:
positive_features = [(word_feats(pos), 'pos') for pos in positive_vocab] negative_features = [(word_feats(neg), 'neg') for neg in negative_vocab] neutral_features = [(word_feats(neu), 'neu') for neu in neutral_vocab]
To enter the input sentence manually, use the input or raw_input functions. The better your training data is, the more accurate your predictions. In this example our training data is very small.
Training sets There are many training sets available:
Leave a Reply: