What is it about?

Sentiment classification has gained much attention in big data era. Most existing methods rely on bag-of-words model, which disregard contextual information. In many cases however, the sentiment strength of a word is implicitly associated with its part of speech and context. In this paper, we present a WWE (weighted word embeddings) method that combines word embeddings and part-of-speech (POS) tagging. First, we used a continuous word representations algorithm (Word2Vec) to train a vector model. The algorithm learns the optimal vectors from the context of surrounding words. According to the cosine similarity between the vector of a word and the vectors of seed words, a polarity score of this word can be calculated. The state-of-the-art SyntaxNet was used for POS tagging. We then computed an overall polarity score of the whole sentence by POS weighted polarity scores of words. At the end, majority voting was applied to determine the final polarity. Our experimental results show that the WWE method is performed with promising outcomes. Additionally, the methodology was demonstrated on the 3 Twitter datasets from different domains. The robustness recommends that this method can be applied on other sentiment classification problems or domains. We also compared the performance on various dimensions of the trained models. A higher dimension achieved a better performance.

Featured Image

Read the Original

This page is a summary of: Unlock big data emotions: Weighted word embeddings for sentiment classification, December 2016, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/bigdata.2016.7841056.
You can read the full text:

Read

Contributors

The following have contributed to this page