From social media to public health surveillance: Word embedding based clustering method for twitter classification

Xiangfeng Dai; Marwan Bikdash; Bradley Meyer

doi:10.1109/secon.2017.7925400

What is it about?

Social media provide a low-cost alternative source for public health surveillance and health-related classification plays an important role to identify useful information. In this paper, we summarized the recent classification methods using social media in public health. These methods rely on bag-of-words (BOW) model and have difficulty grasping the semantic meaning of texts. Unlike these methods, we present a word embedding based clustering method. Word embedding is one of the strongest trends in Natural Language Processing (NLP) at this moment. It learns the optimal vectors from surrounding words and the vectors can represent the semantic information of words. A tweet can be represented as a few vectors and divided into clusters of similar words. According to similarity measures of all the clusters, the tweet can then be classified as related or unrelated to a topic (e.g., influenza). Our simulations show a good performance and the best accuracy achieved was 87.1%. Moreover, the proposed method is unsupervised. It does not require labor to label training data and can be readily extended to other classification problems or other diseases.

Why is it important?

Unlike other approaches in public health, we present a word embedding based clustering method. Word embedding is the one of the strongest trends in Natural Language Processing at this moment. It learns the continuous vector representation of words from context words and the vectors can represent the semantic information of words. A tweet can be represented as a few vectors and divided into clusters of similar words. According to similarity measures of all the clusters, the tweet then can be classified as related or unrelated to a topic (e.g., influenza). Our approach is unsupervised and does not require annotated data.

This page is a summary of: From social media to public health surveillance: Word embedding based clustering method for twitter classification, March 2017, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/secon.2017.7925400.
You can read the full text:

Read

Contributors

The following have contributed to this page

Research Papers

Word embedding based clustering method for twitter classification

What is it about?

Why is it important?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Word embedding based clustering method for twitter classification

What is it about?

Featured Image

Why is it important?

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management