Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods

  • Kokil Jaidka, Salvatore Giorgi, H. Andrew Schwartz, Margaret L. Kern, Lyle H. Ungar, Johannes C. Eichstaedt
  • Proceedings of the National Academy of Sciences, April 2020, Proceedings of the National Academy of Sciences
  • DOI: 10.1073/pnas.1906364117

Diagnosing errors in Twitter-based estimates of regional well-being

Photo by John-Mark Smith on Unsplash

Photo by John-Mark Smith on Unsplash

What is it about?

A study of 1.53 billion tweets suggests the removing of “LOL” and other misleading words can improve well-being estimates and monitoring ability.

Why is it important?

Social media posts can help us understand how people are adapting to and coping with the new normal. But our words are useful not just to understand what we – as individuals – think and feel. They’re also useful clues about the community we live in. Why is this so?


Dr. Kokil Jaidka
National University of Singapore

Words like ”lol” confound word-level methods because their contemporary use on social media is out of sync with their emotion scores in typical dictionaries, which interpret it as an expression of happiness. How is internet use evolving the connotations of typically positive or negative words? And, how do these connotations change with culture and region? These are questions that need to be addressed before standard measurements can work as expected to estimate populations, and not merely individuals. Our findings show that the words posted by county residents on social media can offer a signal into their well-being, over and above their socioeconomic markers.

Read Publication

The following have contributed to this page: Dr. Kokil Jaidka