Key language markers of depression on social media depend on race

Proc Natl Acad Sci U S A. 2024 Apr 2;121(14):e2319837121. doi: 10.1073/pnas.2319837121. Epub 2024 Mar 26.

Abstract

Depression has robust natural language correlates and can increasingly be measured in language using predictive models. However, despite evidence that language use varies as a function of individual demographic features (e.g., age, gender), previous work has not systematically examined whether and how depression's association with language varies by race. We examine how race moderates the relationship between language features (i.e., first-person pronouns and negative emotions) from social media posts and self-reported depression, in a matched sample of Black and White English speakers in the United States. Our findings reveal moderating effects of race: While depression severity predicts I-usage in White individuals, it does not in Black individuals. White individuals use more belongingness and self-deprecation-related negative emotions. Machine learning models trained on similar amounts of data to predict depression severity performed poorly when tested on Black individuals, even when they were trained exclusively using the language of Black individuals. In contrast, analogous models tested on White individuals performed relatively well. Our study reveals surprising race-based differences in the expression of depression in natural language and highlights the need to understand these effects better, especially before language-based models for detecting psychological phenomena are integrated into clinical practice.

Keywords: depression; mental health; racial differences; social media.

MeSH terms

  • Depression* / psychology
  • Emotions
  • Humans
  • Language
  • Social Media*
  • United States