Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Sentiment Analysis on Social Media for

Mental Health Support


Checkpoint 1
Team 8

Bryan Vega

Sai Sharan Burugu

Tarun Komirishetty

AIT 726-DL2: NLP with Deep Learning

George Mason University

Dr. Lindi Liao

Mostafa Omidi

February 23, 2024

1
Table of Contents

Introduction.............................................................................................
Related Work...........................................................................................
Objectives................................................................................................
Selected Dataset.....................................................................................
Proposed Solution..................................................................................
Proposed Development Platforms........................................................
References...............................................................................................

2
Introduction
This project focuses on creating a tool that can read through social media posts to find signs
that someone might be struggling with mental health issues like depression or anxiety. With so
many people using social media to share their thoughts and feelings, we have a chance to help
by spotting early signs of trouble. We plan to use advanced computer techniques, known as
Natural Language Processing (NLP), to sort through these posts. Our tool will be smart enough
to understand the emotions behind the words, categorizing them as happy, neutral, or sad. This
is a big challenge because people use a wide variety of ways to express themselves online,
including jokes and slang, making it hard to always understand what they really mean.

The main problem we're addressing is the silent nature of mental health decline. It's often
difficult to notice when someone starts feeling worse, especially online. Our goal is to make a
system that can gently alert us to these signs so that help can be offered sooner. The key
contribution of this work is twofold. Technically, it pushes the boundaries of what computer
programs can understand about human emotions from text. More importantly, on a human level,
it aims to provide a new way to support people's mental health using technology. By creating a
system that can pick up on subtle emotional cues in social media posts, we hope to open up
new possibilities for early support and care, making a difference in the way we look after each
other's mental well-being.

Related Work
1. Mental Health Detection on Social Media Platforms: There have been several
studies focused on identifying markers of mental health conditions such as depression,
anxiety, and stress on platforms like Twitter, Facebook, and Reddit. These studies often
use machine learning and NLP techniques to analyze patterns in language use, posting
frequency, and changes in sentiment over time.

2. Ethical Considerations in AI for Mental Health: It's crucial to consider the ethical
implications of monitoring social media for mental health signals. Research on privacy,
consent, data security, and the potential for misinterpretation or misuse of the
information gathered is important. Look for works discussing the balance between
intervention and privacy, and strategies to ensure ethical use of technology in mental
health care.

3. Psycholinguistics and Mental Health: Research in psycholinguistics can offer insights


into how language changes with mental health conditions. Studies have looked at
patterns such as word choice, sentence structure, and the use of pronouns in relation to
psychological states.

4. Sentiment Analysis and Emotion Recognition: Research in sentiment analysis involves


classifying text into emotions or sentiments. Look into works that specifically focus on

3
detecting nuanced emotional states, such as depression or anxiety, from textual data.
Emotion recognition in text is a closely related field that seeks to identify specific
emotional states from words used in communication.

Objectives
To make our project work, we're going to use some sophisticated computer methods and
algorithms, which are like recipes that tell the computer how to understand human emotions
from text. Specifically, we're going to use two main types of algorithms:

● Long Short-Term Memory (LSTM) Networks: These are a kind of smart algorithm that
can remember information for a long time. They're really good at understanding
sentences and can pick up on the context and meaning over long stretches of text. This
is super helpful for understanding social media posts, where people might talk about
their feelings in complex ways.

● Bidirectional Encoder Representations from Transformers (BERT): This is a more


advanced tool that can read text in both directions (from left to right and right to left) to
get a full picture of what the words mean together. It's like having a really deep
conversation with someone and understanding not just what they said, but how they feel
about it.

For our project to understand social media posts, we need lots of examples. We plan to use
datasets from social media platforms that are available for research. These datasets come from
places like Twitter or Reddit, where people often share their thoughts and feelings. To get these
datasets, we'll use what's called an API (Application Programming Interface), which is a way for
our program to ask the social media platforms for data in a format it can understand.

Here's a bit more detail on what we'll use:

● Twitter API: Twitter is a place where people tweet their thoughts in short messages. We
can use the Twitter API to collect tweets that might show different emotions. This can
include tweets with specific hashtags or keywords related to mental health.

● Reddit API: Reddit has many communities (subreddits) where people discuss a wide
range of topics, including mental health. We can use the Reddit API to gather posts and
comments from these communities for our analysis.

By using these tools and datasets, we hope to teach our computer program how to understand
the wide range of human emotions expressed online. This way, we can spot signs of someone
struggling and potentially offer help or support when it's needed the most.

4
The overall objectives of our project, which aims to develop a sentiment analysis system for
monitoring mental health through social media posts, are as follows:

1. Detect and Analyze Emotional Sentiments: Use advanced NLP algorithms to identify
and categorize the emotional tone of social media posts into positive, neutral, or
negative sentiments, focusing on detecting expressions that may indicate mental health
issues.
2. Implement Advanced NLP Techniques: Apply sophisticated Natural Language
Processing techniques, such as LSTM networks and BERT, to accurately understand
and interpret the nuances and complexities of human language and emotion expressed
in social media content.
3. Leverage Social Media Datasets: Utilize available datasets from social media
platforms like Twitter and Reddit, accessed through their APIs, to train and test the
system on real-world data, ensuring it can handle a diverse range of expressions and
slang used in everyday online communication.
4. Promote Early Intervention for Mental Health: By identifying early signs of mental
health deterioration through sentiment analysis, the project aims to facilitate timely
support and intervention for individuals who may be in need.
5. Evaluate System Performance: Rigorously test and evaluate the system's performance
using metrics such as accuracy, recall, and precision, comparing its effectiveness to
baseline models to ensure reliability and improvement in detecting mental health
indicators.
6. Contribute to Mental Health Awareness: Through the development and application of
this system, contribute to broader efforts in mental health awareness, showcasing how
technology can be used to support mental well-being in the digital age.

Selected Dataset
The dataset is a curated collection of posts from Reddit, specifically from the "SuicideWatch,"
"depression," and "teenagers" subreddits. It's been organized to facilitate the detection of
suicide ideation, with posts labeled as suicide, depression, or non-suicide (normal
conversations). This labeling provides a clear pathway for training our NLP models to
distinguish between varying degrees of mental health concerns and typical adolescent
discourse. The dataset spans posts from the inception of these subreddits through early 2021,
offering a comprehensive view of the discourse over time.

Dataset URL: https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch

5
Proposed Solution
Data Preprocessing and Augmentation

Preprocessing: The dataset will undergo preprocessing to clean and standardize the text data.
This includes removing irrelevant characters, standardizing text formatting, and handling
missing values.

Augmentation: Given the potentially imbalanced nature of the dataset, we may employ data
augmentation techniques to ensure that our models are not biased toward the more prevalent
class. Techniques such as synonym replacement or back-translation could be used to augment
the data in a way that preserves the original sentiment and context.

Model Training and Evaluation

Model Training: Utilizing the preprocessed dataset, we'll train our NLP models, focusing on
LSTM networks and BERT, to classify posts into suicide, depression, or non-suicide categories.
The nuanced understanding required to differentiate between these categories makes the LSTM
and BERT models particularly suitable for this task, given their proficiency in capturing context
and sentiment over sequences of text.

Evaluation: The models will be evaluated based on their accuracy, recall, precision, and F1
score, with a special emphasis on minimizing false negatives for suicide and depression
categories due to the sensitive nature of the task. We will also conduct error analysis to
understand the types of posts that are more challenging for the models to classify correctly.

Repository: https://github.com/gocasual/mentAI

Proposed Development Platforms


Programming Language:

Python: Widely used for NLP and machine learning projects due to its readability and the
extensive ecosystem of libraries and frameworks it offers.

Libraries and Frameworks:

Natural Language Toolkit (NLTK): A Python library providing easy-to-use interfaces to over 50
corpora and lexical resources such as WordNet, along with a suite of text processing libraries
for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

spaCy: An industrial-strength NLP library in Python that provides capabilities for named entity
recognition, part-of-speech tagging, dependency parsing, and more, with support for over 60
languages.

6
TensorFlow and Keras: Open-source libraries for numerical computation and machine learning
that allow for easy construction, training, and deployment of models running on GPUs and
CPUs.

PyTorch: A Python library for deep learning developed by Facebook that is known for its
flexibility, speed, and native support for dynamic computation graphs.

Scikit-learn: A Python library for machine learning that offers various classification, regression,
and clustering algorithms, including support vector machines, random forests, gradient boosting,
k-means, and DBSCAN, and is designed to interoperate with NumPy and SciPy.

References
[1] https://www.nature.com/articles/s41598-020-68764-y
[2]https://ejnpn.springeropen.com/articles/10.1186/s41983-023-00735-2#:~:text=Data
%20privacy%20is%20one%20of,necessitate%20stringent%20safeguards%20%5B1%5D.
[3] https://link.springer.com/chapter/10.1007/978-1-4684-3680-8_1

You might also like