Leveraging N-gram Features for Stress Detection in Social-Media Using Logistic Regression

Main Article Content

Bhushan V. Wakode

Abstract

Social media sites like Reddit have become an important channel for users to share their feelings about being stressed or anxious or having difficulties with their mental health. The ability to detect stress automatically by analyzing the text created by users will allow for timely interventions and support systems. The current research presents a natural language processing (NLP) method using an n-gram feature extraction method along with TF-IDF weighting and logistic regression to detect stress in text. The research is conducted using the "Dreaddit" dataset, which contains posts from Reddit that are identified as either stressed or not stressed. The current model with n-gram TF-IDF features is compared with an array of classification methods including Naive Bayes, Decision Tree, SVM, and KNN. The authors also conduct feature analyses to find the most discriminating features of stress in a text analysis. The results of the experiments demonstrate that the logistic regression n-gram TF-IDF model outperforms all other classification models with the highest F1 score of approximately 80%. This study concludes that traditional NLP methods are very useful for identifying and classifying text-based data related to mental health.

Article Details

Issue
Section
Articles