1 Introduction
Depression is a profound psychological disorder that affects people in different regions of the world, regardless of their gender, age, or social status [1].
It is a psychiatric condition characterized by persistent feelings of boredom, negativity, and sadness during daily activities. People experiencing depression often face challenges in interpersonal relationships, occupational performance, and maintaining healthy bonds, which ultimately affect their overall well-being. It is estimated that more than 280 million people worldwide struggle with depression, making it the main cause of disability on a global scale [6].
Despite its prevalence and impact, depression frequently evades detection and remains untreated. Early detection of depression presents challenges due to people who do not seek professional help or are unaware of their symptoms.
The presence of social media (SM) and online forums provides researchers with a unique opportunity to analyze online expressions of thoughts and emotions, potentially uncovering indications of depression.
Therefore, it is essential to develop accessible and efficient techniques that can help identify people at risk of depression, allowing them to receive the necessary support. SM platforms have become popular for communication and self-expression [8, 7, 47].
Sentiment analysis (SA) identifies and extracts subjective information from text, such as SM posts, reviews, and news articles. By analyzing the language used in SM texts [41, 42, 4, 31, 17, 2, 28], SA algorithms can determine the general sentiment of the text.
SM serves as a valuable tool to connect with people who might be susceptible to depression or who face challenges related to mental well-being. Considering the extensive use of SM platforms such as Facebook, Twitter, and Instagram by billions of people around the world to communicate and share information, it has evolved into an integral aspect of contemporary communication methods. Studies have shown that people with depression use SM as a coping mechanism, seeking support and validation from others through online interactions [3, 9, 36].
Environmental and social factors, among others, have a great influence on the development of depression. The process of analyzing large volumes of text derived from SM using natural language processing (NLP) techniques allows the identification of language patterns that can indicate depression [24, 48, 52, 49].
Machine Learning (ML) techniques are revolutionizing the field of NLP [19, 26, 43, 30, 35, 10, 31, 32, 27]. These techniques have enabled researchers to build sophisticated models that can analyze and understand complex human language, including sentiment, syntax, and semantics. These algorithms use statistical rules to discover patterns in the data and use them to inform decisions as a result of this learning.
Deep learning (DL), a subset of ML, involves training artificial neural networks with many layers to recognize patterns and make decisions. Transformers are a type of DL architecture that has become popular for its ability to process sequential data, such as text.
Fine-tuned pre-trained language models are a specific application of these DL techniques. These models are pre-trained on massive volumes of text data before being fine-tuned for specific tasks such as named entity recognition or sentiment analysis.
By fine-tuning these models on a specific task, researchers can leverage the pre-existing knowledge encoded in the model and achieve state-of-the-art performance on the task at hand.
In this paper, we performed various experiments to test the efficiency of traditional ML and DL techniques, including fine-tuned pre-trained transformer models, for the detection of depression in social media texts.
We conducted an in-depth investigation of these models and examined their performance. Our goal is to provide information on how well these models can detect depression and highlight areas for future research and development.
The findings of our research improved our understanding of the potential use of these sentiment analysis techniques to detect depression and inform the development of targeted interventions that can reduce the burden of depression on society as a whole. Our research contributes to the existing literature as follows:
A thorough analysis of depression was carried out, as well as the exploration of the possible use of social media as a tool to express depression traits, and how machine learning can help detect depression on social media data.
We applied different feature representations with machine learning and deep learning algorithms for depression detection and evaluated the performance of the models using accuracy, recall, precision, and F1 scores.
We evaluated pre-trained language models and show that they exhibit outstanding performance by consistently achieving high accuracy, precision, recall, and F1 scores.
Context, feature extraction, and pre-training all had a significant impact on the models’ performance as far as depression detection is concerned.
2 Literature Review
SA approaches have gained interest as a promising method of identifying patterns in text that can serve as indicators of depression. These approaches involve classifying the sentiment expressed in a given text to identify potential signs of depression symptoms.
In a study by Haque et al. [18], machine learning algorithms were employed to develop models capable of effectively identifying depression in children. The findings revealed that the Random Forest Classifier exhibited the highest efficiency in detecting depression.
Furthermore, the study identified 11 specific questions that can be used to detect depression in children and adolescents, helping to early diagnosis and treatment of this condition while understanding the contributing factors.
Another study by Reece et al. [38] used machine learning techniques to analyze Instagram data to identify possible indicators of depression. The study involved evaluating more than 43,000 Instagram photos and extracting statistical features such as color analysis, metadata components, and face identification.
Interestingly, their algorithm outperformed general practitioners in diagnosing depression, highlighting the potential of computational analysis of visual social media data as a scalable approach to detecting mental illnesses. In the study conducted by Cornn K. [14], a combination of various machine learning algorithms and neural networks was used to classify depression within social media text.
The most successful model was a CNN model, achieving an impressive accuracy of 92.5%. The one-dimensional convolutional layer played a vital role in noise reduction and was regarded as the most crucial component of the model.
Interestingly, the use of Word embedding proved to be ineffective in representing the text used in this particular study. In another work by Ziwei et al. [54], an application was developed to differentiate between depressive and non-depressive tweets using a classification function.
The application also provided a visualization of the user’s depression status through a web interface. The research emphasized the importance of early detection of depression and highlighted the potential of social media platforms in predicting mental and physical illnesses.
However, the application faced limitations imposed by Twitter’s API, such as the constraint of analyzing only a limited number of tweets. In a study conducted by De Choudhury et al. [15], sentiment analysis techniques were used to analyze Facebook data to detect symptoms of depression.
The findings revealed that individuals with depression symptoms tended to use a higher frequency of first-person pronouns, express negative emotions through their choice of words, and display a reduced use of terms associated with happiness in their Facebook posts, compared to individuals without symptoms.
Chen et al. [11] conducted a data analysis on Reddit data to identify people with depression. They proposed a hybrid deep learning model that combined a pre-trained sentence BERT (sBERT) with a convolutional neural network (CNN) to effectively identify individuals with depression based on their Reddit posts.
Interestingly, the model exceeded previously reported state-of-the-art results in the literature, achieving an accuracy of 0.86 and an F1 score of 0.86. The improved hybrid model was also applied to other text analysis tasks, showcasing its versatility and efficacy.
The research carried out by Wen et al. [51] used social media data to detect depression among users. Through the development of a classification model specifically designed to identify depression in tweets, the authors achieved remarkable results, with a high test accuracy of 98.94% and an F1 score of 99.04%.
The study highlights the effectiveness of analyzing the language used on social media platforms as a valuable approach for the early detection of depression among individuals. In a related study, Hosseini et al. [21] explored the integration of psychological and psychoanalytical insights to improve the identification of individuals with depression.
By combining traits observed in both depressed and non-depressed groups, the researchers created a bipolar feature vector. They successfully improved their models and achieved an impressive F1 score of 82.75% using a modified Bayesian classifier to classify social media users into depressed and non-depressed groups. In the research conducted by Wang et al. [50], a method to improve the features was introduced as input to a 3D CNN speech emotion recognition model, with the aim of identifying depression in its earliest stages.
The experiments carried out demonstrated that the combination of the enhanced feature and the model significantly improved the ability to detect and recognize depression.
Additionally, their study emphasized the necessity for future investigations to incorporate more detailed levels of analysis and extract additional features from speech signals to enhance detection accuracy.
Muzammel et al. [25] conducted experiments on depression detection by integrating multimodal features and selecting the optimal fusion strategy. The authors proposed two unimodal representations based on RNN and CNN networks.
These networks were utilized to acquire dynamic temporal representations of multimodal data, allowing for a comprehensive understanding of depression.
These investigations indicate that supervised learning techniques can be effective in identifying depression through the analysis of social media data. The summarized research findings related to depression detection are presented in Table 1.
Table 1 Related Studies on Detecting Depression
Model | Reference | F1 Score | Accuracy | Year |
MNB | S.G. Burdisso et al. | 0.96 | 0.96 | 2019 |
MLP | I. Fatima et al. | 0.92 | 0.92 | 2019 |
CNN | J. Kim | 0.79 | 0.75 | 2020 |
RFC | A Priya et al. | 0.77 | 0.80 | 2020 |
Char CNN | K. Cornn | 0.94 | 0.93 | 2020 |
SVM | H.S. AlSagri et al. | 0.79 | 0.83 | 2020 |
Sense Mood | C. Lin et al. | 0.94 | 0.88 | 2020 |
3D-CNN | H. Wang et al. | 0.64 | 0.77 | 2021 |
RFC | EM de Souza Filho et al. | 0.89 | 0.89 | 2021 |
LSTM | M. Muzammel et al. | 0.95 | 0.95 | 2021 |
SBERT | CNN Z. Chen | 0.86 | 0.86 | 2023 |
However, there are some limitations to some of these methods that highlight the need to continue developing and fine-tuning these techniques to improve their accuracy and effectiveness. Figure 1 illustrates the steps involved in our classification method.
3 Methodology
3.1 Data
The dataset used in this experiment was sourced from Kaggle [5], a widely used platform known for hosting diverse datasets and machine learning competitions for individuals and organizations. It consists of depression-related text, acquired from Reddit, a highly popular social media platform worldwide, using web scraping techniques.
The datasets includes a total of 7,731 posts, which we divided into train and test sets to ensure accuracy and consistency in the analysis. The sentiment classes are (‘1’) or non-depression (‘0’), which indicate whether the text contained expressions of depression or not.
Table 2 presents an example of text labeled with sentiment classes denoting depression and non depression. The dataset was divided into two parts: the training set and the testing set, consisting of 6539 and 1192 text inputs, respectively.
Table 2 Sample Text with Sentiment Classes
Text | Label |
i ve lost everything i lost my best friend a community of people who were my only social outlet i m a failure i m i ve never been in a relationship i couldn t graduate college i m stuck working at a job which doesn t pay enough for me to afford rent so i have to live with my retirement age parent i can t find a job anywhere else i started cutting myself today never did it a a teenager but i did it now and it feel great i don t want to die but i don t see any other solution i can not afford help to me being in debt is worse than death i ve lost so much i can t go on | 1 |
I ve been feeling really depressed lately and find myself with no one to talk I have these cry spell whenever i m alone and convinced that i m worthless and not worth anyone s time it s getting harder to pick myself up from the floor bed and be productive or practice self care my friend live far away and emotionally at arm length my family understands that i m depressed but not how much it debilitates me with no one to talk to i feel trapped i m hoping finding online support can help me understand how to go on so i m kinda new to this how does this thread help you | 1 |
am i really just that awful no one want to be my friend my old friend abuse me i hate everything but especially myself when will it get better | 1 |
Our membership had expired and to renew them, we have to do a new induction which can’t happen until next Tuesday | 0 |
bored of sims for today and still thinking of a name for me and like youtube account to post our awesome new video on idea people | 0 |
hetty christ heh yeah i shakily conquered the ladder pointless job though we are too far away to receive digital signal with antenna | 0 |
Table 3 presents the statistics of the text indicating depression and non-depression in both the training and the testing sets.
Table 3 Statistics of depression and non-depression in the train and test datasets
Data | Instances | Label |
Train | 3,239 | 1 |
3,300 | 0 | |
Test | 592 | 1 |
600 | 0 | |
Total | 3,831 | 1 |
3,900 | 0 |
To comprehensively evaluate the effectiveness and reliability of our depression detection models, we conducted extensive experiments by combining intelligent pre-trained transformer models with traditional machine learning techniques.
By integrating diverse feature representations and transformer architectures, we obtained valuable insights into the performance and suitability of various approaches for depression classification.
The availability of this dataset on Kaggle makes it easier for other researchers to replicate this experiment and build on the work done in this research.
3.2 Models
The traditional machine learning algorithms included Multinomial Naive Bayes (MNB) [37], Stochastic Gradient Descent (SGD) [53], Logistic Regression Classifier (LRC) [40], Decision Tree Classifier (DTC) [45], Random Forest Classifier (RFC) [33], K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and Multi-Layer Perceptron (MLP) [46].
These algorithms are commonly used in text classification tasks and are well established in the field of machine learning [34, 43, 29, 44]. Furthermore, we also used fine-tuned pre-trained language models for depression detection. The models used in the study included BERT [16], RoBERTa [23], XLM-RoBERTa [13],
DistilBERT [39], ALBERT [22], DistilRoBERTa [23] and ELECTRA [12]. These models are capable of capturing semantic and syntactic relationships between words, and the efficiency and effectiveness of these techniques make them often used for a wide range of applications, including language generation, machine translation and text classification.
4 Results
For this study, we evaluated different machine learning and pre-trained language models to detect and evaluate signs of depression. We extracted meaningful features from the text of social media to represent language patterns associated with depression.
The accuracy, precision, recall, and F1 evaluation metrics were used to assess the performance of the depression detection models. The features used in our experiments include bag-of-words (BoW), Word2Vec, and GloVe embeddings. By analyzing these results, we shed light on the profound influence of these distinct features on the overall performance of the models.
4.1 Experiment with Traditional Machine Learning Models and BoW
The BoW model represents text data as a collection of individual words and converts them into numerical representations that can be used by various machine learning algorithms. Machine learning models are trained on labeled datasets, where each text sample is associated with labels indicating the presence or absence of depression. The models learn to identify patterns and associations between the extracted BoW features and the corresponding labels.
The trained models are then evaluated using appropriate evaluation metrics such as accuracy, precision, recall, and F1-score. Our findings are presented in Tables 4 - 7, which provide a comprehensive overview of the results of our experiments. In the experiment, several models were evaluated using the Bag-of-Words (BoW) feature representation, and their performance scores were recorded.
Table 4 Result of machine learning models using the BoW feature representation
Model | Average | Accuracy | Precision | Recall | F1 |
MNB | macro avg | 0.83 | 0.87 | 0.83 | 0.83 |
weighted avg | 0.87 | 0.83 | 0.83 | ||
SGD | macro avg | 0.94 | 0.94 | 0.94 | 0.94 |
weighted avg | 0.94 | 0.94 | 0.94 | ||
LRC | macro avg | 0.96 | 0.96 | 0.96 | 0.96 |
weighted avg | 0.96 | 0.96 | 0.96 | ||
DTC | macro avg | 0.86 | 0.87 | 0.86 | 0.86 |
weighted avg | 0.87 | 0.86 | 0.86 | ||
RFC | macro avg | 0.93 | 0.93 | 0.93 | 0.93 |
weighted avg | 0.93 | 0.93 | 0.93 | ||
KNN | macro avg | 0.74 | 0.78 | 0.74 | 0.73 |
weighted avg | 0.78 | 0.74 | 0.73 | ||
SVM | macro avg | 0.95 | 0.96 | 0.95 | 0.95 |
weighted avg | 0.96 | 0.95 | 0.95 | ||
MLP | macro avg | 0.91 | 0.91 | 0.91 | 0.91 |
weighted avg | 0.91 | 0.91 | 0.91 |
Table 5 Result of machine learning models using the Word2Vec feature representation
Model | Average | Accuracy | Precision | Recall | F1 |
MNB | macro avg | 0.52 | 0.53 | 0.52 | 0.48 |
weighted avg | 0.53 | 0.52 | 0.48 | ||
SGD | macro avg | 0.81 | 0.83 | 0.81 | 0.80 |
weighted avg | 0.84 | 0.81 | 0.80 | ||
LRC | macro avg | 0.87 | 0.87 | 0.87 | 0.87 |
weighted avg | 0.87 | 0.87 | 0.87 | ||
DTC | macro avg | 0.82 | 0.82 | 0.82 | 0.82 |
weighted avg | 0.82 | 0.82 | 0.82 | ||
RFC | macro avg | 0.91 | 0.92 | 0.91 | 0.91 |
weighted avg | 0.92 | 0.91 | 0.91 | ||
KNN | macro avg | 0.80 | 0.83 | 0.80 | 0.80 |
weighted avg | 0.83 | 0.80 | 0.80 | ||
SVM | macro avg | 0.91 | 0.91 | 0.91 | 0.91 |
weighted avg | 0.91 | 0.91 | 0.91 | ||
MLP | macro avg | 0.94 | 0.94 | 0.94 | 0.94 |
weighted avg | 0.94 | 0.94 | 0.94 |
Table 6 Results of machine learning models using the GloVe feature representation
Model | Average | Accuracy | Precision | Recall | F1 |
MNB | macro avg | 0.60 | 0.64 | 0.60 | 0.58 |
weighted avg | - | 0.64 | 0.60 | 0.58 | |
SGD | macro avg | 0.94 | 0.94 | 0.94 | 0.94 |
weighted avg | - | 0.94 | 0.94 | 0.94 | |
LRC | macro avg | 0.96 | 0.96 | 0.96 | 0.96 |
weighted avg | - | 0.96 | 0.96 | 0.96 | |
DTC | macro avg | 0.86 | 0.87 | 0.86 | 0.86 |
weighted avg | - | 0.87 | 0.86 | 0.86 | |
RFC | macro avg | 0.93 | 0.93 | 0.93 | 0.93 |
weighted avg | - | 0.93 | 0.93 | 0.93 | |
KNN | macro avg | 0.74 | 0.78 | 0.74 | 0.73 |
weighted avg | - | 0.78 | 0.74 | 0.73 | |
SVM | macro avg | 0.95 | 0.96 | 0.95 | 0.95 |
weighted avg | - | 0.96 | 0.95 | 0.95 | |
MLP | macro avg | 0.91 | 0.91 | 0.91 | 0.91 |
weighted avg | - | 0.91 | 0.91 | 0.91 |
Table 7 Results of the Transformer models in the experiment
Model | Feature | Accuracy | Precision | Recall | F1 |
BERT | Transformer | 0.97 | 0.97 | 0.97 | 0.97 |
Embedding | |||||
RoBERTa | Transformer | 0.99 | 0.99 | 0.99 | 0.99 |
Embedding | |||||
XLM-RoBERTa | Transformer | 0.98 | 0.98 | 0.98 | 0.98 |
Embedding | |||||
DistilBERT | Transformer | 0.98 | 0.98 | 0.98 | 0.98 |
Embedding | |||||
ALBERT | Transformer | 0.98 | 0.98 | 0.98 | 0.98 |
Embedding | |||||
DistilRoBERTa | Transformer | 0.96 | 0.97 | 0.96 | 0.96 |
Embedding | |||||
ELECTRA | Transformer | 0.99 | 0.99 | 0.99 | 0.99 |
Embedding |
According to the results, the LRC model achieved the highest performance in all metrics, with an average accuracy, precision, recall, and F1 score of 0.96. This indicates that the model excelled at accurately classifying depression. The SGD and SVM also demonstrated strong performance with average scores of 0.94 and 0.95 respectively.
These models showed excellent overall performance in terms of accuracy, precision, recall, and F1 score. However, MNB, DTC, RFC, and MLP achieved good performance, with average scores ranging from 0.83 to 0.91. Although these models did not achieve as high scores as the top performers, they still exhibited reasonably good results.
The KNN model had the lowest performance among the evaluated models, with an average score of 0.74. This suggests that the model faced challenges in accurately classifying instances related to depression compared to the other models.
4.2 Experiment with Traditional Machine Learning Models and Word2Vec
Unlike the BoW model, Word2Vec captures not only the frequency of words, but also their semantic meaning and contextual relationships. The Word2Vec model learns dense vector representations by analyzing large corpora of text data.
It represents each word in a high-dimensional vector space, where words with similar meanings or contextual usage are located closer to each other. The text data were preprocessed by tokenizing the text into words and removing any stop words or irrelevant characters.
Each word is then replaced by its corresponding Word2Vec vector representation obtained from the pre-trained model. This transforms the text data into numerical vectors, where each word is represented by a dense vector of fixed length.
The Word2Vec vectors are subsequently used as input features for machine learning models to detect depression. The models learn to identify patterns and associations between Word2Vec embeddings and the corresponding labels and are evaluated using accuracy, precision, recall, and F1 score.
Using Word2Vec word embeddings, the models effectively capture semantic and contextual information within the text data, resulting in improved accuracy and more meaningful predictions. The findings of the analysis, using Word2Vec as feature representations, are presented in Table 5.
Table 5 presents notable insights into the performance of different machine learning models using Word2Vec features. Among these models, the MLP model stands out with an impressive accuracy of 0.94. Both the RFC and SVM models consistently demonstrated moderate performance with accuracy, precision, recall, and F1 scores hovering around 0.91. The SGD, KNN, LRC, and DTC models performed adequately, albeit at a slightly lower level. The MNB model exhibited poor performance, as indicated by lower accuracy, precision, recall, and F1 scores.
4.3 Experiment with Traditional Machine Learning Models and GloVe
In order to conduct further analysis, we employed the use of GloVe embedding representation to capture the semantic relationships between words. These vector representations are derived from the co-occurrence statistics of words in a corpus.
By encoding information about word meaning and context, these embeddings enable machine learning models to benefit from this knowledge. Using pre-trained GloVe embeddings, each word in the text is mapped to its corresponding vector representation.
These word vectors are then concatenated to create document-level representations, which are subsequently used to train the machine learning models. The results of our experiments using GloVe embeddings are presented in Table 6. The results of the experiment using the GloVe feature representation for machine learning models are summarized in Table 6.
The SGD, LRC, and SVM models consistently outperformed all other models, achieving high accuracy, precision, recall, and F1 scores of approximately 0.94, 0.96, and 0.95, respectively. The RFC model also exhibited strong performance, with accuracy, precision, recall, and an F1 score of around 0.93.
The KNN, DTC and MLP models yielded good performance, yielding an accuracy, precision, recall, and F1 score of approximately 0.74, 0.86, and 0.91, respectively. On the other hand, the MNB model showed relatively lower performance compared to the other models, with an accuracy, precision, recall, and F1 score of approximately 0.60. These findings indicate that when combined with these specific models, the GloVe feature representation can be highly valuable for the analysis and classification of depression in textual data.
4.4 Experiment with Transformer Architectures
Pre-trained language models have demonstrated remarkable success in various NLP tasks [31, 20]. Initially trained on vast amounts of text data from the Internet, these models acquire a contextual understanding of words and sentences.
To apply pre-trained language models for depression detection, we fine-tuned them by training them on labeled data. Labeled data consist of text samples annotated with depression-related labels.
Through the fine-tuning process, the pre-trained language models learn to capture significant linguistic patterns and contextual cues associated with depression.
Using the fine-tuned models, we classify new text samples as either indicating depression or not. We evaluated the models’ performance using various metrics and found that the ELECTRA and Roberta-large models outperformed others, achieving the highest cumulative scores across all metrics.
Notably, these models achieved an F1 score of 0.99 each after only 10 epochs. Table 7 presents the performance of the transformer models. Table 7 displays the results of an extensive evaluation of various pre-trained language models in the depression detection task.
Transformer models showcased strong performance across all metrics evaluated. BERT achieved an accuracy, precision, recall, and F1 score of 0.97, demonstrating its effectiveness in detecting depression.
Furthermore, models such as RoBERTa, XLM-RoBERTa, DistilBERT, ALBERT and ELECTRA consistently achieved high scores, with accuracy, precision, recall, and F1 scores around or above 0.98.
These models exhibited robustness and reliability in the capture and comprehension of complex language patterns. While DistilRoBERTa slightly underperformed compared to other models, it still achieved an accuracy of 0.96. Both the ELECTRA and Roberta large models achieved the highest F1 scores of 0.99 each.
This underscores its exceptional potential for accurately detecting depression in this experiment. Incorporating these models could potentially revolutionize the identification and treatment of depression, leading to early detection and treatment.
5 Discussion
Pre-trained language models can continuously improve and adapt as they encounter new data, enhancing their diagnostic accuracy and generalization capabilities.
In this research, an investigation was conducted into the effectiveness of a variety of machine learning models in detecting depression in social media data, including pre-trained language models such as BERT, RoBERTa, XLM-RoBERTa, DistilBERT, ALBERT, DistilRoBERTa, and ELECTRA.
These models were assessed based on their accuracy, precision, recall, and F1 scores. Throughout the test, all transformer models achieved high accuracy and F1 scores, with RoBERTa and ELECTRA as the best performers.
This high performance of pre-trained transformer models suggests that they can effectively identify depression in text data.
Furthermore, these models can provide institutions responsible for the prevention of depression with a cost-effective alternative to their traditional methods of recognizing depression.
With the use of pre-trained language models and social media data for depression detection, significant advancement has been made in this study, emphasizing the potential of pre-trained language models and social media analysis for depression treatment and prevention.
6 Conclusions
As pre-trained language models continue to evolve, they hold the potential to revolutionize the field of depression prevention and treatment. The key strength of pre-trained language models lies in their ability to learn from vast amounts of diverse textual data, enabling them to discern subtle linguistic cues indicative of depression across different languages. Our study demonstrates the high effectiveness of pre-trained language models in detecting depression in English text from social media sources.
In our experiments, the pre-trained language models with which we experimented obtained very good accuracy, precision, recall, and F1 values. However, more research is needed to determine whether they are generalizable to larger, more diverse datasets and a different language. Real-world application challenges such as model biases, interpretability, and scalability still need to be addressed.
Our findings still underscore the need to leverage these pretrained language models to detect and address depression at scale. Through continued development, these models can contribute significantly to early detection and improved well-being for individuals suffering from depression.
The authors thank CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercomputo of the INAOE, Mexico, and acknowledge the support of Microsoft through the Microsoft Latin America PhD Award.