SciELO - Scientific Electronic Library Online

 
vol.28 número1Real-Time Helmet Detection and Number Plate Extraction Using Computer VisionMiniCovid-Unet: CT-Scan Lung Images Segmentation for COVID-19 Identification índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.28 no.1 Ciudad de México ene./mar. 2024  Epub 10-Jun-2024

https://doi.org/10.13053/cys-28-1-4905 

Articles

From Words to Paragraphs: Modeling Sentiment Dynamics in Notes from Underground with GPT-4 by Differential Equations Via Quantile Regression Analysis

Volkan Duran1 

Iskander Akhmetov2  3 

Elman Hazar4 

Alexander Gelbukh5  * 

Ezgi Kaya4 

11 Iğdır University, Department of Psychology, Türkiye. volkan.duran8@gmail.com.

22 Institute of Information and Computational Technologies, Almaty, Republic of Kazakhstan. i.akhmetov@kbtu.kz.

33 Kazakh-British Technical University, Almaty, Republic of Kazakhstan.

44 Iğdır University, Department of Mathematics, Türkiye. elman.hazar@igdir.edu.tr, ezgi.kaya@igdir.edu.tr.

55 Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico.


Abstract:

This study examines how the sentiment values in the first part of the book entitled as “Underground” of Fyodor Dostoevsky’s ”Notes from Underground” change from words to sentences to paragraphs. Using the GPT-4 language model, we conducted a descriptive analysis of standardized sentiment values and calculated cumulative binned values of the sentiment trajectories over the text. We then created differential equation models to model the sentiment tones using quantile regression analysis. We show that binned values can reveal a more dynamic and potentially chaotic structure when applied to the cumulative sum of sentiments for word, sentence, and paragraph levels. We model differential equations derived for word, sentence, and paragraph levels via quantile regression. They demonstrate how the rate and acceleration of sentiment change are influenced by their current state and rate of change. In conclusion, this study’s findings are important for enhancing the capabilities of AI-driven chatbots in sentiment analysis, particularly in dissecting and understanding the layered emotional landscapes of literary works.

Keywords: Sentiment analysis; differential equations; GPT-4; curve fitting; quantile regression analysis

1 Introduction

Opinion mining or sentiment analysis (SA) examines opinions in text using a blend of mathematics and linguistics [22]. It offers valuable insights for enhancing educational practices [11]. SA operates mainly at four levels: Document, Sentence, Phrase, and Aspect [20, 26].

Document level classifies the overall sentiment of a text, while Sentence level focuses on individual sentences. Phrase level mines opinion words and Aspect level analyzes the emotional components of phrases, assigning polarity to each.

Sentiment analysis is a multifaceted field involving various NLP tasks like aspect extraction and sarcasm detection [27]. It employs diverse techniques, including machine learning, lexicon-based, rule-based, and statistical models [10, 17, 23, 28]. Specialized methods like aspect-based analysis and deep learning have also been developed [12, 2, 21, 25, 29].

Moreover, multi-modal algorithms are emerging that analyze not just text but also visual data [5]. Sentiment analysis is already applied in diverse sectors like marketing, politics, and healthcare [6, 7, 9, 15]. By fusing AI-driven sentiment analysis with mathematical models, this research sets the stage for deeper exploration into sentiment dynamics, enriching its application across various fields.

Recent research suggests that keyword-based techniques may be inadequate for nuanced texts [19]. In the literature, some researchers focus on the ratings and reviews for the sentiment analysis, and it is the most fundamental part of this area. For instance, by using lemmatization, stemming techniques, and eliminating the stop words so that the data from the dataset are classified using logistic regression approach [18].

Additionally, subdividing the training corpus by topic (local news, sports, hi-tech, and others) and training separate sentiment classifiers for each sub-corpus improves classification F1 scores can also be used as topic-aware sentiment analysis of news articles [1]. The article is different from most of the previous literature by utilizing the GPT-4 language model for a descriptive analysis.

And of standardized sentiment values and calculating cumulative binned values of sentiment trajectories. It uses differential equation models and quantile regression analysis to model sentiment tones, a method that’s more complex and potentially better suited for capturing the nuanced changes in sentiments in literary texts.

Although differential equations have previously been used in social sciences [16], the contribution of this research lies in building sentiment models through quantile regression analysis although There are some studies creating sentiment models through linear regression and curve fitting options [13, 14].

This approach not only enhances credibility but also allows for the study of complex sentiment relationships across various textual levels. It opens up avenues for predicting sentiment behavior in different contexts.

Given the intricacies of text sentiment representation and the intersection of AI-driven sentiment analysis with mathematical models, it is evident that understanding sentiment behavior in various contexts is not only crucial but intricate.

Drawing on the principles of mathematical modeling and physics, this research takes innovative steps in employing techniques from stratified symbolic regression, genetic programming, and the finite difference method.

Such techniques have proven instrumental in extracting differential equations from data, as showcased by many researchers [3, 4, 8, 24]. By bridging the gap between AI sentiment analysis and mathematical modeling, this research promises to provide a more credible, predictive, and enriched understanding of sentiment behavior across textual forms. Therefore, research on the development of sentiment representation using AI-driven analysis combined with mathematical modeling is undeniably relevant.

2 Methodology

This study is based on a quantitative research design. We analyzed the sentiments in terms of Word level, sentence level, and paragraph level (Figure 1) in the first part of the book entitled “Underground”.

Fig. 1 The main units of the analysis 

In the first part of the study, we descriptively investigated the general characteristics of the sentiments in standardized forms. Finally, we used quantile regression models to get differential equations regarding the sentimental tones by using SPSS 25.

We get the three given equations representing the sentiment points at different levels of text (word, sentence, and paragraph) as a function of x. The x variable could be interpreted as the position within the text. We used GPT-4, which is a multimodal large language model created by OpenAI and the fourth in its GPT series, to label sentiment values at the word, sentence, and paragraph levels.

In this analysis, we have three main units of the research as words, sentences, and paragraphs (Figure 1) where GPT4 assigned sentiment scores between -1 and 0 (negative sentiments) and 0 and 1 (positive sentiments) to each word/phrase or to each sentence in a passage out to an entire passage of text.

2.1 Analysis

This study consists of two main parts. The first part involves an analysis of binned sentiment values based on the first standard deviation. Specifically, the cumulative sentiment time series is divided into bins separated by one standard deviation. This binning allows for examining the dynamics and potential chaos in the cumulative sentiment data. In the first part, the procedure can be given below:

  1. Descriptive Analysis of Sentiments: The study starts by analyzing sentiments within the text, likely using GPT-4 or a similar tool to assess the sentiment of words, sentences, and paragraphs.

  2. Visual Binning Based on Standard Deviation: They categorize sentiments into three levels based on the first standard deviation of the sentiment distribution. This approach effectively groups sentiments into categories like ’low’, ’medium’, and ’high’.

  3. Encoding Binned Variables as Integers: Each sentiment level is then encoded as an integer (e.g., -2 for very low, -1 for low, 1 for high, 2 for very high). This quantifies the sentiment levels, making them easier to analyze numerically.

In this process, we focus on the median values of the binned categories, if there are an odd number of categories (n).

The median category, which is at position (n+1)/2, is considered the ’neutral’ or ’base’ sentiment and is assigned a value of 0. For categories below the median, negative integers are assigned, starting from -1 and decreasing for each category, moving away from the median. For categories above the median, positive integers are assigned, starting from 1 and increasing for each category moving away from the median.

If there is an even number of categories (n), the median is determined by averaging the n/2 and (n/2)+1th values. This average value represents the ’neutral’ sentiment and is assigned a value of 0 since the negative values.

But also correspond to low values of the binned variables hence in order to label them negative we did such a procedure in the raw data. Similar to the odd-numbered case, categories below this median are assigned negative integers, and those above are assigned positive integers.

This method of encoding allows for a more nuanced analysis of sentiment data as it preserves the ordinal nature of the sentiment levels while converting them into a format that can be easily used in various statistical and machine learning models.

It’s especially useful when dealing with sentiment analysis where the intensity or degree of sentiment is important (Figure 2). Binning is a process of transforming continuous data into categories or bins.

Fig. 2 Transformation of binned valued into sentimental range via median value 

If the aim is to categorize data based on whether they fall within one standard deviation of the mean, we are essentially creating a non-linear partition of the data. We can make an analogy with wave-particle duality in the context of a double slit experiment with binding. In quantum mechanics, many physical properties, such as energy, angular momentum, and charge, are quantized.

This means they can only take on discrete values, much like how binning categorizes continuous data into discrete bins. The act of measuring a quantum system can ’bin’ the system into one of the possible states. Before measurement, quantum systems are described by a probability distribution (wave function), which encompasses many potential outcomes.

Measurement ’collapses’ this wave function, forcing the system into one of the distinct states, akin to assigning a data point to a specific bin. While these analogies can be helpful in visualizing some aspects of quantum mechanics, it’s crucial to remember that they are metaphorical (Figure 2). One problem with such methodology in the interpretation of the sentiments of text might be whether it really depicts the actual trajectories of the sentiments, but this can be remedied by comparative studies.

Comparative studies serve as a valuable tool to validate and refine this approach, ensuring a more accurate and nuanced understanding of sentiments in textual analysis. This method preserves the ordinal nature of sentiment data. The relative ordering (from very negative to very positive) is maintained, which is important for many statistical analyses and machine learning models that can leverage this order information.

By encoding sentiments this way, you can conduct more detailed and meaningful analyses of sentiment data, capturing not just whether sentiments are positive or negative but the degree of positivity or negativity.

This is particularly useful in areas like customer feedback analysis, social media sentiment tracking, and market research, where understanding the intensity of sentiments can be as important as knowing their direction.

  • 4. Cumulative Sum of Encoded Values: The cumulative sum of these encoded sentiment values is computed. This means that for each point in the text (word, sentence, or paragraph), they add its sentiment score to the total of all previous scores. The result is a running total of sentiment values. The computation of the cumulative sum of these sentiment values represents the aggregation of sentiment over the text. In a literary context, this could reflect the buildup or fluctuation of emotional tone throughout the narrative. This step transforms the sentiment trajectory into a path.

  • 5. Interpreting Cumulative Sum: In this phase, the cumulative sum of sentiment values is interpreted as a narrative sentiment progression. This approach considers the cumulative total as a reflection of the evolving emotional tone within the text. Each sentiment score at various textual levels - whether a word, sentence, or paragraph - contributes to an ongoing narrative sentiment trajectory.

    • This trajectory, depicted as a cumulative sum, showcases how emotional tones build, shift, and fluctuate over the course of the narrative. Unlike the independent steps of a random walk, this sentiment progression is influenced by the contextual and sequential nature of the literary work, highlighting the interconnectedness and dependency of emotional expressions as the story unfolds. This interpretation provides insights into the nuanced and structured dynamics of sentiment in literature, illustrating how emotions evolve and interact throughout the narrative journey.

  • 6. Graphing and Curve Fitting: The cumulative sentiment scores are then graphed, likely showing how sentiment evolves throughout the text. Curve fitting is applied to this graph to analyze the sentiment dynamics further.

In summary, by encoding sentiments as numerical values and accumulating these values over the course of the text, the authors transform the sentiment data into a format that can be analyzed, revealing insights about the sentiment dynamics in the text. The second part focuses on developing differential equation models using the raw, unbinned sentiment values. These differential equations relate the change in sentiment (first derivative) and acceleration of sentiment (second derivative) to the current sentiment values.

By modeling the derivatives, the equations aim to capture the continuous evolution of sentiment through textual data. We used a quantile regression model to get differential equations regarding the sentimental tones. Creating a differential equation model using the difference method of variables, curve fitting, and linear regression involves several steps by using IBM SPSS 27 and the Excel program. Here’s a general outline of the process:

  1. Data Collection: We collected sentiment data generated by GPT-4, where the variable x can be interpreted as the position within the text.

  2. Calculation of Differences: We computed the differences between consecutive data points to approximate derivatives such as the first and second derivatives. We used the finite forward difference method to calculate these numerical derivatives, denoted as metrics.

  3. Curve Fitting: Curve fitting was performed on both the original sentiment data and the calculated differences.

  4. Quantile Regression Analysis: Quantile regression is a statistical modeling technique that examines the association between a group of explanatory variables and particular percentiles, referred to as quantiles, of a response variable. The response variable is typically the median. There are two primary advantages associated with this method in comparison to Ordinary Least Squares regression. Quantile regression is a statistical method that does not rely on any assumptions about the underlying distribution of the dependent variable. Quantile regression exhibits a robustness against the impact of extreme observations. Quantile regression is extensively employed in various fields, including ecology, healthcare, and financial economics, for the purpose of research.

  5. Formulation of the Differential Equation: Based on the results of curve fitting and quantile regression, a differential equation model was formulated.

    • Coefficients from the regression were used to define the relationship between the dependent variable(s) and their derivatives in the differential equation.

  6. Final Equations: We derived three equations representing sentiment at different textual levels (word, sentence, and paragraph) as functions for the position within the text.

2.2 Limitations

  • – The main limitation of this study is that we chose the English translation of the book rather than the original one (Notes from Underground (Vintage Classics) by Fyodor Dostoevsky (Author), Richard Pevear (Translator), Larissa Volokhonsky (Translator). Although GPT-4 works well with Russian, it is supposed that it can analyze the results best in English since the main aim is to analyze sentiments.

  • – The second limitation is that we use the GPT-4 model since there are a lot of different libraries and algorithms for this, so our results are restricted within the capabilities of the GPT-4 chatbot.

  • – Sentiment analysis and NLP face a number of obstacles, including idiosyncrasies in writing style, sarcasm, irony, and linguistic peculiarities. Many terms in many languages have nuanced or shifting meanings based on the specific setting or field in which they are used.

  • – Performing regression analysis on a variable and its numerical derivative based on a different method might not be ideal for several reasons like loss of information, amplification of noise, data requirements, assumption violations, non-stationarity, causality, and interpretation issues.

    • However, there are cases where using derivatives in a regression analysis could be beneficial.

    • For example, if someone is interested in the rate of change or if the relationship between variables is best modeled by considering rates of change, then the derivative might be appropriate.

  • – The encoding of sentiments into integers (-2 to 2) may lose some granularity of sentiment data. Literature often contains more complex emotions that this range might not fully capture.

  • – While cumulative sums can reveal overall trends, they might obscure local sentiment fluctuations. It’s important to balance the overall trajectory with local sentiment variations.

  • – The choice of curve fitting techniques and their interpretation can significantly influence the conclusions. It’s vital to ensure that the chosen method accurately reflects the sentiment dynamics.

3 Findings

3.1 General Descriptive Findings of the Cumulative Binned Values of the Sentiments

When we look at the sentiments at different levels, we observed the following results:

  • Negative Sentiments: More prevalent at the sentence level (54.3%) compared to word (47.8%) and paragraph levels (44.6%). This suggests that negative sentiments are more distinctly identified or articulated in sentences.

  • Positive Sentiments: Dominant at the paragraph level (55.4%), followed by the word level (52.2%) and sentence level (45.7%). This indicates that positive sentiments are more pronounced or become clearer in larger text contexts, such as paragraphs.

  • Word Level Analysis: Shows a slightly higher occurrence of positive sentiments compared to negative sentiments.

  • Sentence Level Analysis: Negative sentiments are more prominent than positive sentiments, indicating that sentences might convey negativity more distinctly.

  • Paragraph Level Analysis: A significant tilt towards positive sentiments, suggesting that overall positivity is more likely to be perceived in longer text blocks.

We concluded that the context (word, sentence or paragraph) significantly influences sentiment perception. Negative sentiments are more pronounced in sentences, while positive sentiments are more likely to be identified in paragraphs and this data can imply that the nuance and complexity of sentiments become more apparent in larger textual contexts. Binning is a method used in data analysis to group a range of values into bins, or intervals, which can help in identifying trends in a dataset that may not be apparent when analyzing the raw data.

In summary, while raw cumulative sums provide a direct sequential aggregation of sentiment values, binned values can uncover a more nuanced, dynamic, and sometimes chaotic structure in sentiment data, showcasing trends and patterns that may not be immediately evident in the raw cumulative sum.

3.1.1 The Descriptive Interpretation of the Sentiment Values at Word Level

The descriptive values of the sentiment values at the word level show a generally negative sentiment at the word level (Table 2). The data suggests a slight overall negative tendency in the sentiment of words analyzed, but with a balanced median and a wide range of sentiment values.

Table 1 Classification results using PumaMedNet-CXR and ResNet-18 

Category Word Sentence Paragraph
Negative Very low= -2 19.4 19.0 21.4
Low= -1 28.4 35.3 23.2
Total 47.9 54.3 44.6
Positive High=1 33.5 24.0 51.8
Very High=2 18.7 21.5 3
Total 52.1 45.5 55.4

Table 2 Classification results using PumaMedNet-CXR and ResNet-18 

Statistic Std. Error
Wordlevel Mean -0.0868 .01238
95% Confidence Interval for Mean Lower Bound -0.1111
Upper Bound -0.0625
5% Trimmed Mean -0.0892
Median 0.0000
Variance 0.195
Standard Deviation 0.44116
Minimum -0.90
Maximum 0.80
Range 1.70
Interquartile Range 0.80
Skewness 0.024 0.069
Kurtosis -1.061 0.137

The distribution of sentiment scores is fairly symmetrical and moderately varied, indicating a diverse set of sentiments in the words analyzed. This might reflect a dataset with a broad spectrum of emotional expressions, leaning slightly towards negative sentiment. The cumulative sentiment of the words analyzed decreases significantly over the series of data points. The high 2 value indicates that the trend is strongly consistent. This could imply that the data set or time period being analyzed is characterized by an increasing prevalence of negative sentiment.

If this were a time-based analysis, one could conclude that the overall sentiment is becoming more negative over time. If this represents a sequence of events or another type of series, it would suggest a downward trend in sentiment associated with the progression of that series (Figure 4). The binned analysis with a quadratic model shows that the cumulative sentiment of words has a more complex dynamic than a simple linear decrease. It highlights periods where the sentiment becomes more positive before turning more negative again.

Fig. 3 The distribution of the percentages of the sentiments in different categories 

Fig. 4 The values of the water flow chart (cumulative sum) of the raw values of the sentiments of the words 

The presence of a curve in the trend line and the variable distribution of the data points suggest that there are underlying factors or patterns causing these shifts in sentiment over the series. This could reflect the nature of the data source, such as a text or series of texts where the sentiment fluctuates with context or events rather than showing a steady trend in one direction.

In sum, the binned and quadratic analysis provides a nuanced view of sentiment progression, emphasizing the non-linear and cyclical nature of sentiment changes within the dataset (Figure 5).

Fig. 5 The values of the water flow chart (cumulative sum) chart of the binned values sentiments of the words and the relevant equation 

3.1.2 The Descriptive Interpretation of the Sentiment Values at Sentence Level

The descriptive statistics indicate that, at the sentence level, the sentiment is generally negative, with a mean and median both in the negative range. However, there is moderate variability in sentiment across sentences, with a wide range of values and a relatively flat distribution that is not heavily skewed in any direction.

This implies a diverse sentiment across the sentences, with a slight tendency toward more negative expressions (Table 3). The graph depicts the cumulative sum of sentiment scores at the sentence level (Figure 4). A predominantly downward trend, as shown by the blue line, indicates that the cumulative sentiment becomes more negative over the sequence.

Table 3 The descriptive values of the raw values of the sentiments of the sentences 

Statistic Std. Error
Sentencelevel Mean -0.1400 .01593
95% Confidence Interval for Mean Lower Bound -0.1713
Upper Bound -0.1088
5% Trimmed Mean -0.1442
Median -0.2000
Variance 0.172
Standard Deviation 0.41489
Minimum -0.90
Maximum 0.80
Range 1.70
Interquartile Range 0.70
Skewness 0.174 0.069
Kurtosis -0.959 0.187

The strong negative slope and the high R2 value suggest that there is a consistent and significant negative trend. The sentiment of sentences over this data sequence. This could imply that as the sequence progresses, the sentences express increasingly negative sentiments.

If this is a time series, for example, it could suggest a worsening of sentiment over time. If it’s a sequence of sentences from a text or series of texts, it could indicate a narrative arc that becomes more negative (Figure 6). The values of the water flow chart (cumulative sum) chart of the sentiments of the sentences show that at the beginning of the graph, there is a decline in the sentiment values (Figure 5).

Fig. 6 The values of the water flow chart (cumulative sum) of the raw values of the sentiments of the sentences 

The graph reflects the cumulative sentiment of sentences when binned, showing a complex sentiment pattern with fluctuations, including a significant downturn and subsequent recovery.

The lower 2R2 value indicates that while the quadratic trend line captures the overall shape of the data, there is still a considerable amount of variation that it does not explain.

This suggests that the sentiments of the sentences exhibit non-linear behavior with significant variance, which may be influenced by various factors not captured by a simple quadratic trend. The binned approach smooths out some of the variations and helps identify broader patterns in the sentiment data (Figure 7).

Fig. 7 The values of the water flow chart (cumulative sum) chart of the binned values of the sentiments of the sentences and the relevant equation 

3.1.3 The Descriptive Interpretation of the Sentiment Values at Paragraph Level

The descriptive statistics for sentiment analysis at the paragraph level indicate a negative bias in sentiment with a mean of -0.2330 and a standard error of 0.03214. The confidence interval suggests this mean is statistically significant and is not due to random chance. The median of -0.2000 is in line with the mean, further indicating negative sentiment.

The variance and standard deviation are relatively low, suggesting sentiments across different paragraphs are not widely dispersed but are fairly consistent around the mean. The minimum and maximum values show that sentiments range from moderately negative to slightly positive. Overall, these statistics suggest that paragraphs tend to express negative sentiments more frequently than positive ones, with a relatively consistent sentiment distribution that is moderately concentrated around the mean and median values (Table 4). The graph depicts the cumulative sum of sentiment scores at the paragraph level (Figure 6). A predominantly downward trend, as shown by the blue line, indicates that the cumulative sentiment becomes more negative over the sequence.

Table 4 The descriptive values of the raw values of the sentiments of the paragraphs 

Statistic Std. Error
Paragraphlevel Mean -0.2330 0.03214
95% Confidence Interval for Mean Lower Bound -0.2974
Upper Bound -0.1686
5% Trimmed Mean -0.2238
Median -0.2000
Variance 0.058
Standard Deviation 0.24051
Minimum -0.75
Maximum 0.15
Range 0.90
Interquartile Range 0.40
Skewness -0.524 0.319
Kurtosis -0.921 0.628

The strong negative slope and the high R2 value suggest that there is a consistent and significant negative trend in the sentiment of sentences over this data sequence. This could imply that as the sequence progresses, the paragraphs express increasingly negative sentiments. If this is a time series, for example, it could suggest a worsening of sentiment over time.

If it’s a sequence of sentences from a text or series of texts, it could indicate a narrative arc that becomes more negative (Figure 8). The cumulative sentiment values of the paragraphs show a non-linear pattern, initially declining and then rising, which suggests variability in sentiment throughout the paragraphs.

Fig. 8 The graph represents the standardized values of the water flow chart (cumulative sum) of the raw values of the sentiments of the paragraphs 

The substantial R2 value indicates a good fit for the quadratic model but also implies that there are other factors affecting sentiment that are not explained by this model alone. This could mean that the paragraphs may follow a narrative arc, with shifts in sentiment that could correspond to different stages or events in the text (Figure 9).

Fig. 9 The graph represents the standardized values of the water flow chart (cumulative sum) of the raw values of the sentiments of the paragraphs 

3.2 Modelling Differential Equations for Raw Values of the Sentiments Via Quantile Regression

3.2.1 The Differential Equations Modelling for the Words as the Main Unit of the Research

The table presents the outcomes of a quantile regression analysis aimed at understanding the factors that influence the median value of the dependent variable ’d2word’, using ’word’ and ’dword’ as predictors.

The Mean Absolute Error (MAE) is a measure of the average magnitude of the errors in a set of predictions without considering their direction. An MAE of .3562 indicates that, on average, the predictions of the median value of the dependent variable deviate from the observed median values by .3562 units.

The model explains a significant portion of the variability at the median level and provides insights with a relatively low average prediction error. In the realm of statistical analysis, the evaluation and comparison of models using metrics such as the Mean Absolute Error (MAE) and Pseudo R Squared is pivotal for understanding model performance.

A lower MAE indicates better predictive accuracy. In the model’s MAE is significantly lower than that of the null model, suggesting that including the predictors (’word’ and ’dword’) improves the model’s ability to accurately predict the median of ’d2word’.

The Pseudo R Squared value for the model indicates that about 51% of the variability in the median of ’d2word’ is accounted for by the model. In contrast, the null model, with a Pseudo R Squared of 0.000, explains none of the variability. This further suggests that your model provides a substantial improvement over the null model. Based on the information from Table 7.

Table 5 Observed Model Quality (q=0.5)a,b,c 

Pseudo R Squared 0.510
Mean Absolute Error (MAE) 0.3562

a: Dependet Variable: d2word.

b: Model: (Intercept), word, dword.

c: Method: Symplex algorithm.

Table 6 Null Model Quality (q=0.5)a,b,c 

Pseudo R Squared 0.000
Mean Absolute Error (MAE) 0.7273

a: Dependet Variable: d2word.

b: Model: (Intercept).

c: Method: Symplex algorithm.

Table 7 Parameter estimates (q=0.5)a,b 

95% Confidence Interval
Parameter Coefficient Std Error t Df Sig. Lower Bound Upper Bound
(Intercept) -0.579 0.0181 -11.245 1265 0 -0.036 0.036
Word 0.514 0.0407 12.65 0 -0.679 -0.478 -0.478
Dword -1.603 0.0407 -39.399 1265 0 -1.683 -1.523

In this table which provides parameter estimates for a statistical model, we can construct the equation for the dependent variable d2word. The table lists the coefficients for an intercept, Word, and Dword, along with other statistical details:

d2dx2Word=0.579×Word1.603×ddxWord. (1)

Multicollinearity refers to a situation where predictor variables in a regression model are highly correlated. The covariance values between the different parameters (intercept, word, dword) are relatively small. This suggests that the predictors are not highly correlated with each other. Based on the correlation coefficients, there is a potential issue of multicollinearity in your model, particularly between the variables ’word’ and ’dword’. While this level of correlation is a concern, it does not automatically invalidate the model.

3.2.2 The Differential Equations Modelling for the Sentences as the Main Unit of The Research

The regression analysis is focused on the median (0.5th quantile) of the dependent variable.

Quantile regression at the median is particularly useful for understanding the central tendency of the dependent variable, especially in cases where the data might be skewed or have outliers. Pseudo R Squared, 0.478 value suggests that approximately 47.8% of the variability in the median of the dependent variable (’d2sdt2’) is explained by the model.

In quantile regression, the Pseudo R-squared provides a measure of the model’s explanatory power, though it does not have a direct analog to the R-squared in OLS regression. A value of 0.478 indicates a moderate level of explanatory power.

Mean Absolute Error (MAE), 0.3297: The MAE value of 0.3297 means that the average magnitude of the errors in the model’s predictions is 0.3297 units. This metric helps to understand the average error in predictions without considering the direction of the errors. A lower MAE is generally preferable, indicating more accurate predictions.

The comparison clearly shows that the full model with the predictors ’s’ and ’dsdt’ performs substantially better than the null model. This is evident both in terms of the model’s explanatory power (Pseudo R Squared) and its predictive accuracy (MAE):

  • Pseudo R Squared: The increase from 0.000 in the null model to 0.478 in the full model indicates a substantial improvement in the explanatory power of the model. A Pseudo R Squared of 0.478 suggests that approximately 47.8% of the variability in the median of ’d2sdt2’ is explained by the full model, whereas the null model explains none.

  • Mean Absolute Error (MAE): The decrease in MAE from 0.6312 to 0.3297 is significant. This indicates that the full model, with its predictors, is much more accurate in predicting the median of ’d2sdt2’ compared to the null model, which merely uses the median of the dependent variable for prediction.

Based on the information provided in Table 12, which includes parameter estimates for a statistical model, we can write down the equation for the dependent variable d2sdt2. This table presents the coefficients for an intercept, S and Dsdt, along with other statistical details:

d2Sedx2=0.0870.590×Se1.627×dSedx, (2)

where Se = Sentence.

Table 8 Covariances of parameter Estimates (q=0.5)a,b 

(Intercept) Word dword
(Intercept) 0.00033 0.00023 0.00011
Word 0.00023 0.00265 0.00132
Dword 0.00011 0.00132 0.00166

a: Dependet Variable: d2word.

b: Model (Intercept), word, dword.

Table 9 Correlations of parameter Estimates (q=0.5)a,b 

(Intercept) Word Dword
(Intercept) 1 0.247 0.156
Word 0.247 1 0.633
Dword 0.156 0.633 1

a: Dependet Variable: d2word.

b: Model (Intercept), word, dword.

Table 10 Observed Model Quality (q=0.5)a,b,c 

Pseudo R Squared 0.478
Mean Absolute Error (MAE) 0.3297

a: Dependet Variable: d2sdt2.

b: Model (Intercept), s,dsdt.

c: Method: simplex algorithm.

Table 11 Null Model Quality (q=0.5)a,b,c 

Pseudo R Squared .000
Mean Absolute Error (MAE) .6312

a: Dependet Variable: d2sdt2.

b: Model (Intercept).

c: Method: simplex algorithm.

Table 12 Parameter estimates (q=0.5)a,b 

95% Confidence Interval
Parameter Coefficient Std. Error T Df Sig. Lower Bound Upper Bound
(Intercept) -,087 ,0264 -3,306 674 ,001 -,139 -,035
S -,590 ,0718 -8,216 674 ,000 -,730 -,449
Dsdt -1,627 0629 -25,848 674 000 -1,750 -1,503

The values in the table represent the covariances between the estimates of the model parameters. Covariance is a measure of how much two random variables vary together.

In the context of regression coefficients, it provides insight into the relationship between the precision of the estimates of different parameters. (Intercept) Row and Column: The covariance of the intercept with itself (0.00070) is its variance. The covariances between the intercept and each of the predictors (’s’ and ’dsdt’) are 0.00072 and 0.00036, respectively.

These values indicate how the estimate of the intercept co-varies with the estimates of the other parameters. Row and Column: The variance of the ’s’ coefficient is 0.00515. Its covariance with ’dsdt’ is 0.00258. These values tell us how the estimate of ’s’ changes in relation to both the intercept and ’dsdt’.

dsdt Row and Column: The variance of the ’dsdt’ coefficient is 0.00396, and its covariance with the other parameters is indicated in the respective cells. The correlation of 0.571 between ’s’ and ’dsdt’ suggests there might be some level of multicollinearity.

However, this level of correlation is not extremely high, so it may not be severe enough to significantly distort your regression coefficients or their standard errors. It’s important to note that while moderate correlations can indicate potential multicollinearity, they don’t always warrant significant concern unless they’re very high (closer to 1 or -1) (Table 14).

Table 13 Covariances of parameter Estimates (q=0.5)a,b 

(Intercept) S dsdt
(Intercept) 0.00070 0.00072 0.00036
S 0.00072 0.00515 0.00258
Dsdt 0.00036 0.00258 0.00396

a: Dependet Variable.

b: Model (Intercept), s,dsdt.

Table 14 Covariances of parameter Estimates (q=0.5)a,b 

(Intercept) S dsdt
(Intercept) 1 0.378 0.215
S 0.378 1 0.571
Dsdt 0.215 0.571 0.1

a: Dependet Variable: d2sdt2.

b: Model (Intercept), s,dsdt.

3.2.3 The Differential Equations Modelling for the Paragraphs as the Main Unit of the Research

In the context of quantile regression, the provided data indicates a model assessing the median (50th percentile) of the dependent variable ’dp2dt2’, utilizing two predictors, ’p’ and ’dpdt’. The model’s fit is moderately good, as indicated by a Pseudo R Squared value of 0.464, meaning approximately 46.4% of the variation in the dependent variable is explained by the model.

The Mean Absolute Error (MAE) of 0.1498 suggests the predictions are reasonably accurate. The model employs the Simplex algorithm, a method commonly used for solving linear programming problems in optimization scenarios.

This approach provides a more nuanced understanding of the data compared to traditional regression methods, especially in terms of distribution tails (Table 15). Comparing the two models in the context of quantile regression, both aimed at predicting the median of ’dp2dt2’, reveals significant differences in their performance.

Table 15 Observed model quality (q=0.5)a,b 

Pseudo R Squared 0.464
Mean Absolute Error (MAE) 0.1498

a: Dependet Variable: d2sdt2.

b: Model (Intercept), p, dpdt.

c: Method: simplex algorithm.

The null model (Table 16), which only includes an intercept, shows no explanatory power (Pseudo R Squared of 0.000) and a higher Mean Absolute Error (MAE) of 0.2796, indicating less accurate predictions. In contrast, the last model, which includes two predictors, ’p’ and ’dpdt’, along with an intercept, shows considerable improvement.

Table 16 Null model quality (q=0.5)a,b 

Pseudo R Squared 0.000
Mean Absolute Error (MAE) 0.2796

a: Dependet Variable: d2sdt2.

b: Model (Intercept).

c: Method: simplex algorithm.

Its Pseudo R Squared value of 0.464 indicates it explains about 46.4% of the variation in the dependent variable, and its lower MAE of 0.1498 suggests more accurate predictions. Based on the provided table of parameter estimates for a statistical model, we can write down the equation for the dependent variable dp2dt2. The table lists the coefficients for an intercept, P, and Dpdt.

Along with their standard errors, t-values, degrees of freedom (df), significance levels (Sig.), and confidence intervals (Table 17). This table presents the coefficients for an intercept, p, and Dpdt, along with other statistical details:

d2Pardx2=0.303×Par1.485×dPardx, (3)

where par = Paragraph.

Table 17 Parameter estimates (q=0.5)a,b 

95% Confidence Interval
Parameter Coefficient Std Error t Df Sig. Lower Bound Upper Bound
(Intercept) 0.000 0.0445 0.000 51 1.000 -0.089 0.089
P -0.303 0.1429 2.120 51 0.039 -0.590 -0.016
Dpdt -1.485 0.1513 -9.812 51 0.000 -1.789 -1.181

The covariance matrix for the quantile regression model at the 0.5 quantiles, predicting ’dp2dt2’ with predictors ’p’ and ’dpdt’, provides insights into the relationships and variability of the parameter estimates. The diagonal elements show the variances of each parameter’s estimate, with values of 0.00198 for the Intercept, 0.02043 for ’p’, and 0.02290 for ’dpdt’, indicating the spread of each estimate.

The off-diagonal elements represent covariances between pairs of parameters, such as 0.00478 between the Intercept and ’p’, and 0.01100 between ’p’ and ’dpdt’. These covariances reveal how changes in one parameter estimate are associated with changes in another, with positive values indicating a tendency for the estimates to increase together.

This matrix is crucial for understanding the precision of estimates and identifying potential multicollinearity in the model. While correlation does not imply causation, high correlation coefficients (like 0.751 between the Intercept and ’p’) might hint at potential collinearity issues. Collinearity can make it difficult to discern the individual impact of predictors on the dependent variable, potentially leading to unreliable coefficient estimates.

The presence of significant correlations between parameters necessitates careful interpretation of the model coefficients (Table 19). There is an inherent relationship between a variable and its derivatives. The first derivative represents the rate of change of the variable, and the second derivative represents the rate of change of the first derivative. This natural linkage can lead to a high correlation among these predictors.

Table 18 Covariances of parameter estimates (q=0.5)a,b 

(Intercept) P dpdt
(Intercept) 0.00198 0.00478 0.00234
P 0.00478 0.02043 0.01100
Dpdt 0.00234 0.01100 0.02290

a: Dependet Variable: d2sdt2.

b: Model (Intercept), p, dpdt

Table 19 Correlation of parameter estimates (q=0.5)a,b 

(Intercept) P dpdt
(Intercept) 1 0.751 0.347
P 0.751 1 0.509
Dpdt 0.347 0.509 1

a: Dependet Variable: d2sdt2.

b: Model (Intercept), p, dpdt.

In quantile regression, like in other regression types, multicollinearity can affect the precision of the coefficient estimates. If the model’s primary goal is prediction and it shows good predictive performance (i.e., it accurately predicts the dependent variable ’dp2dt2’), then it may still be considered valid for that purpose, even with multicollinearity. We don’t present the solutions of the differential equations there since the primary interest lies in understanding the relationships and dynamics represented by the differential equation rather than in the specific solutions. The equation itself can reveal how different factors are related and how they influence the rate of change of a variable.

This is particularly relevant in sentiment analysis, where the rate of change of sentiment is more informative than the absolute sentiment value at a specific point. Moreover, differential equations provide a generalized model of a system’s behavior. The solutions, however, are often specific to initial conditions or particular parameters.

By presenting the equations, researchers can convey the general dynamics that apply across various scenarios rather than being tied to specific instances.

4 Discussion

The 1864 novella ”Notes from Underground” by Fyodor Dostoevsky introduces the Underground Man, a cynical recluse living in St. Petersburg. In the philosophical first half, he contends that human nature is irrational, making ideal societies impossible.

Overall, the sentiments in the ”Underground” section are dark, complex, and fraught with tension. They reflect a deep sense of disillusionment with both society and the self, as well as a profound existential despair. The second half follows a more conventional format.

The opening ”Underground” section establishes a gloomy, contemplative mood through the protagonist’s cynical monologues on society, reason, and the meaning of life. He grapples with complex ideas that lead to dark, nihilistic conclusions about human nature and the pursuit of happiness.

The tone reflects his mental agony and sense of estrangement. Both the beginning and the end of the ”Underground” section are negative, but the nature of this negativity shifts. The beginning is more confrontational and critical, actively challenging societal norms and intellectual trends.

The end, in contrast, is more resigned and reflective, focusing on the inescapable suffering and irrationality of the human condition. We showed that when we bin the cumulative sum of sentiments, we might uncover more complex structures and dynamics in the data that are not evident when examining the raw, ungrouped cumulative totals.

This can be particularly useful for detecting chaotic patterns and understanding the true sentiment dynamics within a dataset. Binned values can reveal a more dynamic and potentially chaotic structure due to several reasons:

  1. Smoothing Effect: Binning can smooth out short-term fluctuations in the data, making it easier to observe longer-term trends and patterns. This smoothing can sometimes reveal underlying structures that are obscured by noise in the raw data.

  2. Highlighting Extremes: By grouping data into bins, extreme values can have a more pronounced effect on the visual representation of the data. This can make the highs and lows of sentiment more evident, showing a more volatile or chaotic structure.

  3. Revealing Non-Linearity: When sentiment values are binned, non-linear trends may become more apparent. The raw cumulative sum might show a general trend up or down, but binned values could show cycles or patterns of sentiment that change direction or have variable intensity.

  4. Aggregating Variability: Binning combines the variability of individual values within each bin, which can highlight the range of sentiments within sections of the data. This variability can indicate a more chaotic sentiment structure, with rapid shifts from positive to negative or vice versa.

  5. Focus on Distribution: The binned cumulative sum shifts the focus from individual data points to the distribution of data within each bin. This can reveal a more complex sentiment structure that includes the frequency and intensity of sentiment scores.

Aggregating sentiment into cumulative sums makes sense to see overall trends and patterns over time rather than just individual data points. Analyzing binned cumulative sentiment can reveal hidden patterns, trends, and dynamics compared to looking at raw sentiment data.

The hypothesis that binning will uncover more complex dynamics and chaos that are hidden in the raw data is reasonable, as binning can help detect signals and patterns from noise. It summarizes the sentiment while still highlighting the complex, chaotic nature of how sentiment evolves. Overall, this technique of binning cumulative sentiment time series appears to uncover more structure and chaos in the data than may be apparent from only considering individual sentiment values.

The analysis provides insight into the dynamic nature of cumulative sentiment. In the second part of the study, we get three equations that is a linear second-order differential equations.

They describe how the respective functions (Word, Sentence, and Paragraph) change with respect to some variable x. The coefficients (-0.579, -1.603, etc.) modify the effect of the function and its derivatives in the equation. The presence of the first and second derivatives indicates the rate of change and the acceleration of change, respectively, for each level (word, sentence, and paragraph) with respect to x.

The equation for Word sentiments suggests that the acceleration of change in Word (represented by the second derivative) is influenced by both the current state of Word and its rate of change (first derivative).

The coefficient -0.579 affects the direct influence of Word, while -1.603 modifies the influence of its rate of change. The equation explains about 51% of the variation, indicating that approximately half of the changes in the Word data can be predicted or accounted for by this model:

d2dx2Word=0.579×Word1.603×ddxWord. (4)

In the equation of sentence sentiments, the change in Sentence is not only dependent on the Sentence itself and its rate of change but also includes a constant term (-0.087).

This constant could represent a baseline change independent of the current state or rate of change of the Sentence. This equation explains about 47.8% of the variation, meaning nearly half of the variability in the Sentence data can be explained by the model:

d2dx2=Se=0.0870.590×Se1.627×ddxSe. (5)

Similar to the Word equation, this one relates the acceleration of change in a Paragraph to its current state and rate of change, but with different coefficients. The fact that it explains about 46.4% of the variation indicates that less than half of the changes in the Paragraph data are accounted for by this model:

d2dx2Par=0.303×Par1.485×ddxPar. (6)

The percentages of variation explained (51%, 47.8%, and 46.4%) refer to how much of the change in each respective level (word, sentence, paragraph) can be predicted or explained by these models.

A higher percentage indicates a better fit of the model to the data, meaning the model is more effective at explaining the changes or variations in that particular level.

These percentages also imply that there are other factors or variables not captured by these models that contribute to the changes in Words, Sentences, and Paragraphs.

These could be external or more complex internal factors not accounted for in the linear model. Both the potential benefits and limitations of this approach for modeling differential equations can be given below:

4.1 Potential Benefits

  1. Understanding Dynamic Changes: Differential equations can model the rate and acceleration of sentiment changes over time or across different text segments. This could be particularly insightful in understanding how sentiments evolve in complex narratives or dialogues.

  2. Predictive Analysis: By modeling how sentiments change, researchers can potentially predict future sentiment trends based on current and past data. This could be valuable in applications like market analysis, social media monitoring, and interactive storytelling.

  3. Refining Chatbot Responses: For AI development, understanding the dynamics of sentiment can help in refining chatbot interactions, making them more sensitive and responsive to the emotional content of user inputs.

  4. Identifying Underlying Patterns: Differential equations might reveal underlying patterns in sentiment data that are not obvious from a simple analysis. This could lead to new insights into how sentiments are expressed and perceived in language.

4.2 Limitations and Challenges

  1. The complexity of Human Sentiments: Human emotions and sentiments are complex and often non-linear, making them difficult to accurately model with differential equations. Emotions can be influenced by a myriad of factors that are challenging to quantify.

  2. Data Quality and Variability: The accuracy of sentiment ratings from chatbots can vary, and the data might be noisy. This variability can make it difficult to derive meaningful differential equations that accurately represent sentiment dynamics.

  3. Over-Simplification: Reducing the rich and nuanced field of human emotions to a set of differential equations might oversimplify reality. Emotions are not just quantitative variables that can be easily modeled; they are deeply qualitative and context-dependent.

  4. Interdisciplinary Challenges: Effectively modeling sentiments with differential equations requires an interdisciplinary approach. This combining linguistics, psychology, mathematics, and computer science. This complexity can be a barrier to research. While finding differential equations from chatbot sentiment ratings is useful for analyzing and predicting sentiment trends, it also comes with significant challenges and limitations. It’s an approach that may yield valuable insights into certain applications, particularly in enhancing AI and natural language processing capabilities.

    • However, researchers should be cautious of oversimplifying the complexity of human emotions and be mindful of the limitations of the data and the models used.

5 Conclusion

This research aims to understand the dynamics of sentiment evolution in textual units ranging from individual words to expansive paragraphs. The study’s innovative approach to analyzing sentiment in text, especially in the context of complex literary works like Fyodor Dostoevsky’s ”Notes from Underground,” reveals significant insights into the capabilities of advanced chatbots like GPT-4.

By employing a method that bins the cumulative sum of sentiments, we uncover deeper, more intricate structures and dynamics in sentiment data, transcending the limitations of traditional raw cumulative analyses. This method is particularly valuable in understanding the nuanced, often chaotic sentiment landscapes in literature, where emotions and themes are richly layered and dynamically evolving.

The differential equations derived for word, sentence, and paragraph levels further enrich our understanding. They demonstrate how the rate and acceleration of sentiment change are influenced by their current state and rate of change.

With varying percentages of variation explained at each text level (51% for Word, 47.8% for Sentence, and 46.4% for Paragraph), these models effectively illustrate the complex, dynamic nature of sentiment evolution in literary texts.

Moreover, the fact that these models do not account for all variability suggests the presence of other factors influencing sentiment changes, possibly external influences or more intricate internal dynamics. This underscores the multifaceted nature of sentiment analysis, especially in complex narrative contexts. The analysis of modeling sentiment dynamics through differential equations reveals both potential benefits and limitations. On the one hand, differential equations can provide insights into predicting sentiment trends, understanding complex narrative arcs, and refining chatbot interactions.

The approach may uncover hidden patterns and lead to new discoveries about how sentiments are expressed in language. However, accurately quantifying and modeling human emotions through mathematical equations is extremely challenging.

Sentiments are qualitative, subjective, and dependent on nuanced contextual factors that cannot be easily captured in simplistic models. While differential equation modeling of chatbot sentiment ratings offers some utility, care must be taken not to oversimplify the richness of human emotions. Further interdisciplinary research is needed to develop more sophisticated techniques that address the complexity of sentiments and their dynamics in language.

In conclusion, this approach has merit but requires caution against oversimplification of emotions.In conclusion, this study’s findings are pivotal for enhancing the capabilities of AI-driven chatbots in sentiment analysis, particularly in dissecting and understanding the layered emotional landscapes of literary works. It demonstrates the potential of advanced analytical techniques in extracting deeper meaning from texts, a crucial step forward in the field of natural language processing and AI-driven literary analysis.

Acknowledgments

This work was supported by the Ministry of Education and Sciences of the Republic of Kazakhstan under the grant #AP14871214 “Development of machine learning methods to increase the coherence of text in summaries produced by the Extractive Summarization Methods.” The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Akhmetov, I., Gelbukh, A., Mussabayev, R. (2022). Topic-aware sentiment analysis of news articles. Computación y Sistemas, Vol. 26, No. 1, pp. 423–439. DOI: 10.13053/cys-26-1-4179. [ Links ]

2. Alexandridis, G., Michalakis, K., Aliprantis, J., Polydoras, P., Tsantilas, P., Caridakis, G. (2020). A deep learning approach to aspect-based sentiment prediction. Artificial Intelligence Applications and Innovations, pp. 397–408. DOI: 10.1007/978-3-030-49161-1_33. [ Links ]

3. Alpar, R. (2012). Uygulamalı İstatistik ve Geçerlik Güvenirlik. Detay yayıncılık. [ Links ]

4. Belsley, D. A., Kuh, E., Welsch, R. E. (1980). Regression diagnostics: Identifying Influential data and sources of collinearity. John Wiley & Sons. DOI: 10.1002/0471725153. [ Links ]

5. Birjali, M., Kasri, M., Beni-Hssane, A. (2021). A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl-Based Systems, Vol. 226, pp. 2–26. DOI: 10.1016/j.knosys.2021.107134. [ Links ]

6. Casillo, M., Clarizia, F., D’Aniello, G., De-Santo, M., Lombardi, M., Santaniello, D. (2020). Chat-bot: A cultural heritage aware teller-bot for supporting touristic experiences. Pattern Recognition Letters, Vol. 131, pp. 234–243. DOI: 10.1016/j.patrec.2020.01.003. [ Links ]

7. Chang, M., D’Aniello, G., Gaeta, M., Orciuoli, F., Sampson, D., Simonelli, C. (2020). Building ontology-driven tutoring models for intelligent tutoring systems using data mining. IEEE Access, Vol. 8, pp. 48151–48162. DOI: 10.1109/ACCESS.2020.2979281. [ Links ]

8. Chen, Z., Liu, Y., Sun, H. (2021). Physics-informed learning of governing equations from scarce data. Nat Commun, Vol. 12, No. 6136. DOI: 10.48550/arXiv.2005.03448. [ Links ]

9. Colace, F., de-Santo, M., Greco, L. (2014). Safe: a sentiment analysis framework for e-learning. International Journal of Emerging Technologies in Learning, Vol. 9, No. 6, pp. 37–41. DOI: 10.3991/ijet.v9i6.4110. [ Links ]

10. Collomb, A., Costea, C., Joyeux, D., Hasan, O., Brunie, L. (2014). A study and comparison of sentiment analysis methods for reputation evaluation. Rapport de Recherche RR-LIRIS-2014-002. [ Links ]

11. Dietz-Uhler, B., Hurn, E. J. (2013). Using learning analytics to predict (and improve) student success: A faculty perspective. Journal of Interactive Online Learning, Vol. 12, pp. 17–26. [ Links ]

12. Do, H. H., Prasad, P., Maag, A., Alsadoon, A. (2019). Deep learning for aspect-based sentiment analysis: A comparative review. Expert Systems with Applications, Vol. 118, pp. 272–299. DOI: 10.1016/j.eswa.2018.10.003. [ Links ]

13. Duran, V. (2022). Atatürk’ün “zabit ve kumandan ile hasb-i hâl” adlı eserinin eğitsel kavramlar açısından İncelenmesi duygu analizinin diferansiyel denklemler aracılığıyla modellenmesi. Doğumunun 141. Yılında Atatürk 2. Uluslararası Sempozyumu, pp. 48–72. [ Links ]

14. Duran, V. (2023). Modeling sentiment dynamics in terminator 3 subtitles using gpt-4 and differential equations based on fuzzy logic. 7th International Innovative Studies & Contemporary Scientific Research Congress. [ Links ]

15. D’Aniello, G., Gaeta, M., La Rocca, I. (2022). KnowMIS-ABSA: an overview and a reference model for applications of sentiment analysis and aspect-based sentiment analysis. Artificial Intelligence Review, Vol. 55, No. 3, pp. 5543–5574. DOI: 10.1007/s10462-021-10134-9. [ Links ]

16. Fan, D. P., Cook, R. D. (2003). A differential equation model for predicting public opinions and behaviors from persuasive information: Application to the index of consumer sentiment. The Journal of Mathematical Sociology, Vol. 27, No. 1, pp. 29–51. DOI: 10.1080/00222500305886. [ Links ]

17. Hemmatian, F., Sohrabi, M. K. (2019). A survey on classification techniques for opinion mining and sentiment analysis. Artificial Intelligence Review, Vol. 52, No. 3, pp. 1495–1545. DOI: 10.1007/s10462-017-9599-6. [ Links ]

18. Kelsingazin, Y., Akhmetov, I., Pak, A. (2021). Sentiment analysis of kaspi product reviews. 16th International Conference on Electronics Computer and Computation (ICECCO) Kaskelen, Kazakhstan, pp. 1–5. DOI: 10.1109/ICECCO53203.2021.9663854. [ Links ]

19. Leippold, M. (2023). Sentiment spin: Attacking financial sentiment with GPT-3. Finance Research Letters, Vol. 55, pp. 1–6. DOI: 10.1016/j.frl.2023.103957. [ Links ]

20. Liu, B. (2012). Sentiment analysis and opinion mining. Morgan Claypool Publishers, pp. 1–168. [ Links ]

21. Meškelė, D., Frasincar, F. (2020). ALDONAR: a hybrid solution for sentence-level aspect-based sentiment analysis using a lexicalized domain ontology and a regularized neural attention model. Information Processing & Management, Vol. 57, No. 3, pp. 1–9. DOI: 10.1016/j.ipm.2020.102211. [ Links ]

22. Misuraca, M., Forciniti, A., Scepi, G., Spano, M. (2020). Sentiment analysis for education with R: Potential benefits, methods and practical applications. DOI: 10.48550/arXiv.2005.12840. [ Links ]

23. Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., AL-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De-Clercq, O., Apidianaki, M., Tannier, X., Loukachevitch, N., Kotelnikov, E., Bel, N., Jiménez-Zafra, S. M., Eryiğit, G. (2016). SemEval-2016 task 5: aspect based sentiment analysis. Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), Association for Computational Linguistics, San Diego, pp. 19–30. [ Links ]

24. Schmidt, M. D., Lipson, H. (2009). Distilling free-form natural laws from experimental data. Science, Vol. 324, pp. 81–85. DOI: 10.1126/science.1165893. [ Links ]

25. Schouten, K., Frasincar, F. (2018). Ontology-driven sentiment analysis of product and service aspects. The Semantic Web, pp. 608–623. DOI: 10.1007978-3-319-93417-4_39. [ Links ]

26. Wankhade, M., Rao, A. C. S., Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artif Intell Rev, Vol. 55, pp. 5731–5780. DOI: 10.1007/s10462-022-10144-1. [ Links ]

27. Xing, F. Z., Cambria, E., Welsch, R. E. (2018). Natural language based financial forecasting: a survey. Artificial Intelligence Review, Vol. 50, No. 1, pp. 49–73. DOI: 10.1007/s10462-017-9588-9. [ Links ]

28. Yadav, A., Vishwakarma, D. K. (2020). Sentiment analysis using deep learning architectures: a review. Artificial Intelligence Review, Vol. 53, No. 6, pp. 4335–4385. DOI: 10.1007/s10462-019-09794-5. [ Links ]

29. Zhang, L., Wang, S., Liu, B. (2018). Deep learning for sentiment analysis: a survey. Wiley Interdisciplinary Review, Vol. 8, No. 4, pp. 1253. DOI: 10.48550/arXiv.1801.07883. [ Links ]

Received: November 21, 2023; Accepted: January 22, 2024

* Corresponding author: Alexander Gelbukh, e-mail: gelbukh@cic.ipn.mx

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License