This section describes the experimental flow. First, the data for the experiment was collected. Subsequently, preprocessing was performed to remove irrelevant textual data. Third, technical indicators were derived from the S&P 500 dataset, with sentiment scores generated from ESG-related news data. After combining the processed data, the scaled data was fitted as input to the deep learning models to predict future prices. Finally, MAPE was used as an evaluation measure for regression performance. In addition, ablation tests were performed to evaluate the effectiveness of each input feature. The experimental procedure is shown in Figure 1.

Flow chart for predicting the S&P 500 index.
Data collection
The S&P 500 Index is used to understand and monitor general stock market trends and is considered one of the indicators representing the health of the US financial markets.26. The S&P 500 represents an index of 500 large US companies, reflecting the movements of the entire market and not the stock prices of individual companies. In addition, the S&P 500 includes companies from a variety of sectors and industries. Therefore, building a stock price prediction model that includes data from multiple industries is equivalent to designing a generalized model with versatility. Additionally, while individual company stocks must also consider the influence of internal factors, the S&P 500 is influenced by overall market perception.27. Consequently, building an improved stock price prediction model by integrating ESG information and the S&P 500 can underscore the importance and impact of sustainability information across the market for relevant investors and researchers.
The experiments were conducted by gathering two data sets spanning from January 1, 2016 to July 31, 2023. Through LexisNexis, the authors accessed and collected a collection of 14,049 news articles using the search term “ESG” . Access to the LexisNexis database may require a paid subscription, such as institutional access. In addition, historical data on the S&P 500 index, which contains information such as date, closing value, opening value, high value, low value, trading volume and volatility, for the same periods was obtained from investing.com.
Characteristics engineering
Based on previous research, the authors obtained several technical indicators that have been shown to affect stock prices using the TA-lib module.28,29. The chosen functions were open price, close price, high price, low price, trading volume, RSI, SMA_5, SMA_20, EMA, MACD, signal, stochastic RSI_fastk, stochastic RSI_fastd, stochastic oscillator Index_slowk, stochastic oscillator Index_slowd, stochastic oscillator WilliamR_index. , Momentum and ROC. Detailed descriptions of these technical indicators are provided below.
The opening price is the price of a share at the beginning of a trading session and indicates the first transaction made for the day. High prices represent the highest value of a stock trade in a specific trading period, while low prices mean the lowest. Trading volume, which reflects market activity, is the number of shares or contracts traded during a specific period.
The RSI is a momentum oscillator that measures the speed and change in price movements and helps identify overbought or oversold conditions. SMAs are average closing prices over a given number of periods. For example, SMA_5 and SMA_20 represent the 5-day and 20-day moving averages, respectively. The EMA responds better to recent price changes by assigning more weight to them30.
MACD is a trend-following momentum indicator illustrating the interaction between two moving averages of a security’s price. Signal lines, i.e. moving averages derived from MACD lines, play an important role in generating valuable buy and sell signals for traders and investors.31.
Stochastic RSI_fastk and Stochastic RSI_fastd calculated based on both RSI and Stochastic Oscillator effectively catch potential price reversal points and improve prediction accuracy32. To ensure smoothness, the indices_slowk stochastic oscillator and the indices_slowd stochastic oscillator were considered supplementary components of the stochastic oscillator.
Another integral aspect of the analysis was William’s %R, commonly known as Williams R. This momentum indicator assesses whether market conditions indicate overbought or oversold scenarios, thus contributing to a comprehensive understanding of market sentiment.33.
The next indicator used is momentum. The concept of momentum can be used to measure the rate of price change. Momentum provides price rate of change information by quantifying the rate of change in stock prices. Finally, ROC, a similar metric to momentum, involves calculating price changes over a specific period, providing information about the extent of price fluctuations.34.
Sentiment Index Calculation Using Financial Representations of Transformer Bidirectional Encoder (FinBERT)
Preprocessing was performed, including the removal of nonwords and lemmatization of the news data, followed by sentiment analysis using FinBERT. FinBERT is built on the BERT architecture, which is an effective language model for natural language processing and understanding through bidirectional context-aware text encoding.35. FinBERT specializes in domain knowledge by retraining BERT’s pre-trained model with financial data. FinBERT takes financial-related texts such as financial news, reports and web publications as input, and analyzes and predicts the sentiment of the text, classifying it as positive, negative or neutral.
Data scores were labeled as 0 for negative feelings and 1 for positive feelings (Equation (1)). Referring to a study by Wu et al.36sentiment measurements were calculated as the difference between the number of negative and positive posts in a specific data set.
$$Sentiment\, score=\frac{{M}_{tpos}-{M}_{tneg}}{{M}_{tpos}+{M}_{tneg}}$$
(1)
where \({M}_{tpos}\) represents the number of positive news e \({M}_{tneg}\) represents the number of negative items on day t. The range of values for the sentiment index was between -1 and 125. If the value of the sentiment index approaches -1, it suggests a negative tone in the news for that date. Conversely, if it approaches 1, it indicates an overall positive tone in the news. Before using the selected features as input to the framework, a min-max scaler was applied to standardize the range of these values between 0 and 1.
Window size
Subsequently, several datasets are generated, each corresponding to a different hyperparameter window. Window size is a key concept in stock price forecasting to process and predict time series data37,38. The window size defines a fixed unit period, with data from this window used to predict future stock prices. Therefore, the selection of an appropriate window size is critical to improving the performance of stock price prediction models. In this study, experiments were conducted using three window sizes: 3, 4, and 5 (Fig. 2). Finally, the training and test datasets were split in a ratio of 8:2. The validation dataset comprises 20% of the training dataset.

Illustration of window size.
Deep learning models
Bidirectional Recurrent Neural Networks (Bi-RNNs) are a type of recurrent neural network capable of considering both previous and subsequent contexts of a sequence. This bidirectional feature allows them to capture patterns in different temporal directions39. Also, since short-term factors can influence the fluctuation of stock prices, the RNN structure with recurrent layers is adept at capturing these changes, making it suitable for application as a time series model. In addition, Bi-RNN has a flexible structure that can be applied to various types of time series data, making it useful for pattern processing. In contrast, bidirectional long-term memory (Bi-LSTM) networks represent an improved iteration of RNNs that incorporate LSTM cells40. They excel at learning long-term dependencies and are particularly effective at tasks involving sequential data, such as time series prediction.41.
#Stock #Market #Forecasting #Based #Deep #Learning #Incorporating #ESG #Sentiment #Technical #Indicators #Scientific #Reports