Integrating event information and multi dimensional relationships for improved financial time series forecasting
Dataset and experimental setup
Research subject selection and data sources
This study selects Amazon, Inc. (NASDAQ: AMZN) stock as the primary research subject, based on several important considerations: First, as a global leading technology giant, Amazon’s stock price is subject to complex influences from multiple factors, including changes in company fundamentals, industry technological developments, macroeconomic policies, and market sentiment fluctuations, providing an ideal testing environment for validating our multimodal fusion method; Second, Amazon enjoys extremely high attention in capital markets, with abundant news reports, analyst research reports, and investor discussions, providing sufficient raw materials for constructing high-quality event datasets; Finally, as an important constituent stock of the NASDAQ market, Amazon has complex correlations with other technology stocks, consumer stocks, etc., helping to validate the effectiveness of our multi-dimensional relationship-aware module.
We constructed a comprehensive multimodal dataset spanning from January 1, 2010, to May 31, 2025, totaling over 15 years of complete market data. This time span covers multiple important market cycles, including the technology stock recovery period of 2010-2015, the technology stock boom period of 2016-2020, the pandemic shock and recovery period of 2020-2022, and the interest rate hike cycle and artificial intelligence boom period of 2023-2025, ensuring the representativeness of the dataset and the generalization capability of the model.
Data sources cover multiple authoritative financial data providers: quantitative trading data is mainly obtained from Yahoo Finance API and Refinitiv Eikon terminals, including daily open, high, low, close prices, volume, and adjusted prices; news and announcement data is collected in real-time through Bloomberg Terminal News Feed, supplemented by relevant reports from mainstream financial media such as Reuters and Wall Street Journal; fundamental data is extracted from Refinitiv and S&P Capital IQ databases, covering financial statements, valuation indicators, and industry classification information.
Cross-asset relationship universe construction
While Amazon (AMZN) serves as our primary prediction target, the multi-dimensional relationship-aware module requires a comprehensive asset universe to construct meaningful N\(\times\)N relationship matrices. We carefully selected a universe of 50 large-cap technology and related stocks to ensure robust relationship modeling while maintaining computational feasibility.
Asset Selection Criteria:
-
Market capitalization > $50 billion as of December 2020
-
Primary listing on NASDAQ or NYSE
-
Technology, consumer discretionary, or communication services sectors (GICS classification)
-
Complete daily trading data availability from 2010–2025
-
Minimum average daily trading volume > 1 million shares
Final Asset Universe (N=50):
Core Technology: AAPL, MSFT, GOOGL, META, NFLX, NVDA, ORCL, CRM, ADBE, INTC, AMD, CSCO, AVGO, QCOM, TXN
E-commerce & Consumer: AMZN, EBAY, SHOP, ETSY, WMT, TGT, COST, HD, NKE
Cloud & Software: SNOW, PLTR, ZM, WORK, DDOG, NET, OKTA, MDB, CRWD
Communication: VZ, T, TMUS, CHTR, CMCSA, DIS, NFLX
Related Technology: TSLA, UBER, LYFT, SQ, PYPL, V, MA
Temporal Data Handling:
All 50 assets follow identical temporal splits:
-
Training period: 2010–2020 (all assets)
-
Validation period: 2021–2022 (all assets)
-
Testing period: 2023–2025 (all assets)
-
Missing data: Forward-fill for gaps \(\le\) 3 days, exclude assets with >5% missing data in any period
-
Delisting handling: Assets delisted during study period are excluded from the relationship matrices after delisting date
Relationship Matrix Computation:
For frequency-domain and fundamental relationships, we compute pairwise similarities across all 50 assets, resulting in symmetric 50\(\times\)50 matrices. For knowledge graph relationships, we manually verified and annotated business relationships among these 50 entities using SEC 10-K filings, partnership announcements, and industry databases. The Amazon prediction task extracts the relevant row/column from these matrices corresponding to AMZN’s relationships with the other 49 assets.
This design ensures that AMZN’s relationship modeling benefits from a rich, diverse set of comparable assets while maintaining strict temporal discipline across the entire universe.
Knowledge Graph Construction and Domain Mapping Knowledge graph data construction involves a systematic process of mapping general knowledge bases to financial domain entities. We established comprehensive mappings through the following multi-step procedure:
Step 1: Entity Identification and Mapping We constructed a mapping table linking general knowledge entities to financial assets through multiple identification mechanisms:
-
Primary Mapping: Direct ticker symbol matching (e.g., DBpedia:Amazon.com \(\rightarrow\) NASDAQ:AMZN)
-
Secondary Mapping: Company name disambiguation using ISIN codes and official registrations
-
Tertiary Mapping: Cross-reference validation through multiple knowledge bases (DBpedia, Wikidata, OpenCorporates)
Step 2: Relationship Extraction and Validation From the mapped entities, we extract structured relationships using SPARQL queries and natural language processing:
-
Direct Business Relations: Partnership agreements, joint ventures, strategic alliances extracted from SEC filings and corporate announcements
-
Ownership Structures: Parent-subsidiary relationships from corporate registrations and 10-K filings
-
Supply Chain Networks: Customer-supplier relationships from business dependency disclosures
-
Industry Classifications: GICS sector mappings validated against multiple classification systems (SIC, NAICS, ICB)
Step 3: Relationship Strength Quantification We quantify relationship strengths through empirical analysis of historical market co-movements and fundamental connections, as detailed in Table 6.
Step 4: Dynamic Relationship Updates The knowledge graph undergoes quarterly updates to reflect corporate actions and market structure changes:
-
Corporate Actions: M&A announcements, spin-offs, and restructuring events
-
Partnership Changes: New strategic alliances and partnership dissolutions
-
Supply Chain Evolution: Customer/supplier relationship updates from annual reports
-
Regulatory Reclassifications: GICS sector changes and industry standard updates
This systematic approach ensures that our knowledge graph accurately reflects the evolving landscape of financial market relationships while maintaining high data quality and validation standards. The mapping process achieved 96.1% validation rate across all entity types, providing a robust foundation for relationship-aware modeling.
Dataset feature analysis
To comprehensively understand the market behavior characteristics of Amazon stock and provide a basis for model design, we first conducted a detailed analysis of the price time series.

Amazon stock price time series (2010-2025).
As shown in Fig. 2, Amazon stock exhibited typical growth stock characteristics during the study period, with an overall price trend showing strong upward momentum. Several important phase characteristics can be clearly observed from Fig. 2: During 2010-2012, stock prices fluctuated at relatively low levels, with the price range mainly between \(\$5-25\), reflecting the company’s early market positioning and investor expectations; During 2013-2017, stock prices began to rise significantly, climbing from about \(\$25\) to nearly \(\$100\), corresponding to the rapid development of Amazon Web Services (AWS) and global expansion of e-commerce business; During 2018-2021, stock prices experienced more dramatic volatility, reaching a high of about \(\$180\) during the 2020 pandemic, followed by a significant pullback in 2022 with lows around \(\$80\), reflecting market reassessment of technology stock valuations; Since 2023, stock prices have rebounded strongly again, creating new highs in 2024 exceeding \(\$240\), mainly benefiting from artificial intelligence technology breakthroughs and optimistic market expectations for AI application prospects.
To further understand market participation and liquidity characteristics, we analyzed the trading volume change patterns during the corresponding period, which is important for assessing the credibility of price fluctuations and market depth.

Amazon stock trading volume time series (2010-2025).
Figure 3 shows the trading volume change patterns during the corresponding period, providing important supplementary information for understanding market participation and price volatility. Several significant characteristics can be observed from the trading volume data in Fig. 3: First, trading volume typically shows significant amplification during major events, particularly during the initial outbreak of the pandemic in March 2020 and the market adjustment period in the second half of 2022, with daily trading volumes often exceeding 600–800 million shares, far above the normal level of 200–400 million shares; Second, the seasonal patterns of trading volume are relatively stable, usually increasing during earnings report seasons (January, April, July, October each year) and year-end/year-beginning periods; Finally, overall trading volume levels have increased in recent years with the proliferation of algorithmic trading and high-frequency trading, reflecting the evolution of market microstructure.
Considering the important role of technical analysis in financial prediction, we need to examine price performance relative to key technical indicators to identify important trend signals and support/resistance levels, which will provide valuable learning targets for our temporal pattern extractor.

Amazon stock price vs. moving averages comparison.
Figure 4 further demonstrates the relationship between price and technical indicators. By comparing closing prices with 50-day and 200-day moving averages, we can identify important trend reversal points and technical signals. Fig. 4 clearly shows several important technical breakthroughs: the upward breakthrough of 50-day and 200-day moving averages in early 2013, marking the establishment of a long-term uptrend; two technical adjustments in late 2018 and early 2022, where stock prices fell below short-term moving averages but ultimately received support from long-term trend lines; the strong breakthrough in the second half of 2023, where stock prices regained footing above all major moving averages. These technical characteristics provide rich learning samples for our temporal pattern extractor.
Finally, to understand the statistical distribution characteristics of returns and validate the necessity of extreme event handling in our model design, we analyzed the probability distribution characteristics of daily returns.

Amazon stock daily returns distribution histogram.
The daily returns distribution histogram in Fig. 5 reveals the statistical characteristics of Amazon stock returns, showing typical “fat-tailed” distribution characteristics of financial time series. From Fig. 5, it can be seen that the distribution center is close to zero, consistent with efficient market hypothesis expectations, but compared to standard normal distribution, this distribution shows obvious excess kurtosis and thicker tails. Specifically, about 68% of daily returns are concentrated in the −2% to +2% range, about 95% of observations are between −6% to +6%, but there are still considerable extreme values exceeding the expected range of normal distribution. This distribution characteristic indicates that financial markets have relatively frequent extreme events, validating the necessity of considering event-driven factors in our model design.
Data preprocessing and partitioning strategy
To ensure experimental rigor and result reliability, we adopted a strict time series data partitioning strategy to avoid any form of future information leakage. The specific partitioning scheme is: the training set covers January 1, 2010, to December 31, 2020, totaling 11 years of historical data, providing sufficient learning samples for the model; the validation set is set from January 1, 2021, to December 31, 2022, totaling 2 years of data, used for hyperparameter tuning and model selection; the test set includes January 1, 2023, to May 31, 2025, approximately 2.5 years of the latest data, used to evaluate model generalization performance in actual application scenarios.
This partitioning design has important practical significance: the training set time span is sufficiently long, covering multiple complete market cycles, ensuring the model can learn patterns under various market conditions; the validation set corresponds to the post-pandemic market environment, including new market characteristics such as monetary policy shifts and rising inflation pressures, helping the model adapt to environmental changes; the test set is entirely in the period after model training, including unprecedented market events such as the latest interest rate hike cycle and AI technology breakthroughs, providing a severe test for evaluating the model’s true predictive capability.
In the data preprocessing stage, we applied corresponding standardization methods for different types of data: price data was transformed through log returns to eliminate the effect of price levels; fundamental data used rolling Z-score standardization, maintaining temporal characteristics while eliminating dimensional differences; event data was encoded into fixed-dimension vector representations through pre-trained BERT models; relationship data was normalized to ensure comparability across different relationship types. These preprocessing steps laid a solid foundation for subsequent model training and evaluation.
Baseline model comparison
Baseline model selection and configuration
To comprehensively evaluate the performance advantages of DAFF-Net, we carefully selected eight representative baseline models that cover the complete spectrum from traditional statistical methods to the latest deep learning techniques, providing thorough validation of our method’s effectiveness and advancement.
Traditional statistical models: We selected ARIMA (Autoregressive Integrated Moving Average)33 as the representative of classical time series prediction methods. ARIMA has good theoretical foundations and interpretability, is widely applied in financial time series prediction, and provides an important benchmark for evaluating the improvement magnitude of deep learning methods.
Classical deep learning models: LSTM (Long Short-Term Memory)7 as a representative of recurrent neural networks, can effectively handle long sequence dependencies and is a classical method in the time series prediction domain. TCN (Temporal Convolutional Network)34 adopts causal convolution design, has advantages in parallel training and good long-term dependency modeling capability, representing the application of convolutional neural networks in time series prediction.
Attention mechanism models: Transformer9 as the pioneering work of attention mechanisms, achieves direct modeling of relationships between arbitrary positions in sequences through global self-attention mechanisms. Informer35 introduces sparse attention mechanisms based on Transformer, specifically optimized for long sequence prediction tasks, improving computational efficiency.
Multimodal and relationship modeling methods: DUET36 as the direct foundation of our work, adopts dual clustering mechanisms to model in temporal and channel dimensions respectively, representing advanced methods in current time series prediction. MM-LSTM (Multimodal LSTM)37 fuses multiple data sources into the LSTM framework, providing baseline reference for multimodal time series prediction.
Simple decision-focused baselines: To establish fundamental performance benchmarks, we include two essential naive baselines that represent the most basic prediction strategies. The No-Change baseline assumes that stock returns will be zero for all prediction horizons, representing the null hypothesis that prices follow a random walk without predictable drift. The Market Index baseline predicts individual stock returns based on the historical beta relationship with the NASDAQ-100 index:
$$\begin{aligned} {\hat{r}}_{AMZN,t+h} = \beta _{AMZN} \times r_{NASDAQ,t+h} \end{aligned}$$
(45)
where \(\beta _{AMZN}\) is estimated using a 252-day rolling window of historical returns prior to the prediction date. This baseline represents a simple factor model approach commonly used in practical finance. These fundamental baselines are essential for validating that sophisticated models provide meaningful improvements over the most basic prediction strategies.
All baseline models adopt the same data partitioning strategy and evaluation metrics, ensuring experimental fairness. Hyperparameters were optimized through grid search on the validation set, with each model achieving its optimal configuration.
Evaluation metrics and experimental setup
To comprehensively evaluate model prediction performance across different time spans, we adopted two complementary evaluation metrics. Mean Squared Error (MSE) as the primary metric can effectively measure the deviation between predicted and true values, is sensitive to outliers, and meets the needs of financial risk management. Coefficient of determination (\(\text {R}^{2}\)) as an auxiliary metric reflects the model’s ability to explain data variability and facilitates understanding of relative model performance.
We set three prediction time spans: 1-day, 5-day, and 20-day, corresponding to the practical needs of short-term trading, medium-term investment, and long-term allocation respectively. This multi-horizon setup can comprehensively evaluate model generalization capability under different prediction difficulties.
To ensure result reliability, all experiments were conducted in the same hardware environment using the same random seeds and training strategies. Each model was fully trained until convergence and evaluated on the test set for final assessment.
Quantitative results analysis
Several important trends can be observed from the quantitative results in Table 7:
Overall performance ranking: DAFF-Net achieved the best performance across all evaluation metrics, validating the effectiveness of our event-driven and multi-dimensional relationship fusion approach. Multimodal methods (MM-LSTM and DAFF-Net) significantly outperformed unimodal methods, demonstrating the importance of multi-source information fusion. Among unimodal methods, DUET as the most advanced baseline performed best, proving the value of the dual clustering concept.
Prediction difficulty analysis: The performance of all models declined as the prediction time span increased, which aligns with general rules of financial prediction. Interestingly, DAFF-Net’s relative advantage is more pronounced in long-term prediction (20-day), with \(\text {R}^{2}\) reaching 0.51, while the strongest baseline MM-LSTM only achieved 0.45, representing a relative improvement of 13.3%. This indicates that our method has significant advantages in handling complex long-term dependencies.
Method type comparison: Traditional statistical method ARIMA performed worst across all metrics, reflecting the limitations of linear methods in handling complex financial data. Deep learning methods generally outperformed statistical methods, with attention mechanism methods (Transformer, Informer) outperforming traditional recurrent and convolutional methods, demonstrating the advantages of attention mechanisms in capturing long-term dependencies.
Relative improvement magnitude: Compared to the strongest unimodal baseline DUET, DAFF-Net reduced MSE by 15.2% (from 0.0171 to 0.0145) and improved \(\text {R}^{2}\) by 15.5% (from 0.58 to 0.67) in 1-day prediction. Compared to the strongest multimodal baseline MM-LSTM, DAFF-Net reduced MSE by 10.5% and improved \(\text {R}^{2}\) by 9.8% in 1-day prediction, proving our technical innovation in multimodal fusion.
Fundamental baseline validation: The inclusion of naive baselines provides essential context for evaluating model sophistication. The No-Change baseline achieved \(\text {R}^{2}\) scores of 0.17, 0.12, and 0.08 across the three prediction horizons, representing the null hypothesis performance. The Market Index baseline, utilizing beta-adjusted market returns, improved to \(\text {R}^{2}\) scores of 0.21, 0.18, and 0.13, demonstrating the value of incorporating systematic market factors. Notably, even the traditional ARIMA model substantially outperformed these fundamental baselines, validating the necessity of time series modeling techniques. DAFF-Net’s performance represents a 3.2\(\times\) improvement over the No-Change baseline and a 3.9\(\times\) improvement over the Market Index baseline in 20-day prediction \(\text {R}^{2}\), confirming that our sophisticated multimodal approach provides meaningful predictive value beyond the most basic strategies.
These quantitative results fully demonstrate the superiority of DAFF-Net in financial time series prediction tasks, particularly showcasing unique advantages in handling event-driven market dynamics and complex asset correlation relationships.
Results analysis
Overall performance comparison analysis
To more intuitively demonstrate the performance advantages of DAFF-Net relative to various baseline models and deeply analyze the performance differences of different models on multi-time horizon prediction tasks, we conducted comprehensive analysis of experimental results through various visualization methods.
First, to clearly compare the prediction errors of all models across different prediction time horizons, we used bar charts to display the comparison results of MSE metrics.

MSE comparison bar chart for different models.
Several important characteristics can be clearly observed from Fig. 6: First, DAFF-Net achieved the lowest MSE values across all three time horizons, where the yellow bars (1-day MSE), orange bars (5-day MSE), and red bars (20-day MSE) are all significantly lower than all other baseline models, validating the superiority of our method. The inclusion of fundamental baselines (No-Change and Market Index) provides essential context for evaluating model sophistication. These naive baselines achieved MSE values of 0.0312–0.0351 and 0.0298–0.0342 respectively, representing the most basic prediction strategies. Even the traditional ARIMA model substantially outperformed these fundamental approaches, validating the necessity of sophisticated modeling techniques. Second, the MSE values of all models increased significantly with increasing prediction time horizons, which aligns with general rules of time series prediction—long-term prediction is more challenging than short-term prediction. Particularly noteworthy is that the traditional ARIMA model achieved an MSE of 0.0348 for long-term prediction (20-day), almost 1.5 times that of DAFF-Net (0.0228), highlighting the advantages of deep learning methods in handling complex nonlinear relationships. The gaps between deep learning models are more pronounced in long-term prediction, indicating that the importance of architectural design becomes increasingly prominent as prediction difficulty increases.
Next, to analyze the trend of each model’s explanatory capability with changing prediction time horizons, we displayed the variation patterns of R2 metrics across different prediction horizons through line charts.

R2 score variation trends for different models.
Figure 7 reveals the intrinsic patterns of model performance variation with prediction time horizons. From the figure, it can be observed that all models’ R2 scores show a declining trend with increasing prediction time horizons, but the decline magnitudes differ significantly. The fundamental baselines (No-Change and Market Index, represented by the bottom two lines) demonstrate the poorest performance, with R2 scores declining from 0.17 and 0.21 respectively in 1-day prediction to merely 0.08 and 0.13 in 20-day prediction. DAFF-Net (red line) maintained the highest R2 scores across all time horizons, particularly achieving an excellent performance of 0.67 in 1-day prediction, significantly surpassing all other methods. The performance gaps between different model categories become increasingly evident as prediction horizons extend: fundamental baselines plateau at very low R2 values, traditional statistical methods (ARIMA) show steep decline from 0.41 to 0.22, while sophisticated deep learning methods maintain more gradual degradation. DAFF-Net’s decline is remarkably stable, from 0.67 to 0.51, demonstrating superior stability and generalization capability. Notably, the spread between the best and worst performing models expands from 0.50 in 1-day prediction (0.67–0.17) to 0.43 in 20-day prediction (0.51–0.08), indicating that advanced modeling becomes even more critical for longer-horizon forecasting.
Finally, to comprehensively evaluate the performance differences between DAFF-Net and major competitors from multiple dimensions, we selected the three best-performing models for radar chart comparison analysis.

Comprehensive performance radar chart of DAFF-Net vs. baseline models.
Figure 8 displays the performance of the three best models—DAFF-Net, DUET, and MM-LSTM—across six key dimensions through radar chart format. The six axes of the radar chart correspond to MSE-1d, MSE-5d, MSE-20d, R2−1d, R2−5d, and R2−20d respectively, where MSE metrics are transformed by reciprocal to maintain consistent directionality with R2 metrics (larger values indicate better performance). From the chart, it is evident that DAFF-Net (red line) forms the largest coverage area, outperforming or at least equaling the other two baseline models across all six dimensions. Particularly noteworthy is that DAFF-Net’s advantages are most significant in dimensions related to long-term prediction (MSE-20d and R2−20d), which again confirms the effectiveness of our event-driven and multi-dimensional relationship fusion mechanisms in handling complex long-term dependencies. DUET (yellow line), as the direct foundation of our work, is surpassed by DAFF-Net in all dimensions, validating the value of our technical improvements. MM-LSTM (orange line), although also adopting multimodal fusion strategies, still lags behind DAFF-Net in performance, indicating that our innovations in fusion mechanism design have substantial improvement effects.
Performance improvement quantitative analysis
Based on the results of the above visualization analysis, we further conducted precise quantitative evaluation of DAFF-Net’s performance improvements. Fundamental baseline validation demonstrates the substantial value of sophisticated modeling: DAFF-Net’s performance represents a 3.9\(\times\) improvement over the No-Change baseline and a 3.9\(\times\) improvement over the Market Index baseline in 20-day prediction R2 (0.51 vs 0.13), confirming that our multimodal approach provides meaningful predictive value beyond the most basic strategies.
Compared to the strongest unimodal baseline DUET, DAFF-Net achieved significant improvements in 1-day prediction tasks with MSE reduction of 15.2% (from 0.0171 to 0.0145) and R2 improvement of 15.5% (from 0.58 to 0.67). In the more challenging 20-day prediction task, the improvement magnitude is even more prominent, with MSE reduction of 12.3% (from 0.0260 to 0.0228) and R2 improvement of 21.4% (from 0.42 to 0.51).
Compared to the strongest multimodal baseline MM-LSTM, DAFF-Net also demonstrated stable performance advantages. In 1-day prediction tasks, MSE decreased by 10.5% and R2 improved by 9.8%; in 5-day prediction tasks, MSE decreased by 7.4% and R2 improved by 7.0%; in 20-day prediction tasks, MSE decreased by 9.2% and R2 improved by 13.3%. These quantitative results fully demonstrate the effectiveness of our proposed event-driven temporal pattern extraction and multi-dimensional relationship-aware fusion mechanisms.
The hierarchical performance structure revealed by our comprehensive baseline comparison validates the progressive value of modeling sophistication: fundamental baselines (No-Change, Market Index) establish the baseline performance floor; traditional econometric methods (ARIMA) provide moderate improvements; deep learning approaches (LSTM, TCN, Transformer) achieve substantial gains; and multimodal fusion methods (MM-LSTM, DAFF-Net) deliver the highest performance. Crucially, DAFF-Net’s 13.3% improvement over the strongest multimodal baseline MM-LSTM in 20-day prediction demonstrates that sophisticated fusion mechanisms provide meaningful benefits even among advanced multimodal approaches.
Particularly worth emphasizing is that DAFF-Net’s relative advantages are more pronounced in long-term prediction, which has important significance for practical financial applications. In quantitative investment and risk management, the ability to accurately predict market trends over longer time horizons is often more valuable than short-term prediction, as it provides more sufficient time windows and more stable strategic foundations for investment decisions. Our experimental results indicate that through effective integration of event information and multi-dimensional relationship information, DAFF-Net can better capture long-term patterns in financial markets, thus having greater potential value in practical applications.
Ablation study analysis
To validate the individual contributions of DAFF-Net’s core technical innovations, we conduct comprehensive ablation studies by systematically removing or replacing key components. This analysis demonstrates the necessity and effectiveness of each proposed module.
Component ablation design
We design six ablation variants to isolate the contribution of each major component:
-
DAFF-Net-NoEvent: Remove the entire event-driven temporal pattern extractor, using only price time series data
-
DAFF-Net-NoRelation: Remove the multi-dimensional relationship-aware module, using only temporal features
-
DAFF-Net-SingleRel: Replace multi-dimensional relationships with frequency-domain relationships only
-
DAFF-Net-HardCluster: Replace soft clustering with hard k-means clustering (k=8)
-
DAFF-Net-NoFusion: Remove contextual factor fusion, using simple concatenation instead
-
DAFF-Net-NoRouter: Replace event-aware router with simple concatenation of events and time series
Each ablation variant maintains the same training procedure and hyperparameter settings to ensure fair comparison.
Ablation results and analysis
Table 8 presents the performance degradation when each component is removed or simplified.
Component contribution analysis
The ablation results reveal several important insights about the relative importance of different components:
Event-Driven Module (Highest Impact): Removing the event-driven temporal pattern extractor causes the largest performance degradation (10.1% to 15.9% across metrics), confirming that event information provides crucial predictive signals that pure price-based models cannot capture. The impact is particularly pronounced for longer prediction horizons, supporting our hypothesis that events provide valuable information for understanding market narratives.
Multi-Dimensional Relationships (Significant Impact): Complete removal of relationship modeling results in 6.1% to 9.9% performance loss, demonstrating the importance of cross-asset information. Interestingly, the relationship module’s contribution is more consistent across prediction horizons compared to the event module.
Multi-Dimensional vs. Single-Dimensional Relationships: Using only frequency-domain relationships (DAFF-Net-SingleRel) leads to 3.5% to 5.6% degradation compared to the full multi-dimensional approach. This validates our hypothesis that different relationship types capture complementary information about asset correlations.
Soft vs. Hard Clustering: The soft clustering mechanism provides consistent but moderate improvements (2.0% to 4.1%) over hard clustering. While the improvement is smaller than other components, it demonstrates the value of preserving nuanced relationship information.
Contextual Factor Fusion: The specialized fusion mechanism contributes 4.8% to 7.6% improvement over simple concatenation, highlighting the importance of carefully designed information integration.
Event-Aware Router: The sophisticated event routing provides 3.1% to 5.2% improvement over naive event concatenation, validating the value of learned event-pattern matching.
Cumulative component analysis
To understand how components interact, we examine cumulative ablation effects (Table 9):
The cumulative analysis shows that: (1) Core innovations (event processing + multi-dimensional relationships) contribute 21.6% of total performance; (2) Complete removal of all innovations results in 35.3% performance degradation; (3) The final price-only configuration performs similarly to advanced unimodal baselines like Transformer, validating our baseline comparisons.
Component interaction effects
We observe interesting interaction effects between components:
Event-Relationship Synergy: When both event and relationship modules are present, their combined effect (21.6% improvement) exceeds the sum of individual contributions (13.7% + 9.1% = 22.8%), suggesting slight negative interaction. This may indicate some information overlap between event signals and cross-asset relationships.
Fusion Mechanism Dependency: The fusion module’s contribution increases when both primary streams are active, indicating that sophisticated fusion becomes more valuable as input complexity increases.
This comprehensive ablation study confirms that each proposed component contributes meaningfully to DAFF-Net’s performance, with the event-driven module providing the largest individual benefit and multi-dimensional relationship modeling providing substantial complementary value.
Cross-asset validation analysis
To evaluate the generalizability of DAFF-Net beyond the primary Amazon case study, we conducted additional validation experiments on a diverse set of stocks representing different sectors and market characteristics. This cross-asset analysis addresses concerns about sector-specific overfitting and demonstrates the broader applicability of our methodological approach.
Additional asset selection
We selected four representative stocks from distinct sectors to provide comprehensive cross-validation:
-
Johnson & Johnson (JNJ): Healthcare/Pharmaceutical sector, representing defensive value stocks with stable fundamentals and lower volatility
-
JPMorgan Chase (JPM): Financial services sector, exhibiting strong sensitivity to macroeconomic events and interest rate changes
-
ExxonMobil (XOM): Energy sector, characterized by commodity price dependencies and cyclical behavior patterns
-
Tesla (TSLA): Electric vehicle/Technology sector, representing high-growth, high-volatility stocks with strong event-driven characteristics
These assets were chosen to span the spectrum of market behaviors: from stable dividend-paying value stocks (JNJ) to highly volatile growth stocks (TSLA), from cyclical commodities exposure (XOM) to interest rate sensitive financials (JPM).
Cross-asset performance results
Table 10 presents the comparative performance of DAFF-Net against the three strongest baseline models (DUET, MM-LSTM, Transformer) across all five assets for 20-day prediction tasks.
Sector-specific performance analysis
The cross-asset validation reveals several important patterns:
Consistent Performance Gains: DAFF-Net achieved performance improvements across all five assets, with MSE reductions ranging from 8.5% to 9.7% and \(\text {R}^{2}\) improvements from 10.2% to 15.4% compared to the strongest baseline MM-LSTM. This consistency suggests that our methodological innovations provide robust value across different market segments.
Sector-Dependent Improvement Magnitudes: The relative improvements varied by sector characteristics. Energy stock XOM showed the largest \(\text {R}^{2}\) improvement (15.4%), potentially due to the sector’s strong sensitivity to macroeconomic events, which our event-driven approach effectively captures. Healthcare stock JNJ showed more modest improvements (10.2%), consistent with its lower event sensitivity and more stable fundamental-driven behavior.
Volatility Adaptation: High-volatility stocks (TSLA, AMZN) and low-volatility stocks (JNJ) both benefited from DAFF-Net, though through different mechanisms. For volatile stocks, the event-driven pattern extraction provided significant value; for stable stocks, the multi-dimensional relationship modeling contributed more substantially.
Event Sensitivity Correlation: Assets with higher event sensitivity (TSLA, XOM, AMZN) generally showed larger absolute performance improvements, validating the particular value of our event-aware routing mechanism for event-driven stocks.
Limitations and scope of validation
While these cross-asset results demonstrate promising generalizability, we acknowledge several limitations: (1) the validation remains within large-cap U.S. equities and may not extend to small-cap stocks, international markets, or other asset classes; (2) the 15-year study period, while comprehensive, represents specific market regimes and may not capture all possible market conditions; (3) the computational complexity of full multi-asset deployment remains a practical consideration for broader applications.
link
