Preprint
Article

This version is not peer-reviewed.

Transfer Learning-Enhanced Prediction of Glass Transition Temperature in Bismaleimide-Based Polyimides

A peer-reviewed article of this preprint also exists.

Submitted:

23 May 2025

Posted:

26 May 2025

You are already at the latest version

Abstract
The glass transition temperature (Tg) is a pivotal parameter governing the thermal and mechanical properties of bismaleimide-based polyimide (BMI) resins. However, limited experimental data for BMI systems posed significant challenges for predictive modeling. To address this gap, this study introduces a hybrid modeling framework leveraging transfer learning. Specifically, a multilayer perceptron (MLP) deep neural network is pre-trained on a large-scale polymer database and subsequently fine-tuned on a small-sample BMI dataset. Complementing this approach, six interpretable machine learning algorithms—Random Forest, Ridge Regression, K-Nearest Neighbors, Bayesian Regression, Support Vector Regression, and Extreme Gradient Boosting —are employed to construct transparent predictive models. SHapley Additive exPlanations (SHAP) analysis is further utilized to quantify the relative contributions of molecular descriptors to Tg. Results demonstrate that the transfer learning strategy achieves superior predictive accuracy in data-scarce scenarios compared to direct training on BMI dataset. SHAP analysis identified charge distribution inhomogeneity, molecular topology, and molecular surface area properties as the major influences on Tg. This integrated framework not only improves the prediction performance, but also provides feasible insights into molecular structure design, laying a solid foundation for rational engineering of high-performance BMI resins.
Keywords: 
;  ;  ;  ;  ;  

1. Introduction

The thermal stability of polymer substrates in high-temperature environments represents a critical technical bottleneck limiting the advancement of advanced composite materials. The glass transition temperature (Tg), a key parameter governing the transition from the glassy to the rubbery state, directly influences the morphological stability and mechanical property retention of thermoplastic and thermosetting polymers under extreme conditions[1]. For high-performance thermosetting polymers such as bismaleimide-based polyimides (BMI) resins, Tg determines their practical application potential in critical fields such as aerospace thermal protection structures and microelectronic packaging materials[2]. However, optimizing Tg in BMI resins remains a complex challenge, as it is highly dependent on composite and architecture of polymers[3,4,5,6]. Traditional trial-and-error approaches are constrained by lengthy experimental cycles, prohibitive testing costs, and difficulties in elucidating microscopic mechanisms. While computational methods like density functional theory (DFT)[7] and molecular dynamics (MD) simulations offer theoretical frameworks for Tg prediction[8,9,10], the intricate cross-linking networks, multidimensional aromatization reactions, and synergistic functional group interactions in BMI resins introduce dual challenges: exponentially increasing computational resource demands and empirical force field parameter selection in atomic-scale simulations.
Recent advances in polymer informatics have introduced a data-driven paradigm for materials development, enabling inverse analysis of structure-property relationships through machine learning[11,12,13,14,15]. Notable progress includes, Lei et al.’s[16] systematic benchmarking of 79 models for Tg prediction, which revealed synergistic interactions between molecular fingerprints and neural architectures. He et al.[17] demonstrated the scalability of this approach by developing a quantitative structure-property relationship (QSPR) model for 695 polyesters, achieving experimental validation errors within 17.4°C through virtual screening. Ning et al.[18] further advanced this paradigm by implementing deep learning frameworks for fluorinated polybenzoxazole systems, enabling simultaneous prediction of multiple physical properties. While these studies highlight the transformative potential of machine learning in polymer design, their success critically depends on large-scale, high-quality datasets[19,20,21,22]. In contrast, the BMI resin field is constrained by data scarcity and label noise arising from inconsistent experimental conditions, severely limiting model predictive capabilities.
To address the data scarcity challenge in BMI resin research, we employ transfer learning, a paradigm where knowledge gained from large datasets is repurposed for related tasks with limited data[23]. This approach has demonstrated efficacy in materials informatics. Yamada et al.[24] achieved high predictive performance in material property estimation using only tens of samples through their XenonPy.MDL pre-trained model library. Zhang et al.[25] developed a transfer learning framework to predict stress-strain curves of polymer composites, achieving 46.14% accuracy improvement in plastic deformation stages through optimal transport integration. Kazemi-Khasragh et al.[26] extended this concept to diverse polymer property prediction, accurately forecasting thermal and mechanical properties using datasets as small as 13 samples. Building on these foundations, we propose a hybrid framework that combines transfer learning and interpretable machine learning to overcome data limitations in BMI resin studies.
By working on a baseline database (Data_1) consisting of 3916 Tg measurements of different polymer types, we build a base model with a large amount of physical information. To perform feature extraction, we employ a multilayer perceptron (MLP) architecture inspired by the pioneering work of Chen et al.[27], who demonstrated the efficacy of neural networks in capturing the relationship between polymer structure and Tg. We extend this approach to address the problem of data scarcity for BMI resin by integrating transfer learning. Experimental results show that our constructed model accommodates the experimental inconsistency in the target BMI dataset (Data_2, n = 78) and achieves a prediction performance of Root Mean Squared Error (RMSE) = 27.27°C and Coefficient of Determination (R²) = 0.44 on the Data_2 test dataset.
In order to elucidate the structure-performance relationship of the BMI resin, we systematically expanded the chemical space beyond the limited experimental data (Data_2, n = 78) by structurally perturbing a virtual database (Data_3, n = 1092). Using this expanded dataset (Data_3), the Extreme Gradient Boosting (XGBoost)[28] model was trained using 67 molecular descriptors to achieve R2 = 0.63 on the Data_3 test dataset. In addition, SHapley Additive exPlanations (SHAP) values were calculated to quantify the contribution of the descriptors, revealing three main Tg determinants: charge distribution inhomogeneity, the molecular topology, and surface area properties. By transitioning from "black-box prediction" to "white-box analysis", our approach achieves a quantitative mapping between chemical structure and thermal properties. This work demonstrates the feasibility of data-efficient material discovery in a traditionally data-scarce field and provides a transferable paradigm for high-temperature polymer engineering.

2. Materials and Methods

2.1. Data Collection

A two-tier dataset architecture was employed: The base dataset (Data_1) comprised 3916 diverse polymers[29], among which 697 polyimides (PI)[30] were added due to their topological similarity with the imine pentacyclic structure of bismaleimide-based polyimides (BMI) to improve the pre-training effect of BMI-specific feature extraction. The target dataset (Data_2), constructed through experimental synthesis and literature curation[4,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52], contained 78 BMI molecules, which is a scarcity that posed the primary challenge for model development.
In order to systematically characterize the two-tier dataset architecture, we visualized the Tg distribution of the dataset (Figure 1a) using kernel density estimation (KDE) and distributional analysis. Here, the horizontal coordinates represent the Tg values corresponding to each data point, while the vertical coordinates indicate the frequency density. The green curve denotes Data_1, and the red curve denotes Data_2. Figure 1a reveals the multimodal distribution of Data_1 (the average value μ = 251.25°C), reflecting its composition of diverse polymer families, and the right-skewed distribution of Data_2 (μ = 312.87°C), sugge[4,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52]sting rigid structural patterns specific to BMI resins. The distinct distributional differences between the two datasets underscore the uniqueness of BMI resins compared to other polymers, explaining why generalized polymer models cannot directly predict the Tg of BMI.
Molecular structures from both datasets were encoded as Morgan fingerprints (see Section 2.2 for details) and subjected to principal component analysis (PCA). This method projects high-dimensional feature relationships onto interpretable 2D scatter plots, where the horizontal and vertical axes represent the first two principal components after dimensionality reduction. As shown in Figure 1b, partial domain overlap exists between green Data_1 clusters and red Data_2 clusters, indicating transferable latent representations while preserving domain-specific characteristics. The black arrows symbolize the knowledge transfer pathway from the general polymer space to BMI-specific regions.

2.2. Feature Engineering

Morgan fingerprint[53] is a circular fingerprint encoding molecular substructures through hashed bit patterns. This encoding strategy preserves topological information at multiple scales while maintaining computational efficiency for neural processing. Unlike scalar molecular descriptors that aggregate global properties (e.g., molecular weight, the number of aromatic rings), Morgan fingerprints retain spatial relationships between functional groups, enabling neural networks to learn representations of structural features for Tg determination. The data preprocessing workflow (Figure 1c) utilized RDKit (RDKit: Open-source cheminformatics; http://www.rdkit.org) to generate 2048-dimensional Morgan fingerprints from canonical SMILES[54] using the GetMorganFingerprintAsBitVect function with radius = 3 and nBits = 2048, capturing local chemical environments up to three bonds away while maintaining computational efficiency for neural processing.

2.3. Training Strategy of Transfer Learning Model

To address the sample scarcity in Data_2, a two-stage transfer learning framework was devised (Figure 1c): Stage 1 involved pre-training a MLP model on Data_1, featuring a 2048-dimensional input layer followed by three fully connected layers (1024/512/256 neurons with ReLU activation) and 30% Dropout regularization. The model underwent 200 training epochs using the Adam optimizer with dynamic learning rate adjustment via ReduceLROnPlateau. Stage 2 implemented selective fine-tuning during transfer to Data_2: all parameters except the final five layers were frozen, enabling gradient updates only in the last two fully connected layers. This hierarchical adaptation mechanism preserved cross-domain generalizable features while significantly enhancing prediction performance on the low-resource target domain through localized parameter tuning. Evaluation metrics included Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Squared Error (MSE), and Coefficient of Determination (R²). The presentation and calculation of the evaluation metrics are described in the support informationⅠ.

2.4. Virtual Structure Proposed

In order to elucidate the structure-property relationship between BMI molecules and their Tg, we categorized BMI molecules into aromatic (conjugated systems such as benzene rings) and aliphatic (linear or branched alkanes) based on the characterization of the R-groups as shown in Figure 2a. We employed a multiscale functional group modification strategy by introducing thioether (-SH), amino group (-NH₂/NO₂), oxygen-containing (hydroxyl, carbonyl, ester), and halogen substituents (F, Cl, Br, I). Additionally, five representative copolymer-modified architectures were incorporated, including ABPN, ABPA, DABPA, DABPAF, AN, and unmodified self-polymerization samples, to simulate real-world modification processes[55,56,57]. The functional group type distribution is visualized in Figure 2b, comprehensively reflecting the structural diversity of BMI resin molecules. The Tg prediction of these designed virtual structures using Model_1 yielded a virtual database of BMI (Data_3, n = 1092 ).

2.5. Descripter Calculated

Unlike Morgan fingerprints, molecular descriptors are more suitable for analyzing and quantifying global physicochemical properties, so we use molecular descriptors[60] as feature inputs in the interpretable modeling process instead of Morgan fingerprints. We used RDkit to calculate molecular descriptors (Figure 2c), including 67 descriptors such as maximum partial charge value carried by the atoms in the molecule (MaxpartialCharge), Balaban’s topological index (BalabanJ), the sum of atomic molar refractivities (MolMR), and so on. The specific descriptors and their meanings are shown in Supporting Information II. A hybrid feature selection method combining Pearson correlation coefficient identified 23 core descriptors for quantitative structure-property relationship modeling. The results of screening the characterization correlations using the Pearson correlation coefficient method are shown in Supporting Information III.

2.6. Interpretable Model

We tried six machine learning algorithms. Random Forest (RF)[58] operates as an ensemble method combining multiple decision trees via bagging and feature randomness, thereby reducing variance and improving generalization. Ridge Regression[59] extends ordinary least squares by introducing L2 regularization to penalize large coefficients, effectively mitigating multicollinearity and overfitting. K-Nearest Neighbors (KNN)[60] follows a non-parametric, instance-based learning paradigm where predictions are derived from the weighted average of the target variable in the nearest training examples within the feature space. Bayesian Regression (NB)[61] incorporates probabilistic framework by assuming a prior distribution over model parameters, with predictions formulated as posterior distributions via Bayes' theorem. Support Vector Regression (SVR)[62] extends the principles of Support Vector Machines to regression tasks by mapping input features into a high-dimensional kernel space through nonlinear transformations. Extreme Gradient Boosting (XGBoost)[28] implements a gradient boosting framework that sequentially trains decision trees to correct residual errors, employing regularization terms and shrinkage to enhance robustness against overfitting while maintaining computational efficiency through parallel tree construction. We used 5-fold cross-validation in our model training. Specific parameter settings for the model training process are given in support informationⅣ.

2.7. SHAP Analysis

SHapley Additive exPlanations (SHAP)[63], a game-theoretic framework rooted in cooperative game theory, was employed to decompose model predictions into feature contributions by quantifying Shapley values–the marginal impact of each molecular descriptor on Tg predictions. By aggregating local explanations across the dataset, SHAP generated globally interpretable insights through summary plots and force diagrams, enabling visualization of both linear and non-linear descriptor relationships. The final model leverages SHAP-derived descriptor importance rankings to construct a transparent structure-property map, where each molecular descriptors contribution to thermal transition behavior is represented.

3. Results and discussion

3.1. Performance Comparison of Different Neural Network Frameworks as Pre-trained Models in Transfer Learning

Given that the success of transfer learning hinges critically on the pre-trained model possessing robust generalization capabilities, we systematically compared the performance of different neural network frameworks within the transfer learning framework. We selected three distinct neural network frameworks for pre-trained model comparison: Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Deep Neural Network (DNN). MLP, as a fundamental feedforward neural network, excels in capturing nonlinear relationships within data through its fully connected layers, making it particularly suitable for processing high-dimensional sparse molecular fingerprint data, such as Morgan fingerprints. In this study, the MLP model, employing a three-layer hidden structure, demonstrated exceptional predictive capability on the test dataset of Data_2, achieving an R² value of 0.59. In contrast, CNN, renowned for its convolutional and pooling layers, excels in tasks like image recognition. However, within the transfer learning framework of this study, the CNN architecture did not surpass MLP in terms of predictive accuracy and generalization performance. This discrepancy might stem from CNN's proficiency in handling local features and spatial hierarchies, which may not be fully leveraged when dealing with high-dimensional sparse molecular data. DNN, or Deep Neural Network, enhances model representational power by increasing network depth. The DNN model adopted in this study comprised five hidden layers, enabling it to learn more complex feature representations. Despite its theoretical strong fitting capability, DNN's performance on the specific tasks and datasets of this study still slightly lagged behind MLP. This could be attributed to potential overfitting issues during DNN training, as well as challenges posed by data scarcity and quality heterogeneity in this study.
To visually illustrate the performance disparities among these models, Figure 3 presents the parity plots for pre-training on the Data_1 dataset across all three models. In these plots, the horizontal axis represents the true values, while the vertical axis denotes the model predictions. The black dashed line signifies the x = y diagonal, where points closer to this line indicate predictions closer to the true values. The blue line represents the training regression line, and the yellow line denotes the test regression line. Blue dots correspond to training dataset points, and yellow dots to test dataset points. Figure 3a specifically depicts the parity plot for the MLP model trained on Data_1, with training and test R² values of 0.76 and 0.59, respectively. Figure 3b and 3c showcase the parity plots for CNN and DNN models, with training R² values of 0.67 and 0.68, and test R² values of 0.51 and 0.57, respectively.
The effectiveness of transfer learning hinges on the pre-trained model's performance, particularly its generalization ability, as this directly impacts the subsequent fine-tuning process on Data_2. The superior performance of the MLP model in pre-training, as evidenced by its higher test R² value and closer alignment of test data points to the diagonal line in Figure 3a, underscores its advantage in handling high-dimensional sparse molecular data. Based on this finding, we selected the MLP model as the pre-trained model, and subsequent transfer learning tasks and methodological explorations were all conducted based on the MLP model.

3.2. Necessity and Technical Advantages of Transfer Learning

Confronting the dual challenges of data scarcity (Data_2, n = 78) and quality heterogeneity in predicting the Tg of BMI, our proposed transfer learning framework (Figure 1c) demonstrates significant technical advantages. As our objective focuses on predicting the Tg of BMI resins, we allocated 10% of Data_2 as the test set (designated as Test_2) for comparative analysis across different modeling strategies. Table 1 systematically compares the performance of three MLP-based modeling strategies evaluated on Test_2, building upon the 3.1 section's conclusion that MLP constitutes the optimal neural architecture for this task. As established, all three strategies utilize the MLP framework but differ in training paradigms: (1) standalone training on Data_1 (general molecular database), (2) standalone training on Data_2 (n = 78 BMI-specific dataset), and (3) the two-stage transfer learning paradigm combining Data_1 pretraining with Data_2 finetuning. Specifically, the transfer strategy freezes all layers except the final five during finetuning, which is an optimal knowledge transfer mechanism validated in Section 3.3.
However, despite the initial expectation that Data_1—a large-scale general polymer dataset—would provide robust predictive capability (as evidenced by its R² = 0.59 on internal testing in Section 3.1), the model's performance on BMI-specific Test_2 plummeted to R² = -6.19. This dramatic degradation stems from fundamental domain differences: BMI resins exhibit unique thermal behavior mechanisms distinct from conventional polymers, rendering generic structural patterns in Data_1 poorly transferable. While standalone training on Data_2 (n = 78) might seem a logical alternative, the resulting R² = -4.10 and RMSE = 82.15°C reflect inherent limitations: (1) extreme data scarcity prevents learning of meaningful representations, and (2) manual aggregation from heterogeneous literature sources introduces uncontrolled experimental noise, forcing the model to memorize spurious correlations rather than genuine structure-property relationships.
Faced with these challenges—Data_1's domain mismatch and Data_2's poor quality, the transfer learning framework strategically leverages Data_1's generalizable physical information as foundational knowledge, while adapting to BMI-specific features through fine-tuning. As shown in Table 1, the transfer learning approach achieved significant improvements in all evaluation metrics. There, MSE measures squared differences between predictions and true values; RMSE represents absolute error magnitude aligned with target variable scale; MAE directly reflects average prediction deviation magnitude. All three metrics follow the "lower the better" principle. R² evaluates model's explanatory power for data variance, with values closer to 1 indicating better performance, while negative values signify worse performance than baseline mean prediction. When evaluating the three distinct modeling strategies on Test_2, the transfer learning approach demonstrates superiority across all performance metrics. Notably, transfer learning improves R² from -6.19 to 0.44 when compared to training on Data_1 alone, directly demonstrating the framework's ability to correct for domain shift. The RMSE decreases by 72.40% (from 97.53°C to 27.27°C), which indicates a significant improvement in real-world applicability. The results for the Data_2 standalone training show a negative R² (-4.10) and considerable RMSE (82.15°C), indicating that the model is not predicting accurately. The behavior that stems from the model learning spurious correlations rather than true structure-attribute relationships in the small Data_2 dataset. The transfer learning framework utilizes the structural knowledge in Data_1 to effectively mitigate this issue, as evidenced by the positive R² (0.44) and RMSE (27.27°C) on Test_2. The consistent performance gains across RMSE, MSE, MAE, and R² metrics collectively validate the framework's capacity to mitigate data scarcity limitations in predicting the Tg of BMI resin.

3.3. Optimizing Transfer Learning Performance Through Layer-Wise Fine-Tuning in MLP Architectures

We explored the impact of varying fine-tuning layer counts on MLP-based transfer learning performance, with a particular focus on identifying the optimal balance between preserving pre-trained knowledge and adapting to target-domain specifics. As shown in Figure 4, each subplot systematically evaluates a critical performance metric—RMSE (4a), MSE (4b), MAE (4c), and R² (4d)—along the vertical axis, while the horizontal axis spans the number of fine-tuned layers (ranging from 1 to 6). Given the MLP architecture comprises 6 layers in total (excluding the input layer and output layer), our experiments systematically unfreeze 1 to 6 consecutive layers from the output end backward, enabling a investigation of adaptation effects.
The experimental curves reveal a consistent performance evolution pattern across all metrics. Initially, as layers are progressively unfrozen (moving from 1 to 5 layers), model performance improves markedly: RMSE drops to 27.27°C, MSE decreases to 743.61, MAE reduces to 21.92°C, and R2 climbs to 0.44. This improvement phase peaks at 5 fine-tuned layers, indicating optimal adaptation where the model sufficiently adjusts higher-level representations for the target domain while retaining pre-trained feature extraction capabilities from the frozen initial layers. Beyond this optimal point, continued layer unfreezing (6 layers) triggers performance deterioration across all metrics. This degradation suggests excessive parameter adjustment may introduce domain-specific noise or disrupt previously learned robust features, negating the benefits of transfer learning.
These observations validate our strategy of adapting the last five layers. Such configuration creates a critical balance: maintaining frozen layers ensures stability in handling high-dimensional molecular data, while funing layers provides necessary flexibility for domain-specific calibration. The resulting pretrained model, trained with this 5-layer fine-tuning approach, represents the optimal intersection of transfer efficiency and adaptive capacity, achieving the highest predictive accuracy without compromising generalization capability.

3.4. Feature Interpretability Analysis

According to the 3.3, the layer-wise fine-tuning strategy culminated in the development of Model_1, an optimized MLP architecture incorporating transfer learning principles that directly enabled the Tg predictions for BMI resins. To comprehensively elucidate the intrinsic physicochemical relationships governing the Tg of BMI resins, we conducted an interpretable machine learning analysis comparing experimental data (Data_2) and computationally augmented datasets (Data_3). While Morgan fingerprints excel in capturing structural patterns for predictive modeling, their inherent black-box nature limits physicochemical interpretability. Molecular descriptors, by contrast, encode quantifiable physicochemical properties, enabling direct correlation analysis between specific structural attributes and Tg. This rationale motivated our selection of descriptors for interpretable model development.
Initially, we attempted to derive interpretable insights directly from Data_2, but the insufficient data quantity and methodological variability across experimental sources precluded reliable descriptor analysis. To overcome this limitation, we designed 1092 novel BMI structures, then employed Model_1 to predict their Tg values, thereby established the Data_3 dataset. Table 2 presents the quantitative performance metrics of six regression algorithms evaluated on both datasets. Notably, all models exhibited significantly improved predictive capabilities when trained on Data_3: for instance, the XGBoost model achieved a test set R2 of 0.63 with RMSE = 17.06°C on Data_3 compared to R2 = -1.97 (RMSE = 48.35°C) on Data_2. This dramatic performance disparity stems from Data_2's limited sample size (n = 78) and inherent experimental heterogeneity across literature sources, which introduced confounding noise that compromised model generalization.
According to the results in Table 2, given the superior performance of XGBoost on Data_3 (R² = 0.63, RMSE = 17.06°C), this model was selected for SHAP-based descriptor analysis. Figure 5 systematically shows the ordering of the effects and importance of different descriptors on the positive and negative correlations of Tg. SHAP summary plots (Figure 5a, c) visualize feature contributions via color gradient encoding: red/blue tones indicate high/low feature values, with saturation intensity reflecting predictive impact magnitude. The feature importance ranking (Figure 5b, d) further quantifies the relative contribution of each descriptor, the horizontal coordinate represents the relative value of feature importance, and the vertical coordinate is each descriptor. For Data_3 (5a, 5b), the most important descriptors including MaxPartialCharge, which represents the maximum partial charge value carried by the atoms in the molecule. The fact that this descriptors rank in the most important indicates that the inhomogeneity of the charge distribution affects Tg. MinPartialCharge, which represents the value of the smallest partial charge carried by an atom in a molecule. The importance ranking of MinPartialCharge is also high,and it works with MaxPartialCharge. A larger difference between the two represents a stronger localized concentration of charge within the molecule, which may lead to stronger electrostatic interactions between the molecules. It can be seen that MaxPartialCharge and MinPartialCharge have an important effect on the Tg of BMI, however, there is no obvious positive or negative correlation pattern. Besides, one of the most important descriptor is the number of rotatable bonds (NumRotatableBonds), which indicates the number of single bonds in the molecule that can be freely rotated. The molecules with fewer NumRotatableBonds have restricted chain segment mobility, more rigid molecular conformation and higher Tg. Meanwhile, the specific molecular surface area contribution (SMR_VSA7) and the specific surface area contribution to the lipid-water partition coefficient (SlogP_VSA3) also ranked high, indicating that the molecular surface area properties also affect Tg, probably because they affect the stacking mode of the molecules and intermolecular interactions, which in turn have an effect on Tg.
In order to verify the authenticity of the analysis results of the virtual data Data_3, we use Data_2 to train the XGBoost model for feature significance analysis, and get the results (Figure 5c, 5d) similar to those of Data_3. Notably, MaxPartialCharge, MinPartialCharge, representing charge inhomogeneity, and SMR_VSA7, representing molecular surface properties, are still in the top rank, but the model fails to capture the influences such as NumRotatableBonds, representing molecular topological complexity. This is directly related to the small number of data sets.
Two experimentally characterized BMI derivatives (BMI-I and BMI-II) sourced from literature[64,65,66] were subjected to computational analysis (detailed in Support InformationⅤ). The Model_1 predictions yielded Tg values of 263.75°C and 323.03°C for BMI-I and BMI-II respectively, demonstrating exceptional agreement with the literature-reported experimental values of 260°C and 300°C (errors ≤3.75°C and ≤7.67% relative deviation). Concomitant calculation of molecular descriptors revealed that while most topological parameters remained consistent between the two systems, the NumRotatableBonds metric exhibited a marked difference (22 and 16 bonds), inversely correlating with measured Tg. This experimental and model compared analysis not only validates the Model_1’s predictive accuracy but also reinforces the structural complexity-Tg relationship posited by the interpretable framework, as reduced molecular flexibility (lower NumRotatableBonds) directly corresponds to evaluate Tg, thereby substantiating molecular topology as a critical determinant influencing Tg.

4. Conclusion

The hybrid framework proposed in this study integrates transfer learning and interpretable machine learning to successfully achieve efficient prediction of BMI resin Tg. Through SHAP analysis, this study elucidates the core mechanism of molecular descriptors' influence on Tg. Specifically, topological complexity (represented by NumRotatableBonds), charge distribution properties (represented by MaxPartialCharge and MinAbsPartialCharge), and molecular surface properties (SMR_VSA7 and SlogP_VSA3) are identified as the dominant influencing Tg factors. This success can be attributed to the effective transfer of chemical spatial knowledge through transfer learning and the explicit resolution of higher-order interactions through SHAP analysis. Additionally, Ten possible higher Tg structures are given based on the predictions of the discriminator model (Support InformationⅥ).
However, the study acknowledges certain limitations, particularly the exclusion of external variables such as processing parameters, which may affect the model's generalizability under varying processing condition[64,65,66]. Future research could extend this work by incorporating processing parameters and molecular characteristics to comprehensively reveal the multi-scale regulation mechanisms of Tg, further advancing the field of high-temperature-resistant polymer design with both predictive accuracy and mechanistic transparency.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org. The detailed datasets of these models can be found at https://doi.org/10.5281/zenodo.15481912.

Author Contributions

Conceptualization, Z.W. and J.Z.; methodology, Y.L.; software, P.K.; validation, Z.W., Z.L. and P.K.; formal analysis, Z.W.; investigation, J.Z.; resources, Y.L.; data curation, X.X.; writing—original draft preparation, Z.W.; writing—review and editing, P.K.; visualization, Z.W.; supervision, P.K.; project administration, L.Z.; funding acquisition, Y.L.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data are available from the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhao, Z.; Liu, F.; Yang, X.; Xie, Z.; Liu, L.; Chen, W. High-velocity impact and post-impact fatigue response of Bismaleimide resin composite laminates. European Journal of Mechanics - A/Solids 2025, 112, 105655. [Google Scholar] [CrossRef]
  2. Li, X.; Huang, J.; Chen, Y.; Zhu, F.; Wang, Y.; Wei, W.; Feng, Y. Polymer-Based Electronic Packaging Molding Compounds, Specifically Thermal Performance Improvement: An Overview. ACS Applied Polymer Materials 2024, 6, 14948–14969. [Google Scholar] [CrossRef]
  3. Xu, J.; Chen, P.; Ma, S.; Zhu, G.; Wu, L. Synthesis and thermal properties of novel bismaleimides containing cardo and oxazine structures and the thermal transition behaviors of their polymer structures. Thermochimica Acta 2023, 719, 179401. [Google Scholar] [CrossRef]
  4. Feng, Y.; Sun, Q.; Guo, J.; Wang, C. High-Performance Bismaleimide Resin with an Ultralow Coefficient of Thermal Expansion and High Thermostability. Macromolecules 2024, 57, 1808–1818. [Google Scholar] [CrossRef]
  5. Melissaris, A. P.; Mikroyannidis, J. A. Bismaleimides chain-extended by imidized benzophenone tetracarboxylic dianhydride and their polymerization to high temperature matrix resins. Journal of Polymer Science Part A: Polymer Chemistry 1988, 26, 1165–1178. [Google Scholar] [CrossRef]
  6. Grenier-Loustalot, M.-F.; Da Cunha, L. Sterically hindered bismaleimide monomer: Molten state reactivity and kinetics of polymerization. European Polymer Journal 1998, 34, 95–102. [Google Scholar] [CrossRef]
  7. Graser, J.; Kauwe, S. K.; Sparks, T. D. Machine Learning and Energy Minimization Approaches for Crystal Structure Predictions: A Review and New Horizons. Chemistry of Materials 2018, 30, 3601–3612. [Google Scholar] [CrossRef]
  8. Radue, M. S.; Varshney, V.; Baur, J. W.; Roy, A. K.; Odegard, G. M. Molecular Modeling of Cross-Linked Polymers with Complex Cure Pathways: A Case Study of Bismaleimide Resins. Macromolecules 2018, 51, 1830–1840. [Google Scholar] [CrossRef]
  9. Han, J.; Gee, R. H.; Boyd, R. H. Glass Transition Temperatures of Polymers from Molecular Dynamics Simulations. Macromolecules 1994, 27, 7781–7784. [Google Scholar] [CrossRef]
  10. Buchholz, J.; Paul, W.; Varnik, F.; Binder, K. Cooling rate dependence of the glass transition temperature of polymer melts: Molecular dynamics study. The Journal of Chemical Physics 2002, 117, 7364–7372. [Google Scholar] [CrossRef]
  11. Zhang, T.; Wang, S.; Chai, Y.; Yu, J.; Zhu, W.; Li, L.; Li, B. A.-O. Prediction and Interpretability Study of the Glass Transition Temperature of Polyimide Based on Machine Learning with Quantitative Structure-Property Relationship (T(g)-QSPR). (1520-5207 (Electronic)). From 2024 Sep 12.
  12. Babbar, A.; Ragunathan, S.; Mitra, D.; Dutta, A.; Patra, T. K. Explainability and extrapolation of machine learning models for predicting the glass transition temperature of polymers. Journal of Polymer Science 2024, 62, 1175–1186. [Google Scholar] [CrossRef]
  13. Kang, S.; Cho, K. Conditional Molecular Design with Deep Generative Models. Journal of Chemical Information and Modeling 2019, 59, 43–52. [Google Scholar] [CrossRef]
  14. Preuer, K.; Renz, P.; Unterthiner, T.; Hochreiter, S.; Klambauer, G. A.-O. Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery. (1549-960X (Electronic)). From 2018 Sep 24.
  15. Arús-Pous, J. A.-O.; Blaschke, T.; Ulander, S.; Reymond, J. L.; Chen, H.; Engkvist, O. Exploring the GDB-13 chemical space using deep generative models. (1758-2946 (Print)). From 2019 Mar 12.
  16. Tao, L.; Varshney, V.; Li, Y. Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature. Journal of Chemical Information and Modeling 2021, 61, 5395–5413. [Google Scholar] [CrossRef]
  17. He, X.; Yu, M.; Han, J.-P.; Jiang, J.; Jia, Q.; Wang, Q.; Luo, Z.-H.; Yan, F.; Zhou, Y.-N. Leveraging data-driven strategy for accelerating the discovery of polyesters with targeted glass transition temperatures. AIChE Journal 2024, 70, e18409. [Google Scholar] [CrossRef]
  18. Ning, L. Artificial neural network prediction of glass transition temperature of fluorine-containing polybenzoxazoles. Journal of Materials Science 2009, 44, 3156–3164. [Google Scholar] [CrossRef]
  19. Oviedo, F.; Ferres, J. L.; Buonassisi, T.; Butler, K. T. Interpretable and Explainable Machine Learning for Materials Science and Chemistry. Accounts of Materials Research 2022, 3, 597–607. [Google Scholar] [CrossRef]
  20. Nguyen, T.; Bavarian, M. A Machine Learning Framework for Predicting the Glass Transition Temperature of Homopolymers. Industrial & Engineering Chemistry Research 2022, 61, 12690–12698. [Google Scholar] [CrossRef]
  21. Pilania, G.; Iverson, C. N.; Lookman, T.; Marrone, B. L. Machine-Learning-Based Predictive Modeling of Glass Transition Temperatures: A Case of Polyhydroxyalkanoate Homopolymers and Copolymers. Journal of Chemical Information and Modeling 2019, 59, 5013–5025. [Google Scholar] [CrossRef]
  22. Alcobaça, E.; Mastelini, S. M.; Botari, T.; Pimentel, B. A.; Cassar, D. R.; de Carvalho, A. C. P. d. L. F.; Zanotto, E. D. Explainable Machine Learning Algorithms For Predicting Glass Transition Temperatures. Acta Materialia 2020, 188, 92–100. [Google Scholar] [CrossRef]
  23. King-Smith, E. Transfer learning for a foundational chemistry model. Chemical Science 2024, 15, 5143–5151. [Google Scholar] [CrossRef]
  24. Yamada, H.; Liu, C.; Wu, S.; Koyama, Y.; Ju, S.; Shiomi, J.; Morikawa, J.; Yoshida, R. Predicting Materials Properties with Little Data Using Shotgun Transfer Learning. ACS Central Science 2019, 5, 1717–1730. [Google Scholar] [CrossRef] [PubMed]
  25. Zhang, Z.; Liu, Q.; Wu, D. Predicting stress–strain curves using transfer learning: Knowledge transfer across polymer composites. Materials & Design 2022, 218, 110700. [Google Scholar] [CrossRef]
  26. Kazemi-Khasragh, E.; González, C.; Haranczyk, M. Toward diverse polymer property prediction using transfer learning. Computational Materials Science 2024, 244, 113206. [Google Scholar] [CrossRef]
  27. Chen, X.; Sztandera, L.; Cartwright, H. M. A neural network approach to prediction of glass transition temperature of polymers: Research Articles. Int. J. Intell. Syst. 2008, 23, 22–32. [Google Scholar] [CrossRef]
  28. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, San Francisco, California, USA, 2016. [Google Scholar]
  29. Otsuka, S.; Kuwajima, I.; Hosoya, J.; Xu, Y.; Yamazaki, M. PoLyInfo: Polymer Database for Polymeric Materials Design. In 2011 International Conference on Emerging Intelligent Data and Web Technologies, 7-9 Sept. 2011, 2011; pp 22-29. [CrossRef]
  30. Zhang, H.; Li, H.; Xin, H.; Zhang, J. Property Prediction and Structural Feature Extraction of Polyimide Materials Based on Machine Learning. Journal of Chemical Information and Modeling 2023, 63, 5473–5483. [Google Scholar] [CrossRef]
  31. Zhu, J.; Xia, Y.; Liu, L.; Yan, S.; Zeng, Y.; Zhang, R.; Zhang, X.; Sheng, Y. Comparative study of the kinetic behaviors and properties of aromatic and aliphatic bismaleimides. Thermochimica Acta 2024, 737, 179768. [Google Scholar] [CrossRef]
  32. Lyu, J.; Tang, J.; Ji, B.; Wu, N.; Liao, W.; Yin, C.; Bai, S.; Xing, S. Fluorinated polyetherimide as the modifier for synergistically enhancing the mechanical, thermal and dielectric properties of bismaleimide resin and its composites. Composites Communications 2024, 51, 102035. [Google Scholar] [CrossRef]
  33. Chen, F.; Zhang, H.; Li, S.; Chen, Y.; Liang, M.; Heng, Z.; Zou, H. Design of high-performance resin by tuning cross-linked network topology to improve CF/bismaleimide composite compressive properties. Composites Science and Technology 2023, 242, 110170. [Google Scholar] [CrossRef]
  34. Hsiao, C.-C.; Lee, J.-J.; Liu, Y.-L. Meldrum's acid-functionalized bismaleimide, polyaspartimide and their thermally crosslinked resins: Synthesis and properties. Reactive and Functional Polymers 2024, 202, 105988. [Google Scholar] [CrossRef]
  35. Peng, H.; Wang, Y.; Zhan, Y.; Lei, F.; Wang, P.; Li, K.; Li, Y.; Yang, X. Hierarchical curing mechanism in epoxy/bismaleimide composites: Enhancing mechanical properties without compromising thermal stabilities. European Polymer Journal 2025, 222, 113604. [Google Scholar] [CrossRef]
  36. Liu, B.; Yuan, Z.; Liu, C.; Sun, M.; Zhang, X.; Derradji, M.; Zhang, B.; Li, J.; Zhao, M.; Song, C.; et al. Synthesis, curing kinetics and processability of a low melting point aliphatic silicon-containing bismaleimide. Materials Today Communications 2024, 41, 110845. [Google Scholar] [CrossRef]
  37. Zhang, Y.; Wang, L.; Yuan, Q.; Zheng, Q.; Wan, L.; Huang, F. Bismaleimide resin modified by a propargyl substituted aromatic amine with ultrahigh glass transition temperature, thermomechanical stability and intrinsic flame retardancy. Reactive and Functional Polymers 2023, 193, 105740. [Google Scholar] [CrossRef]
  38. Zhou, Y.; Liu, W.; Ye, W.; Chu, F.; Hu, W.; Song, L.; Hu, Y. Design of reactive linear polyphosphazene to improve the dielectric properties and fire safety of bismaleimide composites. Chemical Engineering Journal 2024, 482, 148867. [Google Scholar] [CrossRef]
  39. Chen, S.; Yu, L.; Zhang, S.; Sun, X.; Qu, B.; Wang, R.; Zheng, Y.; Liu, X.; Li, W.; Gao, J.; et al. Synergistic strengthening and toughening of 3D printing photosensitive resin by bismaleimide and acrylic liquid-crystal resin. Journal of Science: Advanced Materials and Devices 2023, 8, 100565. [Google Scholar] [CrossRef]
  40. Ning, Y.; Li, D.-s.; Jiang, L. Thermally stable and deformation-reversible eugenol-derived bismaleimide resin: Synthesis and structure-property relationships. Reactive and Functional Polymers 2022, 173, 105236. [Google Scholar] [CrossRef]
  41. Sheng, X.; Yun, S.; Wang, S.; Gao, Y.; Zuo, X.; Miao, X.; Shi, X.; Qin, J.; Ma, Z.; Zhang, G. Highly heat-resistant and mechanically strong co-crosslinked polyimide/bismaleimide rigid foams with superior thermal insulation and flame resistance. Materials Today Physics 2023, 36, 101154. [Google Scholar] [CrossRef]
  42. Ge, M.; Liang, G.; Gu, A. A facile strategy and mechanism to achieve biobased bismaleimide resins with high thermal-resistance and strength through copolymerizing with unique propargyl ether-functionalized allyl compound. Reactive and Functional Polymers 2023, 186, 105570. [Google Scholar] [CrossRef]
  43. Wu, T.; Jiang, P.; Zhang, X.; Guo, Y.; Ji, Z.; Jia, X.; Wang, X.; Zhou, F.; Liu, W. Additively manufacturing high-performance bismaleimide architectures with ultraviolet-assisted direct ink writing. Materials & Design 2019, 180, 107947. [Google Scholar] [CrossRef]
  44. Xiong, X.; Ma, X.; Chen, P.; Zhou, L.; Ren, R.; Liu, S. New chain-extended bismaleimides with aryl-ether-imide and phthalide cardo skeleton (I): Synthesis, characterization and properties. Reactive and Functional Polymers 2018, 129, 29–37. [Google Scholar] [CrossRef]
  45. Li, X.; Zhou, Y.; Bao, Y.; Wei, W.; Fei, X.; Li, X.; Liu, X. Bismaleimide/Phenolic/Epoxy Ternary Resin System for Molding Compounds in High-Temperature Electronic Packaging Applications. Industrial & Engineering Chemistry Research 2022, 61, 4191–4201. [Google Scholar] [CrossRef]
  46. Ning, L.; Yuan, L.; Liang, G.; Gu, A. Thermally resistant and strong remoldable triple-shape memory thermosets based on bismaleimide with transesterification. Journal of Materials Science 2021, 56, 3623–3637. [Google Scholar] [CrossRef]
  47. Pu, Z.; Wu, F.; Wang, X.; Zhong, J.; Liu, X.; Pan, Y.; Wang, Y.; Jiang, D.; Ning, Z. Strategy to achieve low-dielectric-constant for benzoxazine-phthalonitriles: introduction of 2,2'-bis[4-(4-Maleimidephen-oxy)phenyl)]propane by in-situ polymerization. Journal of Polymer Research 2024, 31, 140. [Google Scholar] [CrossRef]
  48. Xing, Y.; Zhang, Y.; He, X. Design of acetylene-modified bio-based tri-functional benzoxazine and its copolymerization with bismaleimide for performance enhancement. Polymer Bulletin 2023, 80, 12065–12077. [Google Scholar] [CrossRef]
  49. Yang, R.; Zhang, K. Strategies for improving the performance of diallyl bisphenol A-based benzoxazine resin: Chemical modification via acetylene and physical blending with bismaleimide. Reactive and Functional Polymers 2021, 165, 104958. [Google Scholar] [CrossRef]
  50. Yu, P.; Zhang, Y.-l.; Yang, X.; Pan, L.-j.; Dai, Z.-y.; Xue, M.-z.; Liu, Y.-g.; Wang, W. Synthesis and characterization of asymmetric bismaleimide oligomers with improved processability and thermal/mechanical properties. Polymer Engineering & Science 2019, 59, 2265–2272. [Google Scholar] [CrossRef]
  51. Liu, C.; Qiao, Y.; Li, N.; Hu, F.; Chen, Y.; Du, G.; Wang, J.; Jian, X. Toughened of bismaleimide resin with improved thermal properties using amino-terminated Poly(phthalazinone ether nitrile sulfone)s. Polymer 2020, 206, 122887. [Google Scholar] [CrossRef]
  52. Xue, K.; Zhang, P.; Song, Z.; Guo, F.; Hua, Z.; You, T.; Li, S.; Cui, C.; Liu, L. Preparation of eugenol-based flame retardant epoxy resin with an ultrahigh glass transition temperature via a dual-curing mechanism. Polymer Degradation and Stability 2025, 231, 111092. [Google Scholar] [CrossRef]
  53. Ma, R.; Liu, Z.; Zhang, Q.; Liu, Z.; Luo, T. Evaluating Polymer Representations via Quantifying Structure–Property Relationships. Journal of Chemical Information and Modeling 2019, 59, 3110–3119. [Google Scholar] [CrossRef]
  54. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences 1988, 28, 31–36. [Google Scholar] [CrossRef]
  55. Lyu, J.; Ji, B.; Wu, N.; Liao, W.; Yin, C.; Bai, S.; Xing, S. The effect of substituent group in allyl benzoxazine on the thermal, mechanical and dielectric properties of modified bismaleimide. Reactive and Functional Polymers 2023, 191, 105673. [Google Scholar] [CrossRef]
  56. Srinivasan, S.; Saravanamuthu, S. K. S.; Syed Mohammed, S. R.; Jeyaraj Pandian, D.; Chinnaswamy Thangavel, V. Low-temperature processable glass fiber reinforced aromatic diamine chain extended bismaleimide composites with improved mechanical properties. Polymer Composites 2022, 43, 6987–6997. [Google Scholar] [CrossRef]
  57. Gao, H.; Ding, L.; Li, W.; Ma, G.; Bai, H.; Li, L. Hyper-Cross-Linked Organic Microporous Polymers Based on Alternating Copolymerization of Bismaleimide. ACS Macro Letters 2016, 5, 377–381. [Google Scholar] [CrossRef]
  58. Breiman, L. Random Forests. Machine Learning 2001, 45, 5–32. [Google Scholar] [CrossRef]
  59. Hoerl, A. E.; and Kennard, R. W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  60. Taunk, K.; De, S.; Verma, S.; Swetapadma, A. A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS), 15-17 May 2019, 2019; pp 1255-1260. [CrossRef]
  61. MacKay, D. J. C. Bayesian interpolation. Neural Comput. 1992, 4, 415–447. [Google Scholar] [CrossRef]
  62. Cherkassky, V. The Nature Of Statistical Learning Theory~. IEEE Transactions on Neural Networks 1997, 8, 1564–1564. [Google Scholar] [CrossRef]
  63. Lundberg, S. M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA; 2017.
  64. Iredale, R. J.; Ward, C.; Hamerton, I. Modern advances in bismaleimide resin technology: A 21st century perspective on the chemistry of addition polyimides. Progress in Polymer Science 2017, 69, 1–21. [Google Scholar] [CrossRef]
  65. Ohtsuka, K.; Nakao, S.; Hatanaka, Y. Toughening of bismaleimide and benzoxazine alloy with allyl group by incorporation of polyrotaxane. Polymer 2025, 320, 127979. [Google Scholar] [CrossRef]
  66. Wang, Y.; Yuan, L.; Liang, G.; Gu, A. Achieving ultrahigh glass transition temperature, halogen-free and phosphorus-free intrinsic flame retardancy for bismaleimide resin through building network with diallyloxydiphenyldisulfide. Polymer 2020, 203, 122769. [Google Scholar] [CrossRef]
Figure 1. Data Characterization and Methodological Framework: (a) Comparative KDE profiles: Data_1 (green) shows multimodal distribution patterns, whereas Data_2 (red) exhibits elevated mean values attributed to rigid structural motifs; (b) PCA dimensionality reduction: Green/red regions denote Data_1/Data_2 distributions; arrows indicate transfer learning pathways; (c) Technical Roadmap: Molecular structure → RDKit fingerprint generation → Deep neural modeling → Transfer learning adaptation, forming a framework for cross-domain knowledge migration.
Figure 1. Data Characterization and Methodological Framework: (a) Comparative KDE profiles: Data_1 (green) shows multimodal distribution patterns, whereas Data_2 (red) exhibits elevated mean values attributed to rigid structural motifs; (b) PCA dimensionality reduction: Green/red regions denote Data_1/Data_2 distributions; arrows indicate transfer learning pathways; (c) Technical Roadmap: Molecular structure → RDKit fingerprint generation → Deep neural modeling → Transfer learning adaptation, forming a framework for cross-domain knowledge migration.
Preprints 160759 g001
Figure 2. BMI molecular design (a) classification framework based on R-group features (aromatic/aliphatic) and systematic functional group modification strategies; (b) modifier types showing functional group compositions in five copolymer structures and self-polymerized samples; (c) process of calculating descriptors.
Figure 2. BMI molecular design (a) classification framework based on R-group features (aromatic/aliphatic) and systematic functional group modification strategies; (b) modifier types showing functional group compositions in five copolymer structures and self-polymerized samples; (c) process of calculating descriptors.
Preprints 160759 g002
Figure 3. Pre-training performance comparison of neural network frameworks on Data_1: (a) MLP parity plot (training R² = 0.76, test R² = 0.59); (b) CNN parity plot (training R² = 0.67, test R² = 0.51); (c) DNN parity plot (training R² = 0.68, test R² = 0.57). Diagonal line indicates ideal prediction (x = y), with training/test data points and regression lines shown in blue/yellow.
Figure 3. Pre-training performance comparison of neural network frameworks on Data_1: (a) MLP parity plot (training R² = 0.76, test R² = 0.59); (b) CNN parity plot (training R² = 0.67, test R² = 0.51); (c) DNN parity plot (training R² = 0.68, test R² = 0.57). Diagonal line indicates ideal prediction (x = y), with training/test data points and regression lines shown in blue/yellow.
Preprints 160759 g003
Figure 4. Layer-wise fine-tuning analysis for MLP-based transfer learning: (a) RMSE, (b) MSE, (c) MAE, and (d) R² performance trends across 1–6 fine-tuned layers. The MLP architecture comprises 6 layers (excluding input/output layers), with experiments systematically unfreezing 1–6 consecutive layers from the output end. Horizontal axis indicates the number of fine-tuned layers; vertical axes show error metrics (°C units for RMSE/MAE) and coefficient of determination.
Figure 4. Layer-wise fine-tuning analysis for MLP-based transfer learning: (a) RMSE, (b) MSE, (c) MAE, and (d) R² performance trends across 1–6 fine-tuned layers. The MLP architecture comprises 6 layers (excluding input/output layers), with experiments systematically unfreezing 1–6 consecutive layers from the output end. Horizontal axis indicates the number of fine-tuned layers; vertical axes show error metrics (°C units for RMSE/MAE) and coefficient of determination.
Preprints 160759 g004
Figure 5. Feature role analysis (a) SHAP summary plot for training interpretable models on Data_3, color shades indicate the degree of feature contribution, and red/blue represents the size of feature values; (b) feature importance ranking for training interpretable models on Data_3, horizontal axis is the relative value of feature importance, vertical axis is the feature; (c) SHAP summary plot for training interpretable models on Data_2, color shades indicate the degree of feature contribution, and red/blue represents the size of feature values; (d) feature importance ranking for training interpretable models on Data_2, horizontal axis is the relative value of feature importance, vertical axis is the feature.
Figure 5. Feature role analysis (a) SHAP summary plot for training interpretable models on Data_3, color shades indicate the degree of feature contribution, and red/blue represents the size of feature values; (b) feature importance ranking for training interpretable models on Data_3, horizontal axis is the relative value of feature importance, vertical axis is the feature; (c) SHAP summary plot for training interpretable models on Data_2, color shades indicate the degree of feature contribution, and red/blue represents the size of feature values; (d) feature importance ranking for training interpretable models on Data_2, horizontal axis is the relative value of feature importance, vertical axis is the feature.
Preprints 160759 g005
Table 1. Performance Comparison of MLP-Based Strategies on Test_2.
Table 1. Performance Comparison of MLP-Based Strategies on Test_2.
Metrics Data_1 Standalone Data_2 Standalone Transfer Learning
(from Data_1 to Data_2)
RMSE (°C) 97.53 82.15 27.27
MSE 9512.74 6747.92 743.61
MAE (°C) 89.49 79.47 21.92
R2 -6.19 -4.10 0.44
Table 2. Performace of six interpretable machine learning models on test dataset from Data_2 and Data_3.
Table 2. Performace of six interpretable machine learning models on test dataset from Data_2 and Data_3.
Model Dataset RMSE (°C) MSE MAE (°C) R2
RF Data_2 43.18 1864.35 36.42 -0.81
Data_3 17.32 299.97 12.27 0.62
Ridge Data_2 37.39 1398.15 30.72 -0.36
Data_3 21.75 472.98 15.33 0.40
KNN Data_2 64.09 4107.56 48.96 -2.98
Data_3 20.45 418.29 13.65 0.47
Bayesian Data_2 37.39 1397.86 30.33 -0.36
Data_3 21.60 466.51 15.18 0.41
SVR Data_2 37.02 1370.61 30.39 -0.33
Data_3 19.07 363.58 13.38 0.54
XGBoost Data_2 48.35 2337.33 39.91 -1.97
Data_3 17.06 290.98 12.09 0.63
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated