Preprint
Review

This version is not peer-reviewed.

A Review of Machine Learning in Organic Solar Cells

A peer-reviewed article of this preprint also exists.

Submitted:

09 December 2024

Posted:

09 December 2024

You are already at the latest version

Abstract

Organic solar cells (OSCs) are one of the most promising candidates for the future commercialization of renewable energy sources that provide a low cost and flexible devices for different daily-life applications. This objective can be rapidly accomplished by formulating novel compounds and forecasting their efficiency and stability without significant investigation, thus minimizing the number of prospective targets. Data-driven machine learning (ML) algorithms can foretell materials energy levels, absorption response, stability, and efficiency of OSCs that helps in the development of novel high-performance materials. Nonetheless, the data-driven molecular design of organic solar cell materials continues to pose significant challenges. The primary issue lies in the complexity and variability of organic materials, which necessitates extensive and high-quality datasets for training robust machine learning models. Additionally, integrating these models into a coherent and efficient workflow that can be adopted by the scientific community remains an obstacle. This review article delves into the use of machine learning methods for organic solar cell research. Hence, the fundamentals of machine learning and the important procedures for applying these techniques in the context of organic solar cells are elaborated. A brief introduction to different classes of machine learning algorithms, as well as related software and tools, is provided. By addressing the challenges and leveraging the power of machine learning, we aim to pave the way for the accelerated discovery and optimization of organic solar cell materials, ultimately contributing to their commercialization and widespread adoption.

Keywords: 

1. Introduction

A promising avenue for the development of organic solar cells (OSCs) can be realized via the use of organic semiconductors. This is due to the unique properties of these materials such as high synthetic flexibility, which permits remarkable control over the bandgap, energy level, and carrier mobility of the active layer of OSC devices. The active layers are commonly made of electron donor and acceptor materials. Noteworthy, recent advances in the synthesis of non-fullerene acceptors have led to significant improvements in power conversion efficiencies (PCEs), with some devices achieving over 19% (Zhou, Tang et al. 2019). The great success is the result of device architecture, design of active layer materials and processing methods (Mahmood and Wang 2021). Being lightweight, flexibility , and semitransparency are just a few of the distinct benefits that OSCs provide compared to their inorganic counterparts (Du, Heumueller et al. 2019, Wan, Li et al. 2020).
Despite these advancements, substantial improvement in PCE is still necessary for OSCs to be competitive with inorganic devices and hence to reach the market use. The current methodologies are hindered by lengthy time consumption, tedious purification stages, and strict synthesis methods that plague the trial-and-error experimental routines (Wan, Li et al. 2020). Additionally, predicting PCE from OPV material components is challenging due to several obstacles such as the complex donor/acceptor (D/A) interface morphology, strong electron-phonon couplings, and strong electron-electron interactions. These features necessitate state-of-the-art theoretical approaches from quantum chemistry, statistical mechanics, and quantum dynamics for precise OPV simulations.
Recent theoretical insights into microscopic processes to investigate exciton dissociation and charge transport process have been progressed significantly, yet the expansive space of organic compounds makes discovering suitable materials labor-intensive and time-consuming (Wang, Feng et al. 2023). The inherent degrees of freedom in organic molecules create an almost infinite number of potential candidate materials, and the current selection process relies heavily on trial and error. This approach is inefficient due to the complexity of organic solar cell operation, where predicting a material's performance based on its chemical structure is difficult. Light-induced processes such as exciton production, migration, and dissociation, along with charge transfer to the appropriate electrodes, they further complicate the development of a predictive rule (Wadsworth, Moser et al. 2019).
Scharber's model was established to forecast the photovoltaic efficiency of organic solar cells. It employs a limited number of electronic characteristics from the active layer materials (donor and acceptor) for forecasting. The scope is constrained by the underlying assumptions. Numerous challenges impede its extension, particularly the difficulty in incorporating additional descriptors such as structural, topological, and thermodynamic factors. Nevertheless, the performance of this model was found to be low (Padula, Simpson et al. 2019)
The field of materials science has seen encouraging outcomes from data-driven research, which has initiated a new paradigm. By doing so, one can gain insight into the underlying principles that control the functionality of materials in particular contexts. The data-driven paradigm for material discovery is efficient and effective in leveraging pertinent information (Gu, Noh et al. 2019). The methodical approach to this is termed machine learning, which derives insights from historical data to assist in evaluating candidates for laboratory positions. The identification of superior candidate materials for organic solar cells can be expedited and rendered more cost-effectively through multidimensional design utilizing machine learning (ML), DFT calculations and available experimental data survey, as can be seen in Figure 1.
Problems of Applying machine learning (ML) to the complex domain of organic solar cells (OSCs) has not yet yielded particularly impressive results, despite a growing body of literature on the topic. The performance of OSCs is influenced by numerous factors such as solvent additives, crystallinity, molecule orientation, and processing solvents. Morphological features are crucial for charge separation at the donor-acceptor interface. More research is required to make effective use of ML with photovoltaic materials.
Recent research highlights significant challenges in achieving stability for organic solar cells (OSCs), which is critical for their commercial viability. Stability issues arise primarily from degradation under environmental stressors like oxygen, moisture, light, and heat, as well as intrinsic factors such as material phase separation and morphological changes over time. Despite advancements, these factors limit device lifetimes and performance consistency. Material Degradation in Organic materials, particularly non-fullerene acceptors (NFAs) and polymer donors, exhibit chemical instability when exposed to air or light. This affects the molecular structure and leads to reduced power conversion efficiencies (PCEs) over time. Thermal and Mechanical Instability is the flexible nature of OSCs makes them susceptible to mechanical deformation and thermal stress, which can disrupt the active layer morphology and electrode interfaces. Interface and Morphological Challenges are the delicate balance at donor-acceptor interfaces often destabilizes under operational conditions, causing electron traps and recombination losses. Recent strategies to address these issues include molecular engineering of more robust materials, optimizing device architectures to enhance encapsulation, and developing predictive models for degradation pathways. These approaches aim to achieve stable, high-performing OSCs suitable for large-scale applications in building integration and wearable electronics (Wang, Luke et al. 2023, Ding, Yang et al. 2024).
ML has emerged as a powerful tool to predict essential OSC parameters. By analyzing molecular descriptors and device architectures, ML models can forecast key properties such as HOMO and LUMO energy levels, UV/visible absorption maxima, and PCE. These models can significantly reduce the trial-and-error efforts traditionally associated with OSC material discovery. Machine learning (ML) in organic solar cell (OSC) research faces challenges due to the complex interplay of factors such as exciton dissociation, charge transport, and D/A interface morphology, which are not fully understood. Existing models often rely on limited or overly specific descriptors, while datasets remain small, biased, or incomplete, especially for non-fullerene acceptors (NFAs). Traditional models like the Scharber model are inadequate for modern OSCs, failing to incorporate essential parameters like morphology and excited-state properties. Additionally, the high computational cost of generating accurate descriptors limits scalability, and many ML models act as "black boxes," providing little interpretability to guide experimentalists. These issues underscore the need for richer datasets, improved descriptors balancing accuracy and computational efficiency, and generalized, explainable ML frameworks to enhance predictive power and foster rapid material discovery and device optimization.
The purpose of this review explores the role of machine learning (ML) in advancing the field of organic solar cells (OSCs). Its primary aim is to demonstrate how ML can accelerate the discovery and optimization of OSC materials by predicting key properties like power conversion efficiency (PCE), energy levels, and absorption characteristics. The document also addresses the current challenges in applying ML to OSC research and highlights potential solutions to improve efficiency, data integration, and predictive accuracy.

2. The Use of Machine Learning in Organic Solar Cells

Machine learning (ML) is a branch of artificial intelligence that focuses on building models capable of predicting outcomes and discovering new materials based on large quantities of reliable data. These datasets, derived from experiments or computations, describe the materials’ behaviors, qualities, and applications. By analyzing these data, machine learning techniques can uncover patterns and relationships that might not be evident through traditional methods.
In the context of organic solar cells (OSCs), machine learning involves applying statistical models and computational methods to analyze, predict, and optimize device performance. This process includes conducting experiments with OSCs, running simulations, and employing theoretical models, followed by the application of ML algorithms to interpret the complex relationships within the datasets. The integration of machine learning, high-performance computing (HPC), and adequate data now presents an opportunity to streamline materials discovery (Mahmood et al., 2022b).
The remarkable achievements of machine learning in fields such as image identification and translation have aroused the interest of materials scientists (Dong et al., 2021). These advancements demonstrate the potential of ML to gain insights into the fundamental principles governing material behavior, offering significant time savings compared to traditional quantum chemistry computations and experimental methods.
For instance, in a case study involving the design and testing of materials for OSCs, molecular dynamics simulations combined with machine learning techniques led to the identification of promising new material candidates. The process began with collecting data on various molecular configurations and their properties. Machine learning models were then trained to predict the performance of these configurations, significantly narrowing down the list of potential candidates for experimental testing. This approach not only saved time but also reduced the cost associated with trial-and-error experimentation (Dong et al., 2021).
However, the success of machine learning in this domain is heavily dependent on the quality, size, and form of the dataset. Organic solar cells, like many materials’ science domains, have scattered and heterogeneous data due to the complexity of their working principles (Mahmood et al., 2022a). Effective data collection strategies, preprocessing techniques, and feature selection are crucial to developing robust ML models that can make accurate predictions.
For example, recent research has employed machine learning algorithms to predict the power conversion efficiency (PCE) of OSCs based on molecular descriptors and device architecture parameters. By analyzing vast datasets, these models identified key factors influencing PCE and provided insights into optimizing material properties and device configurations for enhanced performance.
In conclusion, while machine learning delves into data patterns that humans might overlook, its application in organic solar cells requires careful consideration of data quality and analysis methods. By leveraging ML techniques, researchers can accelerate the discovery of high-performance materials and optimize OSC designs, paving the way for more efficient and cost-effective solar energy solutions. Figure 2 provides a high-level overview of the several machine learning paradigms. There are three main categories of machine learning methods: supervised, unsupervised, and reinforcement learning (Mahmood and Wang 2021).
Choosing the right ML algorithm is crucial since it has a major impact on how well the predictions turn out. There are several machine learning methods, as shown in Figure 3, so it is not possible for one algorithm to always give the best prediction in every situation. Choosing the right algorithm, which is often done by trial and error, is vital to produce a highly successful model. Based on the data type and the topic at hand, a variety of learning techniques may be utilized. There is not enough room to cover every algorithm in this review article. Nevertheless, the following review articles might provide readers with a more thorough grasp of certain algorithms (Cova and Pais 2019, Schleder, Padilha et al. 2019, Zhou, Song et al. 2019).
The integration of ML with high-performance computing (HPC) offers a streamlined approach to materials discovery, substantially diminishing the time and expenses linked to conventional trial-and-error experimentation.

3. Steps of Machine Learning Applications

Machine learning analysis consists of four fundamental steps, each of which presents unique challenges and requires specific strategies for effective implementation.

3.1. Sample Collection

The initial stage involves gathering data, which can originate from theoretical models and hands-on experiments. Data cleaning or modification may be necessary to eliminate inconsistencies and noise. Data splitting for training and testing sets can significantly impact model performance. Common ratios include 60:40, 70:30, 80:20, and even 90:10. The simplest approach is to use non-overlapping data sets while maintaining record order, such as using 70% for training and 30% for testing. However, this may lead to issues if the response is not uniformly distributed. Random sampling can ensure that answer values span the whole spectrum from lowest to greatest, reducing the risk of bias.
One major challenge is dealing with small datasets, particularly in materials science where obtaining high-quality data can be difficult. A good rule of thumb is to have a minimum of 50 data points for a decent ML model, but some models, like neural networks, require much larger datasets. Addressing this challenge may involve augmenting data through simulations or utilizing data from scholarly journals and databases (Pyzer-Knapp, Pitera et al. 2022, Taffese and Espinosa-Leal 2024).

3.2. Data Preparation and Processing

Fresh data can reveal previously unseen patterns to a machine learning model, but this requires thorough data cleaning to handle missing data and outliers, enhancing model accuracy. Normalizing the scales of various descriptors is crucial for consistent analysis and effective utilization within a single method. Dimensionality reduction techniques, such as principal component analysis (PCA), discriminant analysis (LDA), and independent component analysis (ICA), are essential when dealing with more features (descriptors) than observations or when characteristics have strong correlations. These techniques reduce the feature space size, helping to identify the most important characteristics and improve visualization.
A common pitfall is overlooking the importance of feature engineering, which can significantly impact model performance. Techniques like feature selection and creation of new features based on domain knowledge can enhance the model’s predictive power (Mahmood and Wang 2021).

3.3. Model Building

In the context of organic solar cells, the relationship between performance and parameters is complex. The choice of algorithm significantly impacts the model’s accuracy and generalizability. Each algorithm has unique benefits and drawbacks. Common machine learning algorithms in materials science include classification, clustering, regression, and probability estimation. Classification and regression are typically used to predict material properties, while clustering helps group similar materials, and probability estimation aids in discovering new materials
Choosing the right algorithm involves understanding the nature of the data and the specific problem. For example, regression models might be preferred for continuous data predictions, while classification models are suitable for categorical outcomes. It's crucial to experiment with multiple algorithms and hyperparameters to identify the best-performing model. (Mahmood and Wang 2021).

3.4. Model Evaluation

An effective model has strong performance on both training and testing datasets. Statistical analysis techniques, including root mean squared error (RMSE) and coefficient of determination (R²), are employed to assess model efficacy (Mahmood and Wang 2021).
M S E = i = 1 m 1 m ( x i y i ) 2 R M S E = M S E R 2 = 1 M S E V a r ( y )
Where xi_ is the predicted value, yi_ is the target variable, and Var(y) is the variance of the sample data. The R² value ranges from 0 to 1, with values closer to 1 indicating higher prediction accuracy.
One challenge is ensuring the model’s generalizability to new, unseen data. Techniques such as cross-validation and bootstrapping can help assess model stability and robustness, ensuring that the model performs well beyond the initial test set.
By addressing these practical challenges and implementing detailed strategies at each step, machine learning can be effectively utilized to advance research in organic solar cells, leading to more efficient and innovative solutions.

4. Types of Machines Learning Algorithm

The design and synthesis of materials with beneficial, innovative properties is a highly dynamic field in modern science, fostering considerable research in biomaterials, cell and tissue engineering, organic photovoltaics (OPVs), light-emitting materials, and nanomaterials for various medical and non-medical applications. These advancements involve interdisciplinary efforts from fields including engineering, biology, physics, and chemistry. Although theoretical and computational science is making some headway, experimental science remains the primary focus. Materials designers would greatly benefit from understanding how to anticipate the characteristics of new materials before synthesis and how the macroscopic features of materials relate to the microscopic properties of molecular components.
Machine learning (ML) offers powerful tools to achieve these goals, especially in materials science, where the relationships between structure, properties, and performance are often complex and non-linear. Here, we discuss the types of machine learning algorithms commonly used in materials science, focusing on their applications and benefits in the context of organic solar cells (OSCs).
  • Supervised Learning: Supervised learning algorithms are trained on labeled data, meaning each training example is paired with an output label. These algorithms learn to map inputs to outputs, which is critical for predicting the properties of new materials.(Breiman 2001, Alvarez-Gonzaga and Rodriguez 2024)
    Classification Algorithms: These are used to categorize data into predefined classes. For example, in organic solar cells, classification algorithms can predict whether a new material will act as a donor or acceptor based on its molecular structure.(Chen and Tang 2024)
    • Support Vector Machines (SVM): SVMs are effective in classifying materials based on their electronic properties. For instance, they can help determine which molecular structures are likely to result in high-efficiency donor or acceptor materials for OSCs.
    • Decision Trees and Random Forests: These algorithms identify critical structural features that determine material performance. They can be used to analyze various molecular descriptors and pinpoint which attributes are most influential in achieving high PCE.
    Regression Algorithms: These predict continuous values, such as the power conversion efficiency (PCE) of organic solar cells.
    • Linear Regression: Often used to model the relationship between molecular descriptors and PCE. For example, linear regression can help establish how changes in molecular structure affect the efficiency of OSCs.(Rosenblatt 1958)
    • Neural Networks: Neural networks can capture more complex, non-linear relationships between structure and efficiency. They are particularly useful in modeling the intricate dependencies between various molecular features and the overall performance of OSCs.(Goodfellow 2016)
  • Unsupervised Learning: Unsupervised learning algorithms deal with data without labeled responses. They are useful for discovering hidden patterns or intrinsic structures in the data. (Ain 2010)
    Clustering Algorithms: Clustering algorithms, such as k-Means, can group materials with similar properties, aiding in the identification of promising material families. For instance, clustering can reveal which sets of molecular structures consistently yield high-efficiency OSCs. (Sahu, Yang et al. 2019)
    Dimensionality Reduction Techniques: Techniques like PCA (Principal Component Analysis) reduce the complexity of data while retaining essential patterns, which is crucial when dealing with high-dimensional datasets in materials science. PCA can help identify the most influential factors in determining OSC performance, streamlining the design process.(Padula, Simpson et al. 2019)
  • Reinforcement Learning: Reinforcement learning involves training models through trial and error, using feedback from their actions. This approach can optimize material synthesis processes or experimental procedures to maximize efficiency or yield.
    Q-Learning and Deep Q-Networks (DQN): These techniques can optimize the sequence of synthesis steps to produce materials with desired properties efficiently. For example, reinforcement learning can help refine the fabrication process of OSCs to enhance their stability and efficiency.(Padula, Simpson et al. 2019)
  • Hybrid and Multiscale Modeling: These approaches integrate different modeling techniques to provide a comprehensive understanding of material behavior across various scales.(Padula, Simpson et al. 2019)
    Atomistic or Molecular-Level Models: These models focus on the interactions at the molecular level, which are crucial for understanding the fundamental properties of materials. For instance, molecular dynamics simulations can reveal how molecular vibrations and rotations affect the electronic properties of OSCs.(Frenkel and Smit 2023)
    Continuum or Device-Level Models: These models help in understanding how molecular-level properties translate to macroscopic device performance. For example, continuum models can simulate the charge transport properties in OSCs, providing insights into how molecular arrangements affect overall efficiency.
Combining these models helps in linking the detailed molecular structure with the overall performance of organic solar cells, leading to better optimization strategies. For instance, hybrid modeling can combine molecular dynamics simulations with device-level models to predict how changes at the molecular scale impact device performance.
  • Performance Prediction and Optimization: Performance prediction and optimization involve using computational models, statistical methods, or machine learning techniques to forecast and improve the performance of a system, device, or process.
    Performance Prediction: In the context of organic solar cells, performance prediction involves using models or algorithms to estimate and forecast the characteristics and efficiency of the solar cell based on various factors. This prediction may encompass the expected power conversion efficiency (PCE), short-circuit current density (Jsc), open-circuit voltage (Voc), fill factor (FF), or other key metrics that quantify the effectiveness of the solar cell in converting sunlight into electricity. For example, machine learning models can predict how different material compositions and device architectures will perform under specific operating conditions. (Afzal and Hachmann 2020)
    Optimization Strategies: Optimization involves adjusting parameters such as material composition, device architecture, layer thicknesses, interfaces, or manufacturing processes to maximize efficiency, increase stability, or enhance other desirable characteristics. Machine learning algorithms can be used to identify the optimal combinations of these parameters, significantly reducing the need for extensive trial-and-error experimentation. For instance, genetic algorithms can be employed to explore a vast parameter space and find the best configuration for high-efficiency OSCs.(Padula, Simpson et al. 2019)
  • Materials Discovery and Design: Materials discovery and design involve the systematic search, identification, and development of new materials or the optimization of existing materials with desired properties for specific applications.
    Property Prediction and Screening: Machine learning models can predict the properties of potential materials, allowing researchers to screen large databases and identify promising candidates quickly. For example, predictive models can estimate the electronic properties of new organic molecules, aiding in the discovery of high-performance materials for OSCs.(Butler, Davies et al. 2018, Sahu, Yang et al. 2019)
    Database Mining and High-Throughput Screening: ML algorithms can mine existing databases of materials to identify patterns and correlations that may not be apparent through traditional analysis. High-throughput screening techniques can rapidly evaluate a vast number of materials, accelerating the discovery process.(Jain, Ong et al. 2013)
    Structure-Property Relationships: Understanding the relationships between molecular structure and material properties is crucial for designing new materials. Machine learning can help elucidate these relationships, guiding the rational design of materials with desired characteristics.
    Design and Synthesis: Once promising materials are identified, machine learning can aid in optimizing the synthesis processes to ensure reproducibility and scalability. For example, ML models can suggest optimal reaction conditions to synthesize high-purity materials efficiently.
  • Process and Manufacturing Optimization: Process and manufacturing optimization in the context of organic solar cells involves improving and refining the procedures, techniques, and production methods used in fabricating these photovoltaic devices.(Alvarez-Gonzaga and Rodriguez 2024)
    Process Control and Standardization: Machine learning can be used to develop standardized protocols that ensure consistent quality and performance of OSCs. For example, ML algorithms can monitor production processes in real-time, adjusting parameters to maintain optimal conditions.
    Yield Improvement: By analyzing production data, machine learning can identify factors that influence yield and suggest modifications to improve it. This can lead to higher efficiency and lower costs in OSC manufacturing.
    Scaling Production and Cost Reduction: ML techniques can optimize manufacturing processes to make them more scalable and cost-effective. For instance, predictive models can help in planning resource allocation and minimizing waste.
    Robustness and Reliability: Machine learning can enhance the robustness and reliability of OSCs by identifying and mitigating factors that lead to device degradation. This can result in longer-lasting and more stable solar cells.
  • Pattern Recognition and Data Analysis: Pattern recognition and data analysis involve the systematic process of identifying meaningful patterns, structures, or relationships within datasets, enabling the extraction of valuable insights or information.(Sahu, Yang et al. 2019)
    Data Collection and Preprocessing: Efficient data collection and preprocessing are crucial for ensuring high-quality inputs for ML models. This includes cleaning data, handling missing values, and normalizing data to make it suitable for analysis.
    Exploratory Data Analysis (EDA): EDA techniques help in understanding the underlying patterns and distributions in the data. Visualization tools can provide insights into how different variables interact and influence OSC performance.
    Feature Extraction and Selection: Identifying the most relevant features or descriptors is essential for building accurate ML models. Techniques like PCA can reduce the dimensionality of the data, focusing on the most significant variables.
    Clustering and Classification: Clustering algorithms can group similar data points, helping to identify patterns in material properties. Classification algorithms can categorize materials based on their predicted performance.
    Regression and Prediction: Regression techniques can model the relationships between variables, providing predictions for new data points. These predictions can guide the development of new materials and the optimization of OSCs.
    Anomaly Detection and Outlier Analysis: Identifying anomalies and outliers in the data can reveal potential issues or novel phenomena that warrant further investigation. This can lead to new discoveries and improvements in OSC technology.
    Correlation and Relationship Analysis: Understanding the correlations and relationships between different variables helps in identifying key factors that influence OSC performance. This knowledge can inform the design and optimization of new materials.(Jain, Ong et al. 2013)
By incorporating specific examples and applications relevant to materials science, particularly in organic photovoltaics, this improved discussion provides a more practical and detailed understanding of how machine learning algorithms can be leveraged in the field. This approach not only clarifies the theoretical aspects but also demonstrates their practical utility in advancing materials discovery and optimization.

5. Machine Learning Analysis of Organic Solar Cells

The application of machine learning (ML) analysis significantly improves the effective screening of potential candidates for organic solar cells. Understanding the relationship between molecular attributes and power conversion efficiency (PCE) is crucial. It is essential to examine the relationship between specific device performance metrics and molecular characteristics to meet the requirements of diverse applications, such as high open-circuit voltage of solar cells for energy conversion, elevated short-circuit current VOC, and solar window applications, as well as JSC.
Because ML can forecast performance based on molecular parameters, it has a broad use in the field of organic solar cell research. However, the kind of descriptors that are employed has a significant impact on how accurate an ML model can predict. Descriptors play a crucial role in producing accurate predictions by acting as a translator between researchers and the database. When the goal property is not well defined, choosing candidate descriptions becomes a substantial task. In general, some aspects affect a material's properties, so choosing appropriate descriptors for a certain property is an important step before using machine learning. This is particularly true for microscopic descriptors, whose determination can be costly both computationally and empirically.
An effective material description must satisfy a minimum of three criteria.: (i) it should provide a unique characterization of the material, (ii) it should be sensitive to the target property, and (iii) it should be easy to calculate. When the target property is ambiguous, meeting these criteria becomes challenging, leading to potential setbacks in developing accurate and trustworthy ML models for organic solar cells. This highlights the need for a clear definition of target properties to ensure the selection of relevant and effective descriptors, ultimately improving the success of ML-driven screening processes in organic solar cell research.

5.1. Molecular Descriptors

Molecular descriptors, which describe a molecule's physical and chemical characteristics, are derived from the molecular structure of a compound. They vary in complexity from more basic properties like charge distribution to more intricate ones like the number of a particular atom. Thousands of different categories of molecular descriptors exist, ranging from zero-dimensional (0D) to three-dimensional (3D) ones (Vo, Van Vleet et al. 2019).
Atomic number, atom type, and molecular weight are examples of molecular information that is described using 0D descriptors, which do not imply topology or atom connection. 1D descriptors provide counts and types of chemical fragments. On the other hand, topological and topo-chemical molecular properties are defined by 2D descriptors. Lastly, geometrical information is captured by 3D descriptors, which also contain conformational information like partial surface charges and molecule volume. The majority of the molecule's properties must be provided by an ideal expression, which should also be devoid of unnecessary details.
Different representations of the same molecule can capture a wide range of chemical details, often at varying levels of complexity. Figure 4 showcases some of these different forms. Molecular descriptors, which are straightforward and quick to compute, enable the rapid assessment of a large number of materials.
In 2011, Aspuru-Guzik's research team utilized machine learning to identify potential donor materials for organic photovoltaic (OPV) applications. (Olivares-Amaya, Amador-Bedolla et al. 2011). The team modeled the current–voltage characteristics of 2.6 million molecular structures using linear regression. Based on their feature predictions, they pinpointed benzothiadiazole, pyridinethiadazole, and thienopyrrole as the most promising candidates.
Zhang et al. Created a dataset including 111,000 molecules and trained a machine learning model utilizing random forest (RF) methodology (Pereira, Xiao et al. 2017). By using this model, they forecasted the LUMO and HOMO with an error of less than 0.16 eV, without employing any DFT computations. This can accelerate high-throughput screening of organic semiconductors for solar cells.
Su et al. created a series of innovative acceptors derived from multi-conformational bistricyclic aromatic (BAE) compounds. (Sui, Yang et al. 2019)
They have forecasted their PCE utilizing a machine learning model constructed from experimental data via a cascaded support vector machine (CasSVM). The CasSVM model is an innovative two-tier network (Fig. 5), comprising three subset SVM models that produce JSC, VOC, and FF as outputs in the first tier. The second level was employed to determine the correlation between the outputs of the first level and the final endpoint PCE. The most established CasSVM model has forecasted the PCE value of OPVs with a mean absolute error (MAE) of 0.35 (%), representing about 10% (3.89%) of the average PCE. The R2 value was 0.96. This methodology can be highly beneficial for experimental chemists to evaluate probable candidates prior to synthesis.

5.2. Molecular Fingerprints

Molecular fingerprints are computerized representations of chemical structures that exclude precise structural features such as coordinates. They are utilized to query databases and discern similarities among compounds. Multiple methodologies are available for transforming a molecular structure into a digital representation, such as key-based fingerprints, circular fingerprints, and topological or path-based fingerprints, each encompassing additional subtypes. We suggest consulting supplementary literature on the subject for more comprehensive information. (Cereto-Massagué, Ojeda et al. 2015, Pattanaik and Coley 2020)
In recent years, organic photovoltaics have seen widespread use of non-fullerene acceptors.(Mahmood, Hu et al. 2018, Liu, Xu et al. 2019, Mahmood, Tang et al. 2019, Zhang, Song et al. 2020).
In 2017, Aspuru-Guzik and his team compiled a dataset of over 51,000 non-fullerene acceptors. These acceptors were based on various compounds, including benzothiadiazole (BT), diketopyrrolopyrroles (DPPs), perylene diimides (PDIs), tetraazabenzodifluoranthenes (BFIs), and fluoranthene-fused imides, sourced from the Harvard Clean Energy Project (HCEP).(Lopez, Sanchez-Lengeling et al. 2017).
To regulate the DFT methods for calculating the HOMO and LUMO values of new non-fullerene acceptors, a dataset of 94 experimentally reported molecules was used. Instead of the commonly used linear regression, they opted for Gaussian process regression due to the lack of a linear trend. They applied the Scharber model to estimate the power conversion efficiency (PCE) of organic solar cells, focusing on non-fullerene acceptors and the standard electron-donor material, poly[N-90-heptadecanyl-2,7-carbazole-alt-5,5-(40,70-di-2-thienyl 20,10,30-benzothiadiazole)] (PCDTBT). The DFT-calculated HOMO and LUMO values of the acceptors, along with the experimentally reported values for PCDTBT, were inputs for the Scharber model. To validate the PCE predictions of the Scharber model, they compared them with 49 experimentally reported values, finding only a weak correlation (r = 0.43 and R² = 0.11) (Lopez, Sanchez-Lengeling et al. 2017)
Predicting the power conversion efficiency (PCE) and specific device properties is crucial. To enhance a particular property, it's essential to understand the relationship between that property and the molecular descriptors. This connection helps identify which molecular features influence the property, allowing for targeted improvements (Xie, Wang et al. 2019). For instance, most high-performing organic solar cell (OSC) devices exhibit lower open-circuit voltages (VOC). In bulk heterojunction (BHJ) organic solar cells (OSCs), charge separation is generally associated with considerable voltage losses due to the additional energy necessary to dissociate excitons into free carriers.
This voltage loss in high-performance OSCs is generally around 0.6 V, which is approximately 0.2–0.3 V higher than the losses observed in silicon (c-Si) and gallium arsenide (GaAs) solar cells.(Linderl, Zechel et al. 2017). Non-fullerene acceptors exhibiting extended thin-film absorption and appropriate energy levels can facilitate an optimal balance between VOC and JSC.(Zhang, Liu et al. 2019). Their structural adaptability enables significant modulation of absorption and molecular energy levels. Machine learning can markedly expedite the identification of appropriate materials.
By predicting specific parameters, it can further enhance the power conversion efficiency (PCE). Aspuru-Guzik and his team calibrated the open-circuit voltage (VOC) and short-circuit current density (JSC) values, which were calculated using the Scharber model and available experimental data, based on structural similarity. They derived information from the molecular graph utilizing enhanced connectivity fingerprints and employed a Gaussian process. This calibration technique reduced the functional dependence of the computed properties, enabling high-throughput virtual screening.
In 2019, Sun et al. collected the dataset of 1719 donor materials.(Sun, Zheng et al. 2019). The researchers experimented with various inputs, including seven types of molecular fingerprints, two types of descriptors, ASCII strings, and images. They classified donor materials into two categories based on their power conversion efficiency (PCE): "low" and "high." The models developed using fingerprints exhibited the best performance, achieving an 86.76% accuracy in predicting the PCE class. To validate the machine learning results, they synthesized 10 donor materials, and the model accurately classified eight of these molecules. The experimental results closely matched the predicted outcomes. However, this study's practical value is limited because categorizing PCE into just two broad categories (0–2.9% and 3–14.6%) is much simpler than predicting the PCE of individual semiconductors with precision.
In the same year, Saeki et al. extracted 2.3 million molecules from the Harvard Clean Energy Project database.(Nagasawa, Al-Naamani et al. 2018). Out of the dataset, 1,000 molecules were initially chosen based on their calculated power conversion efficiency (PCE). The researchers used MACCS fingerprints and the extended connectivity fingerprint (ECFP6) key to train their machine learning model. Through random forest (RF) screening, they further narrowed down the selection to 149 molecules. However, the RF method's accuracy for predicting PCE was only 48%. They ultimately selected one polymer for its synthetic feasibility, but the solar cell device made from this polymer had a PCE of 0.53%, significantly lower than the RF prediction of 5.0–5.8%.
Figure 6. Scheme of polymer design by combining RF screening and manual screening/modification Reproduced with permission from (Nagasawa, Al-Naamani et al. 2018).
Figure 6. Scheme of polymer design by combining RF screening and manual screening/modification Reproduced with permission from (Nagasawa, Al-Naamani et al. 2018).
Preprints 142280 g006
This disparity can be attributed to two primary factors. The RF model was first trained on PCE values derived from the Scharber model, which exhibits suboptimal performance. The structures of polymer donors documented in the literature are more intricate than those of the semiconductors in the HCEP database. Notwithstanding these features, the predictive accuracy of the RF model for PCE remains inadequate. Consequently, the machine learning model must enhance its accuracy, and various materials should be synthesized for empirical validation.
Schmidt and colleagues assembled a dataset including 3,989 monomers and developed a model utilizing a grammar variational autoencoder (GVA). (Jørgensen, Mesta et al. 2018). Even without knowing the precise locations of individual atoms, the trained model can calculate the LUMO and lowest optical transition energies. Furthermore, conformations with the required LUMO and optical gap energies can be synthesized using this approach. Deep neural network (DNN) predictions were more accurate than grammar variational autoencoder (GVA) predictions, however forecasting the LUMO still requires density functional theory (DFT) calculations to find the atomic locations. Therefore, it is not possible to bypass DFT calculations when using the DNN model.
When compared to neural networks trained on molecular fingerprints, SMILES, Chemception, and Molecular Graph, their suggested models performed better.
Peng and Zhao utilized convolutional neural networks (CNNs) to develop models for generating and predicting the properties of non-fullerene acceptors. These models aid in the design and analysis of these materials, leveraging the power of CNNs to identify and optimize key characteristics.(Peng and Zhao 2019). Peng and Zhao used various molecular descriptors, including extended-connectivity fingerprints, Coulomb matrices, molecular graphs, bag-of-bonds, and SMILES strings, to construct their models. The depth of the convolutional layers in their CNNs influenced the diversity of the generated non-fullerene acceptors (NFAs). In order to confirm the compounds that were predicted, they used quantum chemistry computations. They employed an attention method to decipher the outcomes of feature extraction using dilated convolution layers in their prediction model. They concluded that graph-based representations of molecules were more effective than string-based representations.
In most experimental studies, donor and acceptor materials for organic solar cells are optimized separately. However, optimizing only one component at a time limits the exploration of potential combinations. Troisi used machine learning to investigate whether these components should be optimized individually or if simultaneous optimization would yield better results (Padula and Troisi 2019). They took molecular fingerprints as their starting point and searched the literature for combinations of 262 donors (D) and 76 acceptors (A). Despite the tiny dataset, they achieved a high accuracy (r = 0.78) by predicting the PCE of BHJ solar cells using these donor-acceptor combinations. The most promising combination was recommended for experimental testing.
An impressive study was conducted by Min et al., who used 565 donor/acceptor pairs from literature to train five machine learning models: linear regression (LR), boosted regression trees (BRT), random forest (RF), and artificial neural networks (ANN). They confirmed that there is a connection between donor-acceptor pairs and PCE predictions for OSC devices made of polymer-NFA. The BRT model has a higher prediction accuracy of 0.71 and the RF model of 0.70. Afterwards, 432 million donor-acceptor pairings had their PCE predicted using these models. After choosing six pairings, they were all integrated into OSC devices, and the experimental PCEs were quite near to the predictions. All synthesized non-fullerene acceptors were from the high-performing Y6 series. The entire study's workflow is illustrated in Figure 7.

5.3. Images

Machine learning has made significant strides in image recognition by identifying features within complex backgrounds and associating them with specific outputs. To put this skill to use, Sun and colleagues trained a deep neural network to detect and automatically categorize chemical structures; this allowed them to estimate the PCE of organic solar cells (Sun, Li et al. 2019). The researchers used unaltered images of chemical structures for their model, which was both fast and low in computational cost, making it feasible to run on a personal computer. This approach achieved an accuracy of 91.02% in predicting the PCE of donor materials. The workflow of this study is illustrated in Figure 8.
However, the study has several limitations. Firstly, the machine learning model was trained using data from the Harvard Clean Energy Project (HCEP), but the molecules reported in the literature are generally extra complex than those in the HCEP database. Secondly, the Scharber model's PCE estimates were based on energy levels calculated using DFT, which are not always accurate. The performance of organic solar cells is influenced by many factors, including the materials in the active layer, solubility, solvent additives, crystallinity, and molecular orientation. Using only images of chemical structures as input does not provide realistic results. Molecular descriptors, which provide more detailed information about the molecules, are a better option compared to just using pictures of the structures.

5.4. Microscopic Properties

Optical gap, charge-carrier mobility, ionization potential, electron affinity, and hole-electron binding energy are some of the microscopic features of organic materials that determine the efficiency of organic solar cells (OSCs). When contrasted with more basic topological descriptors, these microscopic descriptors offer a more grounded view of solar cell applications. However, computing or experimentally determining these microscopic properties can be costly and time-consuming.
To address this, Ma and colleagues used 13 microscopic properties as descriptors to train a model for predicting power conversion efficiency (PCE). They utilized a dataset of 270 small molecules for this purpose. This approach aims to enhance the accuracy of PCE predictions by incorporating detailed microscopic properties, despite the higher computational and experimental costs involved (Sahu, Rao et al. 2018). Power conversion efficiency (PCE) was predicted using a variety of methods, such as artificial neural networks, gradient boosting, and random forest. The gradient boosting model stood out among the rest, achieving an amazing r-value of 0.79. Unfortunately, these models rely on computationally expensive characteristics like excited state and polarizability. Massive, high-throughput virtual screening of possible compounds is hindered by this hefty price tag.
Ma and colleagues utilized Random Forest (RF) and Gradient Boosting Regression Tree (GBRT) algorithms to predict key device characteristics—such as open-circuit voltage (VOC), short-circuit current density (JSC), and fill factor (FF) based on microscopic properties. They found a strong correlation between JSC (r = 0.78) and FF (r = 0.73) with PCE, indicating these factors are reliable predictors of efficiency. However, VOC showed a very weak correlation with PCE (r = 0.15), which aligns with findings from recent studies.(Nagasawa, Al-Naamani et al. 2018). The JSC and FF are found to be poorly correlated (r = 0.33), with almost no correlation between VOC and JSC (r = -0.18) as well as VOC and FF (r = -0.09).
The impact of various descriptors on machine learning models' prediction abilities was studied by Trois and colleagues. Using information from 566 donor/acceptor pairs retrieved from the literature, they trained k-Nearest Neighbors (k-NN), Kernel Ridge Regression (KRR), and Support Vector Regression (SVR) models. To improve the accuracy of these ML models in predicting the performance of organic solar cells, our investigation sought to identify the most effective descriptors. (Zhao, del Cueto et al. 2020) The research made use of both spatial (topological) and temporal (physical) characteristics, including energy levels, molecule size, light absorption, and mixing characteristics. The machine learning models benefited greatly from the structural descriptors. Some physical parameters did correlate strongly with power conversion efficiency (PCE), but these didn't improve the model's predictive capability as the structural descriptors already included this information.
When developing organic semiconductors, a number of building pieces are utilized to construct push-pull conjugated systems. These building blocks include electron-deficient, electron-rich, and p-spacer units. In order to screen 10,000 compounds made from 32 distinct building blocks, Ma and colleagues used machine learning algorithms. Their research set out to deduce how the molecules' characteristics are impacted by the type and configuration of these building pieces. Using their ground and excited states, we were able to calculate their descriptive properties. They found 126 possible candidates with efficiency predictions above 8% using ANN and Gradient Boosting Regression Trees (GBRT) models. This method was effective in finding organic solar cell candidates through screening.
With a Pearson's coefficient (r) of 0.68, the ML model trained by Troisi et al. to forecast device parameters outperformed the Scharber model. (Padula, Simpson et al. 2019).
In organic solar cells, the thermodynamics of mixing the materials in the active layer dictates how the film morphology evolves. Charge transfer and light harvesting are both impacted by this evolution, which in turn affects the device's stability and performance. (Duong, Walker et al. 2012, Ye, Zhao et al. 2017) Investigating the connection between the characteristics of molecular interactions and the phase behavior of thin films is crucial. To achieve this goal, Perea et al. investigated the phase evolution of fullerenes and polymers using the ANN model in conjunction with the Flory-Huggins solution theory. (Perea, Langner et al. 2017). Solubility parameters were predicted using the surface charge distribution and the ANN model. To characterize the stability of polymer-fullerene blends, a figure of merit was developed, which is combined with solubility characteristics. (Fig. 9).

5.5. Energy Levels

The performance of organic solar cells (OSCs) is significantly influenced by the energy levels of the donor and acceptor materials. When there is a mismatch in these energy levels, it can lead to substantial energy loss due to radiative recombination, which in turn reduces the power conversion efficiency (PCE) of the OSCs.(Yuan, Zhang et al. 2020)
In 2017, Aspuru Guzik’s group investigated millions of molecular motifs using 150 million DFT calculations (Hachmann, Olivares-Amaya et al. 2014). PCE was predicted using Scharber’s model (Scharber, Mühlbacher et al. 2006) and the calculated energy level was used as input. Candidates with a PCE of more than 10% were identified.
Automatic thiophene-based polymer production from donor and acceptor units, orbital level calculation using Hu¨ckel-based models, and photovoltaic characteristic evaluation were all reported by Imamura et al. in 2017. (Imamura, Tashiro et al. 2017) PCE was calculated using Scharber’s model, but its performance is very poor.(Pyzer-Knapp, Simm et al. 2016, Lopez, Sanchez-Lengeling et al. 2017, Padula, Simpson et al. 2019). Molecular descriptors and microscopic properties of semi-conductors were totally ignored.
With a training set R2 of 0.85 and a testing set R2 of 0.80, Min-Hsuan Lee demonstrated excellent prediction accuracy using Random Forest (RF) modeling on a database including 4100 bulk heterojunction solar cells.(Lee 2020).
As discussed earlier, various examples of machine learning applications in binary solar cells have been highlighted. However, ternary organic solar cells (OSCs) generally exhibit better performance than binary ones. One of the main issues with binary OSCs is their limited light harvesting capability due to the narrow absorption range of organic semiconductors. In contrast, ternary OSCs include a third component, which can be either a donor or an acceptor. This additional component not only enhances photon harvesting by serving as an extra absorber but also contributes to achieving a more favorable morphology.(Yue, Liu et al. 2020), The operation of ternary solar cells is more intricate than that of binary solar cells, making the identification of optimal third components for ternary solar cells a tough endeavor. (Gao, Gao et al. 2020, Liu, Ma et al. 2020). Min-Hsuan Lee has developed a machine learning model for ternary solar cells utilizing Random Forest, Gradient Boosting, k-Nearest Neighbors (k-NN), Linear Regression, and Support Vector Regression. The LUMO value of the donor (D1) exhibited a significant linear connection with PCE (r = -0.55), but the correlations of other markers with PCE were minor. (Lee 2019). The VOC value has a strong correlation with the donor's HOMO (r = -0.54) and LUMO (r = -0.54), indicating that the donor's energy levels require additional examination to elucidate the origin of VOC in ternary organic solar cells (OSCs). The Random Forest model exhibited the highest R2 score (0.77 on the test set) across all machine learning approaches. In a separate work, he developed the machine learning model to forecast the voltage of operation characteristics of fullerene derivative-based ternary organic solar cells. The descriptions were identical to those in a prior study. (Lee 2020). The Random Forest model exhibited a R² score of 0.77. Both investigations utilized only the energy levels of organic semiconductors as descriptors, neglecting other chemical descriptors and the influence of thin film shape. Enhancing the efficiency of organic solar cells necessitates the development of a hybrid modeling framework that integrates thin-film features, including the optimal ratio of the three components, and fabrication parameters, such as annealing temperature and solvent additives. By controlling these variables, we may improve charge generation and minimize voltage loss, hence increasing total device efficiency.
(Zhou, Xu et al. 2018) Theoretical analysis of the morphology of the three components is much more complex than that of two components.
Tandem organic solar cells are known for their superior power conversion efficiency (PCE). These cells consist of two sub-cells, designed to extend the range of photon response and minimize both transmission and thermalization losses. This dual-layer architecture enhances the overall efficiency by capturing a broader spectrum of light and reducing energy losses that typically occur in single-junction solar cells.(Liu, Jia et al. 2019). Developing a correlation between the efficiency and physical properties of active layer materials in organic solar cells is particularly challenging due to the vast diversity of organic materials available. This diversity results in a multitude of potential candidate materials, making the task more complex. To address this issue, Min-Hsuan Lee employed machine learning algorithms to predict the efficiency of tandem organic solar cells and identify optimal bandgap combinations for these devices. This approach helps streamline the selection process, making it more efficient and effective in finding high-performing material combinations.(Lee 2020). Random Forest regression was employed to predict the efficiency of tandem organic solar cells using energy levels as input data. The findings suggest that optimizing the energy offset in the lowest unoccupied molecular orbital (LUMO) level between the donor and acceptor materials can significantly enhance electron transfer and overall device performance. This optimization is crucial for improving the efficiency and effectiveness of the solar cells.

5.6. Simulated Properties

The efficiency of organic solar cells is largely governed by the morphology of the film. To further enhance power conversion efficiency (PCE), it is crucial to have a thorough understanding of this film morphology. In addition to experimental methods, mathematical simulations can also be employed to explore the film's structure and analyze how various parameters affect it. This combination of experimental and computational approaches provides a comprehensive understanding that can lead to significant improvements in solar cell performance.
These simulations generally consist of two primary phases: the representation phase and the mapping phase. During the representation phase, a mathematical foundation is established to produce microstructures. In the mapping phase, the created microstructures are correlated with a certain desired attribute. The application of graph theory in the analysis of the microstructure of organic solar cells (OSCs) is gaining traction, since it offers a reliable approach to elucidating the correlation between microstructure and device performance. (Du, Zebrowski et al. 2018, Pfeifer, Pokuri et al. 2018, Noruzi, Ghadai et al. 2020). For example, Ganapathy Subramanian and colleagues used a graph-based approach to study morphology descriptors in organic solar cells (OSCs). They analyzed multiple mechanisms, including photon absorption, exciton diffusion, charge separation, and charge transport, offering a comprehensive assessment of how these elements affect the efficiency and performance of organic solar cells (OSCs). (Wodo, Tirthapura et al. 2012). A strong association was shown between the graph-based technique and the computationally demanding method. In a separate investigation, they employed CNN to correlate film morphology with short-circuit current (JSC). (Pokuri, Ghosal et al. 2019).
They resolved the thermodynamically consistent Cahn-Hilliard equation for binary phase separation via an in-house finite element library. (Wodo and Ganapathysubramanian 2012) A total of B65000 morphologies were generated. JSC, was evaluated for each morphology using the excitonic drift-diffusion equation. (Kodali and Ganapathysubramanian 2012). CNN using morphologies as input and JSC as output showed a classification accuracy of(Kodali and Ganapathysubramanian 2012) .80% (Fig. 10).
Figure 10. (a) Simple sketch of CNN architecture and (b) Confusion matrix. Reproduced with permission from (Pokuri, Ghosal et al. 2019).
Figure 10. (a) Simple sketch of CNN architecture and (b) Confusion matrix. Reproduced with permission from (Pokuri, Ghosal et al. 2019).
Preprints 142280 g010
MacKenzie et al. used the Shockley–Read–Hall based drift diffusion model to simulate current/voltage ( JV) curves. (Majeed, Saladina et al. 2020). A total of 20,000 devices were produced, and electrical parameters including carrier trapping rates, energy disorder, trap densities, recombination time constants, and parasitic resistances were computed. Simulated data were employed to train the neural network.
After training the model, it was applied to the investigation of charge carrier dynamics in several famous OSC devices to determine the effect of surfactant choice and annealing temperature.
The solubility of the materials in the active layer of an organic solar cell in a specific solvent plays a crucial role in determining the film morphology, which in turn affects the device's performance. Risko and colleagues tackled this by calculating the free energy of mixing using molecular dynamics (MD) simulations. They also employed Bayesian statistics to further refine these calculations. This method provides a quick and efficient way to study a wide variety of solvents and solvent additives, helping to optimize the performance of the solar cells.

6. Problems and Future Prospects

6.1. Data Infrastructure

The ML model for screening OPV compounds is often trained using data from the Harvard Clean Energy Project (HCEP). However, the complexity of molecules reported in the literature usually far exceeds that of the HCEP. This disparity can lead to inaccurate ML predictions. Training an ML model effectively requires a vast amount of data. In areas like image recognition, data availability is not an issue, with millions of input datasets available. In contrast, organic solar cells have data only in the hundreds or thousands. The accuracy of ML models reportedly improves as the number of data points (molecules) increases(Pyzer-Knapp, Li et al. 2015, Sun, Li et al. 2019, Lee 2020).
Including massive amounts of data in ML models trained using power conversion process descriptors is challenging because DFT computations are computationally intensive. A potential solution for small datasets is meta-learning, which involves learning from both within and across problems. Another viable approach for dealing with sparse data is a Bayesian framework. Striking a balance between data accessibility and model predictive power requires a two-pronged approach. The degree of freedom (DoF) of the model can mediate the influence of data size on model precision, potentially leading to a relationship between precision and DoF. This concept is theoretically grounded in the statistical bias-variance trade-off (Zhang and Ling 2018).
A significant negative point is the lack of high-quality, extensive datasets specifically tailored for OSCs. This shortage impedes the development of highly accurate ML models, as the limited and less complex data often fail to capture the intricacies of real-world OSC materials.

6.2. Descriptor Selection

A crucial step in ML modeling is the selection of molecular descriptors. While fingerprints and molecular descriptors are simple and quick to compute, they are not ideal for modeling organic solar cells. Understanding photovoltaic processes requires precise quantum computations on a small scale, which are prohibitively costly for rapid virtual screening on a broad scale. There needs to be an appropriate compromise between precision and quickness. Developing a new generation of descriptors specifically for organic semiconductors is critically needed, along with accurate and conveniently accessible fingerprints.

6.3. Multidimensional Design

Many models account for power conversion efficiency (PCE) by correlating chemical structures but fail to consider miscibility and film morphology. Applying the theories of Flory and Huggins might enhance ML approaches. Prediction accuracy could be improved by including data from grazing-incidence small-angle X-ray scattering (GISAXS), atomic force microscopy (AFM), transmission electron microscopy (TEM), and grazing incidence wide-angle X-ray scattering (GIWAXS)(Mahmood and Wang 2020).
Despite machine learning's apparent mastery of everyday image analysis, results from these methods vary significantly. It is exceedingly difficult to perform ML analysis on images generated by the aforementioned techniques because, unlike everyday photos, they contain unique characters. Microscope images are often associated with high-level noise and aberrations (Jones and Nellist 2013). Experimental images also correlate highly with the physicochemical characteristics of materials, mixing conditions, and experimental settings, adding complexity. Additionally, the physical significance of images captured using various techniques varies, necessitating different analytical approaches.
Due to multiple images for a single compound under varying experimental circumstances, accumulating a comprehensive dataset is laborious. Implementing automatic image extraction and sorting is challenging, necessitating human intervention. Analysis and specification of tasks, such as data label decision-making or target property selection, will constitute the second stage. The fill factor (FF) values are heavily affected by the active layer's morphology (Zawodzki, Resel et al. 2015). Thus, choosing FF as the objective instead of PCE might be more practical. The correlation between FF and other components can then be determined using PCE. Another consideration is whether to use classification or regression. Classification may be more appropriate for smaller datasets, and vice versa. Training the model and extracting patterns to provide a forecast will be the third stage. Experimental validation is the final stage. Linking visuals with performance is an uphill battle but ultimately rewarding (Pokuri, Stimes et al. 2019).

6.4. Experimental Validation

The use of machine learning in OSC research is increasing, as noted in the literature. High-throughput screening is expected to continue progressing. Typically, materials are screened using heuristic rules, but these rules do not guarantee that materials can be synthesized, as their synthesis techniques are not always known. Collaboration with experimental professionals is essential to enhance the accuracy of machine predictions. Once candidates are identified by ML, manual examination based on synthetic aspects is recommended, followed by experimental validation. However, the number of cases where experiments validate ML predictions is relatively small. Sun et al. confirmed ML findings by synthesizing ten donor materials, with eight compounds correctly categorized by the model (Sun, Zheng et al. 2019). Saeki et al. found a PCE of 0.53% in their OSC device fabrication and donor synthesis, significantly lower than the random forest (RF) forecast of 5.0-5.8% (Nagasawa, Al-Naamani et al. 2018). Min et al. manufactured six donor-acceptor pairs, with most devices exhibiting a PCE close to the predicted values(Wu, Guo et al. 2020).

6.5. Development of Better Software

Most current ML technologies require programming skills, limiting their use to individuals with extensive knowledge of data science and computer programming. However, these individuals often lack a deep understanding of the fundamental processes involved. This gap occasionally leads to misinterpretation of results. While organic solar cells are a hot topic among experimental scientists, they typically lack training in ML. To address this issue, developing user-friendly software with intuitive graphical user interfaces for material specialists is beneficial. This way, experts can harness the full potential of data-driven research without worrying about complex syntax or esoteric tuning settings.
Conclusions
Machine learning (ML) models have demonstrated substantial potential in predicting critical parameters for organic solar cells (OSCs), such as energy levels (HOMO and LUMO), UV/visible absorption maxima in both solution and film states, and power conversion efficiency (PCE). These models utilize a diverse array of inputs, including images, microscopic properties, energy levels, molecular fingerprints, biochemical descriptors, and simulated features. The efficacy of these ML models is profoundly influenced by the nature of the inputs and the methodologies employed. ML is pivotal in surmounting the challenges associated with the rapid identification of effective organic semiconductors for OSCs.
Despite these advancements, several formidable challenges persist. A primary issue is the quality and heterogeneity of the available data. The complex operational principles of OSCs and the scattered nature of the data present significant obstacles to training robust ML models. Moreover, the predictive accuracy of ML models is highly contingent upon the quality of the input data and the chosen algorithms, often necessitating extensive trial and error to achieve optimization.
The increasing focus on ML in scholarly research underscores its growing significance in OSC studies. While notable successes have been achieved, numerous challenges remain, including the need for enhanced data integration, more precise predictive models, and effective strategies for managing the diverse and complex datasets inherent to OSC research.
This review has highlighted several challenges and proposed potential solutions to address them. As technology advances, the application of ML in the study of OSCs is expected to grow, eventually surpassing traditional experimental approaches that rely on trial and error. Thus, promoting the use of ML in OSC research is crucial.
The transformative potential of ML in materials discovery is bolstered by the availability of open-source tools and data sharing initiatives. Although the current state of ML application in OSC research falls short of its full potential, ongoing advancements in technology and methodology are bridging this gap. Similar to current computational and experimental methods, ML is poised to become an indispensable tool in OSC research.
In conclusion, while ML offers considerable advantages for the discovery and optimization of materials for organic solar cells, it also presents several challenges that must be addressed. Enhancing data quality, refining algorithms, and developing robust models are essential steps to fully realize the potential of ML in this field. By overcoming these challenges, researchers can expedite the discovery of high-performance materials, ultimately leading to more efficient and cost-effective solar energy solutions.

References

  1. Lopez, S. A. , et al. (2017). "Design principles and top non-fullerene acceptor candidates for organic photovoltaics." Joule 1(4): 857-870. [CrossRef]
  2. Ding, P. , et al. (2024). "Stability of organic solar cells: toward commercial applications." Chemical Society Reviews 53(5): 2350-2387. [CrossRef]
  3. Wang, Y. , et al. (2023). "The critical role of the donor polymer in the stability of high-performance non-fullerene acceptor organic solar cells." Joule 7(4): 810-829. [CrossRef]
  4. Afzal, M. A. F. and J. Hachmann (2020). High-throughput computational studies in catalysis and materials research, and their impact on rational design. HANDBOOK ON BIG DATA AND MACHINE LEARNING IN THE PHYSICAL SCIENCES: Volume 1. Big Data Methods in Experimental Materials Discovery, World Scientific: 1-44. [CrossRef]
  5. Ain, A. (2010). "Data clustering: 50 years beyond K-means [J]." Pattern Recognition Letters 31(8): 651-666. [CrossRef]
  6. Alvarez-Gonzaga, O. A. and J. I. Rodriguez (2024). "Machine learning models with different cheminformatics data sets to forecast the power conversion efficiency of organic solar cells." arXiv preprint. arXiv:2410.23444.
  7. Breiman, L. (2001). "Random forests." Machine learning 45: 5-32. [CrossRef]
  8. Butler, K. T. , et al. (2018). "Machine learning for molecular and materials science." Nature 559(7715): 547-555. [CrossRef]
  9. Chen, G. and D.-M. Tang (2024). "Machine Learning as a “Catalyst” for Advancements in Carbon Nanotube Research." Nanomaterials 14(21): 1688. [CrossRef]
  10. Frenkel, D. and B. Smit (2023). Understanding molecular simulation: from algorithms to applications, Elsevier.
  11. Goodfellow, I. (2016). Deep learning, MIT press. [CrossRef]
  12. Jain, A. , et al. (2013). "Commentary: The Materials Project: A materials genome approach to accelerating materials innovation." APL materials 1(1). [CrossRef]
  13. Padula, D. , et al. (2019). "Combining electronic and structural features in machine learning models to predict organic solar cells properties." Materials Horizons 6(2): 343-349. [CrossRef]
  14. Rosenblatt, F. (1958). "The perceptron: a probabilistic model for information storage and organization in the brain." Psychological review 65(6): 386. [CrossRef]
  15. Sahu, H. , et al. (2019). "Designing promising molecules for organic solar cells via machine learning assisted virtual screening." Journal of Materials Chemistry A 7(29): 17480-17488. [CrossRef]
  16. Cereto-Massagué, A. , et al. (2015). "Molecular fingerprint similarity search in virtual screening." Methods 71: 58-63. [CrossRef]
  17. Cova, T. and A. Pais (2019). Deep learning for deep chemistry: optimizing the prediction of chemical patterns. Front Chem 7: 809. [CrossRef]
  18. Du, P. , et al. (2018). "Microstructure design using graphs." npj Computational Materials 4(1): 50. [CrossRef]
  19. Du, X. , et al. (2019). "Efficient polymer solar cells based on non-fullerene acceptors with potential device lifetime approaching 10 years." Joule 3(1): 215-226. [CrossRef]
  20. Duong, D. T. , et al. (2012). "Molecular solubility and hansen solubility parameters for the analysis of phase separation in bulk heterojunctions." Journal of Polymer Science Part B: Polymer Physics 50(20): 1405-1413. [CrossRef]
  21. Gao, J. , et al. (2020). "Over 14.5% efficiency and 71.6% fill factor of ternary organic solar cells with 300 nm thick active layers." Energy & environmental science 13(3): 958-967. [CrossRef]
  22. Gu, G. H. , et al. (2019). "Machine learning for renewable energy materials." Journal of Materials Chemistry A 7(29): 17096-17117. [CrossRef]
  23. Hachmann, J. , et al. (2014). "Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry–the Harvard Clean Energy Project." Energy & environmental science 7(2): 698-704. [CrossRef]
  24. Imamura, Y. , et al. (2017). "Automatic high-throughput screening scheme for organic photovoltaics: estimating the orbital energies of polymers from oligomers and evaluating the photovoltaic characteristics." The Journal of Physical Chemistry C 121(51): 28275-28286. [CrossRef]
  25. Jones, L. and P. D. Nellist (2013). "Identifying and correcting scan noise and drift in the scanning transmission electron microscope." Microscopy and Microanalysis 19(4): 1050-1060. [CrossRef]
  26. Jørgensen, P. B. , et al. (2018). "Machine learning-based screening of complex molecules for polymer solar cells." The Journal of chemical physics 148(24). [CrossRef]
  27. Kodali, H. K. and B. Ganapathysubramanian (2012). "Computer simulation of heterogeneous polymer photovoltaic devices." Modelling and Simulation in Materials Science and Engineering 20(3): 035015. [CrossRef]
  28. Lee, M.-H. (2020). "A Machine Learning–Based Design Rule for Improved Open-Circuit Voltage in Ternary Organic Solar Cells." Advanced Intelligent Systems 2(1): 1900108. [CrossRef]
  29. Lee, M.-H. (2020). "Performance and matching band structure analysis of tandem organic solar cells using machine learning approaches." Energy Technology 8(3): 1900974. [CrossRef]
  30. Lee, M.-H. (2020). "Robust random forest based non-fullerene organic solar cells efficiency prediction." Organic Electronics 76: 105465. [CrossRef]
  31. Lee, M. H. (2019). "Insights from machine learning techniques for predicting the efficiency of fullerene derivatives-based ternary organic solar cells at ternary blend design." Advanced Energy Materials 9(26): 1900891. [CrossRef]
  32. Linderl, T. , et al. (2017). "Energy Losses in Small-Molecule Organic Photovoltaics." Advanced Energy Materials 7(16): 1700237. [CrossRef]
  33. Liu, G. , et al. (2019). "15% efficiency tandem organic solar cell based on a novel highly efficient wide-bandgap nonfullerene acceptor with low energy loss." Advanced Energy Materials 9(11): 1803657. [CrossRef]
  34. Liu, K.-K. , et al. (2019). "Achieving high-performance non-halogenated nonfullerene acceptor-based organic solar cells with 13.7% efficiency via a synergistic strategy of an indacenodithieno [3, 2-b] selenophene core unit and non-halogenated thiophene-based terminal group." Journal of Materials Chemistry A 7(42): 24389-24399. [CrossRef]
  35. Liu, T. , et al. (2020). "Concurrent improvement in J sc and V oc in high-efficiency ternary organic solar cells enabled by a red-absorbing small-molecule acceptor with a high LUMO level." Energy & environmental science 13(7): 2115-2123. arXiv:10.1039/D0EE00662A.
  36. Lopez, S. A. , et al. (2017). "Design principles and top non-fullerene acceptor candidates for organic photovoltaics." Joule 1(4): 857-870. [CrossRef]
  37. Majeed, N. , et al. (2020). "Using deep machine learning to understand the physical performance bottlenecks in novel thin-film solar cells." Advanced Functional Materials 30(7): 1907259. arXiv:10.1002/adfm.201907259.
  38. Mahmood, A. , et al. (2018). "Recent progress in porphyrin-based materials for organic solar cells." Journal of Materials Chemistry A 6(35): 16769-16797. [CrossRef]
  39. Mahmood, A. , et al. (2019). "First-principles theoretical designing of planar non-fullerene small molecular acceptors for organic solar cells: manipulation of noncovalent interactions." Physical Chemistry Chemical Physics 21(4): 2128-2139. [CrossRef]
  40. Mahmood, A. and J.-L. Wang (2021). "Machine learning for high performance organic solar cells: current scenario and future prospects." Energy & environmental science 14(1): 90-105. [CrossRef]
  41. Mahmood, A. and J. L. Wang (2020). "A review of grazing incidence small-and wide-angle x-ray scattering techniques for exploring the film morphology of organic solar cells." Solar RRL 4(10): 2000337. [CrossRef]
  42. Nagasawa, S. , et al. (2018). "Computer-aided screening of conjugated polymers for organic solar cell: classification by random forest." The Journal of Physical Chemistry Letters 9(10): 2639-2646. [CrossRef]
  43. Noruzi, R. , et al. (2020). "NURBS-based microstructure design for organic photovoltaics." Computer-Aided Design 118: 102771. [CrossRef]
  44. Olivares-Amaya, R. , et al. (2011). "Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics." Energy & environmental science 4(12): 4849-4861. [CrossRef]
  45. Padula, D. , et al. (2019). "Combining electronic and structural features in machine learning models to predict organic solar cells properties." Materials Horizons 6(2): 343-349. [CrossRef]
  46. Padula, D. and A. Troisi (2019). "Concurrent optimization of organic donor–acceptor pairs through machine learning." Advanced Energy Materials 9(40): 1902463. [CrossRef]
  47. Pattanaik, L. and C. W. Coley (2020). "Molecular representation: going long on fingerprints." Chem 6(6): 1204-1207. [CrossRef]
  48. Peng, S.-P. and Y. Zhao (2019). "Convolutional neural networks for the design and analysis of non-fullerene acceptors." Journal of chemical information and modeling 59(12): 4993-5001. [CrossRef]
  49. Perea, J. D. , et al. (2017). "Introducing a new potential figure of merit for evaluating microstructure stability in photovoltaic polymer-fullerene blends." The Journal of Physical Chemistry C 121(33): 18153-18161. [CrossRef]
  50. Pereira, F. , et al. (2017). "Machine learning methods to predict density functional theory B3LYP energies of HOMO and LUMO orbitals." Journal of chemical information and modeling 57(1): 11-21. [CrossRef]
  51. Pfeifer, S. , et al. (2018). "Process optimization for microstructure-dependent properties in thin film organic electronics." Materials Discovery 11: 6-13. [CrossRef]
  52. Pokuri, B. S. S. , et al. (2019). "Interpretable deep learning for guided microstructure-property explorations in photovoltaics." npj Computational Materials 5(1): 95. [CrossRef]
  53. Pokuri, B. S. S. , et al. (2019). "GRATE: A framework and software for GRaph based Analysis of Transmission Electron Microscopy images of polymer films." Computational Materials Science 163: 1-10. [CrossRef]
  54. Pyzer-Knapp, E. O. , et al. (2022). "Accelerating materials discovery using artificial intelligence, high performance computing and robotics." npj Computational Materials 8(1): 84. [CrossRef]
  55. Pyzer-Knapp, E. O. , et al. (2016). "A Bayesian approach to calibrating high-throughput virtual screening results and application to organic photovoltaic materials." Materials Horizons 3(3): 226-233. [CrossRef]
  56. Pyzer-Knapp, E. O. , et al. (2015). "Learning from the harvard clean energy project: The use of neural networks to accelerate materials discovery." Advanced Functional Materials 25(41): 6495-6502. [CrossRef]
  57. Sahu, H. , et al. (2018). "Toward predicting efficiency of organic solar cells via machine learning and improved descriptors." Advanced Energy Materials 8(24): 1801032. [CrossRef]
  58. Sanchez-Lengeling, B. and A. Aspuru-Guzik (2018). "Inverse molecular design using machine learning: Generative models for matter engineering." Science 361(6400): 360-365. [CrossRef]
  59. Scharber, M. C. , et al. (2006). "Design rules for donors in bulk-heterojunction solar cells—Towards 10% energy-conversion efficiency." Advanced materials 18(6): 789-794. [CrossRef]
  60. Schleder, G. R. , et al. (2019). "From DFT to machine learning: recent approaches to materials science–a review." Journal of Physics: Materials 2(3): 032001. [CrossRef]
  61. Sui, M.-Y. , et al. (2019). "Nonfullerene acceptors for organic photovoltaics: from conformation effect to power conversion efficiencies prediction." Solar RRL 3(11): 1900258. [CrossRef]
  62. Sun, W. , et al. (2019). "The use of deep learning to fast evaluate organic photovoltaic materials." Advanced Theory and Simulations 2(1): 1800116. [CrossRef]
  63. Sun, W. , et al. (2019). Machine learning-assisted molecular design and efficiency prediction for high-performance organic photo-voltaic materials. Sci Adv 5 (11): eaay4275. [CrossRef]
  64. Sun, W. , et al. (2019). "Machine learning–assisted molecular design and efficiency prediction for high-performance organic photovoltaic materials." Science advances 5(11): eaay4275. [CrossRef]
  65. Taffese, W. Z. and L. Espinosa-Leal (2024). "Unveiling non-steady chloride migration insights through explainable machine learning." Journal of Building Engineering 82: 108370. [CrossRef]
  66. Vo, A. H. , et al. (2019). "An overview of machine learning and big data for drug toxicity evaluation." Chemical research in toxicology 33(1): 20-37. [CrossRef]
  67. Wadsworth, A. , et al. (2019). "Critical review of the molecular design progress in non-fullerene electron acceptors towards commercially viable organic solar cells." Chemical Society Reviews 48(6): 1596-1625. [CrossRef]
  68. Wan, X. , et al. (2020). "Acceptor–donor–acceptor type molecules for high performance organic photovoltaics–chemistry and mechanism." Chemical Society Reviews 49(9): 2828-2842. [CrossRef]
  69. Wang, H. , et al. (2023). "Efficient screening framework for organic solar cells with deep learning and ensemble learning." npj Computational Materials 9(1): 200. [CrossRef]
  70. Wodo, O. and B. Ganapathysubramanian (2012). "Modeling morphology evolution during solvent-based fabrication of organic solar cells." Computational Materials Science 55: 113-126. [CrossRef]
  71. Wodo, O. , et al. (2012). "A graph-based formulation for computational characterization of bulk heterojunction morphology." Organic Electronics 13(6): 1105-1113. [CrossRef]
  72. Wu, Y. , et al. (2020). "Machine learning for accelerating the discovery of high-performance donor/acceptor pairs in non-fullerene organic solar cells." npj Computational Materials 6(1): 120. [CrossRef]
  73. Xie, Y. , et al. (2019). "Assessing the energy offset at the electron donor/acceptor interface in organic solar cells through radiative efficiency measurements." Energy & environmental science 12(12): 3556-3566. [CrossRef]
  74. Ye, L. , et al. (2017). "High-efficiency nonfullerene organic solar cells: critical factors that affect complex multi-length scale morphology and device performance." Advanced Energy Materials 7(7): 1602000. [CrossRef]
  75. Yuan, J. , et al. (2020). "Reducing voltage losses in the A-DA′ DA acceptor-based organic solar cells." Chem 6(9): 2147-2161. [CrossRef]
  76. Yue, Q. , et al. (2020). "n-Type molecular photovoltaic materials: design strategies and device applications." Journal of the American Chemical Society 142(27): 11613-11628. [CrossRef]
  77. Zawodzki, M. , et al. (2015). "Interfacial morphology and effects on device performance of organic bilayer heterojunction solar cells." ACS applied materials & interfaces 7(30): 16161-16168. [CrossRef]
  78. Zhang, C. , et al. (2020). "Electron-Deficient and Quinoid Central Unit Engineering for Unfused Ring-Based A1–D–A2–D–A1-Type Acceptor Enables High Performance Nonfullerene Polymer Solar Cells with High Voc and PCE Simultaneously." Small 16(22): 1907681. [CrossRef]
  79. Zhang, J. , et al. (2019). "Revealing the critical role of the HOMO alignment on maximizing current extraction and suppressing energy loss in organic solar cells." IScience 19: 883-893. [CrossRef]
  80. Zhang, Y. and C. Ling (2018). "A strategy to apply machine learning to small datasets in materials science." npj Computational Materials 4(1): 25. [CrossRef]
  81. Zhao, Z.-W. , et al. (2020). "Effect of increasing the descriptor set on machine learning prediction of small molecule-based organic solar cells." Chemistry of Materials 32(18): 7777-7787. [CrossRef]
  82. Zhou, T. , et al. (2019). "Big data creates new opportunities for materials research: a review on methods and applications of machine learning for materials design." Engineering 5(6): 1017-1026. [CrossRef]
  83. Zhou, X. , et al. (2019). "Enhanced light-harvesting of benzodithiophene conjugated porphyrin electron donors in organic solar cells." Journal of Materials Chemistry C 7(2): 380-386. [CrossRef]
  84. Zhou, Z. , et al. (2018). "High-efficiency small-molecule ternary solar cells with a hierarchical morphology enabled by synergizing fullerene and non-fullerene acceptors." Nature Energy 3(11): 952-959. [CrossRef]
Figure 1. Computer assisted design and screening of materials for organic solar cells. Reproduced from (Mahmood and Wang 2021).
Figure 1. Computer assisted design and screening of materials for organic solar cells. Reproduced from (Mahmood and Wang 2021).
Preprints 142280 g001
Figure 2. Types of machine learning methods.
Figure 2. Types of machine learning methods.
Preprints 142280 g002
Figure 3. Application forms of machine learning with their respective algorithms.
Figure 3. Application forms of machine learning with their respective algorithms.
Preprints 142280 g003
Figure 4. Different types of molecular representations applied to one molecule, AQDS, which is used in the construction of organic redox flow batteries. Clockwise from top: (1) A fingerprint vector that quantifies presence or absence of molecular environments; (2) SMILES strings that use simplified text encodings to describe the structure of a chemical species; (3) potential energy functions that could model interactions or symmetries; (4) a graph with atom and bond weights; (5) Coulomb matrix; (6) bag of bonds and bag of fragments; (7) 3D geometry with associated atomic charges; and (8) the electronic density. Reproduced with permission from (Sanchez-Lengeling and Aspuru-Guzik 2018).
Figure 4. Different types of molecular representations applied to one molecule, AQDS, which is used in the construction of organic redox flow batteries. Clockwise from top: (1) A fingerprint vector that quantifies presence or absence of molecular environments; (2) SMILES strings that use simplified text encodings to describe the structure of a chemical species; (3) potential energy functions that could model interactions or symmetries; (4) a graph with atom and bond weights; (5) Coulomb matrix; (6) bag of bonds and bag of fragments; (7) 3D geometry with associated atomic charges; and (8) the electronic density. Reproduced with permission from (Sanchez-Lengeling and Aspuru-Guzik 2018).
Preprints 142280 g004
Figure 5. The structure of the cascaded SVM QSAR model. Sub-1–3 are the input descriptors, respectively, for JSC, VOC, and FF. SVM1B4 are the subset SVM models used for the prediction of JSC, VOC, FF and PCE, respectively. Reproduced with permission from (Sui, Yang et al. 2019).
Figure 5. The structure of the cascaded SVM QSAR model. Sub-1–3 are the input descriptors, respectively, for JSC, VOC, and FF. SVM1B4 are the subset SVM models used for the prediction of JSC, VOC, FF and PCE, respectively. Reproduced with permission from (Sui, Yang et al. 2019).
Preprints 142280 g005
Figure 7. Workflow of building, application, and evaluations of machine learning methods. (a) Scheme of collecting experimental data and converting chemical structure to digitized data. (b) Scheme of machine training, predicting, and method evaluation Reproduced with permission from (Wu, Guo et al. 2020).
Figure 7. Workflow of building, application, and evaluations of machine learning methods. (a) Scheme of collecting experimental data and converting chemical structure to digitized data. (b) Scheme of machine training, predicting, and method evaluation Reproduced with permission from (Wu, Guo et al. 2020).
Preprints 142280 g007
Figure 8. Structure of the convolutional neural network (CNN). Reproduced with permission from (Sun, Li et al. 2019).
Figure 8. Structure of the convolutional neural network (CNN). Reproduced with permission from (Sun, Li et al. 2019).
Preprints 142280 g008
Figure 9. Computational flowchart describing the routine for determining the relative stability capable of describing the microstructure of polymer:fullerene blends. (i) Creation of the s-profile from the conductor-like screening model (COSMO); (ii) s-moments as extracted from COSMO are fed into an artificial neural network (ANN) to determine Hansen solubility parameters (HSPs); (iii) HSPs are used to calculate the qualitative Flory–Huggings interaction parameters (w1,2); (iv) implementation of moiety-monomer-structure properties (reduced molar volumes/weights); (v) spinodal demixing diagrams resulting from polymer blend theory; and (vi) figure of merit (FoM) defined as the ratio of the Flory–Huggins intermolecular parameter and the spinodal diagram forms the basis of a relative stability metric Reproduced with permission from (Perea, Langner et al. 2017).
Figure 9. Computational flowchart describing the routine for determining the relative stability capable of describing the microstructure of polymer:fullerene blends. (i) Creation of the s-profile from the conductor-like screening model (COSMO); (ii) s-moments as extracted from COSMO are fed into an artificial neural network (ANN) to determine Hansen solubility parameters (HSPs); (iii) HSPs are used to calculate the qualitative Flory–Huggings interaction parameters (w1,2); (iv) implementation of moiety-monomer-structure properties (reduced molar volumes/weights); (v) spinodal demixing diagrams resulting from polymer blend theory; and (vi) figure of merit (FoM) defined as the ratio of the Flory–Huggins intermolecular parameter and the spinodal diagram forms the basis of a relative stability metric Reproduced with permission from (Perea, Langner et al. 2017).
Preprints 142280 g009
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated