4. Computational Experiments
This section presents a series of computational experiments aimed at predicting symmetry information and lattice parameters from diffraction patterns. The objective is to explore the inherent complexity of the problem and evaluate different approaches to identify the most robust and principled method.
Figure 1 provides an overview of the entire setups and their respective methodologies.
4.1. NTREOR
We begin by using the NTREOR indexing algorithm [
7] as used in the EXPO2014 software package, an updated version of EXPO2013 [
17]. NTREOR is a heuristic, well-known, highly tested and trustworthy optimization method employing a trial-and-error approach to search for solutions in the index space by varying the Miller indices. It requires the user to accurately identify the peak positions in the diffraction pattern, which are then used to predict the crystal class, the extinction symbol, and its lattice parameters. NTREOR can propose multiple candidate solutions, each accompanied by the cell volume and a figure of merit to help identify the most likely solution. In particular, we use
, the de Wolff figure of merit, to evaluate the quality of each candidate in the following.
We apply this method to index the 82 experimental diffraction patterns detailed in
Section 3. For each pattern, the radiation wavelength is specified, and peak positions are manually identified. The algorithm is then executed, and the best candidate solution is selected based on the highest figure of merit. In some cases where multiple solutions have the same figure of merit, we pick the solution with the smallest cell volume. This approach serves as a baseline for comparison with the data-driven methods that will be introduced in the subsequent sections.
As regards the statistical interpretation of the results of the NTREOR algorithm applied on those manually identified reflections, NTREOR achieves an accuracy of 49% in predicting the crystal system, correctly classifying 40 out of 82 XRD patterns. Only for these 40 correctly classified patterns, we calculate the root-mean-squared error (RMSE) of the lattice parameters. The mean RMSE for the cell vectors is 1.38 Å, with a standard deviation (σ) of 2.46 Å, while the mean RMSE for the cell angles is 0.81°, with σ = 1.507°; 22 of the 40 correctly classified patterns exhibit an RMSE of the cell vectors below 0.1 Å, and 24 of the 40 patterns have an RMSE of the cell angles below 0.1°. Notably, this rather simple strategy does not cover those cases where the NTREOR solution corresponds to one out of many supercells of the correct unit cell. Likewise, it will not count an almost correct sub-cell of the correct unit cell as a successful case.
4.2. HAND
The first data-driven model we train, HAND, is inspired by the methods proposed by Suzuki et al. [
8]. This model uses handcrafted selected features of the diffraction patterns as inputs, specifically the first ten peak positions in the low 2θ range and the total number of peaks within the 2θ = 0–90° range for constant copper radiation at
= 1.5418 Å, in compliance with the copper wavelength used by ICSD. Compared to models that process the entire diffraction pattern, HAND is less complex, employing a simpler architecture based on Randomized Decision Trees. The model is designed to predict the crystal system (CS), the extinction group (EG), and the space group (SG) separately. The crystallographic data from the ICSD is randomly split, with 90% used for training and the remaining 10% reserved for testing. By testing on a distinct set of crystal structures, we aim to evaluate the model’s equivariance to variations caused by changes in crystal structure. Our most-developed, i.e., best-fitted model achieves an accuracy of 94% for classifying the crystal system, 91% for classifying the extinction group, and 87% for classifying the space group. We notice that in 3.5% of the test instances, the predicted space group did not belong to the correctly predicted crystal system. This highlights the contribution of spurious correlations in the data (the handcrafted features) toward the model’s prediction. Notably, the model performance does not drop significantly with perturbations to the peak position, suggesting a better degree of robustness compared to NTREOR. When testing the model on the experimental test set and using the manually identified peaks as in the previous experiment with NTREOR, we observe an accuracy of 55.5%, 32%, and 39.5% for the crystal system, extinction group, and space group prediction tasks respectively. The crystal system classification accuracy shows an improvement over that of the NTREOR on the same experimental test set. It is worth noting, however, that the space-group classification accuracy is
larger than that of the extinction-group classification. Further investigation shows that in 26% of the experimental test instances, the predicted space group did not belong to the correctly predicted crystal system. This further validates our claim that the model relies on spurious correlations.
4.3. CNN Static – Supervised
As motivated in the introduction, our main goal is to design models that require minimum manual/human involvement and can be scaled in the future for more complicated structure solution tasks. To that end in this computational experiment, we aim to address the issues when utilizing the full profile of the diffraction pattern. The data is generated using simulated Bragg reflections (as described in
Section 3) while incorporating practical experimental and sample effects as signal processing operations. These operations are carefully designed to capture the typical characteristics of “noise” and the measurement specifics of a standard laboratory constant-wavelength X-ray diffractometer. We emphasize that this approach is a crucial aspect of our work and will be applied in subsequent experiments, as we explore the feasibility of training our models entirely on simulated XRD data.
To begin with, we employ a convolutional neural network (CNN) to process the input diffraction pattern, treating it as a one-dimensional image. The architecture is based on ResNet [
18] by He et al., a popular deep-learning framework designed to mitigate the vanishing gradient problem in deep neural networks. The model incorporates multiple residual blocks, which feature skip connections to facilitate the flow of gradients during backpropagation. These skip connections allow the network to learn residual mappings instead of direct mappings, improving convergence and enabling the training of deeper architectures.
Furthermore, we design the model to simultaneously predict the crystal system (CS), extinction group (EG), and space group (SG). The model employs a multi-prediction head architecture: the CS head predicts the crystal system (7 classes), the EG head predicts the extinction group (101 classes), and the SG head predicts the space group (230 classes in total), utilizing the output from the CS head’s prediction. The mapping from CS to SG is injective and follows the crystallographic structure, that is, triclinic: SG, monoclinic: SG, orthorhombic: SG, tetragonal: SG, rhombohedral/trigonal: SG, hexagonal: SG, and cubic: SG.
In total, the model features 8 prediction heads, of which 7 are for the CS and SG, and 1 for the EG. This design introduces an “inductive bias,” aligning the model’s architecture and its inherent structure of the task, as commonly discussed in machine learning literature. Each classification head outputs a set of probabilities
pCS,
pEG,
pSG over the respective number of classes
, by utilizing a softmax function.
The loss function at each classification head uses a cross-entropy (
) loss which compares the predicted probabilities
to the true labels
.
The final loss function is a weighted average of . The weights are treated as hyperparameters and tuned independently of each other during training.
The XRD data used for training and testing are generated through a simulation pipeline beginning with pre-computed ideal Bragg peak positions and calculated intensities for a copper radiation wavelength of = 1.5418 Å, similar to the previous experiment. These values are then utilized to construct the full profile XRD pattern through a series of parameterized signal processing operations, which we will now outline. Most of these operations are inspired by instrument effects, while some are intended to facilitate and validate a more thorough generalizability.
We begin with a simple zero shift, shifting the entire pattern along the axis by a small (maximally ) amount, simulating errors commonly introduced by improper calibration of the instrument.
Next, we add a very small amount of random-like x-axis white noise to each peak position ().
Peak cropping and padding: The peaks at the tails (high and the low ) are randomly cropped, i.e., the are replaced with zeros.
Noise to the integrated intensities is also introduced, simulating all sorts of effects, e.g., of non-ideal detectors. Operations that simultaneously vary the position and integrated intensity effectively model variations due to different experimental effects. For example, changes in integrated intensities can at least partly represent the impact of micro-strain anisotropies, preferred orientations in the powder sample, wavelength fluctuations, as well as Lorentz and polarization factors.
Binning is performed over the = 0–120 range using a fixed bin width and a copper radiation wavelength of . While this might seem trivial, it requires careful consideration. The peak positions depend on the radiation wavelength via Bragg’s law, and the choice of binning affects intensity values. To enable the model to generalize across different radiation wavelengths, one could use the -spacing for peak positions. Due to the non-linear relationship between -spacing and , however, variations in the binned intensities become highly pronounced and cannot be sufficiently accounted for by adding noise to the integrated intensities before binning. Our investigation shows that the variation over the binned intensities is more reasonable when binning over for a specific being equivalent to the wavelength used for training. For inputs with a different radiation wavelength , the measured value can be easily converted to the corresponding as of the training wavelength via the d-spacing, namely . Please note that this might truncate high values for the case < .
Small impurity peaks: small amounts of random like impurity peaks whose intensities are smaller or comparable to the smallest Bragg reflection, but higher than the noise level in a diffraction pattern, are added. This acts as a type of compositional noise in the XRD profile.
Convolving a peak profile (): a
pseudo-Voigt profile with peak asymmetry is convolved across the binned diffraction pattern. This is inspired by the profile used by the CW-XRD refinement program in GSAS-II [
16], although here we are simply interested in a function form that offers reasonable variations of the profile and not its precise fitting capabilities. The full-width-at-half-maximum (FWHM) is inspired by the Caglioti [
10] functional form and is presented in
Table 1; here, the parameters
are sampled using Latin-Hypercube sampling [
19]; the
pseudo-Voigt profile is a linear combination of a Gaussian and a Lorentzian using the same (FWHM) weighted by the mixing parameter
η; asymmetry is introduced by an error function applied over the
pseudo-Voigt profile.
Overall noise: a background noise is added to the XRD profile. This contains a combination of white noise and intensity-dependent noise.
Cropping and Padding: Finally, the edges of the XRD profile are cropped randomly and padded with zeros. This is to facilitate generalizability over cases where the edges of the XRD pattern need to be cropped due to extreme background radiation.
Figure 2 shows the effects of some of these operations on the diffraction pattern. Notably, we consider the aforementioned operations to be parameterized by random variables. For most cases, these random variables are sampled from a uniform distribution with reasonable ranges and are detailed in
Table 1. The simulated
training set samples these parameters from a distribution that does not overlap with those used in the
test set. This allows us to test the model’s invariance over the effects of the aforementioned operations. We call this the
invariance test set. We carry out this pipeline with 90% of the ICSD crystal structures (≈ 180,000 structures), along with our parameter sampling strategy to simulate over 1 million XRD patterns for the
training set and about 360,000 XRD patterns for the
invariance test set. We also have the usual
equivariance test set that simply uses XRD patterns simulated from 10% of the ICSD structures (≈ 20,000 structures) that are kept separate from those used in training. We then simulate ≈ 40,000 XRD patterns by sampling using the
training set simulation parameters.
Together this follows our robust testing strategy discussed earlier in
Section 3. The
equivariance test set is also considered a validation set, which is used to track the training progress and stop training when this accuracy starts to decrease to prevent the model from overfitting.
For the equivariance test set, the classification accuracy of the CNNstatic model for the crystal system (CS), extinction group (EG), and space group (SG) is 89%, 82%, and 79%, respectively. However, for the invariance test set, the classification accuracy drops significantly to 40% for CS, 33% for EG, and 24% for SG. These results indicate that the model struggles to generalize effectively when faced with variations introduced by the aforementioned experimental effects.
Additionally, we also test the model in our experimental test set, containing 82 diffraction patterns whose wide variety of background radiation is described manually for each pattern. It is prudent to mention here that even after the background subtraction, there is often a significant amount of background noise. The testing done here can therefore validate the effectiveness of our simulation pipeline in modelling effects due to significant background noise. However, the classification accuracy for CS, EG, and SG on the experimental test set is only 22%, 15%, and 13%, respectively.
These results demonstrate that despite extensive efforts to model the signal processing details in the simulation pipeline and to train a CNN model accurately, the performance remains unsatisfactory. Any further attempts to tune the model would likely lead to an overfitting to the specific training data distribution.
4.4. CNN with Augmentations
This computational experiment approaches the challenge of designing a data-driven powder XRD indexing algorithm from a novel perspective. To begin, we draw on established ontology in the field of machine learning (ML), particularly concerning the relationship between inputs and outputs in such models.
In traditional ML domains such as computer vision and natural language processing, models are categorized based on the nature of the dataset:
Supervised Learning: Both inputs and outputs are fully labeled for all instances in the dataset.
Unsupervised Learning: Outputs are entirely unlabeled, and the model discovers patterns or structures in the data without explicit guidance.
Semi-Supervised Learning: A portion of the data is labeled, while the rest remains unlabeled.
Self-Supervised Learning: Partial relationships between inputs and outputs are leveraged during different stages of the training pipeline. Self-supervised models generate pseudo-labels or pretext tasks to aid in learning meaningful representations.
In the context of indexing XRD measurements—or structure determination more broadly—we encounter a unique challenge, namely, the lack of sufficient real experimental data for training a reliable ML model, as alluded to already. This necessitates the use of simulated data derived from known crystal structures. Unlike conventional ML applications, this problem involves a peculiar inversion where the “output” (crystal structure information) is known, but the “input” (the XRD pattern) must be generated or simulated from the output.
Furthermore, certain aspects of the input—such as noise, background variation, and experimental inconsistencies in XRD patterns—cannot be reliably simulated from the crystallographic information file alone. This can be pictured using an analogy: training an ML model on synthetically simulated diffraction patterns and expecting it to perform robustly on real experimental data is akin to training a model to distinguish between trees with and without leaves using only simplistic, childlike drawings of trees.
This analogy underscores the inherent challenges and complexity of the task at hand, driving the exploration of self-supervised learning techniques to bridge the gap between synthetic training data and real-world experimental data. In this section, we start to address this gap by designing a training pipeline that incorporates self-supervised learning principles to enhance the model’s robustness and generalizability.
We build upon the model presented in the previous experiment section (CNNstatic) by making targeted modifications while retaining the core model architecture, which features multiple classification heads and employs the same loss function. The primary change lies in how the data-generation pipeline is integrated into the training process.
While the Bragg peaks (both positions and integrated intensities) are still preprocessed before training, the signal-processing operations generating the full-profile diffraction patterns are now incorporated directly into the training loop as data augmentations. These operations, previously treated as fixed preprocessing steps, are dynamically applied during training. By embedding these augmentations into the training process, we simulate the variability and imperfections present in real-world diffraction patterns, allowing the model to better generalize across diverse experimental conditions. This approach emphasizes the importance of maintaining flexibility in the simulated data pipeline while aligning with the principles of self-supervised learning to enhance the model’s robustness against natural perturbations in the input data. We call this model CNNaug.
Nonetheless, we use the same testing strategies as in the previous model (CNNstatic). For the CNNaug model, the classification accuracy of the crystal system (CS), extinction group (EG), and space group (SG) on the equivariance test set is 90%, 83%, and 81%, respectively. The accuracy on the invariance test set is significantly lower, however, with 45% for CS, 35% for EG, and 28% for SG. Similarly, the results on the experimental test set remain low, with classification accuracies of 23% for CS, 16% for EG, and 13% for SG.
The results of these experiments show similar trends compared to the supervised learning model, CNNstatic, with only slight improvements in mitigating overfitting. However, this experiment establishes the groundwork for the final self-supervised learning model proposed in this paper, paving the way for a more robust approach to addressing the challenges highlighted so far.
4.5. Self-Supervised Contrastive Representation Learning
In this section, we introduce a methodology that incorporates the principles of representation learning within the previously described framework of self-supervised learning and finally present our model
DIFCON. To achieve this, we modify the model architecture to include a representation-learning head (RH) positioned before the CS, EG, and SG classification heads. The RH is designed to optimize a contrastive learning objective, enabling the model to learn more robust and generalizable representations of the diffraction patterns. The fundamental idea behind using a contrastive learning [
20,
21] approach is to learn meaningful representations that model the previously discussed invariances and equivariances, by distinguishing between similar (positive) and dissimilar (negative) data samples. The central idea is to map similar inputs closer together in the learned feature space while pushing dissimilar inputs farther apart. For this application, as illustrated in
Figure 3, a positive pair consists of two simulated diffraction patterns originating from the same crystal structure but different augmentation parameters, whereas a negative pair consists of diffraction patterns corresponding to different crystal structures.
The contrastive objective function encourages the model to focus on structural features being invariant to noise, sample variations, and experimental perturbations, while simultaneously maximizing the separability of different crystal structures. This design aligns with the overarching goal of achieving both equivariance to structural differences and invariance to experimental effects, as discussed in
Section 3.
By leveraging contrastive representation learning, the RH generates embeddings that capture essential features of the input diffraction patterns, which are subsequently passed to the downstream classification heads for CS, EG, and SG predictions. This approach not only strengthens the model’s ability to generalize across simulated and real-world data but also facilitates more accurate indexing by learning a feature space that mirrors the underlying crystallographic distinctions.
We consider two distinct contrastive learning approaches for
DIFCON:
SimCLR [
20] and
Barlow Twins [
21]. Both approaches aim to learn robust representations but differ significantly in their objectives and optimization strategies.
SimCLR relies on a contrastive loss function called NT-Xent (Normalized Temperature-scaled Cross Entropy Loss). It uses positive pairs (augmented views of the same sample) and negative pairs (views of different samples) to define the loss. In the context of XRD, we adapt the SimCLR approach by using diffraction-specific augmentations, such as noise injection, random peak shifting, and impurity peak addition, to create positive pairs. Negative pairs are generated using diffraction patterns from different crystal structures.
The Barlow Twins method, in contrast, eliminates the need for explicit negative pairs. It introduces a redundancy-reduction loss that aligns positive pairs while discouraging redundancy in the feature space. Specifically, the method aims to make the cross-correlation matrix of embeddings from positive pairs as close to the identity matrix as possible. By reducing redundancy, Barlow Twins ensures that each dimension of the learned representation captures unique information. A notable advantage of this method is its computational efficiency, as it does not rely on large batch sizes or negative samples. It does, however, require a much higher dimensional feature vector.
Using the
SimCLR approach to train the RH we observe a nice pattern when looking at the cosine similarities of the learned feature representation for positive and negative pairs.
Figure 4 shows the cross-correlation matrix between the feature representations of 100 randomly selected crystal structures from the ICSD test phases, each of which was simulated using sampling strategies of the
invariance test set, to produce eight different experimental effects—such as zero-shift,
x &
y axis noise, impurity peaks, and peak profile variation— and arranged consecutively. Ideally, this should follow a block diagonal structure. The figure compares this matrix to the corresponding matrix for
CNNaug using cosine similarities of the internal features learned in its penultimate layer.
This motivates our approach of using a contrastive learning objective to train the Representation Head (RH). In practice, however, training with the SimCLR approach requires an extremely large batch size to generate a sufficiently diverse set of negative pairs. This results in significantly longer training times for the RH to converge to an acceptable level of performance, particularly when trained jointly with the classification heads for the crystal system (CS), extinction group (EG), and space group (SG).
In contrast, we observed that the Barlow Twins approach, which does not rely on a large batch size of negative pairs, enables faster convergence of the RH and classification heads when trained end-to-end. Moreover, the Barlow Twins method demonstrates better classification performance on the invariance test set, likely because it can leverage more positive pairs within each training batch. Based on these observations, we adopted the Barlow Twins approach to train the final model presented in this paper, DIFCON.
For the equivariance test set, DIFCON achieves classification accuracies of 88%, 80%, and 78% for the CS, EG, and SG tasks, respectively. On the invariance test set, DIFCON shows a marked improvement, achieving classification accuracies of 79%, 71%, and 66% for CS, EG, and SG, respectively. These results highlight a significant improvement in the model’s robustness and invariance to experimental effects simulated by our augmentation pipeline.
When tested on the experimental test set, DIFCON also shows notable advancements. The model correctly classifies the crystal system in 61 out of 82 instances (74%), which represents a significant improvement over both the earlier data-driven models and the results reported with NTREOR, too. Additionally, DIFCON achieves classification accuracies of 48% (39 out of 82) and 41% (34 out of 82) for the EG and SG tasks, respectively.