Benchmark Synthetic Training Data for Artificial 1 Intelligence-based Li-ion Diagnosis and Prognosis 2

: Accurate lithium battery diagnosis and prognosis is critical to increase penetration of 8 electric vehicles and grid-tied storage systems. They are both complex due to the intricate, nonlinear, 9 and path-dependent nature of battery degradation. Data-driven models are anticipated to play a 10 significant role in the behavioral prediction of dynamical systems such as batteries. However, they 11 are often limited by the amount of training data available. In this work, we generated the first big 12 data comprehensive synthetic datasets to train diagnosis and prognosis algorithms. The 13 proof-of-concept datasets are over three orders of magnitude larger than what is currently available 14 in the literature. With benchmark datasets, results from different studies could be easily equated, 15 and the performance of different algorithms can be compared, enhanced, and analyzed extensively. 16 This will expend critical capabilities of current AI algorithms, tools, and techniques to predict 17 scientific data.


20 21
In recent years, artificial intelligence (AI) has attracted a lot of attention for energy applications 22 [1][2][3][4]. For the diagnosis and prognosis of lithium-ion battery (LiB), one bottleneck is the training data 23 that is not populated enough and not representative of the projected sporadic usage [3,5]. This can 24 hinder performance as models can only be as good as the data they were trained with. Most studies 25 had training datasets below 20 samples. Among the studies with more, [5][6][7][8][9][10][11][12], the study by Severson 26 et al. [5,11] stands out with 124 different conditions tested, although only charging conditions were 27 varied. Even though online databases [5,[13][14][15][16][17][18] are a step in the right direction, this is vastly 28 insufficient because LiB degradation is path-dependent and small changes in conditions were shown 29 to lead to drastic differences in durability [19]. This path dependence is an essential aspect to 30 consider for the validation of online diagnosis and prognosis tools [20,21]. With a limited set of 31 training data, the universality of the diagnosis and prognosis tool cannot be proven.

32
The need for high throughput computational data generation was highlighted in recent reviews 33 [1,2], although no effort towards computational cycling training data has been reported. In this 34 work, we will report the first synthetic benchmark training datasets that englobe the entire 35 degradation spectrum. This will be done using computer-assisted voltage curve generation to 36 remove the need for lengthy and costly experimental campaigns. The datasets will be computed 37 using the mechanistic modeling approach we pioneered, along with other groups, in the mid-2000s 38 [22][23][24][25]. The approach has been well validated [26][27][28] and has been intensively used in recent years 39 with the rise in popularity of the electrochemical voltage spectroscopies (EVS) [29,30]. The use of the 40 mechanistic approach will enable the creation of training datasets several orders of magnitude larger 41 than the current ones encompassing all possible degradation scenarios. This is especially important 51 modeling approach where the input is the degradation and the output the cell's voltage and 52 capacity. This makes the method perfect for generating training data and more efficient than 53 electrochemical models because there is no need to find actual physical phenomena that could lead 54 to a given degradation. The initial set up is also simpler without a need for complex 55 parameterization and the only prerequisite being half-cell data versus a reference electrode for both 56 electrodes. In 2017, we proposed to use this approach to perform sensibility analyses by simulating a 57 wide range of hypothetical degradation scenarios [20]. This work is building on our previous effort 58 to enable the creation of comprehensive training data.

59
In this work, we will focus on diagnosis and prognosis, two applications that require different 60 sets of training data. To illustrate the difference between the two requirements, one can think of two 61 time-dependent processes, one linear and one exponential, intersecting. At the intersection, the 62 diagnostic will be the same, but the prognosis needs to be different. In addition, we will also address 63 another hurdle for AI algorithms, the quest for meaningful learnable parameters. Although some 64 studies seem to fit the full constant current voltage data [33], most studies favor the use of features of 65 interest (FOI) and only focus on a specific part of the electrochemical response. This could be 66 capacity and resistance evolution [10,[34][35][36][37], curvature [38,39], sections of the voltage response 67 [8,[40][41][42], electrochemical impedance spectroscopy [43,44], variance [5,11],or EVS [33,[45][46][47][48]. The 68 latter has attracted a lot of attention in recent years since the early work on the technique [22,49,50].

69
Still, the correlation of the variations with degradation is not trivial and requires significant 70 sensibility analysis to derive universal parameters [20]. Some FOIs proposed in the literature were 71 proven not applicable outside of the tested data [21], which illustrates the need for more extensive 72 training sets and proper sensitivity analysis of the learnable parameters. With controlled and 73 complete training, the theoretical framework to establish relationships between data and models can 74 be explored in much more detail, and results from different studies could be easily equated.

75
The purpose of this publication is to introduce a big data methodology and showcase its 76 possibilities. The optimization of the technique to reduce computation time and its integration with 77 AI algorithms is out of the scope of this work.

79
The mechanistic approach conceptually follows the one described by Christensen and Newman 80 to simulate cell degradation in the early 2000s [31,32]. Instead of computing-intensive 81 electrochemical models to simulate half-cell electrode behavior, the mechanistic approach [22][23][24][25] 82 uses experimental half-cell data for each electrode.

100
While the active materials are considered stable with aging, their quantity, as well as the 101 amount of Lithium reacting, will change upon degradation. Degradation will then not affect the 102 electrode OCV curves, but it will impact their matching. If less active material is available, the 103 loading ratio between the electrodes will change. If some reactant is lost, the synchronicity of the 104 electrodes will change. These matching changes can be rendered in the mechanistic approach via 105 some scaling of the electrode curves and a translation of one electrode compared to the other, in 106 other words changing LR and OFS in Figure 1(a). Figure 2(a) presents the impact of the variation of 107 LR and OFS on cell capacity. In can be seen that there is an infinity of combination of LR and OFS 108 that can lead to any capacity loss; this is the path dependence of the degradation. This also explains 109 why measuring capacity will never be a good enough diagnosis of true degradation. Changes in the 110 amount of active material are referred to as loss of active material (LAM). LAM can occur at the PE 111 and NE. Change in the amount of lithium reacting is referred to as loss of lithium inventory (LLI).

112
The thermodynamic degradation of Li-ion batteries can be decomposed into these three main 113 degradation modes. LLI is inducing translation and the LAMs scaling, Figure 2

123
The validity of the predicted impact of LAM and LLI on the electrochemical behavior of

158
For datasets aimed at prognosis, the evolution of each degradation mode needs to be calculated  Figure 5(b). In this example, path 2 (green) was simulated with linear variations for the three 171 components of the triplet, its path is, therefore, vertical as the ratio between the degradation modes 172 is constant. Path 1 (blue) had some exponential component for LLI and LAMPE and, therefore, the 173 path is more complicated. Nonetheless, based on the triplets and the diagnosis dataset, the capacity 174 loss, Figure 5(c), and the voltage variations, Figure 5(d), can be deciphered. This approach for 175 calculating a prognosis dataset will be faster than simulating each degradation path one by one as it 176 will avoid repeating the same simulations over and over.

182
It must be noted that the description above was simplified for narrative purposes and was

244
As a proof-of-concept, we generated a diagnosis dataset containing more than individual 245 500,000 voltage vs. capacity curves and a prognosis dataset with more than 130,000 individual 246 degradation paths for a commercial graphite//LFP battery. These datasets are more than three orders 247 of magnitude larger than the one currently available in the literature and they could be extended at    The diagnosis training dataset was compiled with a resolution of 0.01 for the triplets and C/25 291 charges. This accounts for more than 5,000 different paths at the base of the triangle in Figure 4. Each 292 path was simulated with 0.85% increases for each degradation up to 85%. This accounts for 100 293 simulations per path. The training dataset, therefore, contains more than 500,000 voltage vs. capacity 294 curves and took around 12h to compute on a standard laptop. The 500MB dataset is available to 295 download (see Data availability section for access information).

296
The prognosis dataset was harder to define as there are no limits on how the three degradation 297 modes can evolve. For this proof of concept work, we considered eight parameters to scan. For each 298 degradation mode, degradation was chosen to follow equation (1).

302
Considering the three degradation modes, this accounts for six parameters to scan. In addition, 303 two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the 304 reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating  Table S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At 307 the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal

308
LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the 309 simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest 310 degradation is of 3%. The reversibility factor p8 was calculated with equation (2)

540
The prognosis dataset can be used to test the validity of different FOIs proposed in the literature 541 for early prognosis. Figure S4 (a,b) presents the evolution of the variance between cycle one and 542 cycle 100 and 500 respectively as a function of cycle life, defined as the cycle at which 20% capacity 543 loss is reached. When considering a multitude of degradation paths, the variance approach seems 544 less effective than proposed in the literature [5,11] with correlation coefficients of -0.56 and -0.43, 545 respectively, when calculated after 100 and 500 cycles. Looking at the capacity loss after 100 cycles, 546 Figure S4 (c), the correlation is better (ρ = 0.63). Still, the error could be huge for low capacity loss as 547 paths with less than 1% capacity loss after 100 cycles were found to induce 20% capacity loss after 548 just 200 cycles. Finally, as predicted [20,21], the evolution of the first LFP plateau capacity (i.e., the 549 area under peak ① in the Gr//LFP IC curves) is not a good indicator with a Pearson coefficient of 550 0.51, Figure S4(d). The dataset could be analyzed further by tracking if specific conditions exist 551 where the FOI is accurate, e.g., the first LFP plateau capacity is likely working well when there is 552 little LAMNE [20,21]. This is out of the scope of this publication and will be investigated in future 553 work.

554
The prognosis dataset can also be used to validate diagnosis techniques. For example, the 555 method proposed in [20] can be tested. As shown in Figure S2 individual FOI information is not 556 enough to reach a proper diagnosis, but their combined variations might be enough. Figure S5 557 presents the diagnosis results based on the combined variations of 3 FOIs for diagnosis after 100, 558 200, 500, and 1,000 cycles for the 130,000 computed duty cycles. Out of the more than 500,000 points 559 tested (4 x 130,000), the average errors for estimation of LLI, LAMPE, LAMNE, and capacity loss were 560 of -0.32%, -2.73%, -0,46%, and -0.33% respectively with standard deviations below 1% for all but 561 LAMPE (6%) for diagnosis up to the 500 th cycle. The distribution of the results is presented in Figure   562 S5 and the different statistics summarized in Table S2. For all, but LAMPE, most of the errors are 563 comprised between -1 and 1%, with the minimum average error recorded for the diagnosis after 500 564 cycles and the maximum for the diagnosis after 1,000 cycles. The spread for LAMPE is larger, 565 between -2 and 10%, which was expected since LAMPE is notoriously hard to quantify for LFP cells 566 because of the single voltage plateau [25]. The average LAMPE error decreased with cycle number to 567 0.74% after 1,000 cycles. Looking more into details, the maximum and minimum errors were always 568 recorded for the diagnosis at 1,000 cycles with maximum and minimum errors mostly below 10% for 569 all the other cycles (except for LAMPE). Future work will investigate the specific combinations that 570 led to large errors, although our previous work [20] showed that it was for degradations unlikely to

602
The results from Figure