Where Fault Detection and Diagnosis Meets MPC Performance Assessment: Review and Case Study of an Integrated Framework

Elizabeth V. Melo; Argimiro R. Secchi; Maurício B. de Souza Jr.

doi:10.20944/preprints202605.1971.v1

Submitted:

27 May 2026

Posted:

28 May 2026

You are already at the latest version

Abstract

Various methodologies have been developed over the years for assessing model predictive controller (MPC) performance. However, few are applied in industry, and they remain inefficient in providing a rapid indication of the causes of deteriorated control behavior. This article aims to review methodologies available in the literature addressing these challenges. Additionally, it proposes a structure in which the MPC performance assessment is carried out within a fault detection and diagnosis (FDD) framework. The integrated approach employs cascaded modules of machine learning (ML) binary classifiers arranged in a sequence that mimics the decision-making logic of an operator. To illustrate the integrated strategy both conceptually and operationally, the van de Vusse reactor, controlled by a nonlinear model predictive controller (NMPC), was adopted as a case study. The ML models used were Random Forest, Multilayer Perceptron, and Recurrent Neural Networks. The results showed that the models can correctly distinguish the cause of abnormality, even in the presence of noise in the measurements. Different ML models performed best for different diagnostic tasks, highlighting the flexibility of arranging models according to their most suitable application. The investigation indicated that the proposed ML-based FDD framework, which embeds control performance assessment, is competitive for control-aware diagnosis of MPC-controlled processes.

Keywords:

machine learning

;

model predictive control

;

control performance assessment

;

control-aware FDD

Subject:

Engineering - Chemical Engineering

1. Introduction

Model Predictive Control (MPC) is a prominent control strategy in industry due to its ability to handle system constraints, anticipate process behavior, and be implemented in multivariable systems. The usage of models gives more robustness regarding uncertainties for the control system [1,2].

However, this generation of controllers is susceptible to performance issues, just like any other kind of controller. The main challenge in monitoring control loops is distinguishing between control-related causes of poor performance and those associated with disturbances, instrumentation limitations, or the process itself. Among the control problems, poor tuning settings are one of the main problems in control loops [3,4]. In the MPC formulation, although there is an expectation of how each tuning parameter is meant to influence in the predictive controller formulation, it is not easy to find the combination of parameter values that achieves a desired controller performance [5].

Moreover, a successful implementation of MPC, since it uses models to predict process behavior, also depends on the accuracy of process models [4,5,6]. When a model predictive controller starts to perform poorly due to model-plant mismatch, it may be difficult to determine which submodel(s) need to be updated. Often, a whole model identification and updating is unfeasible, since it is one of the most time-consuming activities during the implementation of predictive controllers, and it may cause unnecessary disturbances to the process. Most industrial MPC packages include model identification tools that can help the user to develop models needed for MPC based on plant data. However, identification software can suffer from poor workflow, requiring many steps, which increases opportunities for user mistakes [5,7].

To help operators and engineers identify whether a control loop is performing poorly, the field of Control Performance Assessment (CPA) has been developed to study metrics and methodologies for monitoring control loops. Different metrics have been proposed over the years, aiming to monitor different types of controllers. For linear MPC, one of the main contributions was the Harris index, which compares the variance of the process with the one expected by the Minimum Variance Control theory [8]. However, there are some limitations regarding that metric. According to Bauer et al. [3], minimum variance-based indices, such as the Harris index, do not have the same impact in the industrial community as they have in the academic control community, due to the presence of non-stationary signals present in industrial data.

One of the main conclusions of a survey conducted by Bauer et al. [3] is that the simpler the method, the greater the number of respondents who find it useful. This means that CPA techniques need to be as simple as possible to guarantee good acceptance. Furthermore, even though MPC is becoming popular in the industry, there is still some resistance by operators to use it, due to its complexity. Operators need to have confidence to decide whether the MPC is underperforming [5]. The existence of an automated system that detects and diagnoses the controller’s performance may increase the acceptance of this kind of controller.

For this reason, Machine Learning (ML) techniques become interesting in the CPA context. ML can both assess and diagnose the control performance at once [9]. Moreover, the size and complexity of industrial data increased significantly over the years, especially considering variable interactions in a multivariable process. Some studies have already been conducted about the use of ML techniques applied to CPA. However, the number of publications remains underwhelming given the development of machine learning, especially in the context of MPC.

This article aims to elucidate the different methodologies available for MPC performance assessment. The classical breaking point for MPC performance monitoring, together with statistical methods used both for Fault Detection and Diagnosis (FDD) and CPA, is explained. Since model mismatch identification is one of the main challenges in MPC applications, methodologies focused on this issue are also elucidated. Finally, studies conducted using ML algorithms for MPC performance monitoring and assessment are presented. Moreover, a control-aware FDD methodology using ML models to detect and diagnose the closed-loop health of model predictive controllers is proposed. The specific objectives include evaluating the capability of the methodology to classify precisely whether the closed-loop is under healthy conditions, with instrumentation and equipment issues, control poor tuning, or plant-model mismatch.

This article is organized as follows: Section 2 gives an overview of a qualitative bibliometric analysis when considering the development of CPA techniques, especially regarding the ML tools, and also a review of different methodologies developed over the years for control performance assessment and its integration with FDD; Section 3 gives the methodology followed for the results presented in Section 4. Finally, Section 5 gives the main conclusion of this study.

2. Methodologies for Control Performance Assessment: Review

2.1. Bibliometric Analysis

This analysis aims to identify the thematic relevance over the years in the literature, based on the number of publications and their distribution per year. It is essential to note that the results presented in this section are for qualitative analysis only. The database used was the Web of Science, and different queries were used in the advanced search to plot the results. The choice of this database is due to its selectivity and data quality, since the Web of Science makes use of metrics to filter out articles with low impact [10]. Thus, since the focus is on the qualitative analysis, the macro tendencies are expected to be the same for different databases. The analysis presented in this section refers to the values found on March 27th, 2026.

To analyze the interest in control performance monitoring and assessment, for all kinds of controllers, the query ALL=("control* performance monitoring" OR "control* performance assessment" OR "control loop performance assessment") was used, returning 495 results and the distribution presented in Figure 1. The results indicate an increase in publication activity from the 1990s to the 2010s, with the latter representing the period of highest average output. This reflects the increase in monitoring and control performance. In the current decade, the average number of publications remains higher than in the early 2000s, but it is lower than the peak observed in the 2010s. However, the current decade is still incomplete.

By adding the query AND ("model predictive control*" OR "MPC" OR "NMPC") to the previous one to restrict the results only to model predictive controllers, the results drop to 79 publications, as can be observed in Figure 2. A similar decade-wise trend is observed when restricting the analysis to MPC-based approaches, with an increase in publication activity from the 2000s to the 2010s. However, the overall number of publications is significantly lower, reflecting the more specific scope of MPC within the broader field. Thus, there is still room to conduct further assessment studies on the performance assessment of this kind of controller, which may reflect the challenge in finding methodologies for monitoring model predictive controllers.

To analyze the presence of machine learning (ML) techniques in research about control performance monitoring, the query ALL=(("machine learning" OR "artificial intelligence" OR "neural network") AND ("control* performance monitoring" OR "control* performance assessment" OR "control loop performance assessment")) was used. Just 33 publications were found, with the distribution presented in Figure 3. From those publications found, only 4 include "model predictive control*" OR "advanced control*" OR "MPC" OR "NMPC" in their text.

However, when the number of publications for machine learning applied to fault detection and diagnosis is analyzed, by using the query ALL=(("machine learning" OR "artificial intelligence“ OR “neural network”) AND ("fault detection and diagnosis" OR "fault detection" OR "fault diagnosis")), the results showed a total of 25,717 publications. Figure 4 shows that there is also a crescent tendency in the number of publications in this field, also reflecting the popularity of the field and the interest in using ML tools to detect and diagnose faults in different applications.

The integration of control performance assessment and FDD is also uncommon in the literature. By using the query ALL = (("fault detection and diagnosis" OR "fault detection" OR "fault diagnosis") AND ("control* performance monitoring" OR "control* performance assessment")), only 38 articles were found, with the distribution presented in Figure 5. From those results, only 10 include "predictive control*" in their text. This shows that there is potential to explore FDD methodologies for MPC performance assessment, which is still underexplored.

The results presented so far show that, even though there is an increased interest in studying predictive controllers and ML applied to FDD, there are still few studies applied to the performance assessment of predictive controllers, especially using ML algorithms. According to Özdemir and Yıldırım [11], many ML algorithms have been used for decades without referring to ML explicitly. Venkatasubramanian et al. [12], for instance, discussed the use of artificial neural networks for fault detection and diagnosis without referring to it as a machine learning methodology.

Hence, variations in the terminology used to describe machine learning methods could potentially influence the results of the bibliometric analysis. However, when a more wide query was implemented (ALL=(("machine learning" OR "artificial intelligence" OR "neural network" OR "support vector machine" OR "random forest" OR "boosting") AND ("control* performance monitoring" OR "control* loop performance monitoring" OR "closed*loop performance monitoring" OR "control* performance assessment" OR "control* loop performance assessment" OR "closed*loop performance assessment"))), the number of publications increased just from 33 to 36.

This evidences lack of investigation into the technical feasibility of using ML algorithms applied to the performance assessment of predictive controllers. There may be some reasons why there are so few studies about using ML algorithms to monitor and assess predictive controllers. Firstly, there is the challenge of labeling and distinguishing the "normal" dataset from the "abnormal" one. When there is an instrumentation problem, it is more evident and easier to take corrective actions than when there is an inappropriate tuning setting in an MPC, due to its complexity.

Moreover, when the lack of performance in an MPC is detected, it may be more interesting for the industry to correct it right away instead of identifying the issue. This is reflected in the number of publications mentioning adaptive control, which was more than 220 results only in 2025 (ALL=("model predictive control*" AND "adaptive control*")).

Even though there is a high interest in adaptive controllers, it is still important to evaluate the technical feasibility of using ML algorithms applied to the performance assessment of predictive controllers. For this reason, this work was elaborated to fill some of those gaps presented in the literature, by both exploring MPC performance degradation as a type of fault and by using ML models for the detection and diagnosis of root causes. In the next sections, a more detailed analysis was made of the methodologies available for monitoring and assessing predictive controllers.

2.2. Methodologies for MPC Monitoring and Detecting Performance Degradation

The model predictive controller, since its creation, has also been a subject of concern regarding its performance. Throughout the years, some methodologies have been developed to detect and diagnose the underperformance of this kind of controller. Many methods of control performance monitoring are from Fault Detection and Diagnosis (FDD), and most data-driven control performance monitoring methods rely on the use of normal process data to build normal-case process models.

2.2.1. Minimum Variance Control Based Metrics

The Harris Index, firstly mentioned by Harris [8] and then improved by Desborough and Harris [13], was the breaking point for MPC performance monitoring. It is based on the Minimum Variance Control (MVC) benchmark. Åström [14] proposed the MVC as a control strategy and benchmark that should be achieved. Its principle is based on the fact that, once the best achievable performance is reached, the variance of the controlled variables cannot be reduced by changing the tuning or the type of control strategy. Harris Index used the MVC theory as a lower-bound benchmark for assessing the performance of control loops. The index is defined according to Equation 1, in which

σ_{M V C}^{2}

is the theoretical minimum variance of the process,

σ_{P r o c e s s}^{2}

is the actual variance observed in the process, and

μ_{P r o c e s s}

is the mean absolute error from the setpoint.

I_{H a r r i s} = 1 - \frac{σ_{M V C}^{2}}{σ_{P r o c e s s}^{2} + μ_{P r o c e s s}}

(1)

To estimate the theoretical minimum variance of the process

σ_{M V C}^{2}

from process data, Harris [8] used an auto-regressive moving average (ARMA) model [15]. The reason for using the ARMA model is to separate the predictable from the unpredictable variance in data, without the need for a physical model of the plant. Under minimum variance control, the controlled variable y can be written based on its past values and on its moving average time series model, according to Equation 2. The parameters

φ

and

θ

are the auto-regressive (AR) and the moving average (MA) parameters of the model, respectively.

y (z) = ε (z) + \sum_{i = 1}^{p} φ_{i} y (z - i) + \sum_{i = 1}^{q} θ_{i} ε (z - i)

(2)

The variable

ε

represents the error between the ARMA model and the process data, considered to be the white noise from the process. The parameter p is the AR order of the model, while the parameter q, on the other hand, refers to the model order of the MA. Harris [8] suggests using the autocorrelation function of y to choose the order of the ARMA model. The parameter p should be chosen as the lag beyond which the autocorrelation function is zero. The order q, on the other hand, should be equal to

p - 1

. The minimum variance of the process

σ_{M V C}^{2}

is then defined according to Equation 3, in which

σ_{ε}^{2}

is the noise variance and d is the dead time of the system.

σ_{M V C}^{2} = σ_{ε}^{2} (1 + \sum_{i = 1}^{d - 1} φ_{i}^{2})

(3)

For the performance assessment of multivariable controllers, Harris et al. [16] extended the approach by proposing a global performance index, as well as individual indices for each controlled variable. This approach has to take into consideration the coupling between variables. Thus, once each controlled variable has different dead times regarding each manipulated variable, the interactor matrix is necessary to calculate these indices.

Even though the Harris Index was a breaking point for MPC monitoring, it still finds some resistance in the industry. According to Bauer et al. [3], it is not so relevant for real applications as it is for the academic community. The main criticism about the approach is that it is based on an unrealistic benchmark. MVC is often an aggressive controller and may not be economically feasible. Another disadvantage of the Harris Index is the consideration of only stochastic disturbances, which limit its applicability for this scenario. Also, the methodology considers a stationary data trend, and it is meaningless if the data contains non-stationary signals [3].

Although the Harris Index for a MIMO system is not as practical as its form for a SISO system, the idea of considering variance as a metric to be monitored is implemented by commercial controller software such as Aspen APC and the EcoStruxure Control Expert. Those controllers include metrics such as the standard deviation of each variable as information about the closed-loop performance, especially for regulatory control.

Furthermore, the studies conducted to develop the indices based on MVC served as a basis for the emergence of other metrics. Numerous metrics have been developed and proposed in the literature. Huang et al. [17] proposed the method based on filtering and correlation (FCOR) for MIMO systems, also described by Huang and Shah [18]. It is also based on the MVC theory and requires a previous knowledge of the interactor matrix for MIMO systems, as it was done by Harris et al. [16].

2.2.2. Other Benchmarks for MPC Performance Monitoring

Other metrics were also developed, considering different benchmarks. Tyler and Morari [19] used Generalized Likelihood Ratio (GLR) tests instead of a single value based on a benchmark for comparison. This is a hypothesis testing problem to verify whether the control system is performing satisfactorily or whether the controller has deteriorated. The approach is more realistic than those based on a benchmark such as MVC. The index is calculated based on a ratio between the Maximum Likelihood of the model with constraints in the impulse response coefficients and of the model without the constraints. However, this methodology is also more computationally expensive, since it requires an optimization procedure, and may not lead to a precise diagnosis.

Huang and Shah [18] proposed using a Linear Quadratic Gaussian (LQG), instead of an MVC, as a benchmark. It takes into consideration the control efforts (

v a r (u)

or variance of u) and the objective is to minimize both

v a r (y)

and

v a r (u)

, considering the trade-off curve that represents the variability of each group of variables. This approach is more realistic than the MVC benchmark and may give some diagnostic insights. However, it requires the process model and is more computationally expensive than MVC for online applications.

Uduehi et al. [20] conducted a study on using Generalized Predictive Control as a benchmark for monitoring the performance of controllers in a SISO system, which is a benchmark less aggressive and more robust than MVC and includes the control efforts in its equations. However, the index based on GPC also requires more computational efforts than the MVC theory, since the calculation is not as simple and requires a successive approximation algorithm.

2.2.3. Statistical Techniques

Qin [21] reviewed the use of data-driven methods for process monitoring. The study includes control performance monitoring and assessment as a type of process monitoring, and the degradation of control performance due to changes in process or disturbance characteristics. Among the reviewed methods, Principal Component Analysis (PCA) and Partial Least Squares (PLS) are the main methodologies employed. The main purpose is to characterize normal variations and detect abnormal changes [22].

PCA is a statistical method based on singular value decomposition (SVD), as described in Equation 4, in which X represents the data matrix, T is a score matrix (principal components, PC), P is a loading matrix (the contribution of each variable to each PC), and

\tilde{X}

is the residual matrix.

X = T P^{T} + \tilde{X}

(4)

PLS, as well as PCA, can be used for fault detection and diagnosis. Unlike PCA, PLS is a supervised methodology to define whether the system is under normal conditions. It takes into consideration not just the process variables X, but also the process output Y, which might be quality variables. PLS is modelled according to Equation 5, where l is the number of latent variables,

T = [t_{1}, . . ., t_{l}]

are the latent score vectors,

P = [p_{1}, . . ., p_{l}]

and

Q = [q_{1}, . . ., q_{l}]

are the loadings, while E and F are the residuals for X and Y, respectively.

\{\begin{matrix} X = \sum_{i = 1}^{l} t_{i} p_{i}^{T} + E = T P^{T} + E \\ Y = \sum_{i = 1}^{l} t_{i} q_{i}^{T} + F = T Q^{T} + F \end{matrix}

(5)

For the detecting procedure, Hotelling’s

T^{2}

statistic and the Q-statistic, also known as the squared prediction error (SPE), can be used. Hotelling’s

T^{2}

statistic is used to measure the variation inside the model and to detect unusual and statistically unlikely events in the dataset. It is calculated according to Equation 6, where x represents the sample vector from X and

Λ

is a diagonal matrix with the eigenvalues of the covariance matrix of the data. The detection procedure is performed through the definition of a confidence limit (usually 95% or 99%). Faulty data samples, according to Hotelling’s

T^{2}

statistic, indicate that the correlation structure is preserved, but the variables’ values have drifted too far from the average.

T^{2} = x^{T} P Λ^{- 1} P^{T} x = \sum_{i = 1}^{k} \frac{t_{i}^{2}}{λ_{i}}

(6)

SPE, on the other hand, deals with the variation in the residual subspace. It is calculated according to Equation 7, where

\hat{x}

is the fitted data in the model. Since SPE relies on residuals, it evaluates how well the data is fitted in the model. Just like Hotelling’s

T^{2}

statistic, SPE also relies on a confidence limit for detection. Due to the complementary nature of these two indices (SPE and Hotelling’s

T^{2}

), combined indices are recommended for fault detection and diagnosis [21].

S P E = | | x - \hat{x} {| |}^{2} = | | (I - P P^{T}) x {| |}^{2}

(7)

Once a fault has been detected, the diagnosis is possible through the contribution plots. As Qin [21] has mentioned, those techniques can also be used for performance monitoring of controllers. ZHANG and LI [23] already used

T^{2}

statistics and PCA for an MPC monitoring. However, diagnosing the process variables that lead to the quality problems remains a difficult task, since both PCA and PLS have limitations. Both models have the assumption of a linear relationship between variables, which does not represent the reality in many chemical processes. They are also static, ignoring autocorrelation in lagged data. In addition, there is a lack of causality during the diagnosis procedure, due to the high correlation between variables, and the inability to give a precise diagnosis, since different faults may lead to the same contribution plot [21].

2.3. Methodologies for Identification and Diagnosis

Although the methodologies described so far help determine whether the control system is healthy or underperforming, they usually do not provide the root cause of performance degradation. For this reason, some metrics were developed to provide a diagnosis of the process conditions. The main issue with diagnosis is the lack of a unique characteristic that determines whether the control system is under unmeasured and unpredictable disturbance, with a process fault, an instrumentation fault, a plant-model mismatch (PMM), or inappropriate tuning settings.

Even though instrumentation problems also cause MPC degradation, the key challenge in this type of controller is identifying issues in its internal configuration. The model is the crucial aspect of a predictive controller and affects its prediction accuracy. Due to the sake of computational efficiency, linear models are preferred in industrial applications, even though chemical processes are inherently nonlinear. However, changes in operational regions or process dynamics may lead to loss of model representability and, consequently, to degradation of MPC performance.

Once the model is degraded, it is required to update the model by performing a plant re-identification. However, this procedure requires intrusive plant tests, which have economic repercussions. Therefore, it is desired to identify only the part or the subsystem of the plant where a significant mismatch occurs. In addition, it is preferred to perform the re-identification when no other procedure is possible to improve MPC performance, since other solutions may limit the intervention required in the process. The most relevant issue in model assessment is determining whether the bad performance comes from a PMM or an unmeasured disturbance. Both cause similar effects in the process output. Nevertheless, the corrective action for an unmeasured disturbance might be less invasive than for a PMM. Thus, methodologies focused on identifying PMM were developed.

Badwe et al. [24] propose a methodology based on the analysis of partial correlations between the model residuals and manipulated variables for the detection and isolation of PMM. The main purpose of this approach is to evaluate the correlation between the manipulated variables and the prediction error, isolating the effect of the closed-loop. In an open-loop scenario, the manipulated variables (MVs) are not affected by unmeasured disturbances, so they should not be correlated. A correlation between them when closed-loop effects are removed means that the prediction error is caused by the MVs, and, consequently, there is a PMM. Equation 8 shows the relationship between the prediction error e and the uncorrelated manipulated variable u, where y and

\hat{y}

are the plant and the model controlled variables, respectively, while

Δ u (k)

and v are the PMM and the Gaussian disturbances acting on the process.

e (k) = y (k) - \hat{y} (k) = Δ u (k) + v (k)

(8)

However, even though the Badwe et al. [24] methodology uses only closed-loop operating data for the analysis and does not require intrusive tests on the plant, the system data must be excited. In the absence of target changes, it is not possible to determine the extent of mismatch, which is the main limitation of this methodology.

Botelho et al. [25] developed another methodology to distinguish PMM from unmeasured disturbances. It is a statistical approach based on a nominal output estimation. Equation 9 represents how the nominal controlled variable

y_{0}

is estimated from the controlled variable y, where

S_{0}

is the nominal output sensitivity transfer matrix and

y_{s i m}

is the simulated controlled variable of the nominal model.

y_{0} = S_{0} (y_{s i m} - y) + y

(9)

The main purpose of Equation 9 is also to decouple PMM from controller configuration, since the effect of the PMM in the controlled system depends on the controller tuning. The relation between the nominal outputs and the nominal prediction errors

e_{0}

is then evaluated through the Pearson correlation between their statistical distribution coefficients Z in a moving window, according to Equation 10. The statistical distribution coefficients considered are the skewness and kurtosis coefficients.

c o_{Z} = \frac{c o v (Z_{e_{0}}, Z_{y_{0}})}{\sqrt{v a r (Z_{e_{0}}) \times v a r (Z_{y_{0}})}}

(10)

A high correlation between both variables indicates the presence of PMM. Otherwise, unmeasured disturbances are the cause of prediction errors. The main challenge in implementing this methodology is its dependency on the nominal output sensitivity transfer matrix

S_{0}

, which industrial MPCs do not give explicitly.

Giraldo et al. [26] have made an analysis of the Botelho et al. [25] and the Badwe et al. [24] methodologies. The critic relies on the limitations of those methods for real-time applications, given their relatively high computational cost. Furthermore, the assumptions underlying those methods, which rely on the feasibility of achieving an ideal model, limit their applicability in practical scenarios. Giraldo et al. [26] have proposed an algorithm to distinguish PMM from unmeasured disturbances, trying to mitigate the limitations of the previous methods. However, the method focused on the Filtered Smith Predictor and is also based on the sensitivity transfer function

S_{0}

. Moreover, the methodologies for identifying PMM are focused on distinguishing it from an unmeasured disturbance, requiring other methods to discard other sources of performance loss.

2.4. Industrial Methodologies

Even though the methods discussed so far are very important, since they give evidence of the possibility of detecting and diagnosing poor performance in MPC, they are still not being used in industries. In this section, the methods that Aspen Technology, an industrial software company, implements to monitor advanced controllers are explored.

Aspen Watch Performance Monitor is Aspen Tech’s software for monitoring control loops. Its metrics are grouped into libraries. In version 14.2, the Basic library has 57 metrics, while the Extended library has 86. There are also a library focused on crude oil distillation and a library for quality analysis.

The Extended library has three main groups of metrics: the General’s, the Manipulated Variables’, and the Controlled Variables’ metrics. The General’s metrics evaluate the control system as a whole. This group includes metrics such as the percentage of time the control is on, the percentage of active MVs and CVs, and the percentage of CV constraint violations. The groups of MV and CV metrics include metrics about target deviation, the variable’s standard deviation, constraint violations, oscillation indices, the percentage each variable is on, prediction error, which is the difference between the model prediction and the variable’s measurement, and so on.

Many of the metrics implemented have a literature background. Regarding the variable oscillation indication, Aspen documentation does not supply this metric equation, only the information that the more oscillating the variable is, the closer the metric is to 1. In the literature, some metrics have been developed to detect oscillation in closed-loop systems. Thornhill and Hägglund [27] have proposed an oscillation detector based on the IAE of the system, and it is also in a range

[0, 1]

. Thus, it is likely that Aspen Watch follows a similar approach. The variable’s standard deviation, on the other hand, especially regarding the dependent variable, is aligned with the Harris Index. Even though the software does not use the Harris Index directly to monitor the control-loop, nor use the MVC theory as a benchmark, it does use the variability of the variables as metrics to be evaluated.

A single metric rarely provides a precise diagnosis. Consequently, Aspen Watch has an extensive range of metrics to draw information from the process data. While these metrics offer clues regarding the potential causes of diverging values, the number of possibilities they present means they do not eliminate the need for a deep investigation into the control loop.

2.5. Machine Learning Methodologies

Many researchers have already applied ML algorithms in the chemical engineering context, not only on abnormality detection, but also on signal processing and process modeling [11]. Most applications in the chemical engineering context already focus on modeling, optimization, control, and monitoring [28]. As mentioned before, control performance monitoring may also be part of the FDD spectrum. Thus, ML techniques already have a solid base in the academic context. As more of this kind of methodology is studied, it becomes more reliable and feasible for use in industrial applications.

Early detection and diagnosis of process faults while the plant is still operating in a controllable region can help avoid abnormal event progression and reduce productivity loss. Venkatasubramanian et al. [12,29,30], in their review about FDD, already considered neural networks as a non-statistical method for feature extraction.

One of the conclusions of the survey conducted by Bauer et al. [3] is that, for control-loop monitoring, the simpler the method, the more respondents find it useful. The respondents also asked for better guidance on corrective actions to be taken. de Campos et al. [31] affirms that classic metrics used by Petrobras are not good to diagnose the root cause of the poor performance. It is necessary to use multiple techniques combined, together with a specialized team, to evaluate and correct control performance issues, and still, there are a lot of challenges to evaluate and diagnose effectively the multivariable model predictive controller performance.

Under this scenario, ML techniques have a great potential to diagnose control performance. As it was shown in Section 2.1, research about ML has become popular in the last years. Different methods and models have been developed to address different types of applications, especially in chemical engineering [11,28,32]. This evidence shows that ML models have great potential and are already under development in the PSE area. However, the use of ML tools restrictively for control-loops performance assessment is still under-evaluated. As shown in Section 2.1, only few studies have been conducted to apply the use of ML tools in control performance monitoring, despite its potential. By using the classification task approach, it is possible to assess and diagnose control performance at once, instead of using combined methodologies such as the ones shown in the previous sections.

Most studies on the use of ML techniques to monitor controller performance have considered classic and SISO controllers. Pillay and Govender [33] analyzed the use of a Multi-Class Support Vector Machine (SVM) classifier to detect and diagnose tuning issues in a proportional-integral (PI) controller. Grelewicz et al. [34,35] focused on developing an ML model that can distinguish between acceptable and poor PID performance, independent of the process, and without any additional learning stage.

In studies aimed at MPC, the key object of this study, neural networks have been used to identify the performance index [36], based on the Minimum Variance Control benchmark. The approach was to estimate the minimum variance of the process for the Harris index. The models evaluated were a radial basis function network and a Laguerre network instead of using the ARMA model proposed by the original methodology [8]. Even though this approach showed a great capacity to estimate the Harris Index of closed-loops, it did not explore the NN capacity to identify the degradation of the MPC performance.

Wang et al. [37] also used neural networks, but directly as a classification problem. The approach was to identify MPC performance degradation due to noise variance change, model mismatch, control variables constraint saturation, and manipulated variables constraint saturation. The process evaluated was a two-tank liquid level, and the neural network used was an MLP, being compared to a SVM model. The inputs used were the manipulated and controlled variables, the control effort, and the control errors. Xu et al. [38] evaluated the use of SVM as well, but with a Mahalanobis distance performance index as input. The proposal was to eliminate the correlation between process variables before the classification procedure. The evaluated systems were the Wood-Berry distillation column process and the two-tank liquid level process, with the same classification task performed by Wang et al. [37]. Both studies explored the capacity of MLP to identify the loss of performance of an MPC, but the classes used, especially regarding constraint saturation, could be easily extracted from process data, without the need for a ML model. Furthermore, the classes explored do not give a precise diagnosis of the closed-loop condition, except for the model mismatch.

Loquasto and Seborg [39] conducted a study that might be one of the most relevant in this thematic. They also employed the Wood-Berry process as a case study of a methodology using neural networks to monitor the performance of MPC systems. In this approach, MLP independent classifiers were used in cascade to detect and diagnose the MPC condition. The first MLP is responsible for classifying whether the process is in a normal or abnormal condition. If an abnormality is detected, the second MLP is responsible for detecting the presence of an abnormal disturbance. Finally, the third model is used for diagnosing PMM. The single four-class classifier (normal, disturbance, plant change, and both disturbance and plant change) methodology was also evaluated. Once a model-plant mismatch is detected, another MLP classifier is used to diagnose the specific sub-models that are inaccurate, also considering binary classifications. The MLP inputs are the manipulated and controlled variables, the control effort, the control errors, and the one-step prediction errors. The results presented showed that, in general, all models performed well. The disadvantage of using the multi-class classification is that each sample can belong to only one of the classes determined, while the multi-binary models can be arranged to allow the classification of two problems simultaneously. This study explored different arrangements of ML models and diagnoses focused on PMM. However, they used only one type of ML model and did not explore the possibility of the abnormality being from process instrumentation, which may cause similar effects in the closed-loop, or from the MPC tuning, which might require retuning, especially after operational changes. Furthermore, the process studied - the Wood-Berry process - is linear and has relatively few variables.

Some applications are more specific to each MPC degradation reason. Lu et al. [40] have focused their study only on PMM. The ML tool evaluated was the SVM to discriminate PMM from noise model change. The problem was a binary classification, and the case study was a SISO system. The classification is based on the Finite Impulse Response (FIR) coefficients, which requires an identification procedure before the classification task.

Dambros et al. [41], on the other hand, focused on oscillation detection. The ML algorithm evaluated was a deep feedforward network, and data in the frequency domain was used as input. The ML tool was trained and validated with artificial data before being tested with industrial data. Rabba et al. [42] also proposed the use of an ML algorithm to detect oscillation in a closed-loop process. The ML model used was a XGBoost, and the process was the Tennessee Eastman. A previous step of feature extraction is performed to select the features with a high degree of importance and correlation to be used as inputs. Still focusing on oscillation detection, Akavalappil et al. [43] proposed the use of a CNN using the process variables (PV) and the controller output (OP) signals as input.

The main contribution of all those studies together is their indication that ML tools have the potential to be applied to predictive controller performance monitoring. However, there is a lack of studies that evaluate whether ML can be applied to diagnose the closed-loop condition as a whole, considering different possibilities of the source of fault. Also, there is a lack of studies that expand the evaluation to nonlinear systems or explore their use in complex systems for more direct diagnosis, including process and controller source of degradation. Furthermore, other gaps include the lack of complex processes as case studies, the application of commercial controllers, and the evaluation of deep neural networks.

3. Methodology

3.1. Models in Cascade

Models in cascade have already been used for FDD issues. Vasudevan et al. [44] used a sequence of binary classification models for the detection of intrusion failure in the context of network attacks. Krishna and Kumar [45] employed Decision Tree, Naive Bayes, and KNN algorithms to detect and diagnose faults in a microgrid. Loquasto and Seborg [39] used independent neural networks to monitor and diagnose performance issues in MPC in the Wood-Berry process. However, this methodology has not been widely explored, especially for more complex processes and comparing different machine learning models. Unlike what was previously done, this study compared different ML models, such as Random Forest, MLP, and RNNs, in sequence to detect and diagnose the MPC closed-loop conditions. Also, expanding what was done by Loquasto and Seborg [39], more fault classes were used in order to identify how precise the diagnoses of ML models can be.

In this work, the general scheme presented in Figure 6 is adopted. The AI Monitoring System assumes a sequence of binary classification ML models, organized in cascade, named here Cascade Machine Learning Models (CMLM), used for detecting and diagnosing predictive controllers’ performance issues. This methodology is compared with a Single Machine Learning Model (SMLM) for a multiclass classification, in which detection and diagnosis occur with the same model. The multiclass approach is used in other studies for MPC status classification [37,38].

Figure 7 illustrates the general arrangement of the CMLM and its equivalent in the SMLM approach. In general, the CMLM was designed to discriminate between the control configuration problem and instrumentation and process problems. The first ML model is responsible for detecting whether the closed-loop performance is abnormal. Once abnormalities are detected, the process data goes to the second ML model, which is responsible for diagnosing the first type of fault (Fault 1). If Fault 1 is not detected, the data flows to the next ML model in order to identify the second type of fault (Fault 2). This logic continues until all possible faults are covered by the cascade. The specific adaptation of the CMLM from the general arrangement presented in the figure is based on the physical knowledge of the process and on the main typical source of degradation. Process particularities must be taken into consideration to design the cascade arrangement.

The main advantage of using CMLM is the structure based on process knowledge that mimics the decision-making logic of an operator. Moreover, this approach has the facility for maintenance and adaptations. Chemical processes with a large number of variables and controllers may suffer from a huge number of possible issues in the closed-loop. In case there is a necessity to update the diagnosis or include a new one, for the SMLM approach, the whole model should be retrained and rearranged, while the CMLM allows the possibility to update just one of the structures used or include a new model for a new diagnosis. Besides, CMLM allows the use of different ML algorithms combined, as one technique can be better for one specific diagnosis than others. Furthermore, the technical team can also contribute to the precise diagnosis, removing models from the cascade. For example, if it is certain that the process is not under normal conditions, the operators and engineers can use only the models regarding the diagnosis, skipping the detection model. Moreover, for each diagnosis, different features are expected to be affected. For the SMLM, even though it requires only one hyperparameter optimization and training procedure, all features that might contribute must be used, resulting in a complex ML model. For the CMLM, even though more models should be optimized and trained, different features can be used for each model in the cascade, allowing the construction of simpler models.

Firstly, simulations were run to generate a dataset used for training and testing the ML models. The presence of measurement noise in the dataset was considered. Secondly, ML models were trained following the SMLM approach and the CMLM approach to compare the best type of arrangements. Finally, each of those ML algorithms was compared for each class of diagnosis to elucidate the best ML algorithm for each class. The comparison was carried out by evaluating metric values and the implementation of those ML models.

Each ML model was trained with the same inputs:

The process MVs, with lagged data with up to 2 time-units for non-recurrent models.
The Integral Time-Weighted Absolute Error (ITAE), calculated considering the last 5 samples, of each CV.
The prediction error of each CV.

CVs and MVs are considered to capture information regarding the closed-loop. Lagged MVs are used to capture process dynamics and improve ML predictions. ITAE is used to capture setpoint deviation persistent over time. The prediction error, on the other hand, is used to help ML models identify possible model mismatch. To evaluate the performance of each architecture, the van de Vusse reactor was used as a case study, due to its non-linearity. Details about the simulation are presented in Section 3.5.

The following sections elucidate the ML architectures built. All ML algorithms were implemented in Python through the libraries Scikit-learn and TensorFlow. For each ML algorithm, as part of the preprocessing data procedure, the dataset was scaled by using the StandardScaler function from Scikit-learn. The reason to use this scaler is due to its sensitivity to outliers, which may happen due to the presence of faults. Also, according to LeCun et al. [46], the gradients of back-propagation learning usually converge faster when inputs have a zero mean.

Two metrics were used to compare ML algorithms: Accuracy (Equation 11) and F1-score (Equation 12). For F1-score calculation, the macro average was taken into account, since all classes are equally important. To confirm the results obtained, training was executed 10 times, removing seeds from the split of data and parameters initialization. This execution aims to evaluate the model’s dependency on the data used for training and testing, and also the training initialization.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(11)

F 1 - Score = \frac{2 \times T P}{2 \times T P + F P + F N}

(12)

For each optimized architecture, to confirm the results obtained, the training procedure was executed 10 times, removing seeds from the split of data and parameters initialization, and the metrics were calculated for each one of those independent training runs. This execution aims to evaluate the model’s dependency on the data used for training and testing, and also the training initialization.

3.2. Random Forest

The algorithm was implemented in Python through the library Scikit-learn, using the function RandomForestClassifier. The dataset was split into training, validation, and test sets, using shuffle to choose samples randomly. The proportions chosen were 80% for training and 20% for testing. Optuna was used to optimize Random Forest hyperparameters [47]. The parameters considered for optimization for both case studies were:

Number of estimators between 10 and 100
Maximum depth between 5 and 15
Maximum features among "None", "Sqrt", and "log2"
Minimum samples at each leaf between 5 and 10
Criterion among "gini", "entropy", and "log_loss"

The objective of the optimization was to maximize the mean accuracy of the cross-validation, using the stratified K-Fold with 5 folds. To turn the optimization less computationally expensive, MedianPruner was used to stop the trial if the best intermediate result is worse than the intermediate results of previous trials at the same step. A total of 100 trials were performed to find the best hyperparameters.

3.3. Multilayer Perceptron - MLP Architecture

For all MLPs, a batch size of 64 was considered, since it is an intermediate value for a batch size. A batch size too high leads to unstable learning, while a size too small increases learning time significantly. The dataset was split into training, validation, and test sets, using shuffle to choose samples randomly. The proportions chosen were 64% for training, 16% for validation, and 20% for testing.

The encoding used was One-Hot Encoding for multiclass classification, since, for this type of architecture and problem, it assures orthogonality among classes. For the CMLM, label encoding was used, as each model in the cascade corresponds to a binary classification problem. This choice is due to its simplicity when compared to One-Hot Encoding.

The architecture considered for MLP models used Optuna for hyperparameter optimization [47]. The objective of the optimization was to maximize the validation accuracy at the last epoch (considering a maximum of 50 epochs during optimization). MedianPruner was also used to reduce computational cost. The hyperparameters optimized and the boundaries used for research were:

Number of layers between 2 and 4
Number of neurons at each layer between 8 and 64
The activation function among ReLU, LeakyReLU, and ELU
Learning rate between $10^{- 4}$ and $10^{- 2}$
Batch normalization between each layer: yes or no
Dropout rate between 0.1 and 0.3

The batch size and the

L_{2}

values were kept constant and equal to 64 and 0.0001, to avoid a too large search space for the optimization. The learning rate depends mainly on the batch size [48], as well as the

L_{2}

regularization, which directly affects the weight values and, consequently, the number of parameters in the model during optimization.

3.4. Recurrent Neural Network

Since the process data is a time series, GRU was evaluated in diagnosing the controller’s conditions. A previous study was conducted to evaluate the three types of RNN models: Simple RNN, for being the simplest RNN algorithm; LSTM, for having superior performance for long memory; and GRU, for having a lower computational cost than LSTM. The study has shown that GRU has a good performance and a lower computational cost [49]. The dataset was split into training, validation, and test sets, using shuffle to choose samples randomly. The proportions chosen were 64% for training, 16% for validation, and 20% for testing.

The data length to be analyzed by GRU was half the settling time, which is 20 samples for the Van de Vusse reactor, equivalent to 3 minutes of simulation. Based on the window size defined, the data was reshaped into a 3D tensor. The features used were the same for MLP and RF: the manipulated variables, the ITAE from the last 5 points, and the prediction error of each controlled variable. Since this type of algorithm already captures process dynamics, lagged data was not considered.

For the Optuna optimization HyperbandPruner was used for pruning, since it is more efficient to reduce computational cost. The following hyperparameters were considered for optimization:

Number of layers between 1 and 2
Number of neurons at each layer between 8 and 64
The activation function between ReLU, and tanh
Learning rate between $10^{- 5}$ and $10^{- 2}$
Layer normalization between each layer: yes or no
Dropout rate, between 0.1 and 0.5

3.5. Case Study: van de Vusse Reactor Controlled by NMPC

The system under study is the van de Vusse reactor, presented in Figure 8, proposed firstly by van de Vusse [50], and has been used as a benchmark for nonlinear control problems. The reactor modeling and design have been developed throughout the years, especially by Kantor [51] and Engell and Klatt [52,53]. It is a reaction to produce cyclopentenol (B) from cyclopentadiene (A), with the generation of cyclopentanediol (C) and dicyclopentathiene (D) as byproducts. The reaction scheme is given by [50]:

\begin{matrix} A \overset{K_{1}}{\to} B \overset{K_{2}}{\to} C \\ 2 A \overset{K_{3}}{\to} D \end{matrix}

(13)

The feed flow, with a concentration of

C_{A i n}

of reagent A, gets into a CSTR reactor with a temperature

T_{i n}

and a flow rate of F. The control objective is to control the production of B (

C_{B}

) and, consequently, control the reaction temperature (T) by using the heat exchanger duty

\dot{Q}

and the flow and reactor volume ratio (

u_{1} = \frac{F}{V}

) as manipulated variables. The equations that describe the van de Vusse reactor are given by:

\begin{matrix} \frac{d C_{A}}{d t} = & \frac{F}{V} (C_{A, i n} - C_{A}) - K_{1} (T) C_{A} - K_{3} (T) C_{A}^{2} \\ \frac{d C_{B}}{d t} = & \frac{F}{V} C_{B} + K_{1} (T) C_{A} - K_{2} (T) C_{B} \\ \frac{d T_{k}}{d t} = & \frac{\dot{Q} + K_{w} A_{r}}{m_{k} C_{p, k}} (T - T_{k}) \\ \frac{d T}{d t} = & \frac{K_{1} (T) C_{A} (- Δ H_{A B}) + K_{2} (T) C_{B} (- Δ H_{B C}) + K_{3} (T) C_{A}^{2} (- Δ H_{A D})}{ρ C_{P}} + \\ \frac{F}{V} (T_{i n} - T) + \frac{K_{w} A_{r}}{ρ C_{P} V} (T_{k} - T) \end{matrix}

(14)

in which

T_{k}

,

m_{k}

,

C_{p, k}

are the outlet temperature, the mass, and the heat capacity of the coolant;

A_{r}

and

K_{w}

are the heat exchange area and the heat transfer coefficient;

ρ

and

C_{P}

are the reaction mixture’s specific mass and heat capacity.

Δ H_{i}

and

K_{i} (T)

are the molar enthalpy of reaction and the reaction rate constant of i. This last one depends on the reactor’s temperature according to Arrhenius Law (Equation 15):

K_{i} (T) = k_{0, i} exp (- \frac{E_{A, i}}{R (T + 273.15)})

(15)

The values for each parameter considered for the simulation were the same explored by Engell and Klatt [52]. The hypotheses considered for the process equation are:

Perfect mixing;
Constant volume;
Constant heat capacity;
Incompressible fluid;
Elementary kinetics for all reactions;
Reaction rate constants described by the Arrhenius Law;
No inlet flow of B, C and D.

3.5.1. Control System

The reactor is controlled by a nonlinear MPC (NMPC) following the implementation of Lima et al. [54], with the presence of a Constrained Extended Kalman Filter (CEKF). All simulations were run considering the same steady state as the initial process condition. The simulations were carried out using the same structure adopted by Lima et al. [54], considering the measurement vector

z = {[C_{A}, C_{B}, T, T_{K}]}^{⊺}

, the input vector

u = {[F / V, \dot{Q} / (K_{w} A_{r})]}^{⊺}

,

C_{A, i n}

as an unmeasured disturbance, and

y = {[C_{B}, T]}^{⊺}

as controlled variable. The upper and lower bounds of

\tilde{x} (k) = {[C_{A}, C_{B}, T, T_{K}, C_{A, i n}]}^{⊺}

to compose the NMPC and the CEKF are given by Equation 16:

[\begin{matrix} 0 \\ 0 \\ 50 \\ 50 \\ 0.1 \end{matrix}] \leq \tilde{x} (k) \leq [\begin{matrix} 4 \\ 3 \\ 200 \\ 200 \\ 6.1 \end{matrix}]

(16)

The manipulation variability constraints assumed are given by Equation 17:

[\begin{matrix} - 50 \\ - 0.15 \end{matrix}] \leq [\begin{matrix} Δ (F / V) \\ Δ (\dot{Q} / (K_{w} A_{r})) \end{matrix}] \leq [\begin{matrix} 50 \\ 0.15 \end{matrix}]

(17)

A timestep of 2.5 ×

10^{- 3}

h was employed. The weights and horizons considered for NMPC are given in Table 1. The initial conditions used for each scenario are given by Table 2.

The CEKF was implemented as a state estimator, for which the parameter values are presented in Equations 18-20. The CEKF estimates

\tilde{x} (k)

together with

k_{0, 1}

and

C_{P}

by considering the measurements of

\tilde{x} (k)

. Lima et al. [54] employed the CEKF to evaluate its performance under some scenarios, including PMM in the

k_{0, 1}

and in the

C_{P}

. In this work, the implementation was kept to filter out noise and estimate unmeasured states, as well as being able to estimate some parameters to update the model.

Initial Covariance = [10^{3}, 10^{3}, 10^{3}, 10^{3}, 10^{3}, 10^{3}, 10^{3}]

(18)

Model Uncertainties = [3 \times 10^{- 6}, 5 \times 10^{- 6}, 10^{- 3}, 10^{- 3}, 10^{- 3}, 10^{- 2}, 10^{- 2}]

(19)

Measurement Uncertainties = [3 \times 10^{- 2}, 5 \times 10^{- 3}, 3 \times 10^{- 1}, 8 \times 10^{- 1}]

(20)

3.5.2. Scenarios Conditions

To evaluate the capability of ML tools to predict closed-loop issues, the following scenarios were simulated in the van de Vusse reactor: normal conditions, valve stiction, wrong tuning settings, and plant model mismatch.

The normal operating conditions of the process consist of data generated under typical regulatory control actuation, representing fault-free operation in which disturbances in

C_{A, i n}

happen. Disturbances in

C_{A, i n}

occur randomly every 0.2 hour (12 min) during the simulation, following a normal (Gaussian) distribution, as defined in Equation 21. A total of 10,000 samples were generated in this condition.

C_{A, i n} = 5.1 \times [1 + 0.02 \times (r a n d o m (m e a n = 0, s t . d e v . = 1))]

(21)

For a valve stiction simulation, a simple stiction model, employed by Hägglund [55] and presented in Equation 22, was considered. A dead band, presented by d, is the percentage of the valve operational range in which the valve does not change its position.

u_{s}

corresponds to the

F / V

value in the process, and u corresponds to the control effort found by the NMPC. In this model, the value of the inlet flow is stuck if the difference between the controller output and the variable value is within the dead band. When this difference is greater than the band d, the inlet flow slips to the value given by the controller. Table 3 shows the number of samples generated considering different dead band values. The total of samples under this condition is 10,000.

u_{s} (k) = \{\begin{matrix} u_{s} (k - 1) & if | u (k) - u_{s} (k - 1) | < d \\ u (k) & otherwise \end{matrix}

(22)

In the simulations of Incorrect Tuning Settings, the prediction and control horizons were reduced relative to those used in the normal scenario to intentionally degrade controller performance. Table 4 shows the different combinations used in the simulation to generate the dataset. For each combination, 1,000 samples were generated, to sum up a total of 10,000 samples under this condition.

For the scenario with PMM, constant changes in the parameters

C_{P}

and the pre-exponential constant

k_{0, 1}

are considered. From Equation 14, it is possible to observe that

K_{1}

has a direct effect on concentrations and temperatures, while

C_{P}

seems to have an inverse effect on temperature and a more nonlinear effect on concentrations. Lima et al. [54] used a difference of 20% in the value of the specific heat of the mixture (

C_{P}

) and 5% in the value of the pre-exponential constant of

K_{1}

(

k_{0, 1}

) to evaluate how the CEKF handles PMM.

A previous analysis was made to evaluate the impact that different changes in those variables could have on the controller. The results showed that a model-mismatch with values up to 5% for

k_{0, 1}

and 20% for Cp are handled by the controller. Based on this analysis, different percentages of plant-model mismatches were used for each parameter individually. Percentages of 25%, 30%, 50%, 75%, 80%, 120%, 125%, 200%, 300%, and 400% of the

C_{P}

nominal value were used. A similar approach was made to generate the dataset with plant-model mismatch in the parameter k

k_{0, 1}

, in which the percentages of 50%, 60%, 70%, 90%, 95%, 105%, 110%, 150%, 175% and 200% of the k

k_{0, 1}

nominal value were used. For each condition, 1,000 samples were generated, to sum up a total of 10,000 samples under each parameter mismatch.

3.5.3. Monitoring Strategies: Proposed CMLM and SMLM Approaches

Figure 9 illustrates how the CMLM and the SMLM models are organized for the van de Vusse case study. The CMLM was designed to segregate the instrumentation and process problem from the control configuration problem. The first ML model is responsible for detecting whether the closed-loop performance is abnormal. Once abnormalities are detected, the process data goes to the second ML model, which is responsible for diagnosing whether the issue is due to the process actuator, such as valve stiction, or internal control configuration. Once an internal control configuration issue is detected, the third ML model is used for diagnosing poor control tuning or PMM. Finally, if a PMM is identified, the final model is responsible for diagnosing which of the two process model parameters should be evaluated.

3.6. Standard Methodology

One traditional methodology used in industry is to use the standard deviation of variables to capture variability changes. This analysis is based on the minimum variance control theory and the Harris Index [8] to monitor control performance. The evaluation of standard deviation changes over time was performed to explore the potential of ML models to detect and diagnose sources of performance degradation against traditional methods. For this analysis, the standard deviation time history of each MV (

F / V

and

\dot{Q}

) and CV (

C_{B}

and T) was plotted using a moving window of 100 samples (15 min of simulation).

4. Results and Discussion

The noisy data was generated to evaluate the ML models’ robustness under measurement uncertainty. The noise follows a Gaussian distribution and was implemented on the measurement of

C_{B}

and T, with the percentages of the random values of 5% and 2%. The Gaussian distribution has a mean of 0 and a standard deviation of 1, computed as presented in Equation 23.

y_{m e a s_{n o i s e}} = y_{m e a s} \times [1 + percentage \times (r a n d o m (m e a n = 0, s t . d e v . = 1))]

(23)

For a fair evaluation of the ML models’ robustness, they were employed as if they were being implemented offline in a real process, a procedure called here the Realistic Implementation. It was executed by generating a new dataset in the same scenarios as the training and test datasets. However, since the disturbance and the noise are randomly applied to the process, this new set of data will evaluate the generalization capabilities of each ML model and arrangement.

4.1. Random Forest

Table 5 shows the results obtained for the test dataset for each model in the CMLM and for the SMLM. All models in the CMLM have better metric values than the SMLM. Model 1, responsible for detecting abnormalities, had the lowest performance among the models in CMLM.

The results from multiple training sessions (Table 6) show once again that the results are relatively stable and do not depend heavily on the split or the weights initialization. They also reinforce the conclusions about the performance of Random Forest for noisy data.

After the overall implementation, the normalized confusion matrices presented in Figure 10 are obtained. For the Random Forest technique, the CMLM approach seems to have a lower performance when compared to the single-model approach. The main low-performance model of the CMLM is the detection model (Model 1). It is noticeable that many samples are misclassified as under normal conditions, when actually they are under abnormal conditions. The main misleading class is the valve stiction. This might happen due to the use of Accuracy as an optimization function during ML model training. Since the use of models in CMLM creates an artificial imbalance in the dataset, especially in the first model, the high Accuracy might be due to the correct classification of the majority class.

4.2. Multilayer Perceptron

Table 7 shows the results obtained for MLP models. By comparing the CMLM models against the SMLM, Model 1 was the only model with a lower performance than the SMLM. It also has the lowest performance among all models in the CMLM, similar to what was observed by using Random Forest. This leads to the conclusion that MLP also struggles to detect abnormalities in the van de Vusse data with noise.

Table 8 shows the results after 10 independent training runs of each model. The highest variability relies on the SMLM. This reflects how stable the results are.

Figure 11 shows the results obtained for the MLPs. The diagnosis of normal data was the most affected one, with a lower performance for CMLM. In contrast, other diagnoses, especially with valve stiction, were mislabeled as normal for the SMLM. This suggests that, in this case, the SMLM approach yields false negatives in abnormality detection. The CMLM approach, on the other hand, seems to give more false positives, since some normal data was misclassified as abnormal. The valve stiction diagnosis was better for the CMLM, while the diagnosis of mismatch in the parameter

C_{P}

was also slightly better for the CMLM approach. The diagnosis of a mismatch in the parameter

k_{0, 1}

was quite similar for both models.

4.3. Gated Recurrent Unit - GRU

Melo [49] have conducted an analysis, showing that the window size of 20 samples is enough to capture correlations in the dataset. Table 9 shows the results for the GRU models. Besides Model 2, responsible for diagnosing valve stiction, the remaining models have metric values higher than 0.95.

When the results of multiple independent training runs are evaluated (Table 10), the patterns observed remain the same. Model 4 of the cascade achieved 1.0000 for all 10 trainings. The remaining models also have stable results, especially Models 2 and 3.

Figure 12 shows the results for the overall implementation. It is noticeable that SMLM and CMLM have similar performance, except for valve stiction, where the SMLM approach seems higher. However, contrary to what was observed in RF and MLP algorithms, the valve stiction data were not misclassified as normal. The misclassification was a mix of other control performance problems, regarding tuning and model-plant mismatch. Since the recurrent models use a bigger window than the one given to MLP or RF, for this kind of model, the difference between normal and valve stiction classes is more evident. The underperformance of the CMLM might still be due to the imbalance of the dataset.

4.4. Implementation - Realistic

By executing the realistic implementation, the results obtained follow the pattern observed in the overall implementation: the CMLM approach seems to have a performance similar to the single-model approach, except for detecting whether the process is with abnormalities.

By comparing the SMLM with the CMLM approach, with results presented in Figure 13, considering first the RF model, the classification of normal was better for the CMLM, while the valve stiction and tuning issues classes were better for the SMLM approach. The model-plant mismatch diagnosis was slightly similar for both approaches. The same pattern of underperformance in the CMLM might be due to the imbalance in the dataset.

Evaluating the MLPs (Figure 14), the performance was similar for model-plant mismatch classes. The classification of valve stiction was better for the CMLM, while normal and tuning classifications were better for the SMLM approach. Again, the underperformance of the CMLM is due to Model 1, which is detecting the tuning-issues data incorrectly as normal, while normal data is being classified as abnormal. This might be happening due to the imbalance in the dataset.

Finally, considering the GRU algorithm, Figure 15 shows the results obtained. The first noticeable difference when compared to previous algorithms is that this type of ML model did not perform well for model-plant mismatch in

K_{0, 1}

. As an advantage, GRU in CMLM was better for classifying data in normal conditions and with valve stiction, which was hardly done by the previous algorithms in CMLM. This suggests the possibility of using different ML algorithms for closed-loop diagnosis, once different models have different performance, depending on the issue.

The CMLM shows to be better at classifying model-plant mismatch in the

K_{0, 1}

than the SMLM, while normal and model-plant mismatch in the

C_{P}

were quite similar. When compared to the other recurrent ML models in CMLM, this algorithm has better results for classifying mismatch in the

K_{0, 1}

, even though the Accuracy is lower than that obtained from RF and MLP.

4.5. Standard Methodology

To evaluate the potential of CMLM against traditional methodologies, the evaluation of the standard deviation changes through time was performed. Figure 16 shows the standard deviation history for each MV and CV of the van de Vusse reactor. Visually, both the normal (blue area) and the valve stiction (green area) have the same range of standard deviation values, except for some samples equal to 0 for the

F / V

.

Poor NMPC tuning and mismatches in its internal model parameters cause the standard deviation range for some variables to be different than those observed in normal conditions, especially in the tuning changes. However, the standard deviation gives information regarding a specific variable and does not give evidence about the root cause of performance degradation. A more accurate diagnosis would require an exhaustive data investigation.

5. Conclusions

In this work, a review of the different methodologies available for predictive controllers’ performance monitoring is presented. Most of them rely on monitoring linear MPCs, and with assumptions that limit their application online in industry. Commercial controllers have their own metrics implemented to monitor the control system, with a lot of them based on the literature. However, they are inefficient in giving the root cause of degradation. An FDD methodology based on cascaded binary classification using machine learning models and incorporating predictive controller performance monitoring was proposed and investigated. The proposed methodology was compared to multiclass ML models for different types of ML models. To provide a solid validation case, the van de Vusse reactor, a classical and highly nonlinear benchmark system, was investigated. Although based on data-driven (ML-based) models, the methodology also incorporates knowledge-based reasoning through the proposed cascade architecture, which mimics the decision-making logic of an operator (i.e., “normal operation”, “instrumentation OK”, “process OK”, and “controller performance OK”). In this sense, the approach can be regarded as hybrid in nature, providing advantages such as more direct interpretability.

The main findings of this study include that there is no universal ML model that is better for controller performance monitoring. The best approach depends on the process and the evaluated diagnosis. Random Forest and MLP require smaller datasets for training and have relatively low computational cost. Random Forest is particularly effective when the diagnosis involves evaluating whether variables cross predefined thresholds. MLP, on the other hand, is well-suited to capturing nonlinear relationships between variables. However, when the diagnosis depends on a time-dependent characteristic, RNNs are better suited to make a diagnosis based on a window of data, even though they require a larger dataset for training. Models in cascade give the possibility of combining the best ML models for each case. Even though the SMLM approach outperformed the CMLM in many architectures, it limits the methodology to a single ML algorithm. As observed in the realistic implementation, the diagnosis of some root causes may be compromised by the SMLM approach. In the realistic implementation, it was possible to observe that the RNNs were better at distinguishing between normal conditions and valve stiction conditions, while the remaining diagnoses were best performed with RF and MLP.

For future work, it is recommended to further investigate the integrated framework using different ML models and more industrial applications as case studies, while assessing computational cost limitations in industrial applications. Additional efforts should consider multi-objective optimization (balancing accuracy and model complexity), cascade structures with heterogeneous ML techniques, and the combined use of single and cascade models.

Funding

This study was financed in part by CAPES - Finance Code 001 and by Petrobras S.A. (Cooperation term no. 0050.0125244.23.9). Professors Maurício B. de Souza Jr. and Argimiro R. Secchi are grateful to financial support from CNPq (Grants No. 304190/2025-0 and 300744/2025-0), and Prof. Maurício B. de Souza Jr. is also thankful to FAPERJ (Grant No. E-26/200.532/2026).

Abbreviations

The following abbreviations are used in this manuscript:

CEKF	Constrained Extended Kalman Filter
CMLM	Cascade Machine Learning Models
CPA	Control Performance Assessment
CV	Controlled Variable
FDD	Fault Detection and Diagnosis
GRU	Gated Recurrent Unit
ITAE	Integral Time-weighted Absolute Error
LSTM	Long Short-Term Memory
MLP	Multilayer Perceptron

ML	Machine Learning
MPC	Model Predictive Control
MVC	Minimum Variance Control
MV	Manipulated Variable
NMPC	Nonlinear Model Predictive Control
NN	Neural Network
PMM	Plant-Model Mismatch
RF	Random Forest
RNN	Recurrent Neural Networks
SMLM	Single Machine Learning Model

References

Schwenzer, M.; Ay, M.; Bergs, T.; Abel, D. Review on model predictive control: an engineering perspective. Int. J. Adv. Manuf. Technol. 2021, 117, 1327–1349. [Google Scholar] [CrossRef]
García, C.E.; Prett, D.M.; Morari, M. Model predictive control: Theory and practice—A survey. Automatica 1989, 25, 335–348. [Google Scholar] [CrossRef]
Bauer, M.; Horch, A.; Xie, L.; Jelali, M.; Thornhill, N. The current state of control loop performance monitoring – A survey of application in industry. J. Process Control 2016, 38, 1–10. [Google Scholar] [CrossRef]
Carelli, A.C.; de Souza, M.B. Stochastic and Deterministic Performance Assessment of PID and MPC Controllers: Application to a Hydrotreater Reactor. In 10th International Symposium on Process Systems Engineering: Part A; de Brito Alves, R.M., do Nascimento, C.A.O., Biscaia, E.C., Eds.; Elsevier: Computer Aided Chemical Engineering , 2009; Vol. 27, pp. 1635–1640. [Google Scholar] [CrossRef]
Forbes, M.G.; Patwardhan, R.S.; Hamadah, H.; Gopaluni, R.B. Model Predictive Control in Industry: Challenges and Opportunities. IFAC-PapersOnLine 9th IFAC Symposium on Advanced Control of Chemical Processes ADCHEM 2015, 2015; 48, pp. 531–538. [Google Scholar] [CrossRef]
Rawlings, J.B. Tutorial Overview of Model Predictive Control. IEEE Control Syst. Mag. 2000, 20, 38–52. [Google Scholar] [CrossRef]
Henson, M.A. Nonlinear model predictive control: current status and future directions. Comput. Chem. Eng. 1998, 23, 187–202. [Google Scholar] [CrossRef]
Harris, T.J. Assessment of Control Loop Performance. Can. J. Chem. Eng. 1989, 67, 856–861. [Google Scholar] [CrossRef]
Yağcı, M.; Forsman, K.; Böling, J.M. A Machine Learning Classifier for Detection of Performance Issues in Industrial Closed-Loop PID Controllers. Proceedings of the 2024 10th International Conference on Control, Decision and Information Technologies (CoDIT) 2024, 1231–1236. [Google Scholar] [CrossRef]
Birkle, C.; Pendlebury, D.A.; Schnell, J.; Adams, J. Web of Science as a data source for research on scientific and scholarly activity. Quant. Sci. Stud. 2020, 1, 363–376. [Google Scholar] [CrossRef]
Özdemir, P.; Yıldırım, R. ML@ChemE: Past, Present, and Future of Machine Learning in Chemical Engineering. ChemBioEng Rev. 2025, 12, e70012. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1002/cben.70012. [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S.N.; Yin, K. A review of process fault detection and diagnosis: Part III: Process history based methods. Comput. Chem. Eng. 2003, 27, 327–346. [Google Scholar] [CrossRef]
Desborough, L.; Harris, T. Performance assessment measures for univariate feedback control. Can. J. Chem. Eng. 1992, 70, 1186–1197. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1002/cjce. [CrossRef]
Åström, K.J. Computer Control of a Paper Machine—an Application of Linear Stochastic Control Theory. IBM J. Res. Dev. 1967, 11, 389–405. [Google Scholar] [CrossRef]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley and Sons Inc.: Hoboken, New Jersey, 2015. [Google Scholar]
Harris, T.J.; Boudreau, F.; MacGregor, J.F. Performance assessment of multivariable feedback controllers. Automatica 1996, 32, 1505–1518. [Google Scholar] [CrossRef]
Huang, B.; Shah, S.; Kwok, E. Good, bad or optimal? Performance assessment of multivariable processes. Automatica 1997, 33, 1175–1183. [Google Scholar] [CrossRef]
Huang, B.; Shah, S.L. Performance Assessment of Control Loops; Springer London: London, 1999. [Google Scholar] [CrossRef]
Tyler, M.L.; Morari, M. Performance monitoring of control systems using likelihood methods. Automatica 1996, 32, 1145–1162. [Google Scholar] [CrossRef]
Uduehi, D.; Ordys, A.; Grimble, M. A generalized predictive control benchmark index for SISO systems. In Proceedings of the Proceedings of the 41st IEEE Conference on Decision and Control, 2002., 2002; Vol. 2, pp. 1796–1801 vol.2. [Google Scholar] [CrossRef]
Qin, S.J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220–234. [Google Scholar] [CrossRef]
Kresta, J.V.; Macgregor, J.F.; Marlin, T.E. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 1991, 69, 35–47. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1002/cjce.5450690105. [CrossRef]
ZHANG, Q.; LI, S. Performance Monitoring and Diagnosis of Multivariable Model Predictive Control Using Statistical Analysis1 1Supported by the National Natural Science Foundation of China (Nos.60474051, 60534020), the Key Technology and Development Program of Shanghai Science and Technology Department (No.04DZ11008), and the Program for New Century Excellent Talents in the University of China (NCET). Chin. J. Chem. Eng. 2006, 14, 207–215. [Google Scholar] [CrossRef]
Badwe, A.S.; Gudi, R.D.; Patwardhan, R.S.; Shah, S.L.; Patwardhan, S.C. Detection of model-plant mismatch in MPC applications. J. Process Control;Spec. Sect. Hybrid. Syst. Model. Simul. Optim. 2009, 19, 1305–1313. [Google Scholar] [CrossRef]
Botelho, V.R.; Trierweiler, J.O.; Farenzena, M. Diagnosis of Poor Performance in Model Predictive Controllers: Unmeasured Disturbance versus Model–Plant Mismatch. Ind. Eng. Chem. Res. 2016, 55, 11566–11582. [Google Scholar] [CrossRef]
Giraldo, S.A.; Melo, P.A.; Secchi, A.R. Enhanced control in time-delay processes: Diagnostic, monitoring, and self-tuning strategies for the filtered Smith predictor in response to model-plant mismatch and abrupt load disturbances. Control Eng. Pract. 2024, 145, 105869. [Google Scholar] [CrossRef]
Thornhill, N.; Hägglund, T. Detection and diagnosis of oscillation in control loops. Control Eng. Pract. 1997, 5, 1343–1354. [Google Scholar] [CrossRef]
Trinh, C.; Meimaroglou, D.; Hoppe, S. Machine Learning in Chemical Product Engineering: The State of the Art and a Guide for Newcomers. Processes 2021, 9. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S.N. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Comput. Chem. Eng. 2003, 27, 293–311. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S.N. A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies. Comput. Chem. Eng. 2003, 27, 313–326. [Google Scholar] [CrossRef]
de Campos, M.C.M.M.; de Carvalho Gomes, M.V.; Perez, J.M.G.T. Controle Avançado e Otimização na Indústria do Petróleo; Editora Interciência: Rio de Janeiro, 2013. [Google Scholar]
Schweidtmann, A.M.; Esche, E.; Fischer, A.; Kloft, M.; Repke, J.U.; Sager, S.; Mitsos, A. Machine Learning in Chemical Engineering: A Perspective. Chem. Ing. Tech. 2021, 93, 2029–2039. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1002/cite.202100083. [CrossRef]
Pillay, N.; Govender, P. Multi-class SMVs for automatic performance classification of closed loop controllers. Control Eng. Appl. Inform. 2017, 19, 3–12. [Google Scholar]
Grelewicz, P.; Nowak, P.; Khuat, T.T.; Czeczot, J.; Klopot, T.; Gabrys, B. Practical implementation of computationally-efficient machine learning-based control performance assessment system for a class of closed loop systems. Appl. Soft Comput. 2023, 146, 110690. [Google Scholar] [CrossRef]
Grelewicz, P.; Khuat, T.T.; Czeczot, J.; Nowak, P.; Klopot, T.; Gabrys, B. Application of Machine Learning to Performance Assessment for a Class of PID-Based Control Systems. IEEE Trans. Syst. Man. Cybern. Syst. 2023, 53, 4226–4238. [Google Scholar] [CrossRef]
Zhou, Y.; Wan, F. A neural network approach to control performance assessment. Int. J. Intell. Comput. Cybern. 2008, 1, 617–633. [Google Scholar] [CrossRef]
Wang, L.; Li, N.; Li, S.; Li, K. Neural network based model predictive control performance monitoring-data-driven approach. In Proceedings of the 2013 9th Asian Control Conference (ASCC), 2013; pp. 1–6. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, G.; Li, N.; Zhang, J.; Li, S.; Wang, L. Data-Driven Performance Monitoring for Model Predictive Control Using a mahalanobis distance based overall index. Asian J. Control 2019, 21, 891–907. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1002/asjc.1782. [CrossRef]
Loquasto, F.; Seborg, D.E. Monitoring Model Predictive Control Systems Using Pattern Classification and Neural Networks. Ind. Eng. Chem. Res. 2003, 42, 4689–4701. [Google Scholar] [CrossRef]
Lu, Q.; Gopaluni, R.B.; Forbes, M.G.; Loewen, P.D.; Backström, J.U.; Dumont, G.A. Model-Plant Mismatch Detection with Support Vector Machines. In IFAC-PapersOnLine; 20th IFAC World Congress, 2017; Volume 50, pp. 7993–7998. [Google Scholar] [CrossRef]
Dambros, J.W.V.; Trierweiler, J.O.; Farenzena, M.; Kloft, M. Oscillation Detection in Process Industries by a Machine Learning-Based Approach. Ind. Eng. Chem. Res. 2019, 58, 14180–14192. [Google Scholar] [CrossRef]
Rabba, D.F.; Wardana, A.N.I.; Effendy, N. Intermittent Oscillation Diagnosis in a Control Loop Using Extreme Gradient Boosting. J. Nas. Tek. Elektro 2022, 11. [Google Scholar] [CrossRef]
Akavalappil, V.; Radhakrishnan, T.K.; Kathari, S. Detection and Classification of Oscillations in Process Control Loops Using Deep Learning Techniques. Adv. Control Appl. 2025, 7, e70030. [Google Scholar] [CrossRef]
Vasudevan, S.; Chokshi, I.; Ranganathan, R.; Sundaram, N. Sequential Binary Classification for Intrusion Detection. arXiv 2025, arXiv:cs. [Google Scholar] [CrossRef]
Krishna, A.T.S.; Kumar, H.R. Fault Detection and Classification for DC Microgrid using Binary Classification Models. In Proceedings of the 2023 International Conference on Control, Communication and Computing (ICCC), 2023; pp. 1–6. [Google Scholar] [CrossRef]
LeCun, Y.A.; Bottou, L.; Orr, G.B.; Müller, K.R. Efficient BackProp. In Neural Networks: Tricks of the Trade: Second Edition; Springer Berlin Heidelberg: Berlin, Heidelberg, 2012; pp. 9–48. [Google Scholar] [CrossRef]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework, 2019. arXiv arXiv:cs.
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 2 ed.; O’Reilly Media, Inc.: Sebastopol, CA, 2019. [Google Scholar]
Melo, E.V. Machine learning for performance diagnosis of predictive controllers. Master’s thesis, Chemical Engineering Program, COPPE, Universidade Federal do Rio de Janeiro, 2026. [Google Scholar]
van de Vusse, J. Plug-flow type reactor versus tank reactor. Chem. Eng. Sci. 1964, 19, 994–996. [Google Scholar] [CrossRef]
Kantor, J.C. Stability of State Feedback Transformations for Nonlinear Systems - Some Practical Considerations. In Proceedings of the 1986 American Control Conference, 1986; pp. 1014–1016. [Google Scholar] [CrossRef]
Engell, S.; Klatt, K.U. Gain scheduling control of a non-minimum-phase CSTR. In Proceedings of the Proceedings of the 2nd European Control Conference, 1993; pp. 2323–2328. [Google Scholar]
Engell, S.; Klatt, K.U. Nonlinear control of a nonminimum-phase CSTR. In Proceedings of the Proceedings of the 1993 American Control Conference, 1993; pp. 2041–2045. [Google Scholar]
Lima, F.A.R.D.; Faria, R.d.R.; Curvelo, R.; Cadorini, M.C.F.; Echeverry, C.A.G.; de Souza, M.B.; Secchi, A.R. Influence of Estimators and Numerical Approaches on the Implementation of NMPCs. Processes 2023, 11. [Google Scholar] [CrossRef]
Hägglund, T. A shape-analysis approach for diagnosis of stiction in control valves. Control Eng. Pract. 2011, 19, 782–789. [Google Scholar] [CrossRef]

Figure 1. Number of publications per year mentioning the controller performance monitoring.

Figure 2. Number of publications per year mentioning ’controller performance monitoring’ and ’model predictive controller’.

Figure 3. Number of publications per year mentioning ’artificial intelligence’ and ’control performance monitoring’.

Figure 4. Number of publications per year mentioning ’machine learning’ and ’fault detection and diagnosis’.

Figure 5. Number of publications per year mentioning ’fault detection and diagnosis’ and ’control performance monitoring’.

Figure 6. Controlled process integrated with the AI-based monitoring system.

Figure 7. General arrangement for the ML models for the CMLM (a) and the SMLM (b).

Figure 8. Van de Vusse reactor representation.

Figure 9. CMLM (a) and SMLM (b) applied to the van de Vusse reactor case study.

Figure 10. Normalized confusion matrices for the SMLM (a) and the CMLM (b) using Random Forest models after the overall implementation.

Figure 11. Normalized confusion matrices for the SMLM (a) and the CMLM (b) using MLP models after the overall implementation.

Figure 12. Normalized confusion matrices for the SMLM (a) and the CMLM (b) using GRU models after the overall implementation.

Figure 13. Normalized confusion matrices for the SMLM (a) and the CMLM (b) using Random Forest models after the realistic implementation.

Figure 14. Normalized confusion matrices for the SMLM (a) and the CMLM (b) using MLP models after the realistic implementation.

Figure 15. Normalized confusion matrices for the SMLM (a) and the CMLM (b) using GRU models after the realistic implementation.

Figure 16. Standard deviation for each variable of the van de Vusse reactor.

Table 1. NMPC parameter values.

Parameters	Value
Prediction Horizon	40
Control Horizon	10
CV weight ( $p_{k}$ ) of $C_{B}$	5
CV weight ( $p_{k}$ ) of T	5 × $10^{- 2}$
$Δ u$ weight ( $q_{k}$ ) of $F / V$	1 × $10^{- 5}$
$Δ u$ weight ( $q_{k}$ ) of $\dot{Q} / (K_{w} A_{r})$	9 × $10^{- 3}$

Table 2. Initial conditions used in the van de Vusse simulation.

$C_{A}$	$C_{B}$	T	$T_{K}$	$F / V$	$\frac{\dot{Q}}{K_{w} A_{r}}$	$C_{A, i n}$	$T_{i n}$
2.4 mol/L	1.1 mol/L	140 °C	140 °C	85 h⁻¹	$- 0.04$ K	5.1 mol/L	130 °C

Table 3. Number of samples generated for each valve stiction scenario.

Dead Band	Number of Samples
0.2%	4,000
1%	2,000
5%	4,000

Table 4. NMPC parameter values.

Prediction Horizon	Control Horizon
10	3
10	8
10	10
11	10
13	10
15	10
15	13
20	5
20	10
20	15

Table 5. Results for Random Forest on the test dataset.

Model	Accuracy	F1-Score
SMLM	0.8065	0.8105
Model 1: Detection	0.8423	0.8398
Model 2: Valve Stiction	0.9082	0.8872
Model 3: NMPC Tuning	0.9763	0.9734
Model 4: PMM parameter	0.9652	0.9652

Table 6. Results for RF on multiple training sessions.

Model	Accuracy	F1-Score
SMLM	0.8087 ± 0.0037	0.8128 ± 0.0038
Model 1: Detection	0.8379 ± 0.0023	0.8351 ± 0.0024
Model 2: Valve Stiction	0.9104 ± 0.0029	0.8885 ± 0.0038
Model 3: NMPC Tuning	0.9772 ± 0.0020	0.9746 ± 0.0022
Model 4: PMM parameter	0.9679 ± 0.0019	0.9679 ± 0.0019

Table 7. Results of MLP for the test dataset.

Model	Accuracy	F1-Score
SMLM	0.8704	0.8721
Model 1: Detection	0.8686	0.7878
Model 2: Valve Stiction	0.9569	0.9419
Model 3: NMPC Tuning	0.9943	0.9936
Model 4: PMM parameter	0.9549	0.9549

Table 8. Results of MLP for 10 independent training runs.

Model	Accuracy	F1-Score
SMLM	0.8491 ± 0.0367	0.8502 ± 0.0397
Model 1: Detection	0.8789 ± 0.0047	0.8073 ± 0.0108
Model 2: Valve Stiction	0.9442 ± 0.0059	0.9244 ± 0.0078
Model 3: NMPC Tuning	0.9918 ± 0.0026	0.9908 ± 0.0029
Model 4: PMM parameter	0.9580 ± 0.0025	0.9579 ± 0.0025

Table 9. Results of GRU for the test dataset.

Model	Accuracy	F1-Score
SMLM	0.9517	0.9515
Model 1: Detection	0.9965	0.9965
Model 2: Valve Stiction	0.9075	0.8724
Model 3: NMPC Tuning	0.9674	0.9623
Model 4: PMM parameter	1.0000	1.0000

Table 10. Results of GRU for 10 independent training runs.

Model	Accuracy	F1-Score
SMLM	0.9599 ± 0.0111	0.9598 ± 0.0109
Model 1: Detection	0.9919 ± 0.0056	0.9916 ± 0.0056
Model 2: Valve Stiction	0.9109 ± 0.0061	0.8750 ± 0.0092
Model 3: NMPC Tuning	0.9792 ± 0.0068	0.9761 ± 0.0079
Model 4: PMM parameter	1.0000 ± 0.0000	1.0000 ± 0.0000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.