In this section, we will explain the proposed ASISO method in detail. ASISO introduces two algorithms: KSpace and KMatch. To clearly illustrate the proposed method, we will first provide an overview of ASISO and then introduce the specific algorithms involved.
3.1. Overview
The ASISO is mainly based on the linear interpolation to increase the size and improve the quality of the original dataset. The idea is to divide the original feature space into several subspaces with an equal number of samples, and then perform linear interpolation for the samples in the adjacent subspaces. This method requires two hyperparameters (
k and
$\eta $) in advance. Interpretation of parameter
k is the number of samples existing in each feature subspace, while
$\eta $ is the number of equidistant nodes interpolated per unit distance in the linear interpolation of the samples. This proposed method is illustrated in
Figure 1.
The data set
$D={\{{\mathit{x}}_{i},{y}_{i}\}}_{i=1}^{n}$ has been given and is assumed to be contaminated with unknown noise, where
${\mathit{x}}_{i}\in \mathcal{X}={R}^{p}$ and
${y}_{i}\in \mathcal{Y}=R$. Assuming
${\dot{y}}_{i}=f\left({\dot{\mathit{x}}}_{i}\right)$ where
${\dot{\mathit{x}}}_{i}$ is the actual value of
${\mathit{x}}_{i}$,
${\dot{y}}_{i}$ is the actual value of
${y}_{i}$ and
$f(\xb7)$ is a continuous function, represents the relationship in reality between
${\mathit{x}}_{i}$ and
y. Consider the model:
where
${\u03f5}_{i,y}$ is the noise in
${\dot{y}}_{i}$,
${\u03f5}_{i,x}$ is the noise in
${\dot{\mathit{x}}}_{i}$, and
${\u03f5}_{i}$ represents the error term. The expression (1) can be rewritten as:
Let ${\mathit{x}}_{0}=\{{x}_{0}^{1},\dots ,{x}_{0}^{p}\}$, we have ${x}_{0}^{j}=inf{\left\{{x}_{i}^{j}\right\}}_{i=1}^{n}$ for $\forall j=1,\dots ,p$, and call ${\mathit{x}}_{0}$ the sample minimum point.
Given the hyperparameter
k, we provide an unsupervised clustering method called KSpace. As it is shown in
Figure 1(b), the space can be partitioned into
$n/k$ subspaces, each containing
k samples. i.e.,
$\mathcal{X}={\cup}_{s=1}^{n/k}{\mathcal{X}}_{s}$,
${\mathcal{X}}_{i}\cap {\mathcal{X}}_{j}=\varnothing ,i,j=1,2,\dots ,\frac{n}{k}$ and
$i\ne j$. The datasets corresponding to different subspaces
$D={\cup}_{s=1}^{n/k}{D}_{s},{D}_{s}={\left\{({\mathit{x}}_{i}^{s},{y}_{i}^{s})\right\}}_{i=1}^{k}$,
${\mathit{x}}_{i}^{s}\in {\mathcal{X}}_{s}$. For two adjacent subspaces, since
$f(\xb7)$ is a continuous function, we assume that it can be approximated as a linear function
$g(\xb7)$, and then (2) can be transformed into:
where
${\u03f5}_{i}^{\prime}$ is the linear fitting error term. When the distance between two adjacent subspaces approaches zero and the measurements of subspaces tend to zero, obtain
${\u03f5}_{i}^{\prime}\to 0$. Next, we will perform sample interpolation between adjacent subspaces.
We need to calculate the centers of each cluster, as follow:
To make
${\u03f5}_{i}^{\prime}\to 0$, we need to ensure that the interpolation is performed between clusters that are close in distance as much as possible. Among
${\left\{{D}_{s}\right\}}_{s=1}^{n/k}$, we define
${D}_{\left(1\right)}$ whose cluster center has the minimum distance to the sample minimum point
${\mathit{x}}_{0}$, and define
${D}_{\left(d\right)}$ whose cluster center has the minimum distance to the center of
${D}_{(d1)},{D}_{(d1)}\ne {D}_{\left(1\right)},\dots ,{D}_{(d1)}$ and
$d>1$.
where
${\overline{\mathit{x}}}^{(d1)}$ is the center of
${D}_{(d1)}$. Perform interpolation in
${\left\{{D}_{\left(d\right)}\right\}}_{d=1}^{n/k}$ sequentially according to the order of
d values, and interpolate only between adjacent subspaces (i.e., interpolate between
${D}_{\left(1\right)}$ and
${D}_{\left(2\right)}$, between
${D}_{\left(2\right)}$ and
${D}_{\left(3\right)}$, and so on).
When performing linear interpolation between adjacent subspaces, we should pair the k samples from the first subspace with an equal number of samples from the second subspace. The interpolation rules between adjacent subspaces are as follows:
1. Linear interpolation can only be performed between two samples belonging to different adjacent subspace sets.
2. Interpolation must be performed for each sample.
3. Participation of each sample point is restricted to a single interpolation instance.
The number of matching schemes is
$k!$. As it is shown in
Figure 1(c), we provide a matching method called KMatch. Suppose
${\u03f5}_{i}^{\prime}\to 0$, this method can select a goodperforming matching scheme
${\{({\mathit{x}}_{i}^{\left(d\right)},{y}_{i}^{\left(d\right)}),({\mathit{x}}_{i}^{(d+1)},{y}_{i}^{(d+1)})\}}_{i=1}^{k}$ from
$k!$.
Assuming
$\mathit{x}$ and
y are continuous variables. Given another hyperparameter
$\eta $, the number of samples inserted using linear interpolation method between
${D}_{\left(d\right)}$ and
${D}_{(d+1)}$ is
$\sum _{i=1}^{k}}\lfloor \eta \xb7dist({\mathit{x}}_{i}^{\left(d\right)},{\mathit{x}}_{i}^{(d+1)})\rfloor $. Taking
$({\mathit{x}}_{i}^{\left(d\right)},{y}_{i}^{\left(d\right)})\in {D}_{\left(d\right)}$ and
$({\mathit{x}}_{i}^{(d+1)},{y}_{i}^{(d+1)})\in {D}_{(d+1)}$ as example,
$\left\{\right({\mathit{x}}_{(d,d+1)}^{(m,i)},$${y}_{(d,d+1)}^{(m,i)}{\left)\right\}}_{m=1}^{\lfloor \eta \xb7dist({\mathit{x}}_{i}^{\left(d\right)},{\mathit{x}}_{i}^{(d+1)})\rfloor}$ is the set of inserted samples, the linear interpolation formula is defined as:
After ASISO processing, the original dataset will be optimized. The main steps of the ASISO algorithm are summarized in Algorithm 1.
Algorithm 1: ASISO 
Input: Data set $D=\{{({\mathit{x}}_{i},{y}_{i}\}}_{i=1}^{n}$; hyperparameters k and $\eta $
Output: Optimized data set ${D}^{\prime}$

The assumptions of ASISO are as follows:
1. $f(\xb7)$ is a continuous function.
2. the linear fitting error ${\u03f5}_{i}^{\prime}\to 0$.
3. $\mathit{x}$ and y are continuous variables.
3.2. KSpace
The implementation of ASISO requires an unsupervised clustering method to partition the feature space into multiple subspaces, each containing k samples. Based on this, we propose the KSpace clustering method. The clustering method has the following performance:
1.Each subspace contains an equal number of samples, i.e., $D={\cup}_{s=1}^{n/k}{D}_{s},{D}_{s}={\left\{({\mathit{x}}_{i}^{s},{y}_{i}^{s})\right\}}_{i=1}^{k}$;
2.Each sample belongs to only one subset, i.e., ${D}_{i}\cap {D}_{j}=\varnothing ,i\ne j$.
Maintaining continuity and similarity between adjacent subspaces is essential for synthesizing data via multiple linear interpolations in ASISO. Our objective is to minimize the linear fitting error ${\u03f5}_{i}^{\prime}$, which helps to satisfy ASISO assumption 2 as much as possible.
To determine the sample set
${D}_{s}$ for subspace
${\mathcal{X}}_{s}$, it is necessary to determine the first sample
${\mathit{x}}_{1}^{s}$ in
${D}_{s}$.
where
$s=1,\dots ,\frac{n}{k}$,
${\overline{\mathit{x}}}^{s1}$ is the cluster center of
${D}_{s1}$,
${\overline{\mathit{x}}}^{0}={\mathit{x}}_{0}$. We define
${D}_{s}=\left\{{x}_{1}^{s}\right\}$ and determine
${\mathit{x}}_{d}^{s}$ as follows:
where
$d=2,\dots ,k$. Obtain
${\mathit{x}}_{d}^{s}$ and update
${D}_{s}\leftarrow {D}_{s}\cup \left\{{\mathit{x}}_{d}^{s}\right\}$.
The main steps of the KSpace algorithm are summarized in Algorithm 2.
Algorithm 2: KSpace 
Input: Data set $D={\left\{({\mathit{x}}_{i},{y}_{i})\right\}}_{i=1}^{n}$; hyperparameter k
Output: ${\left\{{D}_{s}\right\}}_{s=1}^{n/k}$

3.3. KMatch
We can calculate the total error of the matching scheme to measure the quality of the scheme, for the sake of simplicity, let
$\mathcal{X}=R$, as follow:
where
${L}_{i}\left(x\right)$ is the linear expression passing through the points
$({x}_{i}^{\left(d\right)},{y}_{i}^{\left(d\right)})$ and
$({x}_{i}^{(d+1)},{y}_{i}^{(d+1)})$.
Theorem 1. Let ${\mathcal{X}}_{\left(d\right)}$ and ${\mathcal{X}}_{(d+1)}$ be two adjacent subspaces, the datasets corresponding to different subspaces are ${D}_{\left(d\right)},{D}_{(d+1)}$, and $({x}_{i}^{\left(d\right)},{y}_{i}^{\left(d\right)})\in {D}_{\left(d\right)},({x}_{i}^{(d+1)},{y}_{i}^{(d+1)})\in {D}_{(d+1)}$. Consider the model ${y}_{i}=f\left({x}_{i}\right)+{\u03f5}_{i}$, let ${\u03f5}_{i}^{\left(d\right)}={y}_{i}^{\left(d\right)}f\left({x}_{i}^{\left(d\right)}\right)$. For $\forall i=1,2,\dots ,k$, suppose that ${\u03f5}_{i}^{\prime}\to 0$, then $E\left({\displaystyle \frac{S({x}_{i}^{\left(d\right)},{x}_{i}^{(d+1)})}{{x}_{i}^{\left(d\right)}{x}_{i}^{(d+1)}}}\right)<E\left({\displaystyle \frac{{\u03f5}_{i}^{\left(d\right)}+{\u03f5}_{i}^{(d+1)}}{2}}\right)$.
Proof of Theorem 1. Since
${\u03f5}_{i}^{\prime}\to 0$, and according to (3), the model can be transformed into:
where
$g(\xb7)$ is a linear function. According to (11), it follows that:
When
${\u03f5}_{i}^{\left(d\right)}\xb7{\u03f5}_{i}^{(d+1)}<0$, let
$({x}^{\prime},{y}^{\prime})$ be the intersection point between
$y={L}_{i}\left(x\right)$ and
$y=g\left(x\right)$. We can simplify
$S({x}_{i}^{\left(d\right)},{x}_{i}^{(d+1)})$ using basic geometric area calculations, and according to Law of Iterated Expectations (LIE):
where
${h}_{1}={\displaystyle \frac{{x}^{\prime}{x}_{i}^{\left(d\right)}}{{x}_{i}^{\left(d\right)}{x}_{i}^{(d+1)}}},{h}_{2}={\displaystyle \frac{{x}^{\prime}{x}_{i}^{(d+1)}}{{x}_{i}^{\left(d\right)}{x}_{i}^{(d+1)}}}$. Since
$P({\u03f5}_{i}^{\left(d\right)}\xb7{\u03f5}_{i}^{(d+1)}\ge 0)+P({\u03f5}_{i}^{\left(d\right)}\xb7{\u03f5}_{i}^{(d+1)}<0)=1$, and
${h}_{1}+{h}_{2}=1$, it follows that
$E\left({\displaystyle \frac{S({x}_{i}^{\left(d\right)},{x}_{i}^{(d+1)})}{{x}_{i}^{\left(d\right)}{x}_{i}^{(d+1)}}}\right)<E\left({\displaystyle \frac{{\u03f5}_{i}^{\left(d\right)}+{\u03f5}_{i}^{(d+1)}}{2}}\right)$.
□
If our approach is to randomly select a matching scheme, the validity of this method can be proved by Theorem 1. However, randomly selecting a matching scheme does not guarantee the uniqueness of the results, and it also does not guarantee that we will necessarily select the goodperforming matching scheme. We found that for ${x}_{i}^{\left(d\right)}$ and ${x}_{i}^{(d+1)}$, if ${\u03f5}_{i}^{\left(d\right)}\xb7{\u03f5}_{i}^{(d+1)}<0$, there will be a better interpolation effect.
Theorem 2. Let ${y}_{i}^{\left(d\right)}=f\left({x}_{i}^{\left(d\right)}\right)+{\u03f5}_{i}^{\left(d\right)},{y}_{i}^{(d+1)}=f\left({x}_{i}^{(d+1)}\right)+{\u03f5}_{i}^{(d+1)}$. Suppose that ${\u03f5}_{i}^{\prime}\to 0$, then we have $E\left(\right)open="("\; close=")">S({x}_{i}^{\left(d\right)},{x}_{i}^{(d+1)}){\u03f5}_{i}^{\left(d\right)}\xb7{\u03f5}_{i}^{(d+1)}0$.
Proof of Theorem 2. Since
${\u03f5}_{i}^{\prime}\to 0$, based on the proof of theorem 1, we have:
□
According to Theorem 2, we can match samples with opposite signs of ${\u03f5}_{i}$ to achieve a good data synthesis effect. Therefore, the core idea of Kmatch is to make a judgment on the positive or negative sign of ${\u03f5}_{i}$ for each sample, and then interpolate the samples with opposite signs as much as possible.
In KMatch, we need to choose an appropriate linear regression method to fit the dataset
${D}_{\left(d\right)}\cup {D}_{(d+1)}$ based on the performance of the noise. For example, Lasso regression, Locally Weighted Linear Regression (LWLR) [
23], and other methods can be used [
24,
25]. In our experiments, we use OLS or SVR method to fit and obtain
$\widehat{g}(\xb7)$. Specially, the kernel function is Linear in SVR. According to (3), and suppose the linear fitting error
${\u03f5}_{i}^{\prime}\to 0$, for dataset
${D}_{\left(d\right)}\cup {D}_{(d+1)}$, we have:
Then, we sort the samples in dataset
${D}_{\left(d\right)}$ in ascending order according to the value of
${\u03f5}_{i}$, and obtain
${\left\{({\mathit{x}}_{i}^{\left(d\right)},{y}_{i}^{(d+1)})\right\}}_{i=1}^{k}$; we sort the samples in dataset
${D}_{(d+1)}$ in descending order and obtain
${\left\{({\mathit{x}}_{i}^{(d+1)},{y}_{i}^{(d+1)})\right\}}_{i=1}^{k}$. As it is shown in
Figure 1(d), combine the sorted datasets
${D}_{\left(d\right)}$ and
${D}_{(d+1)}$ into the matching scheme
${\{({\mathit{x}}_{i}^{\left(d\right)},{y}_{i}^{\left(d\right)}),({\mathit{x}}_{i}^{(d+1)},{y}_{i}^{(d+1)})\}}_{i=1}^{k}$.
Algorithm 3: KMatch 
Input: Subset ${D}_{\left(d\right)}={\left\{({\mathit{x}}_{i}^{\left(d\right)},{y}_{i}^{\left(d\right)})\right\}}_{i=1}^{k},{D}_{(d+1)}=\{{({\mathit{x}}_{i}^{(d+1)},{y}_{i}^{(d+1)}\}}_{i=1}^{k}$
Output: Matching scheme ${\{({\mathit{x}}_{i}^{\left(d\right)},{y}_{i}^{\left(d\right)}),({\mathit{x}}_{i}^{(d+1)},{y}_{i}^{(d+1)})\}}_{i=1}^{k}$
1 Fitting the dataset ${D}_{\left(d\right)}\cup {D}_{(d+1)}$ and obtain $\widehat{g}(\xb7)$
2 Obtain $\left\{{\u03f5}_{i}\right\}$ using (10)
3 Sorting the samples in ${D}_{\left(d\right)}$ and ${D}_{(d+1)}$ according to the value of ${\u03f5}_{i}$, obtain ${D}_{\left(d\right)}^{\prime}$ and ${D}_{(d+1)}^{\prime}$
4 Combine ${D}_{\left(d\right)}^{\prime}$ and ${D}_{(d+1)}^{\prime}$ into ${\{({\mathit{x}}_{i}^{\left(d\right)},{y}_{i}^{\left(d\right)}),({\mathit{x}}_{i}^{(d+1)},{y}_{i}^{(d+1)})\}}_{i=1}^{k}$

3.4. Supplements
The proposed method can effectively expand the size of dataset and adjust the dataset structure, reducing the proportion of samples that deviate significantly from the actual distribution, thereby improving the model generalization. Refer to
Figure 2.
The supplements about ASISO are given below:
1. The choice of the hyperparameter k is crucial as different datasets require different values of k. Conversely, hyperparameter $\eta $ tends to exhibit better performance as its value increases, which will be illustrated in the following experimental results.
2. It is necessary to normalize the data if there is a significant difference in the dimensional scale between the features of the data. It avoids the issue of generating an excessive number of samples.
3. In most cases,
$n/k$ is not an integer, and for the excess samples, we usually have two solutions of handling them. The first one is to use the LOF algorithm [
26] to filter out the excess samples that will not participate in the ASISO, as it is shown in
Figure 2(c). Another solution is to treat the excess samples as a dataset
${D}^{\prime}$ of a subspace,
${D}^{\prime}={\left\{({\mathit{x}}_{i},{y}_{i})\right\}}_{i=1}^{n\phantom{\rule{0.277778em}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}k}$.When interpolating between
${D}^{\prime}$ and other subspaces
${D}^{\prime \prime}$, choose an appropriate linear regression method to fit the dataset
${D}^{\prime}\cup {D}^{\prime \prime}$, and obtain
$\widehat{g}(\xb7)$. Then, use the same method to sort
${D}^{\prime}$ and
${D}^{\prime \prime}$. Only
$n\phantom{\rule{0.277778em}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}k$ interpolations are performed, with each sample in
${D}^{\prime}$ being interpolated, while for
${D}^{\prime \prime}$, only
$n\phantom{\rule{0.277778em}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}k$ samples are interpolated. Moreover, interpolate the samples with opposite signs of
${\u03f5}_{i}$ as much as possible, as it is shown in
Figure 3.