3. Materials and Methods
This study outlines the creation of a specialized MIMO radar sensor designed to identify and analyze hand motions reliably.
Figure 1 depicts a diagram of the proposed architecture. The radar captures comprehensive information from individuals engaging in hand gestures through its twelve virtual antennas. Following the data capture process, the gathered information is for further computation.
The classification of hand gestures is systematically partitioned into four distinct stages. Initially, the acquired data undergoes rigorous processing involving range and velocity computations facilitated by a two-dimensional Fast Fourier Transform (FFT). Subsequent stages encompass the application of a refined Constant False Alarm Rate mechanism and target detection algorithms, precisely delineating the pertinent variables associated with the spatial coordinates of the palm. The angle variable is determined using the Multiple Signal Classification (MUSIC) technique. After obtaining these parameters, feature extraction is performed and the obtained features are input into the Random Forest method for further analysis.
3.1. Range and Velocity Analysis
The combination of the MIMO radar together with the FMCW technology takes the advantages of the MIMO system in spatial diversity and the FMCW system in remarkable frequency modulation performance. The fusion leads to a highly adaptive, precise, and efficient radar system. Its detail is provided as follows. First, the radar transmits the chirp-modulated waveform through a transmitting antenna. It can be represented as,
where
is the carrier frequency,
is the sweep bandwidth of the chirp, and
is the chirp duration. On the other hand, the received signal is the signal echoed from a target, which is a scaled and delayed version of the transmitted signal,
where
is a scaled factor and
is the delayed time. After that, the received signal is mixed with the transmitted signal to create an intermediate frequency signal:
where the beat frequency after the receive mixer is represented as
. For static objects, the beat frequency is proportional to the distance, which is done by taking the fast Fourier transform (FFT) of the received IF signal. However, the velocity is determined for moving objects using phase change across multiple chirps. The phase and the frequency of the received signal changes with the velocity of the moving object. A second FFT is then applied across these chirps to extract the information about phase variation and velocities. This process yields a comprehensive 2D Range-Doppler map, providing valuable insights into the spatial distribution and velocity characteristics of detected objects.
Besides a traditional radar system with one single antenna transmitter, MIMO radar uses multiple antennas for transmission, which can create multidimensional arrays for spatial diversity. In our scenario, the physical antenna configurations with three transmitting antennas and four receivers can extended to 12 virtual antennas. The raw radar data is represented by a frame with three dimensions: ADC sampling, chirps, and antennas, as seen in
Figure 2. The Fast Fourier Transform (FFT) is used in ADC sampling to extract ranges, also known as fast-time FFT or range-FFT. On the other hand, when a Fast Fourier Transform (FFT) is used to extract velocity information from chirp dimensions, it is called a slow-time FFT or Doppler-FFT, respectively. As depicted in
Figure 2, by following the two-dimensional FFT, the Range-Doppler Map (RDM) is generated, providing accurate target range and velocity data. The hand motion can be represented based on the range and velocity differences observed in the frames of the RDM. The hand motion can be depicted by analyzing the variations in range and velocity recorded in the frames of the RDM. Therefore, in order to analyze the hand gesture, it was essential to determine the exact location of the hand in order to understand its motion before carrying out any more computations.
3.2. CFAR and Target Detection
When detecting the target, the background noise and other non-target signals will increase the difficulty of detecting the target afterwards. Therefore, in order to achieve reliable and consistent detection accuracy, a sensor needs to possess the capability of maintaining a constant false alarm rate (CFAR) with varying interference. The principle behind the CFAR is based on statistical and signal-processing concepts. It utilizes statistical information about the local background noise and adjusts the detection threshold based on this information to ensure a consistent false alarm rate across different conditions. This paper applies a cell-averaging (CA)-CFAR method to the algorithm, identifying a set of reference cells around the cell of interest and then computing the average power from the selected reference cell. The average over the threshold will be considered a target by setting a detection threshold.
After applying the CA-CFAR technique to the RDM, the frame has several peaks since the peak of the gesture is attributed to the human hands, arms, and body. Our hand gesture recordings ensure that the hand will always be at the front of the human body. It means the hand is the closest to the radar and makes the most velocity movement. Moreover, to acquire even more accurate hand gesture information from the radar, the following problems are addressed in the proposed system:
When analyzing the RDM for stationary objects, zero-velocity disturbances can introduce complexities in interpreting the data. One common source of zero-velocity disturbances is the presence of clutters or unwanted reflections from the environment, which remain after the CFAR. The clutter may be mistakenly viewed as real target movement. Thus, the proposed method incorporates an additional two-dimensional filter to differentiate between stationary and moving objects, thereby improving the precision of target identification and tracking. In order to obtain precise information from the radar and reduce these disturbances, incorporating an additional filter is a beneficial approach for radar system calibration.
- 2.
Peaks caused by other body parts.
Due to the fact that the CA-CFAR threshold is determined by a fixed multiplier of the average power, the movement of other body parts may affect the result of hand motion detection. With this limitation caused by hardware architecture, the proposed method incorporates an additional step to address the error caused by body movement. This additional step is to identify the item with the highest velocity and the location with the greatest energy in the closest range to the radar. Focusing on these crucial measures enables us to identify the precise coordinates that correspond to the motion of the palm. This enhanced analysis enhances the ability of the radar system to detect complex motions.
- 3.
The target velocity wrapping.
The determination of the target velocity is challenging when it surpasses the predetermined velocity range and returns to the opposite end of the velocity spectrum by wrapping or folding. In order to tackle this problem, we utilize the mirroring technique that is commonly used in the field of image processing. More precisely, we replicate the data in the initial three and final three columns, providing continuous variation of the speed while avoiding any kind of excessive computational complexity. Assume that the RDM is
. Then the result after applying CA-CFAR is
where
is the threshold factor and
is the average reflectivity within a specific region. After that, we mirror the first and last three rows of the
, which can be represented as
where
is the number of rows in the matrix.
By implementing this mirrored technique, we successfully address the issue related to the target velocity beyond the specified range. This technique ensures the precise capture of the goal distance and the velocity trajectory while preserving the smoothness of the speed.
3.3. Angle of Arrival
In addition to determining the range and the velocity, the MIMO radar can also determine the angle between the radar and the target. By employing the direction of arrival (DoA), one can precisely identify the angles at which different items are positioned.
The MIMO radar system provides a large number of virtual antenna elements, which can reduce hardware requirements. If there exists a number of
transceivers and
receivers, and
and
represent the distance between antennas and the wavelength, respectively, then the angle resolution
can be determined from
Equation (6) shows that the hardware structure still restricts the angle resolution. Therefore, we employ a Direction of Arrival (DoA) estimation approach to enhance the angle resolution.
The DoA data is helpful for calculating the degrees to which signals from reflected targets are received. By analyzing the DoA, one can determine the angles of the targets. DoA estimation can be classified into four main categories: beamforming, maximum likelihood, subspace-based methods, and compressive sensing. This paper utilized the well-known MUSIC method [
20], which is a subspace-based method. It is based on using the eigenstructure of the data covariance matrix. Assume that the matrix of the received signals by one receive antenna is
of size
. Also suppose that transmission signals are
and the noises are
. Then, the received signals matrix at time
can be written as
where
is the steering matrix.
Since signals and noises are uncorrelated, the covariance matrix of
is
where
is the signal correlation matrix,
is the noise variance, and
is the
identity matrix. Next, suppose that
. Let
and
be the corresponding eigenvalues of the signal and noise eigenvectors, respectively. By the assumption in MUSIC, the steering vector of the signal
is orthogonal to the subspace
of noise, which can be written as
where
is a column of matrix
with
. Then (5) can be represented as
with
and
. Then, the MUSIC spectrum is defined as
Meanwhile, by increasing the number of snapshots used for spectrum estimation, the ability of MUSIC to separate multiple targets can be further improved.
3.4. Elevation Calculation
To determine elevation angles precisely and reliably, an innovative method suitable for the IWR6843ISK radar system is proposed. It is designed with two antennas positioned at the same azimuth angle. As indicated by (6), this configuration results in a resolution angle of 45 degrees.
The structure of the radar system allows for the collection of four elevation angles using four antenna pairs. The possible location of the target is either on the upper side or the opposite side of the radar. Moreover, this additional layer improves the combination of data sources, thereby reducing the impact of the hardware limitation. In order to improve the accuracy, a voting mechanism is implemented. The voting system will determine the most possible elevation angle. This mechanism enhances the dependability of elevation angle estimation, resulting in a more precise depiction of data. Although limited by hardware, the proposed elevation angle determination algorithm is able to overcome the technological challenge and maintains the precision of the estimation result.
3.5. Data Smoothing and Feature Extraction
The hand position in each RDM was recorded and then used to determine the angle of hand motion. The MUSIC method was used to calculate the angle using the hand position data collected from eight antennas in the RDM. The elevation is determined by the method motioned above. As a result, we obtained the measurements for the range, the velocity, the azimuth, and the elevation angle of the dynamic hand movements in each frame.
Moreover, a data smoothing approach was applied due to the fluctuation of the detected range, velocity, and angle values. The moving average technique with window size three was utilized to enhance the precision of the data and reduce variations. This technique is helpful for improving the robustness to noise and enables a more organized interpretation of hand movements. Then, the detected range, velocity, and angle values are stored in a 4x
n matrix
, where each row represents each set of data:
Then, the moving average of
T is computed:
Next, we created a set of features to improve the classification of different movements using different parts of the recorded data.
During feature extraction, we employ two methods to segment the data. One approach involves analyzing the data based on the total duration of object movement, while the other method entails dividing the data into eight equal segments for individual analysis. We extract a wide variety of parameters from the collected data in our study, which includes capturing the maximum and minimum values of the velocity and the azimuth angle, determining their ranges and distributions, and assessing the differences between the initial and final values of each parameter. We also determine the differences between these parameters and add the numbers in each part of the elevation angle. By incorporating these nuanced features, we aim to provide a comprehensive representation of the dynamic aspects of hand motions, which will be helpful for improving the accuracy of the later classification process.
3.5. Classification
In this work, the Random Forest algorithm has been used as the classification method. Although decision tree algorithms like C4.5, the classification and regression tree, ID3, and others are visually easy to understand, they are prone to noise and minor differences, increasing the chance of overfitting [
22]. The Random Forest is a machine-learning ensemble technique that explicitly tackles these issues. During the training process, a significant number of decision trees are created, and each of them is built by randomly selecting a subset of features. Moreover, a voting method is utilized for the prediction of these trees afterward. The feature and sample selection variance improve the flexibility of the model, reduce the likelihood of overfitting, and enhance its generalization ability.