Detection of 2D and 3D Video Transitions Based on EEG Power

Despite the long and extensive history of 3D technology, it has recently attracted the attention of researchers. This technology has become the center of interest of young people because of the real feelings and sensations it creates. People see their environment as 3D because of their eye structure. In this study, it is hypothesized that people lose their perception of depth during sleepy moments and that there is a sudden transition from 3D vision to 2D vision. Regarding these transitions, the EEG signal analysis method was used for deep and comprehensive analysis of 2D and 3D brain signals. In this study, a single-stream anaglyph video of random 2D and 3D segments was prepared. After watching this single video, the obtained EEG recordings were considered for two different analyses: the part involving the critical transition (transition-state) and the state analysis of only the 2D versus 3D or 3D versus 2D parts (steady-state). The main objective of this study is to see the behavioral changes of brain signals in 2D and 3D transitions. To clarify the impacts of the human brain’s power spectral density (PSD) in 2D-to-3D (2D_3D) and 3D-to-2D (3D_2D) transitions of anaglyph video, 9 visual healthy individuals were prepared for testing in this pioneering study. Spectrogram graphs based on Short Time Fourier transform (STFT) were considered to evaluate the power spectrum analysis in each EEG channel of transition or steady-state. Thus, in 2D and 3D transition scenarios, important channels representing EEG frequency bands and brain lobes will be identified. To classify the 2D and 3D transitions, the dominant bands and time intervals representing the maximum difference of PSD were selected. Afterward, effective features were selected by applying statistical methods such as standard deviation (SD), maximum (max), and Hjorth parameters to epochs indicating transition intervals. Ultimately, k -Nearest Neighbors ( k -NN), Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA) algorithms were applied to classify 2D_3D and 3D_2D transitions. The frontal, temporal, and partially parietal lobes show 2D_3D and 3D_2D transitions with a good classification success rate. Overall, it was found that Hjorth parameters and LDA algorithms have 71.11% and 77.78% classification success rates for transition and steady-state, respectively.

The human brain controls the central nervous system, manages the peripheral nervous system, and regulates almost all functions of the human being through the skull nerves and spinal cord [1]. Ionic voltage fluctuations of brain neurons play a key role in these processes. Fluctuations of these voltages result in the formation of current and finally electric field. The measurement of these fluctuations in brain neurons is referred to as electroencephalography (EEG) [2].
The eye is one of the significant sensory organs in the human body. The eye and brain form a unit called the "visual system" that develops during working. In the concept of human vision, the areas visible to the right and left eyes overlap to a certain extent. Most of the visual field is seen with two eyes, i.e., in a binocular fashion [3], [4]. Due to the 6-cm distance between the eyes, two different photographs are taken by the left and right eyes. As a result of this distance, binocular vision is actually the ability to see similar, but slightly in different ways.
Stereo vision is a normal human vision with some amazing features [5]. The relationship between such vision and brain function has been the subject of many studies [6], [7], [8], [9].
In addition to the perception of depth, this vision is the ability to distinguish relationships between objects and ultimately the appearance of a 3D image.
Three-dimensional technology was first established by the beginning of photography. The history of this fascinating technology dates back to the 18th century [10]. Multidimensional technologies have come a long way from 2D to 3D technologies. Although the 2D technology has remained on the market for a long time, technological advances have improved the 3D technology to the extent that it is used in multiple applications such as the 3D printing industry, entertainment [11], healthcare [12], [13], defense [14], aerospace [15], industry, manufacturing, and architecture [16]. Entertainment has been the most dominant practice in the market and is expected to continue to steer the market within the stipulated period.
Reviewing the literature shows the abundance of studies on 2D, 3D, and EEG applications.
These applications can be classified mainly in the analysis of brain signals of 2D and 3D game watching [17], [18], in 2D and 3D learning content of the education area [19], biomedical research [13], and eye fatigue analysis [20], [21].
The main objective of the present study is the comprehensive EEG analysis of the transition moment in a single hybrid video consisting of random 2D and 3D segments. Based on the 3D human vision, the analysis of this transition moment can lead to a hypothesis in the case of human fatigue. When people fall asleep, their perception of depth is lost. Under such a condition, there may be a sudden transition from 3D to 2D. To test the hypothesis of the present study, the epochs containing the full critical transitions and also the steady-state by disabling this critical transition were considered in the video with random 2D and 3D transitions. Based on the STFT, EEG behavior was analyzed in channels representing different brain lobes based on the spectrogram graph in the epochs reflecting these two conditions. Through this timefrequency visual representation, we will gain insight into the PSD differences of 2D_3D and 3D_2D transitions, important frequency bands of EEG, and dominant time intervals. Finally, by observing this graph, we focus on effective feature extraction to realize a good performance classification technique by considering the EEG bands and dominant time intervals that reveal the power difference in each channel.

Participants
In this test, 9 subjects participated. Participants that took part in the analysis were adults (4 females and 5 males) with the age range of 34.12 ± 2.072 years. They had normal vision and were free from any neurological or mental disorder, which may affect the results. All stages of the test were explained to the individuals in detail. They were asked to minimize unnecessary blinking and body movements. The experiments were conducted under the registration number of 24237859-806 according to the Institutional Ethics Committee.

Hybrid video preparation
To design video with random transitions from 2D_3D and 3D_2D, the 3D version of the Saw video [22] was converted to 2D with Xilisoft 3D Video Converter [23] and then the 3D version was converted to 3D anaglyph form using the IQ mango 3D converter [24] program. In the next step, 8-second short segments were combined with Idoo video editor pro [25]. Finally, a 135second video of 2D and 3D random eight-second parts was prepared for the test.

EEG recording and dataset
The EEG recordings were obtained in the EEG laboratory of Trabzon Karadeniz Technical University. All electrodes were placed on the scalp according to the international 10-20 system.
In this study, 21 EEG electrodes (Fp1, Fpz, Fp2, F3, F4, F7, F8, C3, C4, Fz, P3, P4, Pz, O1,   O2, T3, T4, T5, T6, Oz, and Cz) were used. The Cz electrode was selected as a reference. The configuration pattern of the electrodes is shown in Fig 1. Individuals were asked to sit on a comfortable chair about 85 cm from the TV (LG 32 inch) stand. The features of this television are detailed in [5]. Each EEG recording lasted approximately 135 s. Each test was repeated 5 times. To watch 2D and 3D video segments in a single video, the selected glasses must be independent of any display system. Therefore, anaglyph glasses were found suitable for this scenario according to its working principle. EEG data were sampled at 512 Hz and the skin impedance was below 10 KΩ.

Video details of hybrid scenario
Hybrid video and transitions consisting of random 2D and 3D segments are presented in Fig. 2. Orange and green arrows represent 2D_3D and 3D_2D, respectively. In the critical transition analysis, 5-s epochs were created to analyze and classify the transition moments from 2D_3D and 3D_2D. This epoch range is presented in Table 1. Since there were 5 transitions from 2D to 3D and 4 transitions from 3D to 2D and a total of 5 recordings were taken from each individual, 45 epochs were analyzed. In addition, five intervals from 2D to 3D and five intervals from 3D to 2D were considered for steady-state. Thus, there are 50 epochs for this state. The 4s intervals are shown in Table 2.

Data analysis
The general block diagram for EEG data analysis is presented in Fig 3. After data collection, each block is described in detail in the following sections.

Preprocessing
Preprocessing is an important step in EEG signal processing [26]. Preprocessing techniques help remove unwanted artifacts from the EEG signal and therefore improve the signal-to-noise ratio. One reason for the necessity of preprocessing is that the signals collected from the scalp may not precisely represent signals from the brain as spatial information is lost. Also, the EEG data contain high amounts of noise that can hide weak EEG signals. Artifacts such as blinking or muscle movement [27] may contaminate data and distort the main data. Finally, it is desirable to separate the respective nerve signals from random neural activity that occurs during EEG recordings.
In addition to the bandpass filter and notch filter in the EEG device used, averaging, filtering and normalization methods were used as preprocessing techniques. In the first step, trials were averaged to minimize the noise level in each channel. A 50-Hz notch filter was then applied to suppress line noise. A third-order Butterworth [28] bandpass filter was used to clear the noise signal in the frequency range of 1-55 Hz.
Regarding the normalization method, the signal should be normalized to compare EEG activity in different individuals or between different channels [29]. Furthermore, the amplitude of the signals can directly affect the classification performance. Therefore, epochs were normalized to obtain similar conditions and to reduce the effect of size change. In this study, Eq. (1) [30] was used as the normalization technique for each epoch [31].
where , ̅ , , and refer to the original epoch, the mean, SD of the original epoch, and the normalized epoch, respectively.

STFT-Spectrogram
A sonogram is a two-dimensional image generated by calculating "Short Time Fourier Transform" (STFT) using a floating transient window. This transformation adds important information to the unpredictable nature of EEG data. By adjusting the width of the window, the time resolution of the resulting spectrum can be determined. Narrower windows will provide better time resolution but lower frequency resolution, while larger windows will provide the opposite. The absence of undesirable cross-terms [32] and the simplicity of calculation [33] are the main factors in the widespread use of STFT in practice. Among its many important features, STFT has a basic feature that facilitates the interpretation of the resulting distribution: i.e., the magnitude-wise shift-invariance at both time and frequency [34].
Considering the square module of STFT, the spectrogram, which is the spectral energy density of the windowed signal, was used locally. The power spectrum intensity of the EEG signal was calculated by the STFT spectrogram method with the Gauss window function. Due to the uncertain nature of the EEG signals and to minimize spectral leaks, a soft-behaving Hanning window was selected. The Hanning window with the 512 samples window's length was chosen to achieve an acceptable frequency resolution. The overlapping of the window was considered as 'window size -1'.
In this section, pre-analysis is generally divided into EEG amplitude-time and powerfrequency preliminary analyses. The flowchart of this scenario summarizing preprocessing and the EEG data preliminary analysis is shown in

Time-frequency signal analysis
In signal processing, time-frequency analysis is a set of techniques used to characterize and modify transient and statistics-changing signals. One of the major benefits of applying timefrequency conversion to a signal is to discover patterns of frequency changes that clarify the structure of the signal. Another important use of time-frequency analysis is to reduce random noise in noise-dependent signals. In this study, the STFT time-frequency analysis method was used following the EEG preliminary analysis and based on the results of delta band selection.

Feature extraction
In machine learning, pattern recognition, and image processing, feature extraction is done using the first set of measured data to create informative and necessary values (properties) and facilitate subsequent learning and generalization steps. In addition, feature extraction is a dimensional reduction process in which the initial set of raw variables is reduced to more manageable groups (properties) for processing and still defines the original data set accurately and precisely.
For the transition and steady-state, the feature extraction was performed considering the STFT-based spectrogram graph. According to these graphs, the difference in 2D_3D and 3D_2D for the transition and steady-states showed a big difference in the delta band. Feature extraction was performed using SD and max statistical functions for both dominant time intervals and frequency band in 5-s (transition) and 4-s (steady-state) epochs. As a result of the spectrogram graphs analysis, 1-1.5 s and 1.5-3 s time intervals at 2, 3, and 4Hz frequencies of delta band were selected to show the difference in PSD of 2D_3D and 3D_2D intensely. As the second feature extraction method, Hjorth parameters were applied to epochs in dominant time and frequency ranges. The Hjorth parameter has three types of parameters that help to show the statistical characteristic of a signal in the time domain as in Table 3 [35].
The activity parameter, which is the variance (var) of the time parameter, can show the surface of the power spectrum in the frequency domain.
The mobility parameter is defined as the square root of the first derivative of the signal and the ratio of the variance of the signal. This parameter has the standard deviation rate of the power spectrum.
The complexity parameter shows how the shape of a signal is similar to a pure sine wave.
At this stage, the feature extraction method is summarized in Fig. 5 for critical transition and steady-state. Table 3. Hjorth parameters [35] Parameter Notation In the table, ̇( ) and ̈( ) are the first and second derivatives of the signal, respectively.
While these three parameters contain information on the frequency spectrum of a signal, they also help analyze the signals in the time domain. In addition, lower computational complexity can be achieved by their use.

Classification techniques
A classifier attempts to estimate the corresponding class to which an independent variable belongs, using values for the features as input [36]. In general, a trained classifier models the relationship between classes and corresponding features and can identify new samples in an invisible test dataset. In a study conducted to evaluate the performance of the classification method, some performance measures, namely sensitivity and specificity [10], were used in addition to classification accuracy. Since there is a two-class classification problem, this means that the chance level is 50%. Three classifiers including k-NN, SVM, and LDA were used to show the effectiveness of the proposed technique of this study.
The k-NN is a supervised learning algorithm that defines the class of a test sample by class of the majority of k-nearest training samples. The k-NN algorithm is a simple easy-to-implement machine learning algorithm that can be used to solve both classification and regression problems [37]. Despite its simplicity, k-NN can outperform stronger classifications and can be used in a variety of applications, such as economic prediction, data compression, and genetics.
The performance of the k-NN classifier depends on the distance function and the k value of the neighborhood parameter. Here, k plays a very important role in the performance of the nearest neighbor classifier. If k is too small, the result may be noise sensitive; on the other hand, if k is too large, neighbors may be affected more than other classes [38]. Although there is no clear decision in articles on the choice of k, in general, it is understood that setting k = 1 or selecting k through cross-validation is the most popular method [39].
SVM is a classification method based on statistical learning theory. SVM shows good generalization performance for high dimensional data due to convergence optimization problem [40], [41]. SVM can define classes using a distinctive hyperplane. For the given two classes of linearly separable classification problems, the SVM attempts to find a hyperplane that separates the input field by the maximum margin [42]. Non-linear decision boundaries can be created by using a kernel trick. Non-linear decision boundaries can be created by using a kernel trick. In EEG studies, the Gaussian or Radial Basic Function (RBF) core is often used with very good results [43]. In this study, RBF was preferred for performing the SVM method.
LDA maximizes the ratio of interclass variance to in-class variance in any given data set, thus guaranteeing maximum separability [44], [45]. LDA is used to model differences in groups by separating two or more classes and reflecting the features in a high dimensional space to a lower dimension space.
In this study, K-fold cross-validation was applied to validate the results of the classification algorithms. K value was taken as 10. After performing this cross-validation, the σ value of the SVM and the k value of k-NN were obtained. In our study, we defined class 2D_3D as a positive sample and class 3D_2D as a negative sample. To prepare a dataset for classification, the epochs of each class were divided into two groups. Then, training and test sets were prepared. The hybrid scenario classification flowchart is presented in Fig. 6. The processes in this flow were repeated 20 times.

Results
To analyze and classify 2D_3D and 3D_2D separately, the window lengths of 5-s and 4-s were selected for critical transition steady-states analyses, respectively. The average classification accuracy of SVM, k-NN, and LDA of 9 subjects was calculated in each channel in critical transition and steady-state. The average results of the classification algorithms in the critical transition and steady-states for training and test dataset are presented in Figs. 7, 8, 9, and 10. Also, the comparison of two feature extraction methods is shown in Figs. 11 and 12, respectively.   ) 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100   Scrutinizing the graphs presented, as expected, the training data show a higher success compared to the test. In general, it is observed that SVM and LDA classifiers are more successful in interpreting graphs than k-NN. In addition, the frontal, temporal and parietal lobes of the brain seem to be more effective in the classification of 2D and 3D transition. In Fig. 7, the T3 channel was the best channel with 66.67% success using SD and max feature extraction method and LDA classification algorithm. The second effective channel was obtained by the Hjorth method and the LDA algorithm. This channel is defined as Pz, with 66.25% success. As shown in Fig. 8 in the steady-state, the F8 and T6 channels were selected as effective channels with the success rates of 66.67% and 66.2%, respectively, using the Hjorth and LDA methods.
Since this study consists of two classes, it is observed that the average classification results were low. In this case, it is necessary to improve these results by applying another method. In the following, it was tried to achieve this objective.

Improving the classification results
After obtaining the classification results of the subjects using two feature extraction methods, the result improvement analysis was taken into consideration by inter-label voting of the best three channels. To realize this voting process, three channels with the best result were selected based on average classification results. The results for each group of each epoch were explained according to a decision mechanism [46]. This decision mechanism is summarized in Table 4.
In this table, the final decision is called "estimated label based on criteria". If the label of the estimated epoch for two channels (or three channels) is 1 and represents "2D_3D", it takes the value 1. Additionally, if the estimated epoch label for two channels (or three channels) is 2 and represents "3D_2D", it takes the value 2. The three best channels based on average accuracy results are presented in Tables 5 and 6.
Voting was then carried out between three channels for each test epoch. By voting between these channels, the labels of the test epochs were determined. The flow chart of the voting process is shown in Fig. 13. The classification results for the critical transition and steady-state of the 2D_3D and 3D_2D transitions based on the best three channels label voting are presented in Tables 7 and 8, respectively.   Generally, at the top three channel selection tables of the average classification results, the frontal, temporal and parietal lobes appear to be more prominent. In addition to effective brain lobe selection, reduction of the number of channels indicates that this study is suitable for its application in biomedical fields. In addition, it seems that the voting method has effectively reduced the number of channels and improved classification accuracy. Moreover, classification performance parameters show good compatibility with each other. Comparing the feature extraction methods used, the Hjorth method seems to be more successful. Based on the result tables and classification methods, the LDA algorithm gives better results with an overview.
Considering Table 7 in the interpretation of the average classification result, the best result (71.11%) was obtained with Hjorth and LDA in the label voting of Pz, T3, and F7 channels.
This result increased by approximately 6.48% compared to the pre-voting result. Similarly, as seen in Table 8, the success rate in label voting of F8, T6, and F4 channels increased by 11.57% after voting.

Discussion
In the present study, datasets of EEG records related to 2D_3D and 3D_2D transitions were obtained with the predetermined scenario. Reviewing the studies on 2D and 3D technology, no detailed quantitative research was found using the channels representing the five lobes of the brain and all EEG bands [47], [8], [9]. The effect of watching 2D and 3D TV on brain signals has been the focus of some studies in terms of qualitative [12], [48]. In [12], the researchers claimed that the behavior of brain signals did not change during 2D/3D TV watching. With the opposite result of this study, brain dynamics have been shown to exhibit different behaviors in the brain lobes and EEG frequency bands [49], [48], [5], [10]. Analysis of brain signals by watching 2D and 3D videos individually was done extensively in our previous studies [5], [10].
The classification of EEG signals of 2D and 3D movie watching and post 2D and 3D movie watching in dominant bands has distinguished these studies from others that were done in this field.
Both critical transition moment and steady-state were taken into consideration by watching the video with 2D_3D and 3D_2D random transitions using a single video. Moreover, in the studies carried out in this field, no study is found on catching the transition moment in 2D_3D and 3D_2D transitions. In our hypothesis, the effort to capture this transition moment is based on people's eye anatomy. People see their surroundings in 3D due to their eye structure. Thus, it can be claimed that the perception of depth may be lost when they are sleepy and tired. For this reason, it is important to analyze and classify 2D_3D and 3D_2D transitions for both critical and steady-state situations.
2D_3D and 3D_2D transitions were analyzed separately in critical and stable situations. By looking at the PSD spectrogram graphics based on STFT, the 2, 3, and 4 Hz frequency hypothesis from the delta band was chosen as the dominant band in these transitions. It was prepared to classify critical transition and steady-state using SD, max, and Hjorth parameters at these frequencies. Both situations were analyzed using SVM, k-NN, and LDA classification techniques. In this two-class study, the best result shows that the LDA classification and Hjorth methods well represent the efficacy of the channels from the frontal and temporal lobes in the case of 2D and 3D transitions. Based on these results, the channels representing 2D and 3D transitions from the temporal and frontal lobes overlap with areas that are sensitive to depth perception and binocular visual ability [7], [50], [8].

Conclusion
The main goal of this study is to capture the transition moment in a single video consisting of 2D and 3D parts. This goal is based on people's eye anatomy such that to lay the foundation for fatigue work. Since people's ability to see binoculars lose dimension in case of fatigue, the transition from 2D to 3D and 3D to 2D are important. In this study, the power spectrum density differences of EEG signals in 2D_3D and 3D_2D transitions were analyzed in the form of spectrogram graphics. By interpreting these graphs, an idea was obtained about EEG frequency bands and dominant time intervals. In this study, the power difference of brain signals in transitions is evident at 2, 3, and 4 Hz of the delta band. As a result, basic steps were taken to obtain a good classification performance by extracting suitable features from the dominant band; i.e., delta band. It is noteworthy that Hjorth parameters and LDA classification technique provided promising results in temporal and frontal lobes, in general.
Transition analysis in a single hybrid video consisting of 2D and 3D parts can add a new and unique perspective to driver fatigue studies. This study provides a good basis for our future work. In the near future, transition analysis using a more professional combination of 2D and 3D video in brain signal analysis of tired individuals can provide a promising ground on early diagnosis studies in the transition moment to fatigue. In addition, more specific frequency ranges, power spectrum, and statistical methods can be researched and determined. A suggestion for future studies might be decreasing the number of channels and increasing the classification accuracy using different feature extraction and classification methods. A flexible and interactive graphical interface can be used to process dynamic 2D/3D brain data using EEGLAB. Train and classification stages can be improved by using deep learning algorithms.