Assessing Visual Science Literacy Using Functional Near-Infrared Spectroscopy with an Artificial Neural Network

: The primary barrier to understanding visual and abstract information in STEM fields is representational competence the ability to generate, transform, analyze and explain representations. The relationship is known between the foundational visual literacy and the domain specific science literacy, however how science literacy is a function of science learning is still not well understood despite investigation across many fields. To support the improvement of students’ representational competence and promote learning in science, identification of visualization skills is necessary. This project details the development of an artificial neural network (ANN) capable of measuring and modeling visual science literacy (VSL) via neurological measurements using functional near infrared spectrometry (fNIRS). The developed model has the capacity to classify levels of scientific visual literacy allowing educators and curriculum designers the ability to create more targeted and immersive classroom resources such as virtual reality, to enhance the fundamental visual tools in science.


Introduction
Improving a person's level of scientific literacy is beneficial for the individual as well as society in general. An individual's visual science literacy (VSL) is a vital component of science literacy especially for novice learners who rely on images to illustrate abstract concepts such as the structure of an atom or directionality of ocean currents which can be difficult to visualize without graphical representation. The ability to problem solve and observe patterns as they occur in natural phenomena helps people to make informed choices based upon presented evidence [1]. For example, radar images of an impending hurricane can signal to local officials to start preparing for the storm and possible flooding. Scientific literacy is the combination of skills which helps to promote a person's ability to evaluate claims and evidence using visual information describe and evaluate scientific investigations, and interpret data, visualizations, allowing one to draw reasoned conclusions [2].
The centrality of scientific literacy as means to promote a functioning society is key consideration in science education. Therefore, a primary goal for science educators is to cultivate students' scientific habits of mind and to promote cognitive skills affective dispositions, and behaviors which, develop a person's capability to engage in scientific inquiry. Distilled to its essence, scientific literacy focuses on the promotion of reasoning in a scientific context to develop knowledge [3].

Visual Science Literacy
Visual literacy and spatial ability are essential skills in modern society [4] for learning in science because complex and real-world phenomena are often simplified into visual representations for both teaching and explanation purposes [5]. The representations are often abstract to include symbolic, including textual, drawing, or diagram or concrete, such as authentic physically manipulatable resources [6] and subject to manipulations and transformations of image for illustrating "dynamic processes" such as conformational changes in chemistry [5](p. 312).
The primary barrier to developing this visual literacy in science and more broadly skills such as spatial thinking in STEM fields is a lack of representational competence resting on scientific visual literacy [6,7]. Although there is a strong relationship between the foundational visual literacy and the domain-specific science literacy skills [1], how science literacy is attained through the process of science learning is still not well understood. To more accurately understand how science learning influences scient literacy and vis versa a reliable measure of the underlying cognitive processes is necessary. Reliable measures will aid in the design of resources that enhance the fundamental learning processes behind visual scientific literacy. This is particularly salient as many high school science students struggle in the interpretation and evaluation of scientific visuals such as in diagrams of cell structures and molecular structures.
A critical skill within the context of scientific literacy is the ability to interpret visual representations of data, models, and conclusions in scientific context. For example, a student may be asked to determine the name of a molecule from a three-dimensional model which is a skill that requires a basic ability to visualize science phenomena. Interpretation of visual representation results from critical cognitive attributes associated with executive function and working memory [8]. Important cognitive attributes in the context of science education are spatial visualization (the ability to see in three dimensions), mental rotation (the ability to rotate mental representations of two and three-dimensional objects), and visual attention (the ability to sustain focus on a stimulus). These critical cogntive attributes are central to a subset of scientific literacy known as visual science literacy. Visual science literacy can be defined as the ability to read and interpret to understand scientific information presented in pictorial or graphical form [9] and communicate science concepts through diagrams, charts, and graphs [4]. For example, a student may be asked to interpret the state of matter of a substance by looking at particle diagrams or to determine directionality of ocean currents. Spatial ability includes a variety of skills used to recognize spatial relationships such as those exemplified by diagrammatic representations in chemistry. These diagrammatic relationships are relevant to a variety of STEM discipline phenomena such as molecule polarity and the interpretation of reaction progress curves.
There are five components associated with the cognitive attributes associated with a person's spatial ability. The five components are spatial perception, spatial visualization, mental rotation, spatial relations, and spatial orientation [10] (Figure 1). Together, these factors allow students to predict the effects of transformations (e.g., recognizing that rotated objects are the same as the original object), locate objects in space, and create representations that encode position and orientation information into long term memory for future use [5].
Although we know the skills required for the development of visual literacy in science, a measurement tool that can assess an individual's level of visual scientific literacy using multiple disparate forms of data has yet to be developed. In a survey of 94 biochemistry instructors from 4-year colleges and universities across the US, a majority of instructors did not directly assess students' visual science literacy, indicating that their visual literacy on science assessments was assumed and not explicitly tested [11]. However, in instances where visual literacy has been assessed in science, a series of probes was developed to assess the visual literacy skills necessary to process and construct meaning from representations [12].

Relationship of Representational Competence, Spatial Ability and Visual Science Literacy
Representational competence includes the ability to generate, transform, analyze and explain representations [7] is a main component of an individual's visual scientific literacy along with their spatial ability and visual literacy which must be transferred to interpret new visual representations as science content knowledge is embedded in visuals ( Figure  2).
Graphical representations can dramatically enhance science learning by offering important information such as directionality of processes and systems. However, graphical representations add complexity to the comprehension process [13]. A study by Hannus and Hyönä [14] illustrates that the challenge of reading illustrations in scientific text stems from: 1) comprehension of concepts from text and images, 2) deciding on the order in which images and text are to be studied, 3) judging the relevant and superfluous information contained in texts and images, 4) determining the text and graphics related information and 5) then incorporating the related information. The challenge of using visual textbooks for novices to model real life science illustrations is in that graphics for young  students in scientific trade books do not reflect the types used in academic science textbooks and journals [13]. The study of the physical processes involved in visual perception has both facilitated and strengthened the advocacy for visual literacy in science. Research shows that seeing is not merely a system of passive processing of stimuli, but also an active construction of meaning from the presented visuals [15].
Mental rotation has been studied extensively for over five decades in the assessment of an individual's spatial ability, establishing that the cognitive processes involved with mentally and physically rotating an actual object are analogous. Although a precise mechanism and the potential relationship to VSL has not been fully established, results of mental rotation tasks are considered predictive of achievement in STEM disciplines [7,16] and show a positive linear relationship between angular disparity and response time; whereby the farther objects are rotated from one another, the longer the rotation takes to mentally visualize and simulate [7]. This linear relationship between angular disparity and response time enables its use in assessing the visual component of science literacy, or visual science literacy.

Visual Science Literacy in Chemistry
Chemistry has been identified as one science domain where mental rotation is strongly predictive of achievement on a variety of assessments P-20 [7]. In many cases the assessments consist of rotated molecular structure diagrams. Therefore, in this study, molecular structures were used for the mental rotation tasks because when making identity judgments in comparing molecular structures, mental rotation has been found to be a primary process engaged by novice students. Performing mental rotation tasks necessitates students to reason about given spatial information to perform mental spatial operations to make a successful comparative identity judgment [7,17]. Mental manipulations and transformations of images are a recurrent theme in the reports of imagery by scientists especially in chemistry [7]and psychological research indicates that visual-spatial images such as images of biochemical molecules, are easily susceptible to transformations: in the mind, or externally via concrete models, or on paper. Further, images can hold powerful metaphorical connotations that suggest relations and concepts extending beyond their concrete physical form [5]. However, assessing that ability has always required self-report measures which are subject to participant bias.

Assessment of Visual Science Literacy
A primary goal of this study is to use neurocognitive measurements which are automatic and not in danger of participant bias, to determine an individual's ability to interpret visual diagrams and transformations, indicating an ability to interpret scientific visuals. The key theoretical challenge in educational neuroscience and cognitive science is to bridge levels of analysis, linking brain processes (functions) and behavior in a mutually explanatory manner [18]. Traditionally, assessment of science literacy has involved the use of multiple-choice tests, self-report surveys and content probes. Nonetheless, student learning evaluation is a multibillion-dollar industry with many stakeholders within and outside the education system [19]. The variety of uses of assessment and evaluation of students makes assessment one of the most contentious issues in the education. Educators and policymakers often lining up on opposite sides as to the role of assessment in the school system. Educators suggest that the assessment is not a meaningful measure of student learning and does not account for key student gains and integrate multiple forms of data. On the other hand, policymakers are demanding that the role of the test is primary educator accountability. Given the latent and delayed nature of learning leads to a disconnect between stakeholders. This disconnect between educators and policymakers is a stimulus for business and government to seek more appropriate and authentic student learning measures to drive decision-making processes for curriculum and learning [16]. The call by businesses, governments, educators, and policymakers for a more authentic and realistic assessment has driven much of the assessment innovation in recent years [19]. The disconnect occurs when teachers are able to identify changes in student affect, behavior, and cognition to which assessments are not sensitized to. A possible way to bridge this disconnect is through the use of multiple integrated data sources mixing traditional content and neurological data.
Existing psychometric test development techniques are largely empirical, arising out of a history of test development dominated by correlational and model fitting methods. These methods have led to a heavy emphasis on the description of tests by factor analytic techniques or examination of predictive characteristics. Factor analytic studies have resulted in clearer descriptions of the nature of test content and relationships among items within tests but often without information about predicative capacity. Predictive validity studies provide an estimate of test value in predicting some external criterion but lack elucidation of relationships. Neither perspective, however, provides information leading to clearer descriptions of specific human profiles of behaviors leading to successful test performance in addition to predication and clarification of relationships. The approaches outlined in this study allows for all three components to be present.
Cognitive psychology has been expanding its contributions to issues close to those traditionally deemed psychometric, increasing demands have been placed upon the test movement to develop instruments that assess more complex levels of knowledge and performance. Researchers at the beginning of the test movement have argued that the subject's performance on test items (tasks) depends on specific cognitive aspects called attributes which can be modeled using an artificial neural network (ANN) [16,17], which is a machine learning methodology resembling the biological neural circuits in the brain and which uses algorithms to analyze complex datasets.

Mental Rotation for Visual Literacy Assessment
Visual literacy tasks with an average level of difficulty [12], such as mental rotation, assess our ability to determine whether objects have the same shape despite differences in orientation or size. This form of assessment is a classical visual science literacy task in which participants have to imagine one of the figures rotated in the same orientation and identify the same figure from a series or choices [12]. Since achievement on mental rotation tasks requires both representational competence and spatial ability, mental rotation tasks are an optimal task for assessing an individual's level of visual science literacy since mental rotation calls for the visual review and spatial ability to visualize the rotation of an object in three-dimensional space [12]. These cognitive processes have been found to be a core component of scientific reasoning, specifically VSL [10]. Results of mental rotation tasks are considered predictive of achievement in STEM disciplines [7,17].

Mental Rotation Tasks
The stereochemistry task used in this study was a Matlab application that consisted of two blocks of 80 trials. One of the blocks contained only 2D models of molecules and the other block included only 3D models of molecules to compare. Each trial presented participants with a pair of molecular models and the participants were asked to determine whether the two models represented the same molecule or different molecules. Participants used the marked left and right keys on the keyboard to make their "Same" or "Different" selections. The duration of each trial was limited to 10 seconds, and if a participant did not make a selection, the trial was marked as incorrect and the next trial was shown. A brief instructional video introduced the task. All models were grayscale to control for the potential effects of color. The presentation of each block was counterbalanced.

Cognitive Diagnostics with fNIRS
Most learners have a set of skills embedded in their system of perception, such as critical thinking and memory [20]. These cognitive processes evolve into a set of dynamic procedures that are used parallel to processing environmental information [21]. These functional aspects of cognition have been defined as cognitive attributes [17] which are the smallest units of cognition which retain specific features of individual cognitive processes. These skills include the ability to understand, generate and evaluate charts, graphs, and diagrams with the purpose of explaining the interaction of complex variables like those found in science. Critical thinking, the ability to retrieve from memory, and measure and predict outcomes from simple and complex problems, are other cognitive qualities of interest in science [22].

Process of Learning in Science
Measurement of specific cognitive dynamics in individual students is necessary in order to understand how brain function and cognition relate to the process of learning in science. This has led to the emergence of the field of Educational Neuroscience; bringing together members of multiple communities, all interested in understanding how learning occurs in the science classroom [23]. Functional Near-Infrared Spectroscopy (fNIRS) and Functional Magnetic Resonance Imaging (fMRI) are semi-non-invasive and non-invasive imaging methods respectively, which provides basic data about cognitive processing in real-time [24].
However, the field of education is far behind that of neurosciences in the adoption of these technologies. Even though fMRI has become the standard for noninvasive neurological imaging in the neurosciences since the invention in the early 1990s, the excessive cost, lack of expertise, as well as a lack of recognition of the usefulness by the field of these tools in clarifying questions and testing hypothesizes in the classrooms has delayed adoption. In part to address the shortcomings of fMRI, researchers have developed protocols and methods for the use of fNIRS [23]. fNIRS is better suited and less expensive than fMRI; fNIRS offers a portable and affordable method to examine cerebral blood flow (hemodynamic responses) and enables direct observation of inaccessible structures and functional evaluation and as students interact with different instructional modes [23].
An important consideration in the translation of research from basic research to applied research is the consideration of how the different disciplinary language is interpreted. The translation of basic cognitive science to educational practice requires transdisciplinary educators to provide linkages between behavior and cognitive processes. This divide arises because behavior is measured in terms of choice and response time, whereas neural activity is measured by spiking activity or blood oxygen-level-dependent (BOLD) signal in fNIRS [25].

Neurological Data from Mental Rotation Tasks as Inputs to the ANN
Psychophysiology is the study of the relationship between psychological manipulations (in this case the mental rotation tasks) and the resulting physiological responses, typically measured via autonomic nervous system responses, to promote understanding of the relation between mental and bodily processes [26]. Psychophysiological responses are therefore automatic, non-voluntary responses from the autonomic nervous system. These autonomic responses are useful because being involuntary, they are not subject to participant bias as are self-report measures [26]. Although behavior is the result of ongoing mental processes, the observed behavior is not the equivalent of mental activities, because these activities are not always translated into motor acts. However, these mental activities themselves, although not directly observable, are behaviors but hidden from observation. Therefore, utilizing autonomic psychophysiological measurements gives the ability to bypass observed behavior which is a more direct mode of learning about the underlying cognitive processes [26]. These psychophysiological measurements are numerical since they measure the magnitude or frequency of a physiological state (ie blood oxygenation level) and, as in the case with using fNIRS, robust since there are many electrodes being used concurrently, all recording data about blood oxygenation level. This is important because the excessive data being recorded from each electrode together develops a 'big data' set while using less experimental participants than could be obtained when using quantitative probes or qualitative analysis. Traditionally, neuroscience involves a lot of small models that encompass limited data sets which are more descriptive than explanatory [27]. Since each data point is considered a subject in the analysis, such comprehensive, quantitative data can be used as an input to an ANN to analyze and forecast. One possible bridge between behavior and brain-system is through the use of cognitive computational models in the form of ANNs to integrate multiple forms of data.

Cognitive Computational Models
Cognitive computational models are mathematical formalisms that embody psychological principles, often evaluated by their ability to account for behavioral data. The mechanisms in these models can be related to both behavior and to neural measures, thus providing a possible bridge between the abstract cognitive data and practical outcomes the teacher can make use of in the classroom and in the development of supplemental resources. These models are very loosely based on neurocognitive processes with a natural proclivity for storing and using experiential knowledge to solve problems through the interaction of various processing elements [16].
ANNs serve as a model for biological systems, including intelligence, as adaptive processors in real-time, and as shown in this study, methods for data analysis [28]. A collection of integrated nodes (neurons) forms the artificial neural network which explains the interdependence between the cognitive abilities and the successful completion of tasks. The neurons establish the three ANN layers-input, hidden and output distinct layers. A fundamental feature of an ANN is the hidden nodes which connect the input and output elements, therefore they are based on a level of abstraction which is a cognitive function unique to higher-level organisms, such as the neural systems the technology is based on [16].
No computational function was given by the input layer, instead, the input layer distributes the data into the neural network. The mental rotation tasks serve as an input for the purposes of this model. The hidden layer represents the task assigned cognitive attributes, and the output layer consists of the probabilities of success and failure. Figure 3 provides a general illustration of the ANN used in this analysis.

Developing the Artificial Neural Network
The Artificial Neural Network was developed using the Gradient Boosted Trees algorithm (GBT). GBT is a learning procedure that consecutively fits new models to provide a more accurate estimate of the response variable [29]. The GBT algorithm was chosen because of its overall surpassingly precise predictions as it makes successively more accurate (gradient) weights for predictors by using the weights from the previous decision tree [29]. Training of the artificial neural network in Rapidminer 9.8 (www.rapidminer.com) , as well as the confirmatory model developed in SPSS, used a random stratified 1/4n split data approach with two training sets and two testing sets. The first training set (trainingset1) was used for feature selection and for relative weights of predictors, while the second training set (trainingset2) was used to train the model.
For each of the proposed attributes, an Artificial Neural Network derives propagation weights from random assignment to test set data (1/4n). As the signal travels from node to node within the network, the weights represent the strength of signal propagation.
For information about the learning tasks used for this study data preprocessing for the ANN, please see Appendix A.

Results
In answering the first research question: What is the relationship between representational competence, spatial ability, and visual literacy in science? The results of this study showed the theoretical connection between an individual's performance on a series of mental rotation tasks and their visual science literacy level. To answer the second question: How can the cognitive processes behind visual science literacy be accurately assessed using an artificial neural network (ANN)? a predictive artificial neural network was developed as a neurocognitive visual science literacy assessment which found that the individual visual components of each molecular structure was the greatest predictor in the network.

Artificial Neural Network
This GBT model found that the biggest determining factors or predictors to a successful mental rotation are the individual problem number of the 160 mental rotation items performed by each participant (80 Ball and 80 Dash), the Response Time and fNIR optode #16, located along the right prefrontal cortex which plays a vital role in processing visuospatial working memory (VSWM) [30] and episodic memory retrieval [31]. Table 1 displays the relative predictors for artificial neural network weighting.
Once the network was configured using the random weighting technique, the network was then trained by providing the ANN with a number of examples from the 1/4n data set (1/4n= 1,533,643) to illustrate how the ANN would behave. Analysis of the results of the trained ANN with the calibration set indicates a precise behavioral indicator of the performance outcomes of the subject based on the cognitive attributes given. Table 2 offers key model fit statistics. The model shows an accuracy of 93.9 percent and an F1 score of 95.7% after running trainingset2, but since the F1 measure considers both model recall and accuracy, it provides a more accurate view of the true model fit.
The developed GBT ANN consisted of 140 trees with a maximum depth of 7 branches. Each branch on a decision tree represents a choice or variable (i.e.. the type of task, task complexity, signal frequency reaching threshold) and the model uses dynamic boosting, where the trees get progressively more accurate because they learn from the preceding trees. As indicated by the high-performance level, the data fit the model; and since the individual visual task is the model's biggest determining factor, hierarchal nature to visual literacy to be further examined with the Rasch analysis in a future study.

Discussion
This research has provided a rationale for integrating and expanding multiple distinct research areas relating to subject learning. The first is to use cognitive diagnostic methods to evaluate the learning of subjects as it applies to the cognitive attributes used during science processing. The second area is an analysis and modeling of the connection between processing factors that contribute to visual science literacy. The third is the use of an artificial neural network to predict cognitive diagnostics. The literature discussed incorporates work from multiple subject areas, such as science education, the psychology of education, measurement, and computational psychology.
To address the research question: " What are the cognitive processes behind visual science literacy and can they be accurately assessed using an artificial neural network (ANN)?", we used the developed GBT ANN to predict the effectiveness of mental rotation in molecular models.
We developed a GBT ANN using fNIRS (neurocognitive) data which predicted mental rotation performance accurately. The findings show the creation of a successful ANN model for the student's understanding of visual literacy in science, which provides useful data to educational researchers. The theoretical model gives an overview of the dynamics of the science classroom-based learning processes associated with visual texts. This Rapidminer model was validated with confirmatory results by designing a Multilayer Perceptron (MLP) in SPSS using the same training data (Supplementary material).
Mental rotation is a visual literacy task in science and since the ANN had a high degree of accuracy in predicting mental rotation performance with the largest predictor being the individual problem number and its specific visual elements, the results of this study indicate that visual science literacy can be tested using an artificial neural network; where different aspects of visualizations that are more challenging for students are identified and more targeted interpreting resources are needed. Fewer objects and characteristics can be evaluated in neural network analysis than the traditional cognitive diagnostic tests. The use of an ANN as a statistical processor means that the probabilistic assessment of the data is included, especially the input nodes and output nodes. The input nodes will be recast as input patterns and the output nodes will be reworked to produce higher density samples, randomized probability estimates. This connection to inferential statistics enables researchers to connect ANN to functional representations of problems in the real world such as cognitive characteristics, probabilities of completion and the development of hierarchical relationships. In this context, an ANN offers answers to a variety of problems through its complicated statistical modeling with a focus on versatility.
The ANN model shows a good fit and approaches human learning related to the perception of visual texts composed of scientific material. Model fit determined how many iterations resulted in convergence. Good model fit suggests a computational-cognitive model which describes the cognitive attributes underlying the visual sciences (H0, R2=0). Upon adding tasks as input nodes and attributes as the hidden nodes, a sophisticated model of cognition relating to science processing becomes possible. Given the subtle, complex and poorly understood cognitive evaluation of science processes, an incremental approach like this one produces high-quality results.

Limitations
Despite these findings, some limitations to the study should be considered. The inherent flexibility of ANNs sometimes leads to an overfit, which increases as variables increase, resulting in the randomization of components. This can lead to lower performance of future data or new test data. To control for this, it is important that the data have a similar or higher level of complexity and the application of backpropagation, as used in GBT, reduces the number and type of overfit errors [32]. Another limitation of this study is that fNIRS tends to overrepresent the blood flow in the brain especially in a high-stress environment, like when testing [33]. In this study, a signal analysis was done in order to correct this.
Other limitations to the study were due to the use of secondary data. Although the ability to extract new information from previously recorded data, may provide many benefits related to its unobtrusive speed of analysis and ability to triangulate and reveal serendipitous relationships of the variables, it is not without its negative aspects. One challenge presented by using secondary data in this study was due to the packaging if the stimulus images, the decentralized storage of the data, and the nonstandardization of dataset file types.
Secondary data science is an emerging field and one that is not without its growing pains. As secondary data becomes more utilized in mainstream research, a global standardization of standards and format will be necessary. For example, to access the EEG and eye-tracking data which is stored in the repository requires the secondary data scientist to have the software used to generate the data. By exporting all data in standardized formats such as CSV, or multiple applicable formats, would help data to be reevaluated by others to find meaning in the new patterns in the data.

Future Implications
This research follows a new area of neuroscience, the study of the physiological and cognitive systems involved in visual perception [15] illustrating the numerous dependent variables obtained in real-time when working with autonomic systems and which can therefore be used to model these dynamic processes. Although the rapid development of cognitive sciences has outpaced traditional educational research's ability to keep pace with the verification, development, and translation of new ideas from cognitive science to education, in educational neuroscience and cognitive science, the main conceptual challenge is to link levels of analysis, connecting brain processes (functions) and behavior (actions) in a mutually coherent way [20].

Implications for Science Educators
This neurocognitive computational model would be ideal for the personalization of immersive educational resources such as VR and AR which can educate students on topics that are too abstract or impossible to create or manipulate in real life [34]. In fact, the use of neurocognitive measurements to guide VR programming and assess cognitive processes while learning is an emerging field. This future direction would be to take corresponding eyetracking and EEG data and incorporate the streams of neurocognitive data as more levels of the ANN in order to develop a more robust model of student visual science literacy which would then be embedded within the backend processing of an immersive educative resource for a personalized learning experience. Studies using recent brain imaging techniques have shown that virtual VR encounters in the corresponding real situation activate the same brain areas as those activated [35,]. Although fMRI has been considered the gold standard in relation to dynamic brain imaging, the required huge machinery dimensions, disturbing noise, and electro-magnetic interferences with other instrumentations, together with the horizontal and unnatural position of the participant during scans' acquisition, constitute the most limiting factors in the use of the technology with VR paradigms. However, fNIRS non-invasive headband sensors can be worn with virtual reality head-mounted projection making it an ideal measurement to supply an immersive educative experience with the real-time neurocognitive data for the personalization of individual student learning. Therefore, this will require a new form of professional development for educators and curriculum designers. While the immersive educative experiences will be ultimately coded by engineers and many resources are being created to help create a global immersive educative network to be shared with educators around the globe. Educators will now need to understand the data being recorded by the neurocognitive devices or at least understand enough to provide interventions and support. Specifically, although the developed ANN is able to indicate a relative visual science literacy level, it is up to the educator at this point to indicate what the student should be doing to target those latent traits. However, if this ANN is used within an immersive learning environment, the data will guide the technology to the next steps to target the necessary skills and will automatically guide the student there, but it is at that point the job of the educator to know the available immersive educative resources and how each could help the individual student to succeed. This drives the educator into more of a mentor role than a content resource. This study found that individual visual characteristics were found to be the biggest determiner for mental rotation success which indicates that as visuals become more intricate, novice learners need more support in mentally visualizing the molecules. Since, explicit instruction in diagrammatic reasoning improved student's representational competence [4], by being to explain and identify the interpretable aspects of a visual in a memorable way, when faced with that representation or similar ones, the individual would be able to transfer that previous knowledge to the new visual information. Therefore, visual science literacy could be supported in the science classroom with regular, explicit instruction into the interpretation of the visuals as per the embedded science content such as in using the Identify and Interpret (I2 ) strategy developed by non-profit Biological Sciences Curriculum Study (BSCS), emphasizing a three step approach to analyzing visual information; (1) identify discrete elements of the visual, (2) interpret the meaning of each individual element, (3) and then caption what the full visual means by connecting those individual interpretations [36].

Conclusions
Through the combining of traditional educational research with cognitive science, this study confirms research by Lamb, Annetta, Vallett, & Sadler [20] on computational modeling with neurocognitive data and Mnguni et al. [12] and Stieff [7] on the assessment of science literacy. This research adds to the literature by using a tool of artificial intelligence, in the form of an ANN, to predict behavior based on psychophysiological measurements. Before now, there has been limited research on the quantitative analysis of visual science literacy. This addresses fundamental questions regarding how psychophysiological measurement can help to describe and explain behavioral outcomes (i.e. reading and writing of science texts), the basic processes that underlie unique individual human abilities, and can be used in the future to develop digital immersive educative tools to support the processes required for scientific literacy. The primary purpose of this study was to develop an ANN as a computational model for Visual Science Literacy to model the complex dynamic systems associated with this construct so that pedagogy can be developed to support its development. The computational cognitive model in the form of an ANN assists in obtaining information related to the science-based curriculum and offers additional data related to student learning. The model shows good data fit and approximates human learning to the completion of scientific visual literacy tasks that provide a way to connect biological, physiological, cognitive, and behavioral data. Analysis of ANN weightings suggesting a hierarchical relationship in the cognitive functions underlying this form of literacy and future cognitive attribute models can insert these in order to test this view of the data channel of cognition. By improving the population's average visual science literacy, this research supports, that individuals would be able to more accurately interpret science visual texts which would be beneficial personally and to the society as a whole.
Author Contributions: For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used "Conceptual-

A.1 Developing the Artificial Neural Network in RapidMiner
RapidMiner Studio 9.8 (www.rapidminer.com), formerly YALE (Yet Another Learning Environment), is a platform for machine learning, data mining, text mining, predictive analytics, and business analytics. Used in science, education, training, rapid prototyping, application development, and industrial applications, Rapidminer has an AGPL Open Source License and is a data analytics tool used in real projects. RapidMiner offers a graphical user interface (GUI) for the creation of an analytical method (reading data from the source, transformations, applying algorithms). All GUI modifications are stored in an XML (eXtensible Markup Language) file, and this file is read by RapidMiner to run the analyses. RapidMiner provides all data transformation tools for data processing such as conversion operators (ie Numerical to Binomial, Numerical to Polynomian, and Nominal to Numerical), data modeling such as classification and prediction, and visualization of results. RapidMiner also allows links to a wide range of data sources, such as Oracle, Microsoft SQL Server, MySQL, and Excel, Access, as well as many other data formats enabling deployment of models at scale. Correlation-based filtering was used to remove features that had a very low correlation (r < .04) with the predicted effect and behavior constructs from the initial training set. This method involved the calculation of Pearson's Correlation coefficient between frequency data of each of the Optodes and the target 'task success' output. Features with correlation coefficients < 0 were then removed from the overall feature set of 25 variables. A total of six features were removed (Optode 10, Time, Average of all Optode frequencies, Standard Deviation of all Optode frequencies, ID, and Trial Order), leaving 18 features that were used in the development of the seven prediction models, and 1 target feature (score). The feature selection for each detector was then done using forward selection within the Rapidminer platform, where each feature is evaluated individually and the performance of a model is tested on examples containing more and more attributes. Forward selection invokes its inner operators, in this case, cross validation. The network was initially trained on neurocognitive data sets to evaluate each factor's cognitive process. Subsequent runs of the model centered on using novel data sets to provide an ANN skill test for completing tasks. There was no overlapping of tasks due to the basic unidimensional existence of the factors, but individual attributes overlapped tasks and that overlap helps to explain the non-linear outcomes associated with learning. The model completed tasks correctly across a significant portion of the time thus validating the model and creating a predictive model of subject learning. System outcomes can be controlled with increasing and decreasing attributes as a function of interventions with task probabilities of task performance as the outcome and with manipulation of the distribution of cognitive attributes, one can experiment with the role each attribute plays in visual science literacy. The models were validated using k-fold participant-level batch cross-validation (k=10). In K Fold cross validation, the data is randomly divided into k subsets, in this case 10 subsets of approximately equal size. Each time the model is run, one of the k subsets is used as the test set and the other k-1 subsets together form a training set. Then the error estimation is averaged over all trials to get total effectiveness of our model. Every data point gets to be in a validation set exactly once, and gets to be in a training set k-1 times which significantly reduces bias and also significantly reduces variance. The accuracy of this output was evaluated by repeating the selection process on each fold of the participant-level cross-validation to determine how well the models generated using this selection method perform on new and unseen test data. The final models were obtained by applying the feature selection to the entire dataset. The feature (called attribute in the Rapidminer software) that gives the best performance is then retained, and the process is then repeated using two attributes, with the best from the first run and the remaining 17 attributes. The best pair of attributes is then retained, and the process is repeated with three attributes using each of the remaining 16. This is repeated until the stopping conditions are met and all attributes are weighted based on its relative predictive value and the model does not improve further with the addition of another feature. Using cross validation as the forward selection inner operator ensures that the performance is an estimate of what the performance would be on unseen data, increasing the confidence that the cognitive and behavior predictors will be more accurate for new students (testing data).

A.2 Missing Data
In order to address missing data, several data imputation methods were evaluated with each machine learning algorithm to optimize model performance. This step was taken for all seven algorithms (Naïve Bayes, Generalized Linear Model, Logistic Regression, Deep Learning, Decision Tree, Random Forest, and Gradient Boosted Trees), particularly since the step regression algorithm could not be used in the Rapidminer Studio 9.8 platform (www.rapidminer.com) with missing data. Each algorithm was tested with data that was imputed using zero, the average value, or with no imputation at all. With average imputation, missing values within the dataset would be replaced with the average value of all values within the dataset, whereas zero imputation would mean that missing values would be replaced with' 0.' Average imputation was chosen to be used for all algorithms since it is most commonly used in machine learning models and is effective with all seven algorithms. The data was then separated into four slices using stratified sampling into two training and two testing sets (N= 1,533,642) i.e. a K-fold holdback approach [37], saved as CSV files, and then imported and saved into Rapidminer's repository for the development of the Artificial Neural Network.

A.3 Cleaning the data
Hemodynamic data from the fNIR sensor (optode) was exported as an Excel file (N=131,390). This was separated by the participant and by the task. Problem complexity data, also organized by participant and task was kept in a separate Excel workbook. Mental rotation tasks were coded as 0 for the 2D Wedge and Dash models and 1 for the 3D Stick and Ball models. Strings (i.e. words) from both spreadsheets were changed into number codes (ie low complexity= 1, medium complexity= 2, advanced complexity= 3) for analysis before merging, or combining, the spreadsheets using the ID number and problem number as Join Keys (outer join).
In order to mitigate the effects of missing data, several data imputation methods were evaluated in an effort to optimize model performance. The imputation methods are using zero, the average value, or with no imputation at all. With average imputation, missing values within the dataset would be replaced with the average value of all possible values within the dataset, whereas zero imputation would mean that missing values would be replaced with' 0.' The data from one participant was removed entirely since only data from the 3d Stick and Ball models was collected. Data was then exported to an Excel CSV file (N=131,390) and separated by the participant and task to ensure visuality of the fNIRS response at the individual level. Problem complexity data was also organized by participant and task. Machine Learning algorithms use numerical inputs as opposed to strings (words) so categorical data from both spreadsheets were changed into number codes. Mental rotation tasks were coded as 0 for the 2D Wedge and Dash models and 1 for the 3D Stick and Ball models and task complexity was given a scale from 1 (low complexity) to 3 (advanced complexity) before merging, or combining, the spreadsheets using the ID number and problem number as outer merge join keys.

A.4 Rapidminer model comparisons.
Once the mental rotation task data (ie Task, Complexity, Score) was outer merged with hemodynamic response data received from the fNIRS outputs. Models for each construct are built in the RapidMiner 9.8 data mining software, using common classification algorithms that have been previously shown to be successful in building cognitive models: Naïve Bayes, Generalized Linear Model, Logistic Regression, Deep Learning, Decision Tree, Random Forest, and Gradient Boosted Trees. Using the first set of training data, each algorithm was systematically compared to identify the most accurate predictive model with the lowest classification error for the data (Table 3). Evaluation of the Area Under Curve (AUC) will compare the Specificity of the model to the Sensitivity of that model by comparing True Positives to False Positives. With a range of 0 to 1, and a greater value indicating a better performing model, the GBT's AUC of 0.989 is further evidence that it is the superior model as compared to Naïve Bayes, GLM, Logistic Regression, Deep Learning, Decision Tree or Random Forest with AUC ranging from 0.751 for Naïve Bayes up to 0.888 for Deep Learning. The AUC performance metric was computed on the original, non-replicated datasets. In our model performance measurements, the AUC was used as the primary measure of model goodness, as it is recommended that this metric be particularly suitable for skewed data [38]. The model with AUC of 0.5 performs at random, and the model with AUC-ROC of 1.0 performs perfectly. It is worth noting that the AUC considers model confidence. A combination of features has also been selected from the forward selection process in each of the impact and behavioral models, which provide some insight into the type of student interactions that predict the particular affective state.

A.5 Developing a Confirmatory Model in SPSS
The Multilayer Perceptron (MLP) was chosen as a confirmatory model since it is one of the two available algorithms in SPSS for classification in an neural network which is more commonly used for data analysis in educational research as compared to Rapidminer, and since MLP is often used in cognitive science and neuroscience for the modeling of cognitive functions [39]. An MLP network is a predictor (also known as inputs or independent variables) function that minimizes the predictive error of target variables (also known as outputs). This generates a predictive model for one or more targetdependent variables based on the values of the available predictor variables, organized in a layer-wise structure where each successive layer stores more abstract representations of the data than the previous layer [40].