4. Analysis of the Participants’ Responses
In this section, all experiments and their results are analyzed and discussed in detail. The objective of this section is to understand how each usability factor affects the overall acceptance of the system. In this analysis, the usability factors are the usability dimensions, SUS and TLX.
The first experiment investigated the usability of the VR system in a cross-cultural context, involving junior Engineers from two countries. The objective of this evaluation is to examine how cultural differences between teams influence the usability of the VR system for both inspectors and actively involved users. A total of 23 engineers participated in the usability test, conducted in collaboration with Ostfalia University of Applied Sciences (Germany) and Tshwane University of Technology (South Africa).
The overall usability average across both groups was 51.7%, with a notably low score in the learnability dimension, indicating challenges in system understanding. The SUS scores were comparable between universities, averaging 3.17 and 3.05, suggesting a consistent perception of usability across cultural groups. These findings indicate that, although the system was generally usable, users experienced difficulties in quickly understanding and learning how to interact with the software effectively.
During the inspection, both participant groups provided largely consistent responses, indicating a high degree of objectivity in the evaluation process. The analysis confirmed that the VR system provided the core functionalities required to perform the assigned design review task, including object grouping, model scaling for inspection, and the selection of individual components within the virtual environment.
Despite this, the inspection phase also revealed several limitations. Specifically, the lack of supporting features such as error feedback and instructions to fix these errors. These missing elements were recognized as potential areas to be improved.
In contrast to the inspection results, the empirical survey revealed noticeable differences between the two participant groups. User responses showed variations in user interactions, particularly regarding the tasks completeness. While participants from one group generally considered the available functions sufficient for the assigned tasks, participants from the other group indicated the need for additional features. These differences may be partially attributed to variations in prior experience with VR technologies, which in turns influence user expectations and evaluation criteria.
The TLX results further highlighted differences in user opinions, particularly with regard to generally satisfaction. While some participants reported satisfaction with their performance, others expressed lower levels of satisfaction through challenges in system interaction.
In summary, the results of experiment I indicate that the used VR system provides the essential functionality required for design review tasks and is generally perceived as usable across different cultural contexts. However, limitations related to system learnability, task appropriateness, and self-description of the software were identified. These findings highlight the need for improvements in interface design and user support mechanisms to enhance usability and ensure a more consistent user experience across various user groups.
The second usability experiment focused on how users interacted with the VR system during their first encounter with a set of predefined, product-related tasks. The main aim was to examine the initial user experience, with particular emphasis on usability, interaction behavior, and perceived workload, rather than on task efficiency or complete functional performance. The study involved students enrolled in an ergonomics course, who completed structured tasks in a virtual train model. These tasks included object inspection, taking measurements, navigating the environment, and using basic interaction functions.
Overall, the usability evaluation indicated a higher level of usability than in the first experiment with an average score of 62.3%. The SUS resulted in a mean value of 3.18 on a five point scale, suggesting a generally acceptable usability perception among participants. While users were able to complete the assigned tasks successfully, several usability aspects revealed opportunities for improvement, particularly regarding system learnability and the clarity of certain interaction mechanisms. Analysis of the responses showed that several core interaction functions were clearly recognized by the participants. Object selection and manipulation were identified as available and simple, suggesting that the software supports interaction tasks required. In addition, the system response was perceived positively, as users experienced immediate feedback following their actions. However, other functional aspects revealed noticeable uncertainty among users. Features, such as error handling or object grouping, were not clearly recognized by many participants, and a number of users reported difficulties in evaluating these features. This suggests that such features were either not sufficiently visible within the interface or were not required during the assigned tasks. Similarly, system status information, such as battery status, was not noticed, indicating limitations in interface transparency.
The evaluation of usability dimensions based on established principles revealed mixed results across all categories. Task suitability was generally perceived positively, particularly with regard to the availability of relevant functions for completing the as-signed tasks. However, supporting elements such as error messages or contextual help were considered less effective, indicating a need for improved user guidance during task execution. In terms of expectation conformity, the interface design was largely perceived as unintuitive. Menu structures and visual elements did not align with user expectations, and several graphical representations were difficult to understand. Learnability emerged as one of the weaker aspects of the system. Many participants experienced difficulties identifying features related to system guidance, or preview functions. Visual orientation within the interface was not consistently clear, indicating that first-time users may require additional instructional support or guided interaction mechanisms. Similarly, error tolerance and system controllability were not clearly perceived by users. Functions such as ‘undo’ or alternative input methods were not widely recognized, suggesting that these features were either insufficiently communicated or not encountered during the experimental tasks. Despite these limitations, several usability aspects received positive feedback. The system was generally perceived as self-descriptive, with users reporting a clear sense of control during interaction and an adequate understanding of icons. Furthermore, user engagement was notably high, as participants expressed a positive initial impression of the software. The results of the NASA TLX indicated a manageable level of workload during task execution. Participants described the tasks as moderately demanding, primarily due to the novelty of the VR environment and unfamiliar interaction techniques. Physical workload was perceived as low, and time pressure was considered appropriate for the experimental setup. Emotional responses varied, with some users reporting satisfaction and a sense of accomplishment, while others experienced temporary uncertainty, particularly when interacting with unfamiliar system features.
In summary, the results of experiment II indicate that the VR system provides a generally positive first user experience with moderate usability and manageable workload. Core interaction functions performed effectively and supported task completion. However, several usability aspects, including learnability, expectation conformity, and error tolerance, require further optimization to improve overall usability and reduce uncertainty for users with limited practical experience.
The third experiment investigated the usability of the VR system within a real industrial context, focusing on a multi-user design review of a cutting machine. The evaluation was conducted with experienced engineers and emphasized collaborative interaction, technical inspection, and ergonomic assessment like reachability aspect within a virtual environment. The targeted outcomes of this experiment focus on evaluating the usability of the VR system from the perspectives of only active users.
Overall, the findings suggest that the VR system is generally usable with 50.3% and accepted in a design review, although several usability limitations remain. The SUS results indicate a moderate usability score 2.77. The system was perceived as relatively simple to operate, with users indicating that most functions could be learned quickly. At the same time, the willingness to use the system regularly was rated higher, suggesting that further improvements are required to achieve long-term adoption.
The empirical evaluation identified several missing or insufficiently implemented features, particularly in the areas of learnability and error tolerance. Key shortcomings included the absence of flexible error message handling, lack of visible system status indicators (e.g., controller status), missing diagnostic tools, and the absence of ‘undo’ functionality. In addition, the system did not provide clear previews of actions or sufficient visual cues to indicate menu hierarchy levels. These limitations negatively affect the transparency of the system and increase the cognitive effort required for task execution.
The empirical questionnaire results were limited due to the small number of valid responses. As a result, no comprehensive quantitative conclusions could be drawn for most usability dimensions. However, a positive tendency in self-descriptiveness and user engagement was observed, indicating that participants recognized the potential value of the VR system for collaborative engineering tasks. In addition, the CEO of the company stated clearly that they are planning to implement VR in their design-review process with clients, because it provides more clarity and allows clients to familiarize themselves with the machine, especially those who may not be able to understand CAD designs.
The NASA TLX results indicate that the perceived workload during task execution was generally low to moderate. Task complexity was rated between simple and moderately complex, primarily due to limited prior experience with VR systems and insufficient preparation. Physical workload was perceived as low, confirming that interaction with the VR system did not impose significant physical strain. Time pressure was not considered an issue by any participant, with the task pace described as appropriate or even slow.
User satisfaction and perceived performance varied among participants. While some users reported that tasks were easy and understandable, one participant found them more challenging and indicated that their performance could be improved with additional training. Overall effort was rated as low, although one participant reported difficulty in reaching their desired performance level. Emotional responses were mostly positive, with participants generally feeling relaxed; however, minor stress and frustration were reported in relation to technical issues such as audio communication problems and occasional system instability during the multi-user session.
Despite these limitations, participants demonstrated active engagement with the VR system and were able to complete the assigned collaborative tasks. The multi-user functionality enabled effective communication and joint inspection of the virtual model, highlighting the potential of VR for distributed design reviews. At the same time, the identified usability issues such as system feedback, learnability, and technical reliability, indicate that further refinement is necessary to ensure consistent performance in professional environments.
In summary, experiment III demonstrates that the VR system is functionally applicable and positively perceived in an industrial multi-user design review scenario. However, improvements in system robustness, user guidance, and feature transparency are required to enhance usability. These aspects will be taken into account in the next version of the software.
The fourth experiment evaluated the application of the VR system within another real industrial design review scenario, focusing on the analysis of cable routing on the roof of a regional train. The objective of this evaluation was to assess system usability in a professional engineering context, with special focus on interaction quality, task support, and user acceptance in comparison to traditionally CAD software. The targeted outcomes of this evaluation involve a detailed analysis of the seven usability dimensions, as the experiment was conducted with an experienced industrial team.
Overall, the results indicate a moderate level of usability, with an assessed usability score of 54.5%. The SUS yielded a value of 2.9 on a five point scale, reflecting a rather critical perception of the system among professional users. Although participants were able to complete the assigned tasks, the results highlight several usability limitations that negatively affected efficiency, intuitiveness, and overall acceptance.
From a functional perspective, the system demonstrated strong capabilities in visualization and object interaction. However, ideal precision was not achievable with the available headset at the time. The participants were development engineers who conducted a design review in VR, following their usual review practices. Their feedback was strongly influenced by comparisons between VR and the CAD software they typically used. Many participants initially resisted the VR technology, citing the need for additional training before adoption. One notable comment from the team leader was:
“If I have to invest more money and time to prepare the workforce and adapt the process to implement a new software that only supports one phase of the process, while I can already perform all tasks with the current software, then I do not need it.”
Participants evaluated system responsiveness and the ability to manipulate and inspect complex geometries within the virtual environment negatively. Users reported insufficient flexibility in object selection and difficulties related to controller input, which reduced interaction efficiency.
The analysis of responses revealed deficiencies in system self-descriptiveness. While basic feedback mechanisms were present, important system states such as controller status or active interaction modes, were not continuously visible. This lack of transparency led to uncertainty during task execution, particularly when switching between different tools or interaction modes. In addition, inconsistencies in interaction logic were identified, as users were required to manually deactivate functions before activating new ones, which does not align with typical user expectations.
Evaluation of usability dimensions showed a mixed performance across categories. Task suitability was generally rated as adequate, as the system provided the core functions required for the design review tasks. However, the lack of supporting features, such as advanced measurement tools and precise representation of cable radii, limited the effectiveness of the system for detailed engineering analysis. This technical limitation had a direct negative impact on user trust and perceived reliability of the VR model.
Expectation conformity was only partially fulfilled. While some interface elements, such as menu structures and visual design, were considered understandable, the overall interaction concept was perceived as non-intuitive. Participants indicated that additional training would be required before the system could be effectively integrated into existing workflows.
The learnability of the system was identified as a critical weakness. Although some visual cues, such as color coding and icons, supported user orientation, these were not sufficiently clear or consistent. The absence of preview functions and limited guidance mechanisms made it difficult for users to anticipate the outcome of actions, increasing cognitive effort during task execution.
Similarly, error tolerance and controllability were limited. The system lacked essential features such as undo functionality and diagnostic feedback, restricting users’ ability to recover from mistakes. While most participants were eventually able to perform the required interactions, the process was often inefficient and required additional effort.
User satisfaction results reflect these usability challenges. While participants acknowledged the high potential of VR for immersive visualization and collaborative design reviews, they also emphasized that the system is currently less efficient than conventional CAD tools. Resistance to adoption was observed, particularly from a managerial perspective, where the additional effort required for training and process integration was perceived as a barrier.
The NASA TLX results indicate a moderate workload. Physical demand was generally low, confirming that VR interaction does not impose significant physical strain. However, cognitive load and frustration levels were elevated in some cases, mainly due to interaction difficulties and system limitations. Time pressure was not considered a significant issue.
In summary, the results of experiment IV demonstrate that the VR system offers strong advantages in terms of visualization and spatial understanding, particularly for large and complex models. However, limitations in usability, interaction design, and technical accuracy significantly affect user efficiency and acceptance in a professional engineering context. To enable successful integration into industrial workflows, improvements are required in system intuitiveness, feature completeness, and reliability, as well as in reducing the gap between VR and established CAD-based processes.
The fifth experiment aimed to evaluate the VR system in a broader and more diverse user context, with a particular focus on identifying missing functionalities and collecting user-driven recommendations for improving the system. Due to the relatively large number of participants and their varied professional backgrounds across different engineering domains, this experiment emphasized qualitative insights into user needs alongside the assessment of usability across seven dimensions.
A total of 40 participants, junior engineers with practical experience in various industrial departments, took part in the evaluation. This heterogeneous background enabled a comprehensive assessment of the system from multiple professional perspectives, particularly regarding its applicability in real-world engineering tasks.
Overall, the usability evaluation yielded a score of 68.7%, representing the highest usability rating among all conducted experiments. The SUS resulted in a mean value of 3.0 on a five-point scale, indicating a generally positive perception of the system. Participants were able to complete the assigned tasks effectively, and the system demonstrated improved performance compared to earlier versions. Nevertheless, the primary outcome of this experiment lies in the identification of missing features and improvement potential.
A key result of this study is the identification of 18 missing functions required by users to effectively perform their tasks. These functions were derived from participants’ direct interaction with the system and reflect practical requirements from different engineering domains. In addition, participants proposed 23 recommendations aimed at improving system usability, functionality, and integration into existing workflows. The feedback was notably detailed and critical, reflecting the participants’ technical background and professional experience.
Analysis of the usability dimensions revealed a generally positive performance across most categories. In terms of task suitability, participants confirmed that the system provides the core functionalities required for design evaluation and spatial analysis. However, the absence of several advanced features limited the completeness and efficiency of task execution.
Regarding self-descriptiveness, the system was perceived as understandable, with users generally able to interpret system behavior and interaction outcomes. Nevertheless, some participants indicated that additional guidance and clearer system feedback would further improve usability, particularly for more complex tasks.
The expectation conformity dimension was evaluated positively overall. Interface elements, such as menus and visual structures, were largely consistent with user expectations. However, certain interaction mechanisms still deviated from conventional engineering software workflows, requiring adaptation by the users.
In terms of learnability, the system showed noticeable improvement compared to earlier experiments. Participants were generally able to familiarize themselves with the system within a short period. However, given the complexity of some tasks, additional onboarding support and training features were still considered beneficial.
The evaluation of controllability indicated that users were able to interact with the system and perform the required operations successfully. Interaction with objects and navigation within the virtual environment were generally perceived as manageable, although some users reported minor inefficiencies in control precision.
The error tolerance dimension remained an area with improvement potential. Participants noted the absence of certain features, such as undo functions and error handling mechanisms, which are essential for efficient and confident task execution in professional environments.
Finally, user engagement was rated highly. Participants expressed strong interest in the VR system and recognized its potential for supporting engineering tasks, particularly in visualization and interdisciplinary collaboration. The immersive nature of the system contributed positively to user motivation and acceptance.
The TLX results further support these findings, indicating low perceived workload across cognitive, physical, and temporal dimensions. Participants reported high levels of satisfaction and relatively low effort during task execution, suggesting that the system provides a comfortable and efficient interaction environment despite existing limitations.
In summary, the results of experiment V demonstrate that the VR system achieves a high level of usability and user acceptance in a diverse engineering context. The large and varied participant group enabled the identification of a substantial number of missing functions and practical improvement recommendations, which are critical for further system development. While the system performs well across most usability dimensions, targeted enhancements particularly in feature completeness and error tolerance are necessary to fully support professional engineering workflows.
The sixth experiment investigates the usability of the VR system under conditions of increased task and time pressure. In this experiment, participants were required to complete predefined tasks within a limited time frame as part of a graded academic activity. The objective of this evaluation is to analyze how time pressure and performance requirements influence the usability across the seven defined dimensions, as well as their impact on perceived workload, user satisfaction, and suggested improvements.
A total of ten participants, all bachelor’s students in mechanical engineering, took part in the experiment. Compared to previous experiments, the participants reported a higher level of experience with digital tools and VR systems. This provides a suitable basis to evaluate the system under more demanding conditions. Overall, the results indicate a moderate to good level of usability. Despite the imposed time constraints, participants were partially able to complete the assigned tasks, which shows that the system supports task execution even under pressure.
The analysis of the seven usability dimensions shows generally positive results, although some limitations become more visible under time pressure. Task appropriateness was generally not rated positively. Although participants confirmed that the system provides the necessary functions to complete the tasks, some users indicated that certain functions were missing, which affected the completeness of task execution. Expectation conformity is evaluated positively, since the interface structure, including menus and icons, was generally perceived as clear and understandable. Nevertheless, some participants reported that object manipulation was not fully intuitive, indicating differences between expected and actual interaction behavior.
Self-descriptiveness was predominantly evaluated negatively, as participants reported that they did not feel in control of the interaction and were unable to understand the system’s behavior. In addition, not all users were able to clearly identify the next steps during task execution, which indicates that system guidance is still limited in more complex situations.
Learnability represents one of the weaker dimensions. Participants reported difficulties in understanding system functions, especially in relation to error messages and predictable system responses. This issue becomes more critical under time pressure, where the lack of guidance increases uncertainty.
Controllability is generally sufficient, as users were able to select and manipulate objects within the virtual environment. However, some inconsistencies in interaction precision were observed.
Error tolerance is identified as a weak aspect, since participants reported issues such as missing recovery functions and limited ability to correct mistakes. These limitations negatively influence user confidence, especially in time-constrained scenarios.
User commitment is generally positive, as most participants reported a good first impression and did not perceive the system as overly demanding. However, the perceived efficiency varies, indicating that time pressure influences the interaction performance.
In addition to the usability evaluation, participants provided several suggestions for system improvement. Frequently mentioned aspects include the integration of alternative interaction methods such as hand tracking, as well as the implementation of a tutorial or guided onboarding. Furthermore, improvements in object interaction, such as snapping functions and more interactive elements, were suggested. Participants also highlighted the need for better system adaptability, for example through adjustable user height or automatic detection. Additional features such as object scaling and coloring were also identified as relevant improvements. These suggestions indicate the need for a more intuitive, flexible, and user-adapted system.
The NASA TLX results show that the overall workload is high under time pressure. Cognitive demand is perceived as high due to the need to understand the system during task execution. Physical demand is low, and participants reported no significant physical strain. Time pressure is perceived as manageable, and the effort required to complete the tasks remains relatively low. Some participants reported dissatisfaction with their performance, and some experienced frustration due to unclear interaction elements.
The results of the SUS indicate a generally moderate score of 2.9 on a five-point scale. Participants reported that the system is relatively not easy to use and that its functions are not fully integrated. The system was also considered moderately learnable within a reasonable amount of time. However, some users reported a certain level of complexity and occasional inconsistencies in system behavior. The need for technical support was not dominant, but it was still present in more complex interaction scenarios.
In summary, the results of experiment VI show that the VR system maintains a stable level of usability under time pressure. Since the participants were students enrolled in the course, and the experiment was part of their assessment, it appears that some of them attempted to attribute unfavorable course evaluation outcomes primarily to the software. This interpretation is supported by the discrepancy between the responses provided in the open-text fields and those recorded in the scoring fields. In
Table 3, a summary of all experiment setups and findings is presented.