Submitted:
27 March 2025
Posted:
28 March 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- RQ1:
- How does the use of intelligent speech interfaces affect usability metrics such as ease of control, comfort, accuracy of commands, satisfaction in response, finding controls, learning and adapting, recovery from mistakes, and naturalness and cognitive load metrics like physical demand, mental demand, temporal demand, performance, effort, and frustration.
- RQ2:
- What are the advantages, limitations, general opinion, and expectations from the speech interfaces in VR for medical purposes?
2. Background
3. Materials and Methods
3.1. System Architecture
3.2. NLP Development
3.3. VR Development
3.4. Participants
3.5. Experiment Process
3.6. Data Collection
4. Results
4.1. Quantitative Results
4.1.1. Usability Metrics
4.1.2. Cognitive Load Metrics
4.1.3. Significance Test
4.2. Qualitative Results
5. Discussion
5.1. Future Work
6. Conclusion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AR | Augmented reality |
| BERT | Bidirectional Encoder Representations from Transformers |
| DNN | Deep neural networks |
| LLM | Large language model |
| LUIS | Azure language understanding |
| NLP | Natural language processing |
| NLU | Natural language understanding |
| QA | Quality assurance |
| STT | Speech-to-text |
| TLX | Task load index |
| TTS | Text-to-speech |
| VR | Virtual reality |
| VSP | Virtual standardized patient |
| XR | Extended reality |
References
- Reitinger, B.; Bornik, A.; Beichel, R.; Schmalstieg, D. Liver surgery planning using virtual reality. IEEE Computer Graphics and Applications 2006, 26, 36–47. [Google Scholar] [CrossRef]
- King, F.; Jayender, J.; Bhagavatula, S.K.; Shyn, P.B.; Pieper, S.; Kapur, T.; Lasso, A.; Fichtinger, G. An immersive virtual reality environment for diagnostic imaging. Journal of Medical Robotics Research 2016, 1, 1640003. [Google Scholar] [CrossRef]
- Zajtchuk, R.; Satava, R.M. Medical applications of virtual reality. Communications of the ACM 1997, 40, 63–64. [Google Scholar] [CrossRef]
- Li, Z.; Kiiveri, M.; Rantala, J.; Raisamo, R. Evaluation of haptic virtual reality user interfaces for medical marking on 3D models. International Journal of Human-Computer Studies 2021, 147, 102561. [Google Scholar] [CrossRef]
- Kangas, J.; Li, Z.; Raisamo, R. Expert Evaluation of Haptic Virtual Reality User Interfaces for Medical Landmarking. In Proceedings of the CHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, CHI ’22. 2022. [Google Scholar] [CrossRef]
- Rantamaa, H.R.; Kangas, J.; Jordan, M.; Mehtonen, H.; Mäkelä, J.; Ronkainen, K.; Turunen, M.; Sundqvist, O.; Syrjä, I.; Järnstedt, J.; et al. Evaluation of virtual handles for dental implant manipulation in virtual reality implant planning procedure. International Journal of Computer Assisted Radiology and Surgery 2022, 17, 1723–1730. [Google Scholar] [CrossRef]
- Kangas, J.; Järnstedt, J.; Ronkainen, K.; Mäkelä, J.; Mehtonen, H.; Huuskonen, P.; Raisamo, R. Towards the Emergence of the Medical Metaverse: A Pilot Study on Shared Virtual Reality for Orthognathic–Surgical Planning. Applied Sciences 2024, 14, 1038. [Google Scholar] [CrossRef]
- Mehtonen, H.; Järnstedt, J.; Kangas, J.; Kumar, S.; Rinta-Kiikka, I.; Raisamo, R. Evaluation of properties and usability of virtual reality interaction techniques in craniomaxillofacial computer-assisted surgical simulation. In Proceedings of the Proceedings of the 35th Australian Computer-Human Interaction Conference. [CrossRef]
- Bueckle, A.; Buehling, K.; Shih, P.C.; Börner, K. 3D virtual reality vs. 2D desktop registration user interface comparison. PloS one 2021, 16, e0258103. [Google Scholar] [CrossRef]
- Park, S.; Suh, G.; Kim, S.H.; Yang, H.J.; Lee, G.; Kim, S. Effect of Auto-Erased Sketch Cue in Multiuser Surgical Planning Virtual Reality Collaboration System. IEEE Access 2023, 11, 123565–123576. [Google Scholar] [CrossRef]
- Cockburn, A.; McKenzie, B. Evaluating the effectiveness of spatial memory in 2D and 3D physical and virtual environments. In Proceedings of the Proceedings of the SIGCHI conference on Human factors in computing systems, 2002, pp. 203–210.
- Fernandez, J.A.V.; Lee, J.J.; Vacca, S.A.S.; Magana, A.; Pesam, R.; Benes, B.; Popescu, V. 2024; arXiv:cs.HC/2402.15083].
- Ng, H.W.; Koh, A.; Foong, A.; Ong, J.; Tan, J.H.; Khoo, E.T.; Liu, G. Real-time spoken language understanding for orthopedic clinical training in virtual reality. In Proceedings of the International Conference on Artificial Intelligence in Education. Springer; 2022; pp. 640–646. [Google Scholar]
- Maicher, K.; Stiff, A.; Scholl, M.; White, M.; Fosler-Lussier, E.; Schuler, W.; Serai, P.; Sunder, V.; Forrestal, H.; Mendella, L.; et al. 2. Artificial intelligence in virtual standardized patients: Combining natural language understanding and rule based dialogue management to improve conversational fidelity. Medical Teacher, 2022. [Google Scholar] [CrossRef]
- Prange, A.; Barz, M.; Sonntag, D. Speech-based medical decision support in vr using a deep neural network. In Proceedings of the IJCAI; 2017; pp. 5241–5242. [Google Scholar]
- McGrath, J.L.; Taekman, J.M.; Dev, P.; Danforth, D.R.; Mohan, D.; Kman, N.; Crichlow, A.; Bond, W.F.; Riker, S.; Lemheney, A.; et al. Using virtual reality simulation environments to assess competence for emergency medicine learners. Academic Emergency Medicine 2018, 25, 186–195. [Google Scholar] [CrossRef]
- Dobbala, M.K.; Lingolu, M.S.S. Conversational AI and Chatbots: Enhancing User Experience on Websites. American Journal of Computer Science and Technology 2024, 11, 62–70. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. 2017.
- Kenton, J.D.M.W.C.; Toutanova, L.K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Proceedings of naacL-HLT.
- Trivedi, A.; Pant, N.; Shah, P.; Sonik, S.; Agrawal, S. Speech to text and text to speech recognition systems-Areview. IOSR J. Comput. Eng 2018, 20, 36–43. [Google Scholar]
- Abdul-Kader, S.A.; Woods, J.C. Survey on chatbot design techniques in speech conversation systems. International Journal of Advanced Computer Science and Applications 2015, 6. [Google Scholar]
- Kumari, S.; Naikwadi, Z.; Akole, A.; Darshankar, P. Enhancing college chat bot assistant with the help of richer human computer interaction and speech recognition. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC). IEEE; 2020; pp. 427–433. [Google Scholar]
- Inupakutika, D.; Nadim, M.; Gunnam, G.R.; Kaghyan, S.; Akopian, D.; Chalela, P.; Ramirez, A.G. Integration of NLP and speech-to-text applications with chatbots. Electronic Imaging 2021, 33, 1–6. [Google Scholar] [CrossRef]
- Lopes, D.S.; Jorge, J.A. Extending medical interfaces towards virtual reality and augmented reality. Annals of Medicine 2019, 51, 29–29. [Google Scholar] [CrossRef]
- Yang, J.; Li, E.; Wu, L.; Liao, W. Application of VR and 3D printing in liver reconstruction. Annals of Translational Medicine 2022, 10. [Google Scholar] [CrossRef]
- Huettl, F.; Saalfeld, P.; Hansen, C.; Preim, B.; Poplawski, A.; Kneist, W.; Lang, H.; Huber, T. Virtual reality and 3D printing improve preoperative visualization of 3D liver reconstructions—results from a preclinical comparison of presentation modalities and user’s preference. Annals of translational medicine 2021, 9. [Google Scholar] [CrossRef]
- Nunes, K.L.; Jegede, V.; Mann, D.S.; Llerena, P.; Wu, R.; Estephan, L.; Kumar, A.; Siddiqui, S.; Banoub, R.; Keith, S.W.; et al. A Randomized Pilot Trial of Virtual Reality Surgical Planning for Head and Neck Oncologic Resection. The Laryngoscope 2024. [Google Scholar] [CrossRef]
- Isikay, I.; Cekic, E.; Baylarov, B.; Tunc, O.; Hanalioglu, S. Narrative review of patient-specific 3D visualization and reality technologies in skull base neurosurgery: enhancements in surgical training, planning, and navigation. Frontiers in Surgery 2024, 11, 1427844. [Google Scholar] [CrossRef]
- Rantamaa, H.R.; Kangas, J.; Kumar, S.K.; Mehtonen, H.; Järnstedt, J.; Raisamo, R. Comparison of a vr stylus with a controller, hand tracking, and a mouse for object manipulation and medical marking tasks in virtual reality. Applied Sciences 2023, 13, 2251. [Google Scholar] [CrossRef]
- Alamilla, M.A.; Barnouin, C.; Moreau, R.; Zara, F.; Jaillet, F.; Redarce, H.T.; Coury, F. A Virtual Reality and haptic simulator for ultrasound-guided needle insertion. IEEE Transactions on Medical Robotics and Bionics 2022, 4, 634–645. [Google Scholar] [CrossRef]
- Rantamaa, H.R.; Kangas, J.; Jordan, M.; Mehtonen, H.; Mäkelä, J.; Ronkainen, K.; Turunen, M.; Sundqvist, O.; Syrjä, I.; Järnstedt, J.; et al. Evaluation of voice commands for mode change in virtual reality implant planning procedure. International Journal of Computer Assisted Radiology and Surgery 2022, 17, 1981–1989. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Chan, M.; Uribe-Quevedo, A.; Kapralos, B.; Jaimes, N.; Dubrowski, A. Prototyping virtual reality interactions in medical simulation employing speech recognition. In Proceedings of the 2020 22nd Symposium on Virtual and Augmented Reality (SVR). IEEE; 2020; pp. 351–355. [Google Scholar]
- O’Hara, K.; Gonzalez, G.; Sellen, A.; Penney, G.; Varnavas, A.; Mentis, H.; Criminisi, A.; Corish, R.; Rouncefield, M.; Dastur, N.; et al. Touchless interaction in surgery. Communications of the ACM 2014, 57, 70–77. [Google Scholar] [CrossRef]
- Li, Z.; Akkil, D.; Raisamo, R. Gaze-based kinaesthetic interaction for virtual reality. Interacting with Computers 2020, 32, 17–32. [Google Scholar] [CrossRef]
- Rakkolainen, I.; Farooq, A.; Kangas, J.; Hakulinen, J.; Rantala, J.; Turunen, M.; Raisamo, R. Technologies for multimodal interaction in extended reality—a scoping review. Multimodal Technologies and Interaction 2021, 5, 81. [Google Scholar] [CrossRef]
- Hombeck, J.; Voigt, H.; Lawonn, K. Voice user interfaces for effortless navigation in medical virtual reality environments. Computers & Graphics 2024, 124, 104069. [Google Scholar]
- Chen, L.; Cai, Y.; Wang, R.; Ding, S.; Tang, Y.; Hansen, P.; Sun, L. Supporting text entry in virtual reality with large language models. In Proceedings of the 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR). IEEE; 2024; pp. 524–534. [Google Scholar]
- Chen, J.; Liu, Z.; Huang, X.; Wu, C.; Liu, Q.; Jiang, G.; Pu, Y.; Lei, Y.; Chen, X.; Wang, X.; et al. When large language models meet personalization: Perspectives of challenges and opportunities. World Wide Web 2024, 27, 42. [Google Scholar] [CrossRef]
- Zhao, H.; Chen, H.; Yang, F.; Liu, N.; Deng, H.; Cai, H.; Wang, S.; Yin, D.; Du, M. Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology 2024, 15, 1–38. [Google Scholar] [CrossRef]
- Liu, Z.; Heer, J. The effects of interactive latency on exploratory visual analysis. IEEE transactions on visualization and computer graphics 2014, 20, 2122–2131. [Google Scholar] [CrossRef]
- Gallent-Iglesias, D.; Serantes-Raposo, S.; Botana, I.L.R.; González-Vázquez, S.; Fernandez-Graña, P.M. IVAMED: Intelligent Virtual Assistant for Medical Diagnosis. In Proceedings of the SEPLN (Projects and Demonstrations); 2023; pp. 87–92. [Google Scholar]
- Trivedi, K.S. Fundamentals of Natural Language Processing. In Microsoft Azure AI Fundamentals Certification Companion: Guide to Prepare for the AI-900 Exam; Springer, 2023; pp. 119–180.







| Usability | Cognitive Load | |||||
|---|---|---|---|---|---|---|
| Mode | Total | Mean | Std Dev | Total | Mean | Std Dev |
| Button | 911 | 5.69 | 1.21 | 288 | 2.40 | 1.56 |
| Speech | 931 | 5.82 | 1.02 | 301 | 2.50 | 1.37 |
| Metric | Button | Speech |
|---|---|---|
| Ease of Control | 5.70 ± 1.08 | 5.80 ± 1.01 |
| Comfort | 5.75 ± 1.12 | 6.15 ± 0.93 |
| Accuracy of Commands | 5.50 ± 1.24 | 5.45 ± 0.83 |
| Satisfaction on Response | 5.75 ± 1.25 | 5.60 ± 1.05 |
| Finding Controls | 5.55 ± 1.43 | 6.05 ± 1.05 |
| Learn and Adapt | 5.95 ± 1.05 | 6.05 ± 0.89 |
| Recover from Mistakes | 5.90 ± 1.12 | 5.80 ± 1.06 |
| Natural and Intuitive | 5.45 ± 1.43 | 5.65 ± 1.31 |
| Metric | Button | Speech |
|---|---|---|
| Mental Demand | 2.95 ± 1.73 | 3.40 ± 1.70 |
| Physical Demand | 3.25 ± 2.27 | 2.45 ± 1.36 |
| Temporal Demand | 2.60 ± 1.64 | 2.35 ± 1.14 |
| Performance | 1.60 ± 0.50 | 2.20 ± 1.01 |
| Effort | 2.35 ± 1.09 | 2.60 ± 1.47 |
| Frustration | 1.65 ± 0.88 | 2.05 ± 1.19 |
| Metric Category | Metric | W-statistic | p-value |
|---|---|---|---|
| Usability Metrics | Ease of Control | 63.0 | 0.7879 |
| Comfort | 43.5 | 0.3368 | |
| Accuracy of Commands | 30.0 | 0.7850 | |
| Satisfaction on Response | 35.0 | 0.4389 | |
| Finding Controls | 20.0 | 0.0638 | |
| Learn and Adapt | 20.0 | 0.7551 | |
| Recover from Mistakes | 40.5 | 0.7121 | |
| Natural and Intuitive | 68.0 | 0.6798 | |
| Cognitive Load Metrics | Mental Demand | 29.5 | 0.2545 |
| Physical Demand | 11.5 | 0.0541 | |
| Temporal Demand | 5.0 | 0.2356 | |
| Performance | 9.0 | 0.0146 | |
| Effort | 27.5 | 0.6094 | |
| Frustration | 6.0 | 0.0845 |
| ID | Quotes | Codes | Themes |
|---|---|---|---|
| Qp1 | "It is very responsive and its ability to understand instructions in many ways makes it handy and accessible." | Responsive, Flexible Instructions | Ease of Use, Reduced Latency |
| Qp2 | "The speech interface simplified tasks significantly since I didn’t have to press buttons some of which were difficult to reach." | Avoids Button Pressing, Simplifies Tasks | Task Simplification, Less physical |
| Qp3 | "Great, even though I have used VR before it was hard for me to use the panel but the speech felt more natural." | Natural Feeling, Panel Interaction Difficulties | Natural Interaction, Usability |
| Qp4 | "At first it was challenging, but after a few tasks, I got a good grasp and it felt like a compact, effective tool." | Initial Difficulty, Improved Over Time | Learning Curve, Adaptability |
| Qp5 | "Easy to learn and work. It can be really useful for professionals." | Easy to Learn, Professional Usefulness | Learning Curve, Adaptability |
| Qp6 | "Easier to learn and adapt to. Made the tasks easier. I didn’t have to press the buttons and some of the buttons are not easy to press." | Avoids Button Pressing, Panel Interaction difficulties | Learning Curve and Adaptability |
| Qp7 | "The VR speech assistant was able to understand my questions at least 90% so that’s a plus." | Accurate Understanding of Questions | Speech Recognition Accuracy |
| Qp8 | "Overall good, may improve speech recognition. I was able to visualize but still struggled to find handles, there were a lot of buttons in the panel displayed." | Difficult to Find Buttons, Visualization Clarity | Finding controls and Commands |
| Qp9 | "It’s very responsive and its ability to understand the instruction in many ways. It’s handy in many ways." | Responsive, Flexible Instructions | Reduced Latency, Ease of Use |
| Qp10 | "I could stay focused on the model without needing to stop and press buttons, which is crucial in a workflow setting." | Workflow Continuity | Workflow Efficiency |
| Qp11 | "I think it is easier to stick with the flow using speech while working. I felt effortless and it might be very interesting for the dentists to play around with efficiency. I feel this speech interface might be an artificial assistant." | Task Flow Continuity, Artificial Assistant Potential | Natural Interaction, Task Flow |
| ID | Quotes | Codes | Themes |
|---|---|---|---|
| Qn1 | "My accent was not understood clearly. Sometimes I had to speak slowly so that it understands entirely." | Struggles with Accent | Speech Recognition Accuracy |
| Qn2 | "It interpreted few words wrong maybe due to lower voice and accent." | Misinterpreted Commands | Speech Recognition Accuracy |
| Qn3 | "It was difficult for the system to take in long sentences." | Struggles with Long Commands | Speech Recognition, Accuracy |
| Qn4 | "Some commands are not interpreted correctly, limitation in the amount of tasks. The system responded identically to ’Hide implants’ and ’Show implants,’ showing a need for command differentiation." | Incorrect Commands, Limited Tasks | Overall Accuracy, Task Limitation |
| Qn5 | "Automating the speech activation, similar to Siri or Alexa, could reduce the need for physical input, making the interface more hands-free." | Automation Suggestion, Reduced Physical Input | Automation, Hands-Free Design |
| Qn6 | "The speech was good in total, but in some scenarios, manual intervention was required. So for completing the task by using speech interface alone was not achieved." | Manual Intervention Needed | Task Limitation |
| Qn7 | "Including more command variations and adding common questions would make it feel more intuitive." | Suggests Expanded Commands, Common Questions | Task Limitation, Command Variation |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).