Results
Available open-source surgical virtual reality software systems are highly diverse in their simulation focus, technical stacks, licensing, and intended architectural roles. Across the audited set (
Figure 2), we observed at least three distinct “center-of-gravity” archetypes (
Table 1). Surgical Gym targets GPU-accelerated reinforcement learning and robotics-centric interaction loops, implemented on Omniverse Isaac Sim stack from NVIDIA and explicitly built on top of Isaac Sim’s omni.isaac.core and omni.isaac.gym frameworks (Python-first orchestration with simulator-coupled extensions). In contrast, iMSTK positions itself as a C++ toolkit for interactive, multi-modal surgical simulation scenarios, emphasizing modular real-time simulation components and integration with haptics and rendering backends. SOFA occupies a broader multi-physics engine role, using an open-core, modular architecture and prioritizing real-time continuum mechanics and deformable simulation. These differences were also reflected in our hands-on audit workflow, where each system demanded a different execution substrate (simulator runtime versus native toolchain compilation) and a different notion of “success” (policy rollouts versus interactive scenes).
Licensing and distribution models further reinforced this heterogeneity. Surgical Gym is distributed as a research codebase layered over an external proprietary simulator runtime (Isaac Sim), whereas iMSTK is released under Apache 2.0 according to Kitware public descriptions, and SOFA core is LGPL with carve-outs for specific subdirectories. Visualization-oriented VR in the medical imaging ecosystem offers yet another pattern: SlicerVirtualReality is an extension to 3D Slicer with its own release and platform constraints, and community reports indicate Windows-only availability in at least some recent configurations, with breakage across Slicer major versions. This breadth supports the interpretation that open-source virtual surgery frameworks is not a single interchangeable category, but a spectrum spanning robotics training, physics engines, and clinician-facing visualization.
From a reproducibility perspective, the diversity is consequential because each archetype couples to a different dependency surface. Simulator-bound systems concentrate risk in vendor release cadence and extension availability, while C++ toolkits concentrate risk in compiler, dependency discovery, and transitive third-party source availability. In our audit, these differences translated into qualitatively different failure modes and remediation costs, which motivates reporting results per framework rather than aggregating into a single “ran or did not run” outcome.
This diversity is not only technical but also architectural (
Table 2): the audited systems occupy distinct strata in a virtual-surgery software stack, from reinforcement-learning backends for robotic policy training (Surgical Gym) to mid-level C++ toolkits for assembling interactive procedure simulators (iMSTK) and general-purpose deformable physics engines (SOFA). It also includes clinician-facing VR front ends centered on Unity-based interaction and medical data ingestion (IMHOTEP) and VR extensions that deliberately reuse existing clinical imaging infrastructure rather than re-implement it (SlicerVirtualReality within 3D Slicer). Collectively, these frameworks are largely complementary rather than interchangeable, so reproducibility must be interpreted relative to the intended architectural role of each system.
This taxonomy also motivated deliberate deprioritization of some candidates as primary physics-based surgical simulation baselines. IMHOTEP is presented primarily as a Unity-based VR framework for visualizing and interacting with multimodal patient data in support of planning and education, rather than as a validated tissue–tool interaction simulator with haptics-centric design goals. In addition, its public-facing descriptions emphasize surgeon workspaces and medical data ingestion, with no explicit positioning as a robotics integration framework (for example, ROS interfaces or closed-loop surgical robot control), so robot control integration is appropriately rated low unless substantial external integration is added. SlicerVirtualReality is similarly valuable but maintainers explicitly document that the extension currently functions on Windows, with Linux support described as experimental and macOS lacking backend support. In addition, users sometimes report the extension broken for current stable or preview versions, and install availability gaps for specific Slicer releases. Finally, users report rendering instability, including blank or black VR scenes, interaction problems, and lighting regressions across backends.[
17]
Surgical Gym did not run because it relies on a deprecated Isaac Gym integration that is no longer included in currently maintained Isaac Sim distributions. Note that NVIDIA Isaac Sim and Isaac Gym occupy different positions in NVIDIA robotics ecosystem. Isaac Sim is an Omniverse-based robotics simulation reference application for developing and testing AI-driven robots in physically based virtual environments, and it is distributed as an Omniverse application with an extension system and a bundled Python runtime. Isaac Gym, by contrast, was NVIDIA physics simulation environment oriented toward reinforcement learning research workflows; it is now explicitly labeled “Now Deprecated” and described as legacy software that is no longer supported.[
18] NVIDIA recommended successor pathway is Isaac Lab, an open-source robot learning framework built on the Isaac Sim platform, and the Isaac Lab documentation states that it replaces earlier Isaac Gym frameworks and OmniIsaacGymEnvs.
In our reproduction attempt, we first had to reconcile two incompatible execution contexts: (i) Isaac Sim bundled Python environment, which is necessary for importing Omniverse modules (for example omni.isaac), and (ii) a conventional virtual environment used to install missing Python packages required by Surgical Gym scripts. Practically, we created a venv using Isaac Sim launcher Python (via python.sh -m venv ...) and then sourced Isaac Sim environment setup script so that the python from venv could resolve Omniverse modules. This step was necessary because, without the environment variables of Isaac Sim, standard Python invocations could not import omni at all, whereas after sourcing the setup, import omni.isaac succeeded. A second class of issues arose earlier in the stack, where the provided random_policy.py script failed on missing Hydra and OmegaConf dependencies; we attempted to address these via direct installation of missing packages into the venv and by rewriting the entry script to bypass Hydra-based configuration.
After resolving these surface issues, execution consistently failed at a deeper integration boundary: Surgical Gym vectorized RL environment depends on omni.isaac.gym.vec_env (for example VecEnvBase), and importing this module raised ModuleNotFoundError: No module named 'omni.isaac.gym' even when omni.isaac itself imported correctly. This indicates that the required Isaac Sim extension or package namespace was absent from the installed Isaac Sim distribution, not merely misconfigured in Python path resolution. This observation is coherent with NVIDIA broader platform transition, since Isaac Lab is positioned as the unified robot learning framework for Isaac Sim, and Isaac Sim has also undergone extension namespace and API evolution (for example the ongoing renaming away from omni.isaac.* prefixes in newer releases). Taken together, our attempts suggest that Surgical Gym failure is not a conventional “missing pip dependency” problem, but a structural compatibility break caused by reliance on an RL interface layer that has been deprecated or removed from current Isaac Sim distributions. The net result is that Surgical Gym reproducibility is dominated by external platform compatibility rather than by conventional package installation or configuration issues.
iMSTK did not run because the build failed around the VegaFEM dependency and the project is archived, making upstream fixes unlikely. In the audit environment, iMSTK build process progressed through initial toolchain setup but failed at the dependency stage involving VegaFEM, where the build system attempted to resolve or fetch third-party components and encountered a non-recoverable error. This outcome is consistent with iMSTK design choice to automate dependency acquisition and compilation via CMake-driven external projects, which increases convenience when URLs and upstream repositories remain stable but creates a single point of failure when external artifacts move, become rate-limited, or vanish. The original version of VegaFEM has not been updated since 2018, and iMSTK installation depends on an unofficial fork maintained by another group,[
19] which is an indicator of ecosystem fragmentation that can exacerbate the dependency rot for scripted download steps. The second factor, project lifecycle status, materially reduces the likelihood of resolution through upstream maintenance. Kitware GitLab page states that support and development for iMSTK has been discontinued as of May 2, 2025. Additionally, the GitHub mirror indicates that the repository was archived on September 29, 2025 and is read-only.[
10] Together, these status signals imply that even if the specific Vega-related failure were diagnosable, a durable fix would most likely require local patching or community forking rather than routine upstream issue resolution. Consequently, iMSTK non-execution in this audit is best interpreted as a reproducibility failure driven by a combination of (i) transitive dependency acquisition assumptions embedded in the build and (ii) an inactive maintenance state. This combination is particularly problematic for scientific reproducibility, because it undermines the expectation that a third party can rebuild the system at a later date using documented procedures and canonical sources.
SOFA Framework did not run because it relies on a large, distribution-specific system dependency stack and legacy setup assumptions, indicating limited portability. The SOFA reproduction attempt was dominated by dependency provisioning and build-environment alignment rather than by SOFA source code defects. In the audit workflow, translating recommended installation steps to the target environment required substantial manual mapping between package managers and library variants, and the accumulation of prerequisites created repeated opportunities for version conflicts and missing packages. This profile reflects SOFA nature as a large C++ framework with optional modules and GUI-related dependencies (for example Boost and Qt), where build success depends on consistent discovery of a broad toolchain and library surface. The official documentation further supports the observation that SOFA build instructions are closely coupled to specific host distributions and contemporary toolchain baselines. The Linux build page states an explicit policy of supporting only the latest Ubuntu LTS and prescribes Ubuntu-centric package installation commands, alongside compiler and CMake requirements. Such guidance is reasonable for a project optimizing for a narrow CI target, but it reduces portability when the audit environment diverges from Ubuntu, because dependency names, versions, and default ABI choices differ across distributions. Notably, the SOFA community itself highlights the need for dependency isolation strategies, including using conda packages to avoid mixing system-wide libraries and to centralize dependency resolution. In the audit, the practical implication was that “run from source” reproduction required either (i) adopting the documented, distribution-specific assumptions (which were misaligned with the target environment) or (ii) introducing an additional environment manager layer (conda), which shifts complexity but does not eliminate it. The observed non-execution therefore substantiates the conclusion that SOFA reproducibility in heterogeneous environments is constrained primarily by dependency-stack portability rather than by algorithmic correctness.