Memory-Efficient 3D LiDAR Graph SLAM for Ballast Water Tank Inspection Robots Using Robust Hierarchical Bundle Adjustment and a Kaczmarz Backend

Sanghyun Cha; Wonchul Yoo; Tae-Wan Kim

doi:10.20944/preprints202606.2001.v1

Submitted:

26 June 2026

Posted:

26 June 2026

You are already at the latest version

Abstract

Autonomous inspection of ballast water tanks requires three-dimensional (3D) LiDAR-based simultaneous localization and mapping (SLAM) in Global Positioning System (GPS)-denied, geometrically repetitive interiors, where sensing, mapping, and control modules share a limited onboard memory budget. Graph SLAM backends that rely on sparse factorization can incur fill-in, increasing peak memory and limiting deployment on edge computers. The proposed architecture couples a robust hierarchical bundle adjustment frontend with a factorization-free Kaczmarz backend. The frontend combines residual-adaptive weighting, damped and bounded pose updates, soft fallback, local-map compression, and memory-aware keyframe control. The backend stores the whitened Jacobian in compressed sparse row (CSR) format and performs row-wise projections without explicitly forming the normal equations, a Cholesky factor, or a transpose cache. Evaluation used Norwegian University of Science and Technology (NTNU) Ballast Water Tank missions 1--3, containing 851, 1202, and 1084 LiDAR frames. Following robust local bundle adjustment and verified similarity alignment, translational root-mean-square errors were 0.080, 0.110, and 0.127m, corresponding to 0.137%, 0.144%, and 0.122% of the reference path lengths; archived baseline ratios ranged from 0.281% to 0.372%. The results support a numerical architecture that combines frontend stabilization, row-wise optimization, and memory-aware policies for resource-constrained marine inspection robots.

Keywords:

ballast water tank inspection

;

3D LiDAR SLAM

;

hierarchical bundle adjustment

;

Kaczmarz method

;

memory-efficient SLAM

Subject:

Engineering - Marine Engineering

1. Introduction

Ship ballast water tanks, double-bottom compartments, cofferdams, and similar enclosed structures are among the most demanding locations for structural inspection. They are typically dark, Global Positioning System (GPS)-denied, geometrically repetitive, and accessible only through narrow hatches or manholes. Corrosion products, deteriorated coatings, poor ventilation, limited communication, and slippery or contaminated surfaces further increase the risk and duration of manual inspection. Autonomous or remotely supervised robots can reduce human exposure while producing spatially registered point clouds, images, and condition-monitoring records for engineering assessment [1,2,3].

Reliable navigation in these environments cannot rely solely on nominal design information. Even when computer-aided design drawings and construction records are available, the as-built or in-service geometry may differ because of fabrication tolerances, welding distortion, thermal deformation, coating accumulation or loss, corrosion, repair, retrofitting, temporary equipment, and undocumented obstacles. The robot must therefore update its spatial representation from onboard observations rather than treat the design model as an exact navigation map. Three-dimensional (3D) LiDAR-based simultaneous localization and mapping (SLAM) is well suited to this task because it jointly estimates the sensor trajectory and the surrounding structural geometry [4,5].

Modern LiDAR SLAM systems commonly separate high-rate odometry or scan registration from lower-rate mapping and graph optimization. LiDAR Odometry and Mapping (LOAM) established an influential odometry-and-mapping decomposition [6], while LiDAR–inertial odometry via smoothing and mapping (LIO-SAM) incorporated inertial preintegration, keyframes, and loop-closure factors within a smoothing framework [7]. Hierarchical LiDAR bundle adjustment improves large-scale consistency by decomposing the mapping problem into local optimization windows and higher-level pose constraints [8]. Ballast water tanks, however, present failure modes that are less prominent in general environments: long planar surfaces weakly constrain motion in particular directions, repetitive stiffeners create ambiguous correspondences, narrow passages cause abrupt changes in visibility, and vibration or contact can introduce transient motion distortion. A single erroneous local registration can therefore produce a large pose update that contaminates later keyframes and global constraints.

Graph SLAM backends formulate poses and measurements as a nonlinear least-squares problem. Square Root SAM, iSAM, iSAM2, and g2o show that sparse factorization and incremental graphical inference can solve these problems efficiently [9,10,11,12]. Elimination-based solvers must nevertheless retain factorization structures. As graph connectivity and loop-closure density increase, elimination can introduce fill-in: entries that are zero in the original sparse Jacobian become nonzero in the resulting factors [13,14,15]. Reordering can reduce fill-in, but it does not eliminate the need to construct and store factorization-related data.

This memory growth is operationally significant on compact inspection robots. The NVIDIA Jetson Orin Nano Super Developer Kit, for example, provides 8 GB of low-power double data rate 5 (LPDDR5) memory and a nominal 7–25 W power range [16]. The same physical memory is shared by the operating system, robotics middleware, LiDAR packet buffers, sensor drivers, point-cloud filtering, scan registration, local-map storage, inspection perception, logging, and graphics processing unit (GPU) workloads. A backend that is practical on a workstation can therefore become the peak-memory bottleneck in a sealed, thermally constrained robotic platform.

Iterative sparse solvers provide alternatives to direct factorization. Conjugate-gradient and preconditioned conjugate-gradient (PCG) methods can be efficient when the linear system and preconditioner are favorable, whereas least-squares QR (LSQR) and least-squares minimal-residual (LSMR) methods operate directly on sparse least-squares systems [17,18,19,20]. Kaczmarz methods form a distinct row-action family in which each iteration projects the current estimate toward the hyperplane defined by a selected row [21]. Randomized row selection offers expected convergence guarantees for suitable consistent systems, and weighted or block variants extend the design space [22,23,24]. The present study does not assume that Kaczmarz is universally faster or more accurate than Cholesky, PCG, LSQR, or LSMR. Its relevance here is the row-wise access pattern: an update can be executed from a compressed sparse row (CSR) representation without explicitly forming the normal equations, a Cholesky factor, or a transpose cache.

Accordingly, this study develops a system-level SLAM architecture for resource-constrained marine inspection rather than a new mathematical variant of the Kaczmarz method. The architecture combines robust local hierarchical bundle adjustment (HBA), compressed local maps, row-wise global optimization, and memory-aware operating policies. An 8 GB-class edge computer is used as the deployment reference; however, the reported experiments establish architectural feasibility rather than a completed real-time Jetson implementation. The principal contributions are as follows:

A robust local HBA frontend is developed for repetitive and locally degenerate ship-interior geometry by combining residual-adaptive weighting, pose-step clamping, damping, and soft fallback.
A factorization-free global backend is formulated directly on the whitened sparse least-squares system and solved through CSR-based row projections without explicitly forming the approximate Hessian or a Cholesky factor.
Local-map compression, keyframe regulation, and optimization-range control are co-designed with the numerical backend to bound the problem presented to the solver.
The complete architecture is evaluated on NTNU Ballast Water Tank missions 1–3 using absolute and path-normalized trajectory errors.

2. Materials and Methods

2.1. System Architecture and Onboard Design Objective

The pipeline comprises six stages: (i) LiDAR acquisition and geometric feature extraction; (ii) initial LiDAR odometry; (iii) robust local HBA over overlapping windows; (iv) keyframe and local-map compression; (v) factor-graph assembly and CSR linearization; and (vi) row-wise Kaczmarz optimization under memory-aware control. The frontend produces an initial pose sequence and geometric correspondences. Consecutive scans are grouped into overlapping local windows, and their poses are refined jointly. The optimized local poses and compressed submaps then provide odometry, submap, and loop-closure constraints for the global graph. Before each global solve, the memory controller evaluates the keyframe and factor counts, the number of nonzero Jacobian entries, point-cloud storage, and current process memory. The backend subsequently executes a prescribed budget of Kaczmarz row projections. Figure 1 summarizes the complete architecture.

The design does not assume that the full physical memory of the edge computer is available to graph optimization. A safety margin must remain after accounting for the operating system, middleware, sensor and frontend buffers, inspection and perception modules, map storage, and temporary allocations. Memory efficiency is therefore treated as an end-to-end operating constraint rather than solely as a solver-level metric.

2.2. Robust Hierarchical Bundle Adjustment Frontend

2.2.1. Local Window Formulation

Let the ordered keyframe poses belong to the special Euclidean group

SE (3)

:

T = \{T_{1}, T_{2}, \dots, T_{N}\}, T_{i} \in SE (3) .

(1)

The trajectory is divided into overlapping local windows

W_{q} = \{i_{q}, i_{q} + 1, \dots, i_{q} + n_{w} - 1\},

(2)

where

n_{w}

is the local window size. The overlap preserves geometric continuity between adjacent submaps and enables constraints to propagate through the hierarchy.

For a geometric correspondence k in window

W_{q}

, let

r_{k}

denote a point-to-plane, point-to-line, or feature residual, and let

Ω_{k}

be its information matrix. The robust local objective is

\underset{{T_{i}}_{i \in W_{q}}}{arg min} \sum_{k \in C_{q}} w_{k} r_{k}^{⊤} Ω_{k} r_{k},

(3)

where

C_{q}

is the correspondence set and

w_{k} \in [w_{min}, 1]

is a dynamic reliability weight.

2.2.2. Residual-Dependent Weight Reduction

Ballast-tank scans can contain large residuals because of repeated stiffeners, partially observed surfaces, abrupt visibility changes, or motion distortion. Rejecting every large residual can also discard temporarily informative constraints. The frontend therefore reduces residual influence progressively. Let

s_{k} = \sqrt{r_{k}^{⊤} Ω_{k} r_{k}}

(4)

be the normalized residual magnitude. A robust threshold is obtained from the residual median and median absolute deviation:

τ_{1} = median ({s_{k}}) + κ MAD ({s_{k}}), τ_{2} = γ τ_{1}, γ > 1 .

(5)

The residual weight is defined as

w_{k} = \{\begin{matrix} 1, & s_{k} \leq τ_{1}, \\ \frac{τ_{1}}{s_{k}}, & τ_{1} < s_{k} \leq τ_{2}, \\ w_{min}, & s_{k} > τ_{2} . \end{matrix}

(6)

Thus, nominal constraints retain full influence, intermediate residuals are smoothly down-weighted, and strong outliers retain only a small bounded influence. After linearization, the weighted local system becomes

min_{δ ξ} {∥W^{1 / 2} (J δ ξ + r)∥}_{2}^{2},

(7)

where

δ ξ

stacks the local pose increments.

2.2.3. Damping and Pose-Step Clamping

Repeated planar geometry can leave some motion directions weakly observable and make the local system ill-conditioned. A damped update is therefore computed as

(J^{⊤} W J + μ I) δ ξ = - J^{⊤} W r,

(8)

where

μ > 0

is increased when the objective fails to decrease or when an excessive update is detected.

For each pose increment

δ ξ_{i} = {[\begin{matrix} δ t_{i}^{⊤} & δ θ_{i}^{⊤} \end{matrix}]}^{⊤},

(9)

the translational and rotational components are clamped independently:

δ t_{i}^{c} = δ t_{i} min (1, \frac{t_{max}}{∥ δ t_{i} ∥_{2} + ϵ}),

(10)

δ θ_{i}^{c} = δ θ_{i} min (1, \frac{θ_{max}}{∥ δ θ_{i} ∥_{2} + ϵ}) .

(11)

This clamping limits the influence of an unstable linearization and keeps each update within a scale compatible with physically plausible inter-frame motion.

2.2.4. Soft Fallback

A local solution is flagged as unreliable when the objective increases, the inlier ratio falls below a threshold, scan overlap becomes insufficient, or repeated clamping is required. An abrupt return to the pre-optimization pose can create a trajectory discontinuity. The soft fallback instead interpolates between the predicted pose

T_{pred}

and the locally optimized pose

T_{opt}

:

T_{app} = T_{pred} Exp (α Log (T_{pred}^{- 1} T_{opt})), 0 \leq α \leq 1 .

(12)

A high-confidence solution uses

α \approx 1

, whereas a low-confidence solution is applied conservatively using a smaller value of

α

.

2.2.5. Local-Map Compression

After local optimization, the point clouds in window

W_{q}

are transformed into a common coordinate frame:

M_{q} = ⋃_{i \in W_{q}} T_{i} P_{i},

(13)

where

P_{i}

is the point cloud associated with pose i. A compression operator

C (\cdot)

produces

{\tilde{M}}_{q} = C (M_{q}; v_{q}, n_{max}, F_{q}),

(14)

where

v_{q}

is the voxel resolution,

n_{max}

is the point cap, and

F_{q}

denotes geometry-preserving selection criteria. Broad planar regions are downsampled, whereas edges, stiffener boundaries, brackets, and other geometrically informative structures are preferentially retained.

2.3. Global Graph SLAM Formulation

Let the stacked global pose state be

x = {[ξ_{1}^{⊤}, ξ_{2}^{⊤}, \dots, ξ_{N}^{⊤}]}^{⊤} .

(15)

For a relative-pose measurement

Z_{i j}

between poses

T_{i}

and

T_{j}

, the residual is defined on

SE (3)

as

r_{i j} (x) = Log (Z_{i j}^{- 1} (T_{i}^{- 1} T_{j})) .

(16)

The nonlinear graph objective is

min_{x} \sum_{(i, j) \in E} ρ (r_{i j}^{⊤} Ω_{i j} r_{i j}) + \sum_{p \in P} ρ (r_{p}^{⊤} Ω_{p} r_{p}),

(17)

where

E

is the set of relative-pose factors and

P

contains prior factors.

At the current estimate, each factor is linearized:

r_{k} (x \oplus Δ x) \approx r_{k} (x) + J_{k} Δ x .

(18)

Let

Λ_{k}

satisfy

Λ_{k}^{⊤} Λ_{k} = Ω_{k} .

(19)

With robust factor weight

w_{k}

, the whitened block row and right-hand side are

A_{k} = \sqrt{w_{k}} Λ_{k} J_{k}, b_{k} = - \sqrt{w_{k}} Λ_{k} r_{k} .

(20)

Stacking the factor rows yields

min_{Δ x} {∥ A Δ x - b ∥}_{2}^{2} .

(21)

A conventional normal-equation backend forms

H Δ x = g, H = A^{⊤} A, g = A^{⊤} b,

(22)

and then factorizes

H

. The proposed backend instead operates directly on the rows of

A

. Figure 2 traces a relative-pose factor through linearization, whitening, sparse row assembly, and a subsequent row-wise Kaczmarz update.

2.4. CSR-Based Kaczmarz Backend

Let

a_{i}^{⊤}

denote row i of

A

, and let

b_{i}

be the corresponding right-hand-side element. Given the current increment estimate

Δ x^{(ℓ)}

, a relaxed Kaczmarz update with a numerical safeguard is

Δ x^{(ℓ + 1)} = Δ x^{(ℓ)} + λ \frac{b_{i} - a_{i}^{⊤} Δ x^{(ℓ)}}{∥ a_{i} ∥_{2}^{2} + η} a_{i},

(23)

where

λ

is the relaxation factor and

η > 0

prevents excessive updates when a selected row has a very small norm.

For row-norm sampling,

p_{i} = \frac{∥ a_{i} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}} .

(24)

When frontend confidence scores or additional robust weights are retained separately from the whitening in

A

, the sampling probability can be generalized to

p_{i} = \frac{q_{i} {∥ a_{i} ∥}_{2}^{2}}{\sum_{j} q_{j} {∥ a_{j} ∥}_{2}^{2}}, 0 \leq q_{i} \leq 1 .

(25)

This rule reduces the selection frequency of low-confidence rows while retaining their bounded contribution to the solution.

The matrix is stored in CSR arrays containing nonzero values, column indices, and row pointers. Figure 3 shows a representative Jacobian pattern, dominated by local-factor support with limited off-diagonal blocks introduced by nonlocal constraints. When row i is selected, only the entries between row_ptr[i] and row_ptr[i+1] are accessed. If

N (i)

denotes the corresponding set of nonzero columns, the sparse update is

Δ x_{j}^{(ℓ + 1)} = Δ x_{j}^{(ℓ)} + λ \frac{b_{i} - \sum_{h \in N (i)} a_{i h} Δ x_{h}^{(ℓ)}}{\sum_{h \in N (i)} a_{i h}^{2} + η} a_{i j}, j \in N (i) .

(26)

Following the inner row projections, the increment associated with pose i is applied on the manifold:

T_{i}^{+} = T_{i} Exp (δ ξ_{i}^{\land}) .

(27)

The linear solve terminates when either the normalized residual or the successive-increment norm satisfies

\frac{∥ A Δ x^{(ℓ)} {- b ∥}_{2}}{{∥ b ∥}_{2} + ϵ} < ε_{r}, or {∥ Δ x^{(ℓ)} - Δ x^{(ℓ - 1)} ∥}_{2} < ε_{x},

(28)

or when the prescribed row-projection budget is exhausted. Because practical SLAM systems are generally inconsistent owing to sensor noise, linearization error, and residual outliers, the row projections are used as bounded residual-reduction steps within each nonlinear iteration; they are not assumed to satisfy every factor row exactly or to recover the exact least-squares solution.

2.5. Structural Memory Model and Memory Guard

The principal solver-side storage terms are the CSR representation of

A

, the vector

b

, the pose increment, the row norms, and the sampling data structure. Their approximate memory requirement is

M_{CSR} \approx nnz (A) (s_{val} + s_{col}) + (m + 1) s_{ptr} + m s_{b} + n s_{x} + m s_{p},

(29)

where m and n are the row and column counts, and the s terms denote the bytes required by the corresponding data types.

The full-pipeline memory estimate is

M_{total} = M_{system} + M_{sensor} + M_{frontend} + M_{map} + M_{graph} + M_{solver} + M_{margin} .

(30)

The memory guard monitors keyframe count, point count, factor count, residual-row count,

nnz (A)

, and process resident set size. Three operating regions are used. In the normal region, nominal keyframe and map-resolution settings are retained. In the caution region, keyframe thresholds and voxel size are increased, point caps are tightened, and weak factors are suppressed. In the critical region, new keyframes may be deferred, inactive point clouds are compressed into submaps or descriptors, and the global optimization range may be bounded or skipped.

A new keyframe is accepted when at least one of the following conditions is met:

∥ t_{t} - t_{last} ∥_{2} > τ_{t},

(31)

{∥Log (R_{last}^{⊤} R_{t})∥}_{2} > τ_{R},

(32)

or

overlap (P_{t}, {\tilde{M}}_{local}) < τ_{o} .

(33)

The thresholds are adapted conservatively as the estimated memory margin decreases.

2.6. Experimental Data and Evaluation Protocol

Marine-domain validation used missions 1–3 from the public Norwegian University of Science and Technology (NTNU) Ballast Water Tank dataset. The inspection platform and representative field deployments are described by Dharmadhikari et al. [1], while the data analyzed here are distributed through the NTNU Autonomous Robots Laboratory dataset repository [25]. The three evaluated sequences contained 851, 1202, and 1084 point-cloud frames, with reference path lengths of 58.48, 76.52, and 103.96 m, respectively. Raw point counts were obtained from point-cloud headers and archived run logs. Intermediate quantities that were not recorded, including the exact point count after each voxel-filtering and local-map-compression stage, were not reconstructed retrospectively.

For a trajectory containing N paired positions, the translational root-mean-square error (RMSE) was computed as

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {∥ p_{i} - p_{i}^{ref} ∥}_{2}^{2}} .

(34)

The path-normalized error was

e_{path} = 100 \frac{RMSE}{L_{ref}},

(35)

where

L_{ref}

is the reference path length. The environment-scale metric was

e_{bbox} = 100 \frac{RMSE}{D_{bbox}},

(36)

where

D_{bbox}

is the diagonal of the reference trajectory bounding box.

The archived NTNU evaluation paired trajectory samples using the index-based procedure recorded with the experiments. Final robust local bundle adjustment (BA) trajectories were evaluated after a verified similarity-transformation (

Sim (3)

) alignment. The top-down panels were taken from the corresponding validated evaluation exports. Because the archive did not retain the raw trajectory coordinates or plotting scripts, the plotted trajectory region was not redrawn, smoothed, or otherwise altered. The baseline and robust values should therefore be interpreted as trajectory-level validation, not as a solver-isolated ablation performed under identical alignment conditions.

A separate controlled benchmark compared direct, Krylov, least-squares, and Kaczmarz methods on identical

(A, b)

systems. Solver-side structural-memory estimates were kept distinct from end-to-end peak resident set size (RSS), which also includes point-cloud input/output, local maps, graph assembly, runtime libraries, and trajectory storage.

2.7. Use of Generative Artificial Intelligence in Manuscript Preparation

OpenAI ChatGPT (GPT-5.5 Pro) was used to assist with English-language revision, structural editing, reference-format consistency, and LaTeX preparation. It was not used to generate or alter experimental data. The authors reviewed all generated text and independently verified the technical statements, numerical results, citations, and figures; they take full responsibility for the published content.

3. Results

3.1. NTNU Ballast Water Tank Mission Scale

Table 1 summarizes the input scale of the three marine validation missions. Each frame contained approximately 32,800 raw LiDAR points, yielding tens of millions of points per mission.

The missions differed in path length and spatial extent but had similar per-frame LiDAR density. Absolute RMSE was therefore complemented by path- and bounding-box-normalized metrics for cross-mission comparison.

3.2. Trajectory Accuracy of the Baseline and Robust Local BA Configurations

Table 2 reports the archived trajectory-level results for the baseline and robust local BA configurations. Because the two archived states were not evaluated under identical alignment conditions, the comparison is descriptive rather than a controlled single-variable ablation.

The baseline trajectories had RMSE values of 0.211–0.292 m, whereas the robust local BA trajectories had values of 0.080–0.127 m. The descriptive reductions in the archived RMSE values were 62.1%, 61.3%, and 56.5% for missions 1, 2, and 3, respectively. Maximum errors decreased from 0.699–1.663 m to 0.163–0.184 m.

The baseline path-normalized RMSE values were 0.361%, 0.372%, and 0.281%; all were below 0.4% of the respective path length. The robust local BA values were 0.137%, 0.144%, and 0.122%. Their spread was 0.022 percentage points, indicating comparable relative accuracy across the three mission lengths.

The bounding-box-normalized RMSE showed the same trend: 2.225% to 0.843% for mission 1, 2.706% to 1.047% for mission 2, and 1.895% to 0.824% for mission 3. This metric accounts for the different spatial extents of the three trajectories.

Figure 4 shows the archived top-down trajectory comparisons. In all three missions, the robust local BA trajectory follows the stored comparison trajectory without an evident route-scale discontinuity. The color-coded point errors are spatially localized and agree qualitatively with the aggregate RMSE and maximum-error values in Table 2. Because the robust trajectories were similarity-aligned, these plots complement rather than replace the alignment-dependent quantitative evaluation.

Figure 5 summarizes the path-normalized errors for the two archived evaluation configurations.

3.3. Structural Behavior of the Kaczmarz Backend

The controlled row-access comparison showed that Kaczmarz did not yield the lowest linear residual. Its implementation nevertheless required only CSR row access, the right-hand side, the solution vector, row statistics, and sampling state. Table 3 reports a representative comparison, and Figure 6 shows the corresponding solver-side structural-memory estimates.

Direct and Krylov-family methods achieved lower residuals and, in several controlled cases, shorter runtimes. Kaczmarz should therefore not be interpreted as a universal replacement for other graph solvers. In the representative case, its structural-memory estimate was 0.075 MB, compared with 0.149 MB for LSQR with a stored transpose cache, 0.177 MB for PCG with explicit

H

, and 0.346 MB for the explicit-

H

Cholesky/

L D L^{⊤}

configuration.

In the memory-budgeted streaming experiment, generating one Kaczmarz row at a time reduced the estimated working set to 0.044 MB, compared with 0.061 MB for chunked rows and 0.268 MB when the full CSR matrix was retained. These values quantify a solver-level working-set trade-off; they do not predict full-pipeline peak RSS.

The broader solver-only benchmark similarly showed that LSQR/LSMR and Krylov methods were highly competitive in residual and runtime. Row-norm Kaczmarz produced a mean residual of 0.273 and a mean relative residual of 0.218, whereas the direct and Krylov-family methods produced mean residuals of approximately 0.0067 in the corresponding aggregate benchmark. The result positions Kaczmarz as a memory-aware row-action option, not as a universally superior numerical solver.

3.4. System-Level Memory Operating Points

System-level controls were evaluated separately on the Park dataset, for which the archived runs covered multiple keyframe, point-cap, and global-optimization settings. Table 4 summarizes three representative operating policies.

Peak RSS increased from 165.1 MB in the memory-priority configuration to 411.4 MB in the quality-priority configuration. The main contributors were map resolution, selected point count, keyframe density, and the permitted global-optimization range. In the balanced and quality-priority cases, the range guard skipped global BA because the pose count exceeded the configured limit, while the remaining pipeline still produced a trajectory. These cases are therefore range-controlled pipeline executions, not successful high-load global BA solves.

These results distinguish two complementary forms of memory control. The row-wise backend limits solver-side factorization storage, whereas the keyframe, map, and optimization-range policies limit the size of the problem presented to the solver. Both are required in a complete onboard architecture.

4. Discussion

4.1. Significance for Marine Inspection Robotics

The NTNU results show that trajectory error remained a small fraction of traveled distance in the ballast-tank missions. The robust local BA trajectories had path-normalized RMSE values of 0.122–0.144%. Such normalized measures are more informative than absolute RMSE alone when mission lengths differ substantially.

For ship inspection, localization quality supports more than autonomous mobility. A consistent trajectory allows observations of corrosion, coating degradation, cracks, anodes, brackets, stiffeners, and weld regions to be registered within a three-dimensional structural reference frame. It also supports comparison across inspection campaigns and between observed geometry and design information. Large local discontinuities can otherwise duplicate structures, distort clearance estimates, or associate inspection records with incorrect locations.

The robust frontend targets failure modes that are characteristic of ship interiors. Residual weighting limits the influence of ambiguous matches; damping and step clamping bound updates in locally degenerate directions; and soft fallback preserves continuity when a local window lacks sufficient geometric information. Local-map compression additionally reduces redundant samples from broad planar surfaces while retaining informative structural boundaries.

4.2. Numerical Architecture and Operating-Policy Co-Design

The principal contribution is the co-design of numerical representation, solver access, frontend stabilization, and operating policy. Replacing a Cholesky solve with Kaczmarz while retaining unlimited keyframes and uncompressed point clouds would not resolve the end-to-end memory problem. Conversely, point-cloud compression alone would not prevent factorization storage from increasing as the pose graph becomes more densely connected.

The Kaczmarz backend offers three relevant system properties. First, it operates directly on whitened Jacobian rows. Second, its update does not require explicit storage of

A^{⊤}

,

H = A^{⊤} A

, or a Cholesky factor. Third, factor rows can be streamed, prioritized, down-weighted, clipped, or deferred. These properties support memory budgeting, incremental factor arrival, and robust row-selection policies.

The experiments also quantify the cost of this flexibility. Kaczmarz did not attain the residuals achieved by LSQR/LSMR, PCG, or direct factorization in the controlled benchmark. It is therefore most appropriate when row-level controllability and bounded structural storage are more important than obtaining the smallest linear residual in the shortest time. When sufficient memory is available, an adaptive backend may instead select LSQR, LSMR, PCG, or a direct method.

This interpretation is consistent with recent resource-aware SLAM research, which treats algorithm and hardware constraints as a coupled design problem rather than optimizing accuracy in isolation [26].

4.3. Interpretation of Onboard Feasibility

The 8 GB Jetson Orin Nano represents a relevant memory and power class for a compact marine inspection robot. The reported experiments, however, were not an end-to-end deployment under a hardware-enforced 8 GB cap. The system-level RSS values should therefore be interpreted as measured pipeline operating points and design margins, not as evidence of real-time performance on every Jetson configuration.

The distinction between solver-side structural memory and full-pipeline peak RSS is essential. Solver-side memory isolates the data structures associated with

(A, b)

and the selected numerical method. Full-pipeline RSS also includes point-cloud loading, feature extraction, local mapping, graph construction, visualization or output buffers, memory allocators, and software libraries. Consequently, a lower solver-side estimate does not imply an equal reduction in process peak RSS when point-cloud storage dominates.

The results nevertheless support the architectural feasibility of onboard operation. The backend avoids factorization-related storage, while the surrounding policies regulate the number and size of keyframes, maps, and active factors. These mechanisms provide explicit control variables for trading trajectory quality against memory margin, runtime, power, and thermal limits.

4.4. Limitations and Future Work

Several limitations define the scope of the conclusions. First, the archived baseline and robust NTNU results were not generated under identical alignment conditions; the robust trajectories used verified

Sim (3)

alignment. Their numerical differences provide trajectory-level evidence but do not constitute a strictly controlled single-variable ablation.

Second, the trajectory results validate localization output but do not establish real-time closed-loop navigation, collision avoidance, or inspection coverage. A deployment study should additionally measure update rate, latency distribution, thermal throttling, dropped sensor packets, and mission success on the target robot.

Third, Kaczmarz can converge slowly for ill-conditioned or strongly inconsistent systems, and its residual floor can exceed that of LSQR/LSMR or well-preconditioned Krylov methods. The proposed architecture does not remove this limitation.

Fourth, some intermediate point counts and compression ratios were not retained in the archived logs. The study therefore reports raw points, selected inputs, final pose counts, solver-side structural memory, and full-pipeline RSS separately rather than estimating unrecorded quantities.

Fifth, the full-pipeline operating points were not measured under cgroup, container, or hardware-level memory enforcement. Future tests should impose reproducible 4, 8, and 16 GB limits and distinguish confirmed out-of-memory events from generic process termination.

Future work should develop adaptive solver selection based on graph topology, available memory, accuracy requirements, and time budget. LSQR or LSMR may be preferable when a low residual is required and transpose operations are affordable; PCG may be appropriate when an effective preconditioner is available; and Kaczmarz may be selected when row streaming or a strict working-set limit dominates. Factor-aligned block and mini-batch Kaczmarz variants should also be evaluated. Finally, frontend confidence weights should be logged and propagated explicitly to global row selection, and hardware-capped experiments should record GPU usage, temperature, power consumption, and mission-completion statistics.

5. Conclusions

This study presented a memory-efficient graph SLAM architecture for autonomous inspection of ballast water tanks and other resource-constrained ship interiors. The architecture combines robust local hierarchical bundle adjustment with a CSR-based Kaczmarz backend, local-map compression, keyframe regulation, and global-optimization range control.

The frontend uses residual-adaptive weighting, damping, bounded translation and rotation updates, and soft fallback to suppress local divergence in repetitive or weakly constrained geometry. The backend operates directly on whitened factor rows and avoids explicit construction of the approximate Hessian, a Cholesky factor, and a transpose cache.

On NTNU Ballast Water Tank missions 1–3, baseline path-normalized RMSE values were 0.281–0.372%. The robust local BA trajectories yielded RMSE values of 0.080, 0.110, and 0.127 m, corresponding to 0.137%, 0.144%, and 0.122% of the respective path lengths.

The solver comparisons showed that Kaczmarz was not residual- or runtime-optimal relative to LSQR/LSMR, Krylov methods, or direct factorization. Its practical value lies in row-wise controllability, factorization-free storage, and compatibility with streaming and memory-budgeted operation. The contribution is therefore a numerical-architecture and operating-policy co-design for onboard marine robotics, not a new Kaczmarz variant or a claim of universal solver superiority.

Author Contributions

Conceptualization, S.C. and T.-W.K.; methodology, S.C. and W.Y.; software, S.C.; validation, S.C. and W.Y.; formal analysis, S.C.; investigation, S.C.; data curation, S.C.; writing—original draft preparation, S.C.; writing—review and editing, S.C., W.Y. and T.-W.K.; visualization, S.C.; resources, T.-W.K.; supervision, T.-W.K.; project administration, T.-W.K.; funding acquisition, T.-W.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Global Basic Research Laboratory (Advanced) Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT (MSIT), grant number 00406127 (Mathematical-based theoretical research and technology development of vision models for generative AI).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The third-party NTNU Ballast Water Tank dataset analyzed in this study is openly available from the NTNU Autonomous Robots Laboratory through Hugging Face Datasets [25]. Repository documentation and mission metadata are available at https://github.com/ntnu-arl/ballast_water_tank_dataset (accessed on 22 June 2026). The associated inspection platform and representative field deployments are described by Dharmadhikari et al. [1]. Retained processed trajectory figures, solver configurations, and benchmark logs are available from the corresponding author upon reasonable request. Raw trajectory coordinates and plotting scripts for the archived top-down panels were not retained.

Acknowledgments

The authors acknowledge the NTNU Autonomous Robots Laboratory for making the ballast water tank dataset publicly available.

Conflicts of Interest

W.Y. is employed by Avikus Co., Ltd. The remaining authors declare no conflicts of interest. The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Dharmadhikari, M.; De Petris, P.; Kulkarni, M.; Khedekar, N.; Nguyen, H.; Stene, A.E.; Sjøvold, E.; Solheim, K.; Gussiaas, B.; Alexis, K. Autonomous Exploration and General Visual Inspection of Ship Ballast Water Tanks Using Aerial Robots. In Proceedings of the 2023 21st International Conference on Advanced Robotics, Abu Dhabi, United Arab Emirates, 5–8 December 2023; pp. 409–416. [Google Scholar] [CrossRef]
Poggi, L.; Gaggero, T.; Gaiotti, M.; Ravina, E.; Rizzo, C.M. Recent Developments in Remote Inspections of Ship Structures. Int. J. Nav. Archit. Ocean Eng. 2020, 12, 881–891. [Google Scholar] [CrossRef]
Christensen, L.; Fischer, N.; Kroffke, S.; Lemburg, J.; Ahlers, R. Cost-Effective Autonomous Robots for Ballast Water Tank Inspection. J. Ship Prod. Des. 2011, 27, 127–136. [Google Scholar] [CrossRef]
Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans. Robot. 2016, 32, 1309–1332. [Google Scholar] [CrossRef]
Durrant-Whyte, H.; Bailey, T. Simultaneous Localization and Mapping: Part I. IEEE Robot. Autom. Mag. 2006, 13, 99–110. [Google Scholar] [CrossRef]
Zhang, J.; Singh, S. LOAM: Lidar Odometry and Mapping in Real-time. Proceedings of Robotics: Science and Systems, Berkeley, CA, USA, 12–16 July 2014. [Google Scholar] [CrossRef]
Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. LIO-SAM: Tightly-Coupled Lidar Inertial Odometry via Smoothing and Mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Virtual Conference, 24 October 2020–24 January 2021; pp. 5135–5142. [Google Scholar] [CrossRef]
Liu, X.; Liu, Z.; Kong, F.; Zhang, F. Large-Scale LiDAR Consistent Mapping Using Hierarchical LiDAR Bundle Adjustment. IEEE Robot. Autom. Lett. 2023, 8, 1523–1530. [Google Scholar] [CrossRef]
Dellaert, F.; Kaess, M. Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing. Int. J. Robot. Res. 2006, 25, 1181–1203. [Google Scholar] [CrossRef]
Kaess, M.; Ranganathan, A.; Dellaert, F. iSAM: Incremental Smoothing and Mapping. IEEE Trans. Robot. 2008, 24, 1365–1378. [Google Scholar] [CrossRef]
Kaess, M.; Johannsson, H.; Roberts, R.; Ila, V.; Leonard, J.J.; Dellaert, F. iSAM2: Incremental Smoothing and Mapping Using the Bayes Tree. Int. J. Robot. Res. 2012, 31, 216–235. [Google Scholar] [CrossRef]
Kümmerle, R.; Grisetti, G.; Strasdat, H.; Konolige, K.; Burgard, W. g2o: A General Framework for Graph Optimization. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 3607–3613. [Google Scholar] [CrossRef]
Davis, T.A. Direct Methods for Sparse Linear Systems; SIAM: Philadelphia, PA, USA, 2006. [Google Scholar] [CrossRef]
Chen, Y.; Davis, T.A.; Hager, W.W.; Rajamanickam, S. Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate. ACM Trans. Math. Softw. 2008, 35, Article 22, 14 pp. [Google Scholar] [CrossRef]
George, A.; Liu, J.W.H. Computer Solution of Large Sparse Positive Definite Systems; Prentice-Hall: Englewood Cliffs, NJ, USA, 1981; ISBN 978-0-13-165274-3. [Google Scholar]
NVIDIA Corporation. Jetson Orin Nano Super Developer Kit: Technical Specifications. Available online: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/nano-super-developer-kit/ (accessed on 22 June 2026).
Björck, Å. Numerical Methods for Least Squares Problems; SIAM: Philadelphia, PA, USA, 1996. [Google Scholar] [CrossRef]
Paige, C.C.; Saunders, M.A. LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares. ACM Trans. Math. Softw. 1982, 8, 43–71. [Google Scholar] [CrossRef]
Fong, D.C.-L.; Saunders, M.A. LSMR: An Iterative Algorithm for Sparse Least-Squares Problems. SIAM J. Sci. Comput. 2011, 33, 2950–2971. [Google Scholar] [CrossRef]
Saad, Y. Iterative Methods for Sparse Linear Systems, 2nd ed.; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar] [CrossRef]
Kaczmarz, S. Angenäherte Auflösung von Systemen linearer Gleichungen. Bull. Int. Acad. Pol. Sci. Lett. Cl. Sci. Math. Nat. Sér. A 1937, 355–357. [Google Scholar]
Strohmer, T.; Vershynin, R. A Randomized Kaczmarz Algorithm with Exponential Convergence. J. Fourier Anal. Appl. 2009, 15, 262–278. [Google Scholar] [CrossRef]
Needell, D.; Srebro, N.; Ward, R. Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz Algorithm. Math. Program. 2016, 155, 549–573. [Google Scholar] [CrossRef]
Needell, D.; Tropp, J.A. Paved with Good Intentions: Analysis of a Randomized Block Kaczmarz Method. Linear Algebra Appl. 2014, 441, 199–221. [Google Scholar] [CrossRef]
NTNU Autonomous Robots Laboratory. Ballast Water Tank Dataset. Hugging Face Datasets. Available online: https://huggingface.co/datasets/ntnu-arl/ballast_water_tank_dataset (accessed on 22 June 2026).
Kim, S.; Hsiao, R.; Nikolić, B.; Demmel, J.; Shao, Y.S. SuperNoVA: Algorithm–Hardware Co-Design for Resource-Aware SLAM. In Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 30 March–3 April 2025; pp. 1035–1051. [Google Scholar] [CrossRef]

Figure 1. Proposed marine-inspection simultaneous localization and mapping (SLAM) architecture. The frontend combines light detection and ranging (LiDAR) odometry, robust local hierarchical bundle adjustment (HBA), soft fallback, and local-map compression. The backend assembles weighted factor rows in compressed sparse row (CSR) format and applies Kaczmarz projections subject to keyframe, point-cap, and optimization-range controls.

Figure 2. Transformation of a relative-pose factor into a row-wise Kaczmarz update. Linearization and whitening of factor k produce a sparse block row whose nonzero column blocks correspond only to the connected pose increments. Stacking the factor rows yields the least-squares system

(A, b)

; each Kaczmarz step then selects one scalar row

a_{ℓ}^{⊤}

and its right-hand-side element

b_{ℓ}

.

Figure 2. Transformation of a relative-pose factor into a row-wise Kaczmarz update. Linearization and whitening of factor k produce a sparse block row whose nonzero column blocks correspond only to the connected pose increments. Stacking the factor rows yields the least-squares system

(A, b)

; each Kaczmarz step then selects one scalar row

a_{ℓ}^{⊤}

and its right-hand-side element

b_{ℓ}

.

Figure 3. Sparsity pattern of the assembled Jacobian

A

for the controlled 100-keyframe Park pose graph. The dominant diagonal structure is produced by local relative-pose factors, whereas the lower off-diagonal blocks represent nonlocal constraints.

Figure 3. Sparsity pattern of the assembled Jacobian

A

for the controlled 100-keyframe Park pose graph. The dominant diagonal structure is produced by local relative-pose factors, whereas the lower off-diagonal blocks represent nonlocal constraints.

Figure 4. Top-down trajectory comparison for NTNU Ballast Water Tank (a) mission 1, (b) mission 2, and (c) mission 3. The black solid curve is the stored comparison trajectory (/laser_mapping_path); the orange dashed curve is the final robust local bundle adjustment trajectory after verified

Sim (3)

alignment. Start and end markers are shown, and the colored keyframe markers encode paired position error. Axes are in metres.

Figure 4. Top-down trajectory comparison for NTNU Ballast Water Tank (a) mission 1, (b) mission 2, and (c) mission 3. The black solid curve is the stored comparison trajectory (/laser_mapping_path); the orange dashed curve is the final robust local bundle adjustment trajectory after verified

Sim (3)

alignment. Start and end markers are shown, and the colored keyframe markers encode paired position error. Axes are in metres.

Figure 5. Path-normalized root-mean-square error (RMSE) for the baseline and robust local bundle adjustment configurations on NTNU Ballast Water Tank missions 1–3. The dashed reference indicates 0.4% of the corresponding reference path length.

Figure 6. Solver-side structural-memory estimates for the Kaczmarz, LSQR, preconditioned conjugate-gradient (PCG), and Cholesky/

L D L^{⊤}

backends in the controlled row-access comparison. These estimates are not equivalent to full-pipeline peak resident set size (RSS).

Figure 6. Solver-side structural-memory estimates for the Kaczmarz, LSQR, preconditioned conjugate-gradient (PCG), and Cholesky/

L D L^{⊤}

backends in the controlled row-access comparison. These estimates are not equivalent to full-pipeline peak resident set size (RSS).

Table 1. Norwegian University of Science and Technology (NTNU) Ballast Water Tank mission scale.

Dataset	Frames	Raw points	Mean points per frame	Path length (m)	Bounding-box diagonal (m)
NTNU mission 1	851	27.9 million	32.8 thousand	58.48	9.486
NTNU mission 2	1202	39.4 million	32.8 thousand	76.52	10.509
NTNU mission 3	1084	35.5 million	32.8 thousand	103.96	15.411

Table 2. Archived NTNU trajectory errors for the baseline and robust local bundle adjustment (BA) configurations. RMSE denotes root-mean-square error, and BBox denotes the reference-trajectory bounding-box diagonal.

Dataset	Evaluation state	RMSE (m)	Maximum error (m)	RMSE/path (%)	RMSE/BBox (%)
Mission 1	Baseline verified trajectory	0.211	0.699	0.361	2.225
Mission 1	Robust local BA, verified $Sim (3)$	0.080	0.163	0.137	0.843
Mission 2	Baseline verified trajectory	0.284	1.663	0.372	2.706
Mission 2	Robust local BA, verified $Sim (3)$	0.110	0.181	0.144	1.047
Mission 3	Baseline verified trajectory	0.292	1.161	0.281	1.895
Mission 3	Robust local BA, verified $Sim (3)$	0.127	0.184	0.122	0.824

Table 3. Selected results from the controlled backend-structure comparison. CSR and CSC denote compressed sparse row and compressed sparse column, respectively; LSQR denotes least-squares QR; PCG denotes preconditioned conjugate gradient; and LDLT denotes an

L D L^{⊤}

factorization.

Table 3. Selected results from the controlled backend-structure comparison. CSR and CSC denote compressed sparse row and compressed sparse column, respectively; LSQR denotes least-squares QR; PCG denotes preconditioned conjugate gradient; and LDLT denotes an

L D L^{⊤}

factorization.

Backend	Storage mode	Mean residual	Mean runtime (s)	Structural memory (MB)	Working set (MB)
Kaczmarz	CSR row-only	0.158	0.1210	0.075	0.062
LSQR	CSR and CSC transpose cache	0.011	0.0190	0.149	0.118
PCG	Explicit sparse $H = A^{⊤} A$	0.011	0.0047	0.177	0.094
Cholesky/LDLT	Explicit $H$ and sparse factorization	0.011	0.0015	0.346	0.056

Table 4. Representative full-pipeline operating policies obtained from the Park dataset. BA denotes bundle adjustment, and RSS denotes resident set size.

Operating point	Principal settings	Global BA state	Output poses	Peak RSS (MB)	Runtime (s)
Memory-priority	Stride 7; leaf 0.8; 50,000-point cap	Disabled	57	165.1	6.585
Balanced	Stride 5; leaf 0.8; 50,000-point cap; 20-pose limit	Skipped by range guard	135	178.5	13.56
Quality-priority	Stride 5; leaf 0.2; 100,000-point cap; 100-pose limit	Skipped by range guard	135	411.4	20.58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Memory-Efficient 3D LiDAR Graph SLAM for Ballast Water Tank Inspection Robots Using Robust Hierarchical Bundle Adjustment and a Kaczmarz Backend

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. System Architecture and Onboard Design Objective

2.2. Robust Hierarchical Bundle Adjustment Frontend

2.2.1. Local Window Formulation

2.2.2. Residual-Dependent Weight Reduction

2.2.3. Damping and Pose-Step Clamping

2.2.4. Soft Fallback

2.2.5. Local-Map Compression

2.3. Global Graph SLAM Formulation

2.4. CSR-Based Kaczmarz Backend

2.5. Structural Memory Model and Memory Guard

2.6. Experimental Data and Evaluation Protocol

2.7. Use of Generative Artificial Intelligence in Manuscript Preparation

3. Results

3.1. NTNU Ballast Water Tank Mission Scale

3.2. Trajectory Accuracy of the Baseline and Robust Local BA Configurations

3.3. Structural Behavior of the Kaczmarz Backend

3.4. System-Level Memory Operating Points

4. Discussion

4.1. Significance for Marine Inspection Robotics

4.2. Numerical Architecture and Operating-Policy Co-Design

4.3. Interpretation of Onboard Feasibility

4.4. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe