diffct: Differentiable CT Operators from Circular Orbits to Arbitrary Trajectories

Yipeng Sun; Linda-Sophie Schneider; Chengze ye; Andreas Maier

doi:10.20944/preprints202605.1446.v1

Submitted:

21 May 2026

Posted:

21 May 2026

You are already at the latest version

Abstract

diffct is a CUDA-accelerated computed tomography library that exposes differentiable forward operators and their exact discrete adjoints for 2D parallel beam, 2D fan beam, and 3D cone beam imaging. The main branch provides the stable circular-orbit lineage released on PyPI, including Siddon and separable-footprint (SF) projector families, while the dev branch extends the Siddon-based projector/backprojector interface to arbitrary per-view trajectories through explicit source and detector arrays. This report rewrites the project description directly from the current source code, examples, tests, and the related CT literature. We formalize the geometry parameterization used by the implementation, derive the differentiable Siddon-style projector and its exact discrete adjoint, explain how gradients are transported through torch.autograd.Function wrappers backed by Numba CUDA kernels, document the analytical filtered backprojection and Feldkamp–Davis–Kress pipelines implemented on main and ported into the dev, and record how the circular-orbit SF algorithms from main fit into the broader architecture.

Keywords:

computed tomography

;

differentiable programming

;

CUDA

;

PyTorch

;

arbitrary trajectory CT

Subject:

Computer Science and Mathematics - Software

1. Introduction

Computed tomography (CT) reconstruction still relies on three closely related operator types that must agree with each other numerically: a forward model that maps an attenuation field to projection data, a pure adjoint backprojection that propagates information back to image space, and reconstruction operators that approximate inversion under additional geometry assumptions [1]. Modern learning-based pipelines add a differentiability requirement so that image-domain variables, regularization parameters, and surrounding network parameters can be optimized end to end. diffct addresses this intersection by combining CUDA kernels, PyTorch autograd wrappers, and analytical reconstruction helpers in one small library [2,3].

The current repository has two public branch lineages. The main branch is the stable circular-orbit version published on PyPI. The dev branch keeps the same three geometry families and the same Siddon projector/backprojector class names, while replacing closed-form circular geometry parameters by explicit per-view trajectory tensors. In practice this means that circular, spiral, sinusoidal, saddle, random, and user-defined trajectories all flow through the same projector kernels without changing the CUDA implementation. The trajectory-array analytical helpers on dev track the overhaul merged into main 1.2.10/1.2.11; the current public main branch has since advanced to the 1.3.4 release line.

The report focuses on four contributions. First, it gives a single notation for the main circular-orbit parameterization and the dev arbitrary-trajectory generalization. Second, it records the differentiable projector/backprojector pair implemented in the source, following the discrete ray-tracing lineage of [4]. Third, it derives the analytical reconstruction helpers actually used in the examples and tests, including cosine pre-weighting, Parker weighting, ramp filtering, angular integration weights, and weighted voxel-driven backprojection [5,6]. Fourth, it summarizes the repository-level validation logic encoded in adjoint, gradcheck, reconstruction-accuracy, offset, smoke, and benchmark tests, positioning diffct relative to differentiable CT libraries such as PYRO-NN and recent geometry-learning work [7,8].

Code and availability.

The source repository is hosted at https://github.com/sypsyp97/diffct, and stable releases are distributed through PyPI at https://pypi.org/project/diffct/ (pip install diffct).

2. Related Work

diffct sits at the intersection of four established CT lines of work. The first is classical analytical reconstruction. Parallel-beam FBP, fan-beam short-scan weighting, and cone-beam FDK remain the baseline reference formulas for the circular-orbit setting [1,5,6]. The second is discrete projector design, where Siddon’s plane-intersection traversal remains one of the most widely used ray-driven baselines [4]. The third is footprint-based projector design, especially the separable-footprint family in [9], which showed how voxel footprints can be approximated efficiently by separable trapezoidal or rectangular detector-domain kernels. The fourth is differentiable CT software, where libraries such as PYRO-NN and LEAP established the value of pairing GPU projectors with modern deep-learning workflows [7,10,11].

The dev branch also benefits from a broader arbitrary-trajectory literature. Exact cone-beam inversion is governed by completeness conditions and data sufficiency constraints, classically associated with Tuy and Smith [12,13]. Recent work such as DRACO revisits these ideas in a differentiable setting and makes arbitrary-orbit reconstruction a first-class optimization target [14]. diffct occupies a narrower but useful engineering position in this landscape: it exposes explicit per-view geometry tensors, exact discrete adjoints for Siddon-style projectors, and a practical analytical reconstruction path that shares geometry conventions with PyTorch-based reconstruction workflows [3,8].

3. Branch Lineage and Scope

3.1. Why the Report Is Centered on `dev`

The dev branch is the most complete mathematical target for a full report because it captures the generalized Siddon-based interface for circular and non-circular trajectories. The main branch remains the reference line for the circular-orbit SF family. The public API on dev re-exports the same high-level Siddon operator families:

ParallelProjectorFunction / ParallelBackprojectorFunction,
FanProjectorFunction / FanBackprojectorFunction,
ConeProjectorFunction / ConeBackprojectorFunction,
geometry generators for parallel, fan, and cone beam scanning,
analytical helpers for FBP/FDK pipelines adapted to trajectory arrays.

The report therefore uses the dev Siddon/trajectory-array interface as the primary mathematical model and treats the main branch as the circular-orbit specialization and SF reference line.

3.2. Main Versus Dev

Table 1 summarizes the operational distinction between the two branches.

The unresolved gap is equally important. The main branch contains separable-footprint backends that depend on closed-form circular geometry and therefore remain branch-local. In 2D fan beam, backend="sf" denotes a separable trapezoidal detector footprint. In 3D cone beam, backend="sf_tr" uses a transaxial trapezoid with an axial rectangle, while backend="sf_tt" uses trapezoids in both detector directions, following the separable-footprint taxonomy in [9]. The dev branch keeps the same autograd structure and trajectory-array analytical helpers but omits these SF kernels because their footprint construction is still specialized to circular geometry.

The branch split also creates explicit compatibility costs. Existing code written against scalar angles/sid/sdd/backend signatures must now construct per-view geometry tensors before calling the projector classes. The separable-footprint backends available on main are absent on dev. The detector-grid convention on the dev analytical helpers uses

(k - N / 2) Δ u

, whereas the current main circular closed-form helpers still center detector bins with

(k - (N - 1) / 2) Δ u

, so copying parameters across branches without adjustment introduces a half-cell shift for even detector sizes. The report therefore uses the dev equations as the global interface and treats the main SF family as an important circular-orbit specialization that still needs to be documented explicitly.

4. Unified Geometry Model

4.1. Discrete Detector Coordinates

The trajectory-array analytical helpers on dev use the detector-cell convention

u_{k} = (k - \frac{N_{u}}{2}) Δ u + u_{0}, v_{ℓ} = (ℓ - \frac{N_{v}}{2}) Δ v + v_{0},

(1)

where

N_{u}

and

N_{v}

denote detector sizes,

Δ u

and

Δ v

denote detector spacing, and

u_{0}, v_{0}

are optional offsets. This section follows the dev analytical-helper convention with the factor

N_{u} / 2

rather than

(N_{u} - 1) / 2

; the current main circular closed-form helpers still use

(N_{u} - 1) / 2

.

4.2. Parallel-Beam Parameterization

For a view index n, the dev branch represents 2D parallel geometry by a unit ray direction

d_{n} \in R^{2}

, a detector origin

o_{n} \in R^{2}

, and a detector axis

e_{u, n} \in R^{2}

perpendicular to

d_{n}

. Detector sample k is anchored at

r_{n, k} (0) = o_{n} + u_{k} e_{u, n}, r_{n, k} (t) = r_{n, k} (0) + t d_{n}, {∥ d_{n} ∥}_{2} = 1 .

(2)

The circular-orbit helper used as the main specialization is

d (θ) = [\begin{matrix} cos θ \\ sin θ \end{matrix}], o (θ) = [\begin{matrix} - d_{\det} sin θ \\ d_{\det} cos θ \end{matrix}], e_{u} (θ) = [\begin{matrix} - sin θ \\ cos θ \end{matrix}] .

(3)

Here

θ \in [0, π)

is the non-redundant ray-direction angle used by the implementation, and

d_{\det}

is the detector-to-isocenter distance.

θ

differs from the classical Radon-transform normal angle by a fixed

π / 2

reparameterization.

4.3. Fan-Beam Parameterization

For fan beam CT the implementation stores a source position

S_{n} \in R^{2}

, a detector center

C_{n} \in R^{2}

, and a detector axis

e_{u, n} \in R^{2}

. The detector point associated with cell k is

D_{n, k} = C_{n} + u_{k} e_{u, n}, {\hat{d}}_{n, k} = \frac{D_{n, k} - S_{n}}{∥ D_{n, k} - S_{n} ∥_{2}} .

(4)

The circular-orbit specialization used on the stable branch is

S (β) = [\begin{matrix} - sid sin β \\ sid cos β \end{matrix}], C (β) = [\begin{matrix} (sdd - sid) sin β \\ - (sdd - sid) cos β \end{matrix}], e_{u} (β) = [\begin{matrix} cos β \\ sin β \end{matrix}] .

(5)

Here

β \in [0, 2 π)

is the view angle,

sid

denotes the source-to-isocenter distance, and

sdd

denotes the source-to-detector distance. Per-view quantities reuse the subscript n (e.g.

{sid}_{n}

below).

4.4. Cone-Beam Parameterization

For cone beam CT each view uses a source

S_{n} \in R^{3}

, detector center

C_{n} \in R^{3}

, and orthonormal detector basis vectors

e_{u, n}, e_{v, n} \in R^{3}

. Detector coordinate

(k, ℓ)

maps to

D_{n, k, ℓ} = C_{n} + u_{k} e_{u, n} + v_{ℓ} e_{v, n}, {\hat{d}}_{n, k, ℓ} = \frac{D_{n, k, ℓ} - S_{n}}{∥ D_{n, k, ℓ} - S_{n} ∥_{2}} .

(6)

The circular-orbit 3D helper is

S (β) = [\begin{matrix} - sid sin β \\ sid cos β \\ 0 \end{matrix}], C (β) = [\begin{matrix} (sdd - sid) sin β \\ - (sdd - sid) cos β \\ 0 \end{matrix}], e_{u} (β) = [\begin{matrix} cos β \\ sin β \\ 0 \end{matrix}], e_{v} = [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}] .

(7)

4.5. Non-Circular Trajectories

The arbitrary-trajectory generators in diffct.geometry instantiate several useful families. For example, the spiral cone-beam trajectory uses

β_{n} = β_{0} + \frac{2 π N_{turns}}{N_{views}} n, z_{n} \in [- \frac{z_{max}}{2}, \frac{z_{max}}{2}],

(8)

where

N_{views}

is the number of views in the scan,

N_{turns}

is the number of full helical turns,

β_{0}

is the start angle, and

z_{n}

is the axial source offset of view n bounded by

z_{max}

.

S_{n}^{spiral} = [\begin{matrix} - sid sin β_{n} \\ sid cos β_{n} \\ z_{n} \end{matrix}], C_{n}^{spiral} = [\begin{matrix} (sdd - sid) sin β_{n} \\ - (sdd - sid) cos β_{n} \\ z_{n} \end{matrix}] .

(9)

The sinusoidal and saddle variants modulate the source-to-isocenter distance and axial position,

{sid}_{n} = sid + A sin (ω β_{n}),

(10)

z_{n} = A_{z} cos (2 β_{n}), {sid}_{n} = sid + A_{r} sin (2 β_{n}),

(11)

where

A, A_{r}

are radial modulation amplitudes,

A_{z}

is the axial modulation amplitude, and

ω \in Z^{+}

is the angular frequency of the sinusoidal wobble. and the custom generators accept user-defined source paths or ray fields. The mathematical consequence is simple: every downstream operator only sees the per-view geometry tensors, so the projector definition is trajectory agnostic.

This abstraction is stronger than the analytical theory that supports it. The forward and adjoint Siddon operators only require valid ray definitions, so they transfer immediately from circular to non-circular orbits. Exact cone-beam inversion, by contrast, depends on data-completeness conditions of the Tuy–Smith type [12,13]. diffct therefore treats arbitrary trajectories as a first-class capability for projection, adjoint backprojection, and iterative optimization, while its analytical fan/cone reconstruction helpers should be interpreted as geometry-aware practical reconstructions whose exactness is guaranteed in the classical circular settings and whose usefulness outside that regime is empirical.

Figure 1. Unified geometry parameterization used by diffct. The parallel-beam panel shows the ray direction

d (θ)

, detector origin

o (θ)

, and detector axis

e_{u}

; the fan- and cone-beam panels highlight the source

S (β)

, detector plane, and basis vectors

e_{u}, e_{v}

; the fourth panel previews circular, sinusoidal, spiral, and saddle trajectories available through the diffct.geometry generators.

Figure 1. Unified geometry parameterization used by diffct. The parallel-beam panel shows the ray direction

d (θ)

, detector origin

o (θ)

, and detector axis

e_{u}

; the fan- and cone-beam panels highlight the source

S (β)

, detector plane, and basis vectors

e_{u}, e_{v}

; the fourth panel previews circular, sinusoidal, spiral, and saddle trajectories available through the diffct.geometry generators.

5. Differentiable Forward and Adjoint Operators

5.1. Continuous Forward Model

Let

f : R^{d} \to R

denote the attenuation field, with

d = 2

for the parallel/fan-beam operators and

d = 3

for the cone-beam operator, and let

t \in R

be the ray arc-length parameter. The projector implemented in diffct computes line integrals along rays determined by the geometry tensors. For parallel beam,

P_{par} [f] (n, k) = \int_{R} f (r_{n, k} (t)) d t,

(12)

with

r_{n, k} (t)

from Equation (2). For fan beam,

P_{fan} [f] (n, k) = \int_{0}^{\infty} f (S_{n} + t {\hat{d}}_{n, k}) d t,

(13)

and for cone beam,

P_{cone} [f] (n, k, ℓ) = \int_{0}^{\infty} f (S_{n} + t {\hat{d}}_{n, k, ℓ}) d t .

(14)

These equations are branch-independent. The difference between main and dev lies only in how the trajectory tensors are produced.

5.2. Siddon Traversal with Cell-Constant Integration

The CUDA kernels implement a Siddon-style voxel traversal [4]. Consider a ray

r (t) = o + t d

passing through an axis-aligned voxel grid. Intersections with grid planes satisfy

t_{m}^{(q)} = \frac{b_{m}^{(q)} - o_{m}}{d_{m}}, m \in {x, y, z},

(15)

where

o

and

d

are the ray origin and direction,

o_{m}

and

d_{m}

are their m-th components, q indexes the grid planes along axis m, and

b_{m}^{(q)}

is the corresponding grid-plane coordinate. Sorting all valid intersection parameters yields segment boundaries

t_{0} < t_{1} < \dots < t_{M}

inside the volume, where M is the number of segments the ray cuts across the grid. On each segment the attenuation field is taken to be constant at the value of the single pixel (2D) or voxel (3D) that the segment traverses, so the continuous line integral is approximated as

P [f] (ray) \approx \sum_{m = 0}^{M - 1} Δ t_{m} f_{cell (m)}, Δ t_{m} = t_{m + 1} - t_{m},

(16)

where

cell (m)

denotes the discrete grid index of the cell traversed by segment m and

Δ t_{m}

is the exact chord length of the segment inside that cell. This matches the inner loop of the diffct projector kernels, which step from one grid-plane crossing to the next and accumulate d_image[iy, ix] * seg_len (2D) or d_vol[iz, iy, ix] * seg_len (3D) without any sub-cell interpolation. The forward/adjoint kernel pair is therefore matched by construction: the adjoint scatters the incoming ray-domain gradient into the same cell with the same segment length, so the discrete transpose identity holds up to floating-point tolerance (verified by test_adjoint_inner_product.py).

5.3. Adjoint Identity and Autograd

diffct exposes the projector and backprojector as paired torch.autograd.Function classes. The design target is the linear-adjoint relation

〈 P_{Θ} x, y 〉 = 〈 x, P_{Θ}^{⊤} y 〉,

(17)

where

Θ

collects all geometry tensors. The test suite checks Equation (17) for parallel, fan, and cone beam projectors. The backward pass of a forward projector therefore executes the pure adjoint backprojection kernel rather than an analytical FBP/FDK kernel.

This distinction is essential. The analytical reconstruction operators introduced later are not the same as the autograd adjoints because analytical FBP/FDK require geometry-dependent weighting and scaling. By contrast, the autograd backward uses the exact discrete transpose of the corresponding forward kernel. If

y = P_{Θ} x, z = P_{Θ}^{⊤} s,

(18)

then the implemented backward rules are

\begin{matrix} \frac{\partial L}{\partial x} & = P_{Θ}^{⊤} \frac{\partial L}{\partial y}, \frac{\partial L}{\partial Θ} = 0 for the current wrappers, \end{matrix}

(19)

\begin{matrix} \frac{\partial L}{\partial s} & = P_{Θ} \frac{\partial L}{\partial z}, \frac{\partial L}{\partial Θ} = 0 for the current wrappers . \end{matrix}

(20)

The zero geometry gradient in is an implementation fact, not a mathematical impossibility: the source and detector tensors are saved in the autograd context and reused during backward, but the backward methods return None for geometry arguments. In other words, diffct currently differentiates with respect to image and sinogram variables while treating geometry as fixed runtime input. This is consistent with the repository’s present optimization examples, which solve for reconstructions while holding trajectories fixed.

For a least-squares inverse problem

\hat{f} = arg min_{f} {∥ P_{Θ} f - p ∥}_{2}^{2} + λ R (f),

(21)

the data-term gradient is

\nabla_{f} \frac{1}{2} {∥ P_{Θ} f - p ∥}_{2}^{2} = P_{Θ}^{⊤} (P_{Θ} f - p) .

(22)

This is exactly the quantity propagated by PyTorch through the diffct projector classes. At kernel level, the forward path accumulates line integrals ray by ray, whereas the backward path revisits the same traversed segments and scatters each incoming gradient into the single pixel or voxel that the segment traverses, weighted by the segment length

Δ t_{m}

of Equation (16). Because many rays contribute to the same image cell, the backward kernels use cuda.atomic.add to preserve the discrete transpose relation.

Algorithm 1 Autograd contract of the diffct projector wrappers

Require:: image or volume $x$ , geometry tensors $Θ$ , output sinogram shape
Ensure:: sinogram $y = P_{Θ} x$ ; on Backward, image gradient $g_{x} = P_{Θ}^{⊤} g_{y}$
1:: procedureForward( $x, Θ$ )
2:: $x \leftarrow float32 (x) . contiguous () . cuda ()$
3:: $\tilde{x}, \tilde{Θ} \leftarrow cuda_as_array (x), cuda_as_array (Θ)$ ▹ zero-copy Numba views
4:: allocate $y \leftarrow 0$
5:: KernelForward( $\tilde{x}, \tilde{Θ}, y$ ) ▹ Siddon gather, current torch stream
6:: $ctx . save (Θ)$
7:: return $y$
8:: end procedure
9:: procedureBackward( $g_{y}$ ) ▹ $g_{y} = \partial L / \partial y$
10:: $Θ \leftarrow ctx . load ()$
11:: allocate $g_{x} \leftarrow 0$
12:: KernelAdjoint( $g_{y}, \tilde{Θ}, g_{x}$ ) ▹ cuda.atomic.add scatter
13:: return $(g_{x}, None, \dots, None)$ ▹ no gradient w.r.t. $Θ$
14:: end procedure

Figure 2. Ray-driven forward gather and atomic-scatter adjoint. The forward kernel accumulates cell-constant voxel contributions along each ray (each traversed cell value weighted by its exact chord length) and produces the sinogram entry y. The adjoint kernel revisits the same traversed cells and scatters the incoming gradient back with cuda.atomic.add, which realizes the discrete transpose relation

〈 P_{Θ} x, y 〉 = 〈 x, P_{Θ}^{⊤} y 〉

tested by test_adjoint_inner_product.py. The third panel sketches how the paired kernels plug into the PyTorch autograd contract for

x \to y

and

\partial L / \partial y \to \partial L / \partial x

.

Figure 2. Ray-driven forward gather and atomic-scatter adjoint. The forward kernel accumulates cell-constant voxel contributions along each ray (each traversed cell value weighted by its exact chord length) and produces the sinogram entry y. The adjoint kernel revisits the same traversed cells and scatters the incoming gradient back with cuda.atomic.add, which realizes the discrete transpose relation

〈 P_{Θ} x, y 〉 = 〈 x, P_{Θ}^{⊤} y 〉

tested by test_adjoint_inner_product.py. The third panel sketches how the paired kernels plug into the PyTorch autograd contract for

x \to y

and

\partial L / \partial y \to \partial L / \partial x

.

6. Analytical Reconstruction in `diffct`

6.1. Detector Coordinates and Angular Weights

The analytical module builds FBP/FDK pipelines out of small helpers. Detector coordinates reuse the dev convention in Equation (1). For open angle lists and short scans, angular integration weights are trapezoidal:

Δ β_{1} = \frac{β_{2} - β_{1}}{2}, Δ β_{n} = \frac{(β_{n} - β_{n - 1}) + (β_{n + 1} - β_{n})}{2}, Δ β_{N_{views}} = \frac{β_{N_{views}} - β_{N_{views} - 1}}{2} .

(23)

Here

{β_{n}}_{n = 1}^{N_{views}}

are the per-view angles and

Δ β_{n}

is the trapezoidal angular integration weight attached to view n. For periodic full circular fan or cone scans, the implementation closes the angle loop before applying the extra redundancy factor

M M 1 / 2

. For a short scan with Parker weighting, the open-interval trapezoidal weights in Equation (23) are kept and redundancy is handled by the Parker window itself.

6.2. Ramp Filter and Apodization

Given a weighted sinogram

p_{w}

, the filtered projection is computed by a 1D FFT-based ramp filter along the detector axis:

q = F^{- 1} \{| 2 π ξ | W (ξ) F [p_{w}]\} \frac{1}{Δ s},

(24)

where

F

and

F^{- 1}

are the 1-D Fourier transform and its inverse along the detector axis,

ξ

is the physical Fourier frequency obtained from the normalized torch.fft.fftfreq grid after division by the detector spacing,

W (ξ)

is an apodization window, and

Δ s

is the detector spacing along the filtered axis (either

Δ u

or

Δ v

depending on the pipeline). The explicit factor

1 / Δ s

is the implementation’s conversion from sample frequency to physical units. The apodization window

W (ξ)

supports Ram–Lak, Hann, Hamming, cosine, and Shepp–Logan choices. Using the normalized frequency

ν = 2 | ξ | \in [0, 1]

, the implemented window functions are

\begin{matrix} W_{Ram - Lak} (ν) & = 1, \end{matrix}

(25)

\begin{matrix} W_{Hann} (ν) & = \frac{1}{2} (1 + cos (π ν)), \end{matrix}

(26)

\begin{matrix} W_{Hamming} (ν) & = 0.54 + 0.46 cos (π ν), \end{matrix}

(27)

\begin{matrix} W_{Cosine} (ν) & = cos (\frac{π ν}{2}), \end{matrix}

(28)

\begin{matrix} W_{Shepp - Logan} (ν) & = \frac{sin (π ν / 2)}{π ν / 2} . \end{matrix}

(29)

6.3. Parallel-Beam FBP

Parallel beam requires no source-dependent distance weighting. Let

u_{n} (x)

denote the detector coordinate obtained by projecting voxel

x

onto the detector axis:

u_{n} (x) = (x - o_{n}) \cdot e_{u, n} .

(30)

After angular weighting and ramp filtering, diffct reconstructs via

B_{par} [q] (x) = \frac{1}{2 π} \sum_{n = 1}^{N_{views}} q_{n} (u_{n} (x)) .

(31)

The factor

1 / (2 π)

is applied inside parallel_weighted_backproject, so the wrapper already returns an amplitude-calibrated reconstruction.

6.4. Fan-Beam FBP

For fan beam geometry diffct computes a per-view detector normal

n_{n}

by rotating

e_{u, n}

by

90^{\circ}

and orienting it so that

(C_{n} - S_{n}) \cdot n_{n} > 0

. It then defines

{sdd}_{n} = (C_{n} - S_{n}) \cdot n_{n}, {sid}_{n} = (- S_{n}) \cdot n_{n}, U_{n} (x) = (x - S_{n}) \cdot n_{n} .

(32)

Here

{sdd}_{n}

and

{sid}_{n}

are the signed source-to-detector and source-to-isocenter distances along

n_{n}

, and

U_{n} (x)

is the signed distance from the source to the plane orthogonal to

n_{n}

through

x

; it reduces to

{sid}_{n}

when

x

is at the isocenter. For the classical circular short-scan specialization with constant

sdd

, the fan angle and cosine weight are

γ (u) = arctan (\frac{u}{sdd}), w_{cos}^{fan} (u) = \frac{sdd}{\sqrt{{sdd}^{2} + u^{2}}} = cos γ (u) .

(33)

For a short scan the Parker window used by the code is

w_{P} (β, γ) = \{\begin{matrix} {sin}^{2} (\frac{π}{4} \frac{β}{γ_{max} - γ}), & 0 \leq β < 2 (γ_{max} - γ), \\ 1, & 2 (γ_{max} - γ) \leq β \leq π - 2 γ, \\ {sin}^{2} (\frac{π}{4} \frac{π + 2 γ_{max} - β}{γ_{max} + γ}), & π - 2 γ < β \leq π + 2 γ_{max}, \\ 0, & otherwise, \end{matrix}

(34)

with

γ_{max} = {max}_{u} | γ (u) |

. This Parker window is the circular short-scan helper used by the code; the general trajectory-array fan pipeline on dev reuses without elevating Parker weighting to a generic arbitrary-trajectory claim. A voxel

x

is projected onto the detector plane through the source:

H_{n} (x) = S_{n} + \frac{{sdd}_{n}}{U_{n} (x)} (x - S_{n}), u_{n} (x) = (H_{n} (x) - C_{n}) \cdot e_{u, n} .

(35)

diffct then applies the weighted FBP gather

B_{fan} [q] (x) = \frac{\bar{{sdd}_{n}}}{2 π \bar{{sid}_{n}}} \sum_{n = 1}^{N_{views}} {(\frac{{sid}_{n}}{U_{n} (x)})}^{2} q_{n} (u_{n} (x)),

(36)

where the overline denotes the mean over views. For a circular orbit Equation (36) reduces to the textbook factor

U (β; x, y) = sid + x sin β - y cos β, u (β; x, y) = sdd \frac{x cos β + y sin β}{U (β; x, y)} .

(37)

6.5. Cone-Beam FDK

The cone-beam extension follows the same structure with detector normal

n_{n} = σ_{n} \frac{e_{u, n} \times e_{v, n}}{∥ e_{u, n} \times e_{v, n} ∥_{2}},

(38)

where

σ_{n}

is chosen so that

(C_{n} - S_{n}) \cdot n_{n} > 0

. The detector-normal distances are

{sdd}_{n} = (C_{n} - S_{n}) \cdot n_{n}, {sid}_{n} = (- S_{n}) \cdot n_{n}, U_{n} (x) = (x - S_{n}) \cdot n_{n} .

(39)

The cosine pre-weight is

w_{cos}^{cone} (u, v) = \frac{{sdd}_{n}}{\sqrt{{sdd}_{n}^{2} + u^{2} + v^{2}}} .

(40)

The detector intersection of a voxel

x

is

H_{n} (x) = S_{n} + \frac{{sdd}_{n}}{U_{n} (x)} (x - S_{n}),

(41)

with

u_{n} (x) = (H_{n} (x) - C_{n}) \cdot e_{u, n}, v_{n} (x) = (H_{n} (x) - C_{n}) \cdot e_{v, n} .

(42)

The weighted FDK gather used by the code is

B_{cone} [q] (x) = \frac{\bar{{sdd}_{n}}}{2 π \bar{{sid}_{n}}} \sum_{n = 1}^{N_{views}} {(\frac{{sid}_{n}}{U_{n} (x)})}^{2} q_{n} (u_{n} (x), v_{n} (x)) .

(43)

For a circular orbit this becomes

u (β; x, y) = sdd \frac{x cos β + y sin β}{U (β; x, y)}, v (β; x, y, z) = sdd \frac{z}{U (β; x, y)} .

(44)

diffct applies the ramp filter row-wise along the u direction, matching the standard FDK construction. For non-circular trajectories, Equation (43) is a principled implementation-level extension rather than an exact closed-form inversion.

6.6. Main-Branch Separable-Footprint Family

The main branch adds a second circular-orbit backprojection family alongside the Siddon-based weighted gather path. These SF options are exposed through backend selectors on the stable fan- and cone-beam APIs, remain restricted to circular geometry, and inherit detector-footprint logic from [9] together with the matched-adjoint implementation style used in LEAP [10,11]. For a 2D fan-beam pixel

x

, let the four projected pixel corners induce the ordered detector abscissae

u_{min} \leq u_{lo} \leq u_{hi} \leq u_{max} .

(45)

The detector footprint is then approximated by the trapezoid

h_{n, x}^{fan} (u) = \{\begin{matrix} \frac{u - u_{min}}{u_{lo} - u_{min}}, & u_{min} \leq u < u_{lo}, \\ 1, & u_{lo} \leq u \leq u_{hi}, \\ \frac{u_{max} - u}{u_{max} - u_{hi}}, & u_{hi} < u \leq u_{max}, \\ 0, & otherwise . \end{matrix}

(46)

The SF fan-beam gather used in main can therefore be summarized as

B_{fan}^{SF} [q] (x) = c_{fan}^{SF} \sum_{n} γ_{n, x}^{fan} \int_{R} h_{n, x}^{fan} (u) q_{n} (u) d u,

(47)

where the implementation uses a LEAP-inspired chord-weighted factor

γ_{n, x}^{fan} = \frac{{sid}_{n}}{U_{n} (x)} ℓ_{ϕ, n} (x),

(48)

with

ℓ_{ϕ, n} (x)

denoting the in-plane chord through the unit pixel along the source-to-pixel direction. Relative to the Siddon analytical gather in Equation (36), the distinguishing feature is the explicit integration over detector footprint support rather than point sampling at

u_{n} (x)

.

For 3D cone beam, the SF family stays separable:

h_{n, x}^{cone} (u, v) = h_{n, x}^{u} (u) h_{n, x}^{v} (v) .

(49)

The transaxial factor

h_{n, x}^{u}

is trapezoidal in both sf_tr and sf_tt. The axial factor

h_{n, x}^{v}

is rectangular for sf_tr and trapezoidal for sf_tt. The corresponding gather is

B_{cone}^{SF} [q] (x) = c_{cone}^{SF} \sum_{n} γ_{n, x}^{cone} \int_{R^{2}} h_{n, x}^{cone} (u, v) q_{n} (u, v) d u d v,

(50)

with the chord-weighted per-view factor implemented in the code base summarized by

γ_{n, x}^{cone} = ℓ_{ϕ, n} (x) \sqrt{1 + {(\frac{v_{n} (x)}{{sdd}_{n}})}^{2}} .

(51)

This is the mathematical reason the SF backends in main require dedicated kernels and smaller launch blocks: they integrate filtered detector values against voxel footprints, whereas the Siddon analytical path samples a single detector coordinate per voxel and view.

7. Software Architecture and CUDA Realization

The library is deliberately modular. diffct.projectors contains the PyTorch-facing autograd functions, diffct.geometry generates trajectory tensors, diffct.analytical builds analytical FBP/FDK pipelines, diffct.kernels stores the Numba CUDA kernels, diffct.utils handles device and stream management, and diffct.constants exposes low-level launch parameters and numerical constants. This separation is one of the clearest architectural improvements of the dev branch over the older monolithic diffct.differentiable interface.

Algorithm 2 Shared analytical preprocessing and the branch-specific backprojection stage used in diffct

Require:: sinogram $p$ , geometry tensors $Θ$ , short-scan flag $s \in {0, 1}$ , backprojection selector b, with $b = siddon$ on dev and $b \in {siddon, sf}$ on circular main fan/cone paths
Ensure:: reconstruction $\hat{f}$
1:: compute detector coordinates $u_{k}, v_{ℓ}$ ▹ Equation (1)
2:: ${Δ β_{n}}_{n = 1}^{N_{views}} \leftarrow$ TrapezoidalWeights( ${β_{n}}$ ) ▹ Equation (23)
3:: if geometry $\in {fan, cone}$ then
4:: $p_{n, k (, ℓ)} \leftarrow w_{cos} (u_{k} (, v_{ℓ})) \cdot p_{n, k (, ℓ)}$ ▹ Equations (33) and (40)
5:: end if
6:: if $s = 1$ then
7:: $p_{n, k} \leftarrow w_{P} (β_{n}, γ (u_{k})) \cdot p_{n, k}$ ▹ Parker window, Equation (34)
8:: end if
9:: $q_{n, \cdot} \leftarrow F^{- 1} \{W (ξ) | ξ | F {p_{n, \cdot}}\} / Δ u$ ▹ Equation (24)
10:: $q_{n, \cdot} \leftarrow Δ β_{n} \cdot q_{n, \cdot}$ ▹ angular integration weight
11:: $\hat{f} \leftarrow 0$
12:: for each voxel $x_{i}$ do
13:: if $b = siddon$ then
14:: ${\hat{f}}_{i} \leftarrow \sum_{n = 1}^{N_{views}} w_{n, i} {\tilde{q}}_{n} (u_{n} (x_{i}), v_{n} (x_{i}))$ ▹ point-sampled gather
15:: else
16:: ${\hat{f}}_{i} \leftarrow \sum_{n = 1}^{N_{views}} \sum_{k, ℓ} φ_{n, i} (u_{k}, v_{ℓ}) q_{n, k, ℓ}$ ▹ separable-footprint, Equation (51)
17:: end if
18:: end for
19:: $\hat{f} \leftarrow C_{geom} \cdot \hat{f}$ ▹ geometry-dependent global scale
20:: return $\hat{f}$

7.1. PyTorch and Numba Execution Model

diffct combines PyTorch’s tensor/autograd ecosystem [3] with Numba’s CUDA JIT compilation model [15]. The forward wrappers first normalize all inputs to contiguous float32 tensors on the active CUDA device. They then call TorchCUDABridge.tensor_to_cuda_array, which exposes zero-copy Numba views through cuda.as_cuda_array. This detail matters because the projector kernels operate directly on PyTorch-owned GPU memory rather than on copied buffers.

Stream semantics are equally important. diffct retrieves the active PyTorch stream and wraps it with a cached numba.cuda.external_stream handle. As a result, Numba kernel launches respect the same dependency order as surrounding PyTorch code. The implementation therefore avoids an entire class of subtle synchronization bugs in which PyTorch and Numba would otherwise enqueue work on unrelated streams.

7.2. Kernel Families and Decorators

The code base uses two explicit numerical kernel families.

Siddon projector and pure adjoint kernels are decorated with cuda.jit(cache=True, fastmath=True) to maximize throughput for ray traversal and the cell-constant segment accumulation.
Analytical FBP/FDK gather kernels are decorated with cuda.jit(cache=True, fastmath=False) because the Fourier-convention constants and the geometry-dependent magnification weights amplify round-off errors more strongly than the projector kernels do.

This design explains why the code distinguishes between “autograd backprojection” and “analytical backprojection” even when both appear to map projection data back into image space. The former is the exact discrete transpose used by backward; the latter is a weighted reconstruction operator intended to approximate inversion.

The default thread-block sizes are

(16, 16)

for 2D kernels and

(8, 8, 8)

for 3D kernels. The circular-orbit SF cone-beam kernels on main use smaller launch blocks than the default 3D Siddon kernels because the footprint integrals are more register-intensive. Cone-beam tensors are also permuted internally from

(D, H, W)

to

(W, H, D)

for memory-coalesced access in the 3D gather path and then permuted back before returning to the user.

7.3. Gradient Transport and Atomic Accumulation

The gradient path follows the same hardware design as the forward path. Each projector forward method stores geometry tensors with ctx.save_for_backward and keeps output-shape metadata in the autograd context. During backward, diffct allocates a zero-filled gradient image or volume, converts the incoming PyTorch gradient tensor into a Numba CUDA view, and launches the matched adjoint kernel on the same CUDA stream.

The backward kernel mirrors the cell-constant forward in Equation (16): forward kernels accumulate the traversed cell value weighted by the chord length

Δ t_{m}

along each ray, while backward kernels scatter the incoming ray-domain gradient back into the same cell with the same

Δ t_{m}

. Because multiple rays update the same pixel or voxel, the scatter step uses cuda.atomic.add. This is the implementation mechanism that makes the discrete adjoint relation in Equation (17) true to machine precision up to the expected floating-point tolerance.

8. Experimental Protocol and Repository Validation

8.1. Reference Phantoms and Example Setups

The analytical accuracy tests use Shepp–Logan phantoms in 2D and 3D. The public main branch additionally ships synthetic real-data pipelines and a walnut CBCT example. The circular phantom examples documented in this report follow three reference configurations:

parallel-beam FBP on a $256 \times 256$ phantom with 360 views and 512 detectors,
fan-beam FBP on a $256 \times 256$ phantom with 360 views, 600 detectors, $sid = 500$ , and $sdd = 800$ ,
cone-beam FDK on a $128^{3}$ phantom with 360 views, a $256 \times 256$ detector, $sid = 600$ , and $sdd = 900$ .

The non-circular iterative examples then replace circular trajectories by sinusoidal, wobbling, elliptical, spiral, saddle, and custom figure-eight variants while reusing the same projector classes.

8.2. Circular Analytical Reconstructions

The trajectory-array analytical helper line on dev inherits the amplitude correction introduced on main 1.2.10/1.2.11 by routing the filtered sinogram through the dedicated weighted gather kernels. The change log records the corresponding raw reconstruction errors listed in Table 2.

8.3. Iterative and Arbitrary-Trajectory Reconstructions

The iterative examples solve Equation (21) with a learned correction variable and an AdamW optimizer. Parallel-beam experiments compare sinusoidal and wobbling trajectories. Fan-beam experiments compare sinusoidal and elliptical source paths. Cone-beam experiments compare spiral, sinusoidal, saddle, and custom figure-eight trajectories. The dev branch ships generator scripts and plotting utilities for all of these families, so reproducing the iterative experiments requires no additional modelling effort beyond swapping the trajectory factory.

8.4. Correctness Tests

The repository-level validation strategy is unusually strong for a compact research library. On the public dev branch, the default test suite contains 62 tests; the current public main branch contains 71 tests, while an additional 27 benchmark cases are kept opt-in. The test categories summarized in Table 3 describe the shared validation themes across the two lines.

One representative test threshold is the cone-beam analytical reconstruction guard: the 3D Shepp–Logan FDK test requires a raw RMSE below

0.1

, a center-slice RMSE below

0.1

, bounded axial-profile deviation, finite values everywhere, and a reconstruction maximum below

1.5

. This combination protects against the exact classes of bugs that commonly appear in analytical CT implementations: wrong Fourier constants, omitted cosine weights, omitted

{(sid / U)}^{2}

factors, or detector-axis scaling mistakes.

8.5. Performance Benchmarking

The benchmark suite measures three kernels for each geometry: forward projector, pure adjoint backprojector, and the full analytical pipeline. The analytical benchmarks include the complete sequence of cosine weighting, ramp filtering, angular weighting, and weighted gather backprojection because this is the user-visible reconstruction cost. In the current public benchmark scripts, the measured kernels are the Siddon forward path, the pure adjoint path, and the full analytical pipeline; the SF backends are part of the stable main API but are not benchmarked separately in the published benchmark files.

9. Discussion

diffct now occupies a clean position in the CT software landscape. It is small enough to audit mathematically, but broad enough to support analytical reconstruction, differentiable inversion, and arbitrary-trajectory experimentation inside a single code base. The current dev line is especially attractive for research because it preserves the classical circular-orbit formulas as special cases while making trajectory tensors first-class objects, even though the public main branch has already moved beyond the 1.2.11 analytical-sync point.

The main limitation is equally clear. The weighted fan and cone analytical operators implemented in dev are a geometry-aware extension of FBP/FDK based on detector-normal distances and per-view mean scaling. For circular orbits they collapse to the familiar closed forms. For non-circular orbits they provide a practical and principled reconstruction heuristic, but they do not claim an exact inversion formula beyond the classical setting. A second limitation is branch asymmetry: the main branch still owns the SF backends, so the highest-fidelity detector-footprint model is currently available only for circular trajectories. A third limitation is gradient scope. The present autograd wrappers propagate gradients through images and sinograms, while geometry tensors are runtime inputs with no returned gradients. A fourth limitation is hardware: the entire library assumes CUDA-capable execution and uses PyTorch plus Numba as its software substrate.

The branch split also reveals a natural roadmap. The main branch proves that separable-footprint projectors are useful in the circular setting. The dev branch proves that the generic per-view tensor API is the right abstraction for arbitrary trajectories. Bridging those two lines of work will require a deeper reformulation of footprint models for detector planes that are no longer aligned with a closed-form circular geometry.

10. Conclusions

This report rewrites diffct from the code upward. The stable main branch contributes the circular-orbit baseline, while the dev branch turns the Siddon-based projector/backprojector interface into a general arbitrary-trajectory CT framework. The resulting mathematical picture is compact: projector kernels implement Siddon traversal with cell-constant segment integration; autograd uses exact discrete adjoints with fixed geometry tensors; analytical reconstruction uses explicit cosine weighting, Parker short-scan weighting in the circular fan-beam specialization, FFT-based ramp filtering, angular integration weights, and dedicated weighted gather backprojectors. With this structure in place, diffct is already well positioned for deep-learning integration, gradient-based reconstruction, and arbitrary-trajectory CT research.

Appendix A. Implementation Formula Reference

For quick reference, the dev-line geometry and analytical-helper equations used throughout the report can be grouped as follows.

Detector coordinates.

u_{k} = (k - \frac{N_{u}}{2}) Δ u + u_{0}, v_{ℓ} = (ℓ - \frac{N_{v}}{2}) Δ v + v_{0} .

(A1)

Forward operators.

\begin{matrix} P_{par} [f] (n, k) & = \int_{R} f (o_{n} + u_{k} e_{u, n} + t d_{n}) d t, \end{matrix}

(53)

\begin{matrix} P_{fan} [f] (n, k) & = \int_{0}^{\infty} f (S_{n} + t {\hat{d}}_{n, k}) d t, \end{matrix}

(54)

\begin{matrix} P_{cone} [f] (n, k, ℓ) & = \int_{0}^{\infty} f (S_{n} + t {\hat{d}}_{n, k, ℓ}) d t . \end{matrix}

(55)

Siddon discretization.

P [f] (ray) \approx \sum_{m = 0}^{M - 1} Δ t_{m} f_{cell (m)} .

(A2)

Adjoint relation and inverse problem.

〈 P_{Θ} x, y 〉 = 〈 x, P_{Θ}^{⊤} y 〉, \hat{f} = arg min_{f} {∥ P_{Θ} f - p ∥}_{2}^{2} + λ R (f), \nabla_{f} L (f) = 2 P_{Θ}^{⊤} (P_{Θ} f - p) + λ \nabla R (f) .

(A3)

\frac{\partial L}{\partial x} = P_{Θ}^{⊤} \frac{\partial L}{\partial y}, \frac{\partial L}{\partial s} = P_{Θ} \frac{\partial L}{\partial z}, \frac{\partial L}{\partial Θ} = 0 in the current implementation .

(A4)

Angular weights and ramp filter.

Δ β_{1} = \frac{β_{2} - β_{1}}{2}, Δ β_{n} = \frac{(β_{n} - β_{n - 1}) + (β_{n + 1} - β_{n})}{2}, Δ β_{N} = \frac{β_{N} - β_{N - 1}}{2},

(A5)

q = F^{- 1} \{| 2 π ξ | W (ξ) F [p_{w}]\} \frac{1}{Δ s} .

(A6)

Cosine weights.

w_{cos}^{fan} (u) = \frac{sdd}{\sqrt{{sdd}^{2} + u^{2}}}, w_{cos}^{cone} (u, v) = \frac{sdd}{\sqrt{{sdd}^{2} + u^{2} + v^{2}}} .

(A7)

Weighted analytical backprojection.

\begin{matrix} B_{par} [q] (x) & = \frac{1}{2 π} \sum_{n} q_{n} (u_{n} (x)), \end{matrix}

(62)

\begin{matrix} B_{fan} [q] (x) & = \frac{\bar{{sdd}_{n}}}{2 π \bar{{sid}_{n}}} \sum_{n} {(\frac{{sid}_{n}}{U_{n} (x)})}^{2} q_{n} (u_{n} (x)), \end{matrix}

(63)

\begin{matrix} B_{cone} [q] (x) & = \frac{\bar{{sdd}_{n}}}{2 π \bar{{sid}_{n}}} \sum_{n} {(\frac{{sid}_{n}}{U_{n} (x)})}^{2} q_{n} (u_{n} (x), v_{n} (x)) . \end{matrix}

(64)

Separable-footprint backprojection on main.

h_{n, x}^{fan} (u) = \{\begin{matrix} \frac{u - u_{min}}{u_{lo} - u_{min}}, & u_{min} \leq u < u_{lo}, \\ 1, & u_{lo} \leq u \leq u_{hi}, \\ \frac{u_{max} - u}{u_{max} - u_{hi}}, & u_{hi} < u \leq u_{max}, \\ 0, & otherwise, \end{matrix}

(A8)

\begin{matrix} B_{fan}^{SF} [q] (x) = c_{fan}^{SF} \sum_{n} γ_{n, x}^{fan} \int h_{n, x}^{fan} (u) q_{n} (u) d u, \end{matrix}

(66)

\begin{matrix} B_{cone}^{SF} [q] (x) = c_{cone}^{SF} \sum_{n} γ_{n, x}^{cone} \int \int h_{n, x}^{u} (u) h_{n, x}^{v} (v) q_{n} (u, v) d u d v . \end{matrix}

(67)

Table A1. Crosswalk from the report’s mathematical notation to the current public diffct API, including projector classes, trajectory generators, analytical helpers, and the main-only SF backend names.

Report object	Code object
$P_{par}, P_{fan}, P_{cone}$	`ParallelProjectorFunction`, `FanProjectorFunction`, `ConeProjectorFunction`
$P_{par}^{⊤}, P_{fan}^{⊤}, P_{cone}^{⊤}$	`ParallelBackprojectorFunction`, `FanBackprojectorFunction`, `ConeBackprojectorFunction`
Circular trajectories	`circular_trajectory_2d_parallel`, `circular_trajectory_2d_fan`, `circular_trajectory_3d`
Non-circular trajectories	`spiral_trajectory_3d`, `sinusoidal_trajectory_`, `saddle_trajectory_3d`, `random_trajectory_3d`, `custom_trajectory_`
$u_{k}, v_{ℓ}$ detector coordinates	`detector_coordinates_1d`
$Δ β_{n}$ angular weights	`angular_integration_weights`
$w_{cos}^{fan}, w_{cos}^{cone}$	`fan_cosine_weights`, `cone_cosine_weights`
$w_{P}$ Parker window	`parker_weights`
Ramp filter q	`ramp_filter_1d`
$B_{par}, B_{fan}, B_{cone}$	`parallel_weighted_backproject`, `fan_weighted_backproject`, `cone_weighted_backproject`
Main-branch SF backend selectors	`backend="sf"` for fan beam, `backend="sf_tr"` and `backend="sf_tt"` for cone beam

References

Kak, A.C.; Slaney, M. Principles of Computerized Tomographic Imaging; Society for Industrial and Applied Mathematics: Philadelphia, PA, 2001. [Google Scholar] [CrossRef]
Sun, Y. diffct: Differentiable Computed Tomography Reconstruction with CUDA. https://doi.org/10.5281/zenodo.14999333, 2025. Zenodo software release. [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. [Google Scholar]
Siddon, R.L. Fast Calculation of the Exact Radiological Path for a Three-Dimensional CT Array. Med. Phys. 1985, 12, 252–255. [Google Scholar] [CrossRef] [PubMed]
Feldkamp, L.A.; Davis, L.C.; Kress, J.W. Practical Cone-Beam Algorithm. J. Opt. Soc. Am. A 1984, 1, 612–619. [Google Scholar] [CrossRef]
Parker, D.L. Optimal Short Scan Convolution Reconstruction for Fanbeam CT. Med. Phys. 1982, 9, 254–257. [Google Scholar] [CrossRef] [PubMed]
Syben, C.; Michen, M.; Stimpel, B.; Seitz, S.; Ploner, S.; Maier, A.K. Technical Note: PYRO-NN: Python Reconstruction Operators in Neural Networks. Med. Phys. 2019, 46, 5110–5115. [Google Scholar] [CrossRef] [PubMed]
Thies, M.; Wagner, F.; Maul, N.; Folle, L.; Meier, M.; Rohleder, M.; Schneider, L.S.; Pfaff, L.; Gu, M.; Utz, J.; et al. Gradient-Based Geometry Learning for Fan-Beam CT Reconstruction. Phys. Med. Biol. 2023, 68, 205004. [Google Scholar] [CrossRef] [PubMed]
Long, Y.; Fessler, J.A.; Balter, J.M. 3D Forward and Back-Projection for X-Ray CT Using Separable Footprints. IEEE Trans. Med. Imaging 2010, 29, 1839–1850. [Google Scholar] [CrossRef] [PubMed]
Kim, H.; Champley, K. Differentiable Forward Projector for X-ray Computed Tomography, 2023, [arXiv:eess.IV/2307.05801].
Champley, K.; Kim, H. LivermorE AI Projector for Computed Tomography Tasks (LEAP). 2023. Software release. [CrossRef]
Tuy, H.K. An Inversion Formula for Cone-Beam Reconstruction. SIAM J. Appl. Math. 1983, 43, 546–552. [Google Scholar] [CrossRef]
Smith, B.D. Image Reconstruction from Cone-Beam Projections: Necessary and Sufficient Conditions and Reconstruction Methods. IEEE Trans. Med. Imaging 1985, 4, 14–25. [Google Scholar] [CrossRef] [PubMed]
Ye, C.; Schneider, L.S.; Sun, Y.; Thies, M.; Mei, S.; Maier, A.K. DRACO: Differentiable Reconstruction for Arbitrary CBCT Orbits. Phys. Med. Biol. 2025, 70, 075005. [Google Scholar] [CrossRef] [PubMed]
Lam, S.K.; Pitrou, A.; Seibert, S. Numba: A LLVM-Based Python JIT Compiler. In Proceedings of the Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 2015; pp. 1–6. [Google Scholar] [CrossRef]

Table 1. Repository lineage used by this report.

Aspect	`main` branch	`dev` branch
Release status	Stable, versioned, published on PyPI	Development branch, source install only
Geometry API	Circular-orbit closed-form parameters	Explicit per-view arrays for arbitrary trajectories
Supported scan families	Parallel, fan, cone on circular orbits	Parallel, fan, cone on circular and non-circular trajectories
Projector backends	Siddon plus SF variants (`sf`, `sf_tr`, `sf_tt`)	Siddon family generalized to arbitrary trajectories
Analytical pipeline	Current stable circular pipeline on the 1.3.4 release line	Trajectory-array adaptation of the `main` 1.2.10/1.2.11 analytical overhaul
Testing focus	Circular-orbit reference behavior	Circular plus arbitrary-trajectory correctness and parity
Deferred feature	Separable-footprint backends available	Separable-footprint backends not yet generalized

Table 2. Circular analytical reconstructions documented in the synchronized code base.

Example	Raw MSE	Reported reconstruction range
`circular_trajectory/fbp_parallel.py`	$\approx 0.00540$	$[- 0.03, 1.01]$
`circular_trajectory/fbp_fan.py`	$\approx 0.00509$	$[- 0.11, 1.03]$
`circular_trajectory/fdk_cone.py`	$\approx 0.00737$	$[- 0.13, 1.11]$

Table 3. Validation layers encoded in the repository.

Test group	What it protects
`test_adjoint_inner_product.py`	Verifies $〈 A x, y 〉 = 〈 x, A^{⊤} y 〉$ for parallel, fan, and cone operators
`test_gradcheck.py`	Confirms that CUDA backward kernels match finite-difference Jacobians for all projector classes
`test_cuda_smoke.py`	Checks end-to-end execution of projector, adjoint, and analytical helper paths
`test_weights.py`	Verifies detector coordinates, angular weights, cosine weights, and Parker weights
`test_ramp_filter_windows.py`	Verifies ramp-filter windows, DC behavior, FFT parity, and detector-spacing scaling
`test_fbp_*_accuracy.py`, `test_fdk_cone_accuracy.py`	Bound RMSE and amplitude for analytical circular reconstructions
`test_fbp_fan_offsets.py`, `test_fdk_cone_offsets.py`	Checks that shifted trajectory arrays are reconstructed correctly
`test_cone_projector_autograd.py`	Guards cone-projector gradient finiteness and consistency under circular and spiral trajectories

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

diffct: Differentiable CT Operators from Circular Orbits to Arbitrary Trajectories

Abstract

Keywords:

Subject:

1. Introduction

Code and availability.

2. Related Work

3. Branch Lineage and Scope

3.1. Why the Report Is Centered on dev

3.2. Main Versus Dev

4. Unified Geometry Model

4.1. Discrete Detector Coordinates

4.2. Parallel-Beam Parameterization

4.3. Fan-Beam Parameterization

4.4. Cone-Beam Parameterization

4.5. Non-Circular Trajectories

5. Differentiable Forward and Adjoint Operators

5.1. Continuous Forward Model

5.2. Siddon Traversal with Cell-Constant Integration

5.3. Adjoint Identity and Autograd

6. Analytical Reconstruction in diffct

6.1. Detector Coordinates and Angular Weights

6.2. Ramp Filter and Apodization

6.3. Parallel-Beam FBP

6.4. Fan-Beam FBP

6.5. Cone-Beam FDK

6.6. Main-Branch Separable-Footprint Family

7. Software Architecture and CUDA Realization

7.1. PyTorch and Numba Execution Model

7.2. Kernel Families and Decorators

7.3. Gradient Transport and Atomic Accumulation

8. Experimental Protocol and Repository Validation

8.1. Reference Phantoms and Example Setups

8.2. Circular Analytical Reconstructions

8.3. Iterative and Arbitrary-Trajectory Reconstructions

8.4. Correctness Tests

8.5. Performance Benchmarking

9. Discussion

10. Conclusions

Appendix A. Implementation Formula Reference

Detector coordinates.

Forward operators.

Siddon discretization.

Adjoint relation and inverse problem.

Angular weights and ramp filter.

Cosine weights.

Weighted analytical backprojection.

Separable-footprint backprojection on main.

References

MDPI Initiatives

Important Links

Subscribe

3.1. Why the Report Is Centered on `dev`

6. Analytical Reconstruction in `diffct`