Preprint
Technical Note

This version is not peer-reviewed.

diffct: Differentiable CT Operators from Circular Orbits to Arbitrary Trajectories

Submitted:

21 May 2026

Posted:

21 May 2026

You are already at the latest version

Abstract
diffct is a CUDA-accelerated computed tomography library that exposes differentiable forward operators and their exact discrete adjoints for 2D parallel beam, 2D fan beam, and 3D cone beam imaging. The main branch provides the stable circular-orbit lineage released on PyPI, including Siddon and separable-footprint (SF) projector families, while the dev branch extends the Siddon-based projector/backprojector interface to arbitrary per-view trajectories through explicit source and detector arrays. This report rewrites the project description directly from the current source code, examples, tests, and the related CT literature. We formalize the geometry parameterization used by the implementation, derive the differentiable Siddon-style projector and its exact discrete adjoint, explain how gradients are transported through torch.autograd.Function wrappers backed by Numba CUDA kernels, document the analytical filtered backprojection and Feldkamp–Davis–Kress pipelines implemented on main and ported into the dev, and record how the circular-orbit SF algorithms from main fit into the broader architecture.
Keywords: 
;  ;  ;  ;  

1. Introduction

Computed tomography (CT) reconstruction still relies on three closely related operator types that must agree with each other numerically: a forward model that maps an attenuation field to projection data, a pure adjoint backprojection that propagates information back to image space, and reconstruction operators that approximate inversion under additional geometry assumptions [1]. Modern learning-based pipelines add a differentiability requirement so that image-domain variables, regularization parameters, and surrounding network parameters can be optimized end to end. diffct addresses this intersection by combining CUDA kernels, PyTorch autograd wrappers, and analytical reconstruction helpers in one small library [2,3].
The current repository has two public branch lineages. The main branch is the stable circular-orbit version published on PyPI. The dev branch keeps the same three geometry families and the same Siddon projector/backprojector class names, while replacing closed-form circular geometry parameters by explicit per-view trajectory tensors. In practice this means that circular, spiral, sinusoidal, saddle, random, and user-defined trajectories all flow through the same projector kernels without changing the CUDA implementation. The trajectory-array analytical helpers on dev track the overhaul merged into main 1.2.10/1.2.11; the current public main branch has since advanced to the 1.3.4 release line.
The report focuses on four contributions. First, it gives a single notation for the main circular-orbit parameterization and the dev arbitrary-trajectory generalization. Second, it records the differentiable projector/backprojector pair implemented in the source, following the discrete ray-tracing lineage of [4]. Third, it derives the analytical reconstruction helpers actually used in the examples and tests, including cosine pre-weighting, Parker weighting, ramp filtering, angular integration weights, and weighted voxel-driven backprojection [5,6]. Fourth, it summarizes the repository-level validation logic encoded in adjoint, gradcheck, reconstruction-accuracy, offset, smoke, and benchmark tests, positioning diffct relative to differentiable CT libraries such as PYRO-NN and recent geometry-learning work [7,8].

Code and availability.

The source repository is hosted at https://github.com/sypsyp97/diffct, and stable releases are distributed through PyPI at https://pypi.org/project/diffct/ (pip install diffct).

3. Branch Lineage and Scope

3.1. Why the Report Is Centered on dev

The dev branch is the most complete mathematical target for a full report because it captures the generalized Siddon-based interface for circular and non-circular trajectories. The main branch remains the reference line for the circular-orbit SF family. The public API on dev re-exports the same high-level Siddon operator families:
  • ParallelProjectorFunction / ParallelBackprojectorFunction,
  • FanProjectorFunction / FanBackprojectorFunction,
  • ConeProjectorFunction / ConeBackprojectorFunction,
  • geometry generators for parallel, fan, and cone beam scanning,
  • analytical helpers for FBP/FDK pipelines adapted to trajectory arrays.
The report therefore uses the dev Siddon/trajectory-array interface as the primary mathematical model and treats the main branch as the circular-orbit specialization and SF reference line.

3.2. Main Versus Dev

Table 1 summarizes the operational distinction between the two branches.
The unresolved gap is equally important. The main branch contains separable-footprint backends that depend on closed-form circular geometry and therefore remain branch-local. In 2D fan beam, backend="sf" denotes a separable trapezoidal detector footprint. In 3D cone beam, backend="sf_tr" uses a transaxial trapezoid with an axial rectangle, while backend="sf_tt" uses trapezoids in both detector directions, following the separable-footprint taxonomy in [9]. The dev branch keeps the same autograd structure and trajectory-array analytical helpers but omits these SF kernels because their footprint construction is still specialized to circular geometry.
The branch split also creates explicit compatibility costs. Existing code written against scalar angles/sid/sdd/backend signatures must now construct per-view geometry tensors before calling the projector classes. The separable-footprint backends available on main are absent on dev. The detector-grid convention on the dev analytical helpers uses ( k N / 2 ) Δ u , whereas the current main circular closed-form helpers still center detector bins with ( k ( N 1 ) / 2 ) Δ u , so copying parameters across branches without adjustment introduces a half-cell shift for even detector sizes. The report therefore uses the dev equations as the global interface and treats the main SF family as an important circular-orbit specialization that still needs to be documented explicitly.

4. Unified Geometry Model

4.1. Discrete Detector Coordinates

The trajectory-array analytical helpers on dev use the detector-cell convention
u k = k N u 2 Δ u + u 0 , v = N v 2 Δ v + v 0 ,
where N u and N v denote detector sizes, Δ u and Δ v denote detector spacing, and u 0 , v 0 are optional offsets. This section follows the dev analytical-helper convention with the factor N u / 2 rather than ( N u 1 ) / 2 ; the current main circular closed-form helpers still use ( N u 1 ) / 2 .

4.2. Parallel-Beam Parameterization

For a view index n, the dev branch represents 2D parallel geometry by a unit ray direction d n R 2 , a detector origin o n R 2 , and a detector axis e u , n R 2 perpendicular to d n . Detector sample k is anchored at
r n , k ( 0 ) = o n + u k e u , n , r n , k ( t ) = r n , k ( 0 ) + t d n , d n 2 = 1 .
The circular-orbit helper used as the main specialization is
d ( θ ) = cos θ sin θ , o ( θ ) = d det sin θ d det cos θ , e u ( θ ) = sin θ cos θ .
Here θ [ 0 , π ) is the non-redundant ray-direction angle used by the implementation, and d det is the detector-to-isocenter distance. θ differs from the classical Radon-transform normal angle by a fixed π / 2 reparameterization.

4.3. Fan-Beam Parameterization

For fan beam CT the implementation stores a source position S n R 2 , a detector center C n R 2 , and a detector axis e u , n R 2 . The detector point associated with cell k is
D n , k = C n + u k e u , n , d ^ n , k = D n , k S n D n , k S n 2 .
The circular-orbit specialization used on the stable branch is
S ( β ) = sid sin β sid cos β , C ( β ) = ( sdd sid ) sin β ( sdd sid ) cos β , e u ( β ) = cos β sin β .
Here β [ 0 , 2 π ) is the view angle, sid denotes the source-to-isocenter distance, and sdd denotes the source-to-detector distance. Per-view quantities reuse the subscript n (e.g. sid n below).

4.4. Cone-Beam Parameterization

For cone beam CT each view uses a source S n R 3 , detector center C n R 3 , and orthonormal detector basis vectors e u , n , e v , n R 3 . Detector coordinate ( k , ) maps to
D n , k , = C n + u k e u , n + v e v , n , d ^ n , k , = D n , k , S n D n , k , S n 2 .
The circular-orbit 3D helper is
S ( β ) = sid sin β sid cos β 0 , C ( β ) = ( sdd sid ) sin β ( sdd sid ) cos β 0 , e u ( β ) = cos β sin β 0 , e v = 0 0 1 .

4.5. Non-Circular Trajectories

The arbitrary-trajectory generators in diffct.geometry instantiate several useful families. For example, the spiral cone-beam trajectory uses
β n = β 0 + 2 π N turns N views n , z n z max 2 , z max 2 ,
where N views is the number of views in the scan, N turns is the number of full helical turns, β 0 is the start angle, and z n is the axial source offset of view n bounded by z max .
S n spiral = sid sin β n sid cos β n z n , C n spiral = ( sdd sid ) sin β n ( sdd sid ) cos β n z n .
The sinusoidal and saddle variants modulate the source-to-isocenter distance and axial position,
sid n = sid + A sin ( ω β n ) ,
z n = A z cos ( 2 β n ) , sid n = sid + A r sin ( 2 β n ) ,
where A , A r are radial modulation amplitudes, A z is the axial modulation amplitude, and ω Z + is the angular frequency of the sinusoidal wobble. and the custom generators accept user-defined source paths or ray fields. The mathematical consequence is simple: every downstream operator only sees the per-view geometry tensors, so the projector definition is trajectory agnostic.
This abstraction is stronger than the analytical theory that supports it. The forward and adjoint Siddon operators only require valid ray definitions, so they transfer immediately from circular to non-circular orbits. Exact cone-beam inversion, by contrast, depends on data-completeness conditions of the Tuy–Smith type [12,13]. diffct therefore treats arbitrary trajectories as a first-class capability for projection, adjoint backprojection, and iterative optimization, while its analytical fan/cone reconstruction helpers should be interpreted as geometry-aware practical reconstructions whose exactness is guaranteed in the classical circular settings and whose usefulness outside that regime is empirical.
Figure 1. Unified geometry parameterization used by diffct. The parallel-beam panel shows the ray direction d ( θ ) , detector origin o ( θ ) , and detector axis e u ; the fan- and cone-beam panels highlight the source S ( β ) , detector plane, and basis vectors e u , e v ; the fourth panel previews circular, sinusoidal, spiral, and saddle trajectories available through the diffct.geometry generators.
Figure 1. Unified geometry parameterization used by diffct. The parallel-beam panel shows the ray direction d ( θ ) , detector origin o ( θ ) , and detector axis e u ; the fan- and cone-beam panels highlight the source S ( β ) , detector plane, and basis vectors e u , e v ; the fourth panel previews circular, sinusoidal, spiral, and saddle trajectories available through the diffct.geometry generators.
Preprints 214637 g001

5. Differentiable Forward and Adjoint Operators

5.1. Continuous Forward Model

Let f : R d R denote the attenuation field, with d = 2 for the parallel/fan-beam operators and d = 3 for the cone-beam operator, and let t R be the ray arc-length parameter. The projector implemented in diffct computes line integrals along rays determined by the geometry tensors. For parallel beam,
P par [ f ] ( n , k ) = R f r n , k ( t ) d t ,
with r n , k ( t ) from Equation (2). For fan beam,
P fan [ f ] ( n , k ) = 0 f S n + t d ^ n , k d t ,
and for cone beam,
P cone [ f ] ( n , k , ) = 0 f S n + t d ^ n , k , d t .
These equations are branch-independent. The difference between main and dev lies only in how the trajectory tensors are produced.

5.2. Siddon Traversal with Cell-Constant Integration

The CUDA kernels implement a Siddon-style voxel traversal [4]. Consider a ray r ( t ) = o + t d passing through an axis-aligned voxel grid. Intersections with grid planes satisfy
t m ( q ) = b m ( q ) o m d m , m { x , y , z } ,
where o and d are the ray origin and direction, o m and d m are their m-th components, q indexes the grid planes along axis m, and b m ( q ) is the corresponding grid-plane coordinate. Sorting all valid intersection parameters yields segment boundaries t 0 < t 1 < < t M inside the volume, where M is the number of segments the ray cuts across the grid. On each segment the attenuation field is taken to be constant at the value of the single pixel (2D) or voxel (3D) that the segment traverses, so the continuous line integral is approximated as
P [ f ] ( ray ) m = 0 M 1 Δ t m f cell ( m ) , Δ t m = t m + 1 t m ,
where cell ( m ) denotes the discrete grid index of the cell traversed by segment m and Δ t m is the exact chord length of the segment inside that cell. This matches the inner loop of the diffct projector kernels, which step from one grid-plane crossing to the next and accumulate d_image[iy, ix] * seg_len (2D) or d_vol[iz, iy, ix] * seg_len (3D) without any sub-cell interpolation. The forward/adjoint kernel pair is therefore matched by construction: the adjoint scatters the incoming ray-domain gradient into the same cell with the same segment length, so the discrete transpose identity holds up to floating-point tolerance (verified by test_adjoint_inner_product.py).

5.3. Adjoint Identity and Autograd

diffct exposes the projector and backprojector as paired torch.autograd.Function classes. The design target is the linear-adjoint relation
P Θ x , y = x , P Θ y ,
where Θ collects all geometry tensors. The test suite checks Equation (17) for parallel, fan, and cone beam projectors. The backward pass of a forward projector therefore executes the pure adjoint backprojection kernel rather than an analytical FBP/FDK kernel.
This distinction is essential. The analytical reconstruction operators introduced later are not the same as the autograd adjoints because analytical FBP/FDK require geometry-dependent weighting and scaling. By contrast, the autograd backward uses the exact discrete transpose of the corresponding forward kernel. If
y = P Θ x , z = P Θ s ,
then the implemented backward rules are
L x = P Θ L y , L Θ = 0 for the current wrappers ,
L s = P Θ L z , L Θ = 0 for the current wrappers .
The zero geometry gradient in is an implementation fact, not a mathematical impossibility: the source and detector tensors are saved in the autograd context and reused during backward, but the backward methods return None for geometry arguments. In other words, diffct currently differentiates with respect to image and sinogram variables while treating geometry as fixed runtime input. This is consistent with the repository’s present optimization examples, which solve for reconstructions while holding trajectories fixed.
For a least-squares inverse problem
f ^ = arg min f P Θ f p 2 2 + λ R ( f ) ,
the data-term gradient is
f 1 2 P Θ f p 2 2 = P Θ ( P Θ f p ) .
This is exactly the quantity propagated by PyTorch through the diffct projector classes. At kernel level, the forward path accumulates line integrals ray by ray, whereas the backward path revisits the same traversed segments and scatters each incoming gradient into the single pixel or voxel that the segment traverses, weighted by the segment length Δ t m of Equation (16). Because many rays contribute to the same image cell, the backward kernels use cuda.atomic.add to preserve the discrete transpose relation.
Algorithm 1 Autograd contract of the diffct projector wrappers
Require:  
image or volume x , geometry tensors Θ , output sinogram shape
Ensure:  
sinogram y = P Θ x ; on Backward, image gradient g x = P Θ g y
1:
procedureForward( x , Θ )
2:
     x float32 ( x ) . contiguous ( ) . cuda ( )
3:
     x ˜ , Θ ˜ cuda _ as _ array ( x ) , cuda _ as _ array ( Θ )     ▹ zero-copy Numba views
4:
    allocate y 0
5:
    KernelForward( x ˜ , Θ ˜ , y )     ▹ Siddon gather, current torch stream
6:
     ctx . save ( Θ )
7:
    return  y
8:
end procedure
9:
procedureBackward( g y )     ▹ g y = L / y
10:
     Θ ctx . load ( )
11:
    allocate g x 0
12:
    KernelAdjoint( g y , Θ ˜ , g x )     ▹ cuda.atomic.add scatter
13:
    return  ( g x , None , , None )     ▹ no gradient w.r.t. Θ
14:
end procedure
Figure 2. Ray-driven forward gather and atomic-scatter adjoint. The forward kernel accumulates cell-constant voxel contributions along each ray (each traversed cell value weighted by its exact chord length) and produces the sinogram entry y. The adjoint kernel revisits the same traversed cells and scatters the incoming gradient back with cuda.atomic.add, which realizes the discrete transpose relation P Θ x , y = x , P Θ y tested by test_adjoint_inner_product.py. The third panel sketches how the paired kernels plug into the PyTorch autograd contract for x y and L / y L / x .
Figure 2. Ray-driven forward gather and atomic-scatter adjoint. The forward kernel accumulates cell-constant voxel contributions along each ray (each traversed cell value weighted by its exact chord length) and produces the sinogram entry y. The adjoint kernel revisits the same traversed cells and scatters the incoming gradient back with cuda.atomic.add, which realizes the discrete transpose relation P Θ x , y = x , P Θ y tested by test_adjoint_inner_product.py. The third panel sketches how the paired kernels plug into the PyTorch autograd contract for x y and L / y L / x .
Preprints 214637 g002

6. Analytical Reconstruction in diffct

6.1. Detector Coordinates and Angular Weights

The analytical module builds FBP/FDK pipelines out of small helpers. Detector coordinates reuse the dev convention in Equation (1). For open angle lists and short scans, angular integration weights are trapezoidal:
Δ β 1 = β 2 β 1 2 , Δ β n = ( β n β n 1 ) + ( β n + 1 β n ) 2 , Δ β N views = β N views β N views 1 2 .
Here { β n } n = 1 N views are the per-view angles and Δ β n is the trapezoidal angular integration weight attached to view n. For periodic full circular fan or cone scans, the implementation closes the angle loop before applying the extra redundancy factor M M 1 / 2 . For a short scan with Parker weighting, the open-interval trapezoidal weights in Equation (23) are kept and redundancy is handled by the Parker window itself.

6.2. Ramp Filter and Apodization

Given a weighted sinogram p w , the filtered projection is computed by a 1D FFT-based ramp filter along the detector axis:
q = F 1 | 2 π ξ | W ( ξ ) F [ p w ] 1 Δ s ,
where F and F 1 are the 1-D Fourier transform and its inverse along the detector axis, ξ is the physical Fourier frequency obtained from the normalized torch.fft.fftfreq grid after division by the detector spacing, W ( ξ ) is an apodization window, and Δ s is the detector spacing along the filtered axis (either Δ u or Δ v depending on the pipeline). The explicit factor 1 / Δ s is the implementation’s conversion from sample frequency to physical units. The apodization window W ( ξ ) supports Ram–Lak, Hann, Hamming, cosine, and Shepp–Logan choices. Using the normalized frequency ν = 2 | ξ | [ 0 , 1 ] , the implemented window functions are
W Ram Lak ( ν ) = 1 ,
W Hann ( ν ) = 1 2 1 + cos ( π ν ) ,
W Hamming ( ν ) = 0.54 + 0.46 cos ( π ν ) ,
W Cosine ( ν ) = cos π ν 2 ,
W Shepp Logan ( ν ) = sin ( π ν / 2 ) π ν / 2 .

6.3. Parallel-Beam FBP

Parallel beam requires no source-dependent distance weighting. Let u n ( x ) denote the detector coordinate obtained by projecting voxel x onto the detector axis:
u n ( x ) = x o n · e u , n .
After angular weighting and ramp filtering, diffct reconstructs via
B par [ q ] ( x ) = 1 2 π n = 1 N views q n u n ( x ) .
The factor 1 / ( 2 π ) is applied inside parallel_weighted_backproject, so the wrapper already returns an amplitude-calibrated reconstruction.

6.4. Fan-Beam FBP

For fan beam geometry diffct computes a per-view detector normal n n by rotating e u , n by 90 and orienting it so that ( C n S n ) · n n > 0 . It then defines
sdd n = ( C n S n ) · n n , sid n = ( S n ) · n n , U n ( x ) = ( x S n ) · n n .
Here sdd n and sid n are the signed source-to-detector and source-to-isocenter distances along n n , and U n ( x ) is the signed distance from the source to the plane orthogonal to n n through x ; it reduces to sid n when x is at the isocenter. For the classical circular short-scan specialization with constant sdd , the fan angle and cosine weight are
γ ( u ) = arctan u sdd , w cos fan ( u ) = sdd sdd 2 + u 2 = cos γ ( u ) .
For a short scan the Parker window used by the code is
w P ( β , γ ) = sin 2 π 4 β γ max γ , 0 β < 2 ( γ max γ ) , 1 , 2 ( γ max γ ) β π 2 γ , sin 2 π 4 π + 2 γ max β γ max + γ , π 2 γ < β π + 2 γ max , 0 , otherwise ,
with γ max = max u | γ ( u ) | . This Parker window is the circular short-scan helper used by the code; the general trajectory-array fan pipeline on dev reuses without elevating Parker weighting to a generic arbitrary-trajectory claim. A voxel x is projected onto the detector plane through the source:
H n ( x ) = S n + sdd n U n ( x ) x S n , u n ( x ) = H n ( x ) C n · e u , n .
diffct then applies the weighted FBP gather
B fan [ q ] ( x ) = sdd n ¯ 2 π sid n ¯ n = 1 N views sid n U n ( x ) 2 q n u n ( x ) ,
where the overline denotes the mean over views. For a circular orbit Equation (36) reduces to the textbook factor
U ( β ; x , y ) = sid + x sin β y cos β , u ( β ; x , y ) = sdd x cos β + y sin β U ( β ; x , y ) .

6.5. Cone-Beam FDK

The cone-beam extension follows the same structure with detector normal
n n = σ n e u , n × e v , n e u , n × e v , n 2 ,
where σ n is chosen so that ( C n S n ) · n n > 0 . The detector-normal distances are
sdd n = ( C n S n ) · n n , sid n = ( S n ) · n n , U n ( x ) = ( x S n ) · n n .
The cosine pre-weight is
w cos cone ( u , v ) = sdd n sdd n 2 + u 2 + v 2 .
The detector intersection of a voxel x is
H n ( x ) = S n + sdd n U n ( x ) x S n ,
with
u n ( x ) = H n ( x ) C n · e u , n , v n ( x ) = H n ( x ) C n · e v , n .
The weighted FDK gather used by the code is
B cone [ q ] ( x ) = sdd n ¯ 2 π sid n ¯ n = 1 N views sid n U n ( x ) 2 q n u n ( x ) , v n ( x ) .
For a circular orbit this becomes
u ( β ; x , y ) = sdd x cos β + y sin β U ( β ; x , y ) , v ( β ; x , y , z ) = sdd z U ( β ; x , y ) .
diffct applies the ramp filter row-wise along the u direction, matching the standard FDK construction. For non-circular trajectories, Equation (43) is a principled implementation-level extension rather than an exact closed-form inversion.

6.6. Main-Branch Separable-Footprint Family

The main branch adds a second circular-orbit backprojection family alongside the Siddon-based weighted gather path. These SF options are exposed through backend selectors on the stable fan- and cone-beam APIs, remain restricted to circular geometry, and inherit detector-footprint logic from [9] together with the matched-adjoint implementation style used in LEAP [10,11]. For a 2D fan-beam pixel x , let the four projected pixel corners induce the ordered detector abscissae
u min u lo u hi u max .
The detector footprint is then approximated by the trapezoid
h n , x fan ( u ) = u u min u lo u min , u min u < u lo , 1 , u lo u u hi , u max u u max u hi , u hi < u u max , 0 , otherwise .
The SF fan-beam gather used in main can therefore be summarized as
B fan SF [ q ] ( x ) = c fan SF n γ n , x fan R h n , x fan ( u ) q n ( u ) d u ,
where the implementation uses a LEAP-inspired chord-weighted factor
γ n , x fan = sid n U n ( x ) ϕ , n ( x ) ,
with ϕ , n ( x ) denoting the in-plane chord through the unit pixel along the source-to-pixel direction. Relative to the Siddon analytical gather in Equation (36), the distinguishing feature is the explicit integration over detector footprint support rather than point sampling at u n ( x ) .
For 3D cone beam, the SF family stays separable:
h n , x cone ( u , v ) = h n , x u ( u ) h n , x v ( v ) .
The transaxial factor h n , x u is trapezoidal in both sf_tr and sf_tt. The axial factor h n , x v is rectangular for sf_tr and trapezoidal for sf_tt. The corresponding gather is
B cone SF [ q ] ( x ) = c cone SF n γ n , x cone R 2 h n , x cone ( u , v ) q n ( u , v ) d u d v ,
with the chord-weighted per-view factor implemented in the code base summarized by
γ n , x cone = ϕ , n ( x ) 1 + v n ( x ) sdd n 2 .
This is the mathematical reason the SF backends in main require dedicated kernels and smaller launch blocks: they integrate filtered detector values against voxel footprints, whereas the Siddon analytical path samples a single detector coordinate per voxel and view.

7. Software Architecture and CUDA Realization

The library is deliberately modular. diffct.projectors contains the PyTorch-facing autograd functions, diffct.geometry generates trajectory tensors, diffct.analytical builds analytical FBP/FDK pipelines, diffct.kernels stores the Numba CUDA kernels, diffct.utils handles device and stream management, and diffct.constants exposes low-level launch parameters and numerical constants. This separation is one of the clearest architectural improvements of the dev branch over the older monolithic diffct.differentiable interface.
Algorithm 2 Shared analytical preprocessing and the branch-specific backprojection stage used in diffct
Require:  
sinogram p , geometry tensors Θ , short-scan flag s { 0 , 1 } , backprojection selector b, with b = siddon on dev and b { siddon , sf } on circular main fan/cone paths
Ensure:  
reconstruction f ^
1:
compute detector coordinates u k , v     ▹ Equation (1)
2:
{ Δ β n } n = 1 N views TrapezoidalWeights( { β n } )     ▹ Equation (23)
3:
if geometry { fan , cone }  then
4:
     p n , k ( , ) w cos ( u k ( , v ) ) · p n , k ( , )     ▹ Equations (33) and (40)
5:
end if
6:
if s = 1 then
7:
     p n , k w P ( β n , γ ( u k ) ) · p n , k     ▹ Parker window, Equation (34)
8:
end if
9:
q n , · F 1 W ( ξ ) | ξ | F { p n , · } / Δ u     ▹ Equation (24)
10:
q n , · Δ β n · q n , ·     ▹ angular integration weight
11:
f ^ 0
12:
for each voxel x i  do
13:
    if  b = siddon  then
14:
         f ^ i n = 1 N views w n , i q ˜ n u n ( x i ) , v n ( x i )     ▹ point-sampled gather
15:
    else
16:
         f ^ i n = 1 N views k , φ n , i ( u k , v ) q n , k ,     ▹ separable-footprint, Equation (51)
17:
    end if
18:
end for
19:
f ^ C geom · f ^     ▹ geometry-dependent global scale
20:
return f ^

7.1. PyTorch and Numba Execution Model

diffct combines PyTorch’s tensor/autograd ecosystem [3] with Numba’s CUDA JIT compilation model [15]. The forward wrappers first normalize all inputs to contiguous float32 tensors on the active CUDA device. They then call TorchCUDABridge.tensor_to_cuda_array, which exposes zero-copy Numba views through cuda.as_cuda_array. This detail matters because the projector kernels operate directly on PyTorch-owned GPU memory rather than on copied buffers.
Stream semantics are equally important. diffct retrieves the active PyTorch stream and wraps it with a cached numba.cuda.external_stream handle. As a result, Numba kernel launches respect the same dependency order as surrounding PyTorch code. The implementation therefore avoids an entire class of subtle synchronization bugs in which PyTorch and Numba would otherwise enqueue work on unrelated streams.

7.2. Kernel Families and Decorators

The code base uses two explicit numerical kernel families.
  • Siddon projector and pure adjoint kernels are decorated with cuda.jit(cache=True, fastmath=True) to maximize throughput for ray traversal and the cell-constant segment accumulation.
  • Analytical FBP/FDK gather kernels are decorated with cuda.jit(cache=True, fastmath=False) because the Fourier-convention constants and the geometry-dependent magnification weights amplify round-off errors more strongly than the projector kernels do.
This design explains why the code distinguishes between “autograd backprojection” and “analytical backprojection” even when both appear to map projection data back into image space. The former is the exact discrete transpose used by backward; the latter is a weighted reconstruction operator intended to approximate inversion.
The default thread-block sizes are ( 16 , 16 ) for 2D kernels and ( 8 , 8 , 8 ) for 3D kernels. The circular-orbit SF cone-beam kernels on main use smaller launch blocks than the default 3D Siddon kernels because the footprint integrals are more register-intensive. Cone-beam tensors are also permuted internally from ( D , H , W ) to ( W , H , D ) for memory-coalesced access in the 3D gather path and then permuted back before returning to the user.

7.3. Gradient Transport and Atomic Accumulation

The gradient path follows the same hardware design as the forward path. Each projector forward method stores geometry tensors with ctx.save_for_backward and keeps output-shape metadata in the autograd context. During backward, diffct allocates a zero-filled gradient image or volume, converts the incoming PyTorch gradient tensor into a Numba CUDA view, and launches the matched adjoint kernel on the same CUDA stream.
The backward kernel mirrors the cell-constant forward in Equation (16): forward kernels accumulate the traversed cell value weighted by the chord length Δ t m along each ray, while backward kernels scatter the incoming ray-domain gradient back into the same cell with the same Δ t m . Because multiple rays update the same pixel or voxel, the scatter step uses cuda.atomic.add. This is the implementation mechanism that makes the discrete adjoint relation in Equation (17) true to machine precision up to the expected floating-point tolerance.

8. Experimental Protocol and Repository Validation

8.1. Reference Phantoms and Example Setups

The analytical accuracy tests use Shepp–Logan phantoms in 2D and 3D. The public main branch additionally ships synthetic real-data pipelines and a walnut CBCT example. The circular phantom examples documented in this report follow three reference configurations:
  • parallel-beam FBP on a 256 × 256 phantom with 360 views and 512 detectors,
  • fan-beam FBP on a 256 × 256 phantom with 360 views, 600 detectors, sid = 500 , and sdd = 800 ,
  • cone-beam FDK on a 128 3 phantom with 360 views, a 256 × 256 detector, sid = 600 , and sdd = 900 .
The non-circular iterative examples then replace circular trajectories by sinusoidal, wobbling, elliptical, spiral, saddle, and custom figure-eight variants while reusing the same projector classes.

8.2. Circular Analytical Reconstructions

The trajectory-array analytical helper line on dev inherits the amplitude correction introduced on main 1.2.10/1.2.11 by routing the filtered sinogram through the dedicated weighted gather kernels. The change log records the corresponding raw reconstruction errors listed in Table 2.

8.3. Iterative and Arbitrary-Trajectory Reconstructions

The iterative examples solve Equation (21) with a learned correction variable and an AdamW optimizer. Parallel-beam experiments compare sinusoidal and wobbling trajectories. Fan-beam experiments compare sinusoidal and elliptical source paths. Cone-beam experiments compare spiral, sinusoidal, saddle, and custom figure-eight trajectories. The dev branch ships generator scripts and plotting utilities for all of these families, so reproducing the iterative experiments requires no additional modelling effort beyond swapping the trajectory factory.

8.4. Correctness Tests

The repository-level validation strategy is unusually strong for a compact research library. On the public dev branch, the default test suite contains 62 tests; the current public main branch contains 71 tests, while an additional 27 benchmark cases are kept opt-in. The test categories summarized in Table 3 describe the shared validation themes across the two lines.
One representative test threshold is the cone-beam analytical reconstruction guard: the 3D Shepp–Logan FDK test requires a raw RMSE below 0.1 , a center-slice RMSE below 0.1 , bounded axial-profile deviation, finite values everywhere, and a reconstruction maximum below 1.5 . This combination protects against the exact classes of bugs that commonly appear in analytical CT implementations: wrong Fourier constants, omitted cosine weights, omitted ( sid / U ) 2 factors, or detector-axis scaling mistakes.

8.5. Performance Benchmarking

The benchmark suite measures three kernels for each geometry: forward projector, pure adjoint backprojector, and the full analytical pipeline. The analytical benchmarks include the complete sequence of cosine weighting, ramp filtering, angular weighting, and weighted gather backprojection because this is the user-visible reconstruction cost. In the current public benchmark scripts, the measured kernels are the Siddon forward path, the pure adjoint path, and the full analytical pipeline; the SF backends are part of the stable main API but are not benchmarked separately in the published benchmark files.

9. Discussion

diffct now occupies a clean position in the CT software landscape. It is small enough to audit mathematically, but broad enough to support analytical reconstruction, differentiable inversion, and arbitrary-trajectory experimentation inside a single code base. The current dev line is especially attractive for research because it preserves the classical circular-orbit formulas as special cases while making trajectory tensors first-class objects, even though the public main branch has already moved beyond the 1.2.11 analytical-sync point.
The main limitation is equally clear. The weighted fan and cone analytical operators implemented in dev are a geometry-aware extension of FBP/FDK based on detector-normal distances and per-view mean scaling. For circular orbits they collapse to the familiar closed forms. For non-circular orbits they provide a practical and principled reconstruction heuristic, but they do not claim an exact inversion formula beyond the classical setting. A second limitation is branch asymmetry: the main branch still owns the SF backends, so the highest-fidelity detector-footprint model is currently available only for circular trajectories. A third limitation is gradient scope. The present autograd wrappers propagate gradients through images and sinograms, while geometry tensors are runtime inputs with no returned gradients. A fourth limitation is hardware: the entire library assumes CUDA-capable execution and uses PyTorch plus Numba as its software substrate.
The branch split also reveals a natural roadmap. The main branch proves that separable-footprint projectors are useful in the circular setting. The dev branch proves that the generic per-view tensor API is the right abstraction for arbitrary trajectories. Bridging those two lines of work will require a deeper reformulation of footprint models for detector planes that are no longer aligned with a closed-form circular geometry.

10. Conclusions

This report rewrites diffct from the code upward. The stable main branch contributes the circular-orbit baseline, while the dev branch turns the Siddon-based projector/backprojector interface into a general arbitrary-trajectory CT framework. The resulting mathematical picture is compact: projector kernels implement Siddon traversal with cell-constant segment integration; autograd uses exact discrete adjoints with fixed geometry tensors; analytical reconstruction uses explicit cosine weighting, Parker short-scan weighting in the circular fan-beam specialization, FFT-based ramp filtering, angular integration weights, and dedicated weighted gather backprojectors. With this structure in place, diffct is already well positioned for deep-learning integration, gradient-based reconstruction, and arbitrary-trajectory CT research.

Appendix A. Implementation Formula Reference

For quick reference, the dev-line geometry and analytical-helper equations used throughout the report can be grouped as follows.

Detector coordinates.

u k = k N u 2 Δ u + u 0 , v = N v 2 Δ v + v 0 .

Forward operators.

P par [ f ] ( n , k ) = R f o n + u k e u , n + t d n d t ,
P fan [ f ] ( n , k ) = 0 f S n + t d ^ n , k d t ,
P cone [ f ] ( n , k , ) = 0 f S n + t d ^ n , k , d t .

Siddon discretization.

P [ f ] ( ray ) m = 0 M 1 Δ t m f cell ( m ) .

Adjoint relation and inverse problem.

P Θ x , y = x , P Θ y , f ^ = arg min f P Θ f p 2 2 + λ R ( f ) , f L ( f ) = 2 P Θ ( P Θ f p ) + λ R ( f ) .
L x = P Θ L y , L s = P Θ L z , L Θ = 0 in the current implementation .

Angular weights and ramp filter.

Δ β 1 = β 2 β 1 2 , Δ β n = ( β n β n 1 ) + ( β n + 1 β n ) 2 , Δ β N = β N β N 1 2 ,
q = F 1 | 2 π ξ | W ( ξ ) F [ p w ] 1 Δ s .

Cosine weights.

w cos fan ( u ) = sdd sdd 2 + u 2 , w cos cone ( u , v ) = sdd sdd 2 + u 2 + v 2 .

Weighted analytical backprojection.

B par [ q ] ( x ) = 1 2 π n q n u n ( x ) ,
B fan [ q ] ( x ) = sdd n ¯ 2 π sid n ¯ n sid n U n ( x ) 2 q n u n ( x ) ,
B cone [ q ] ( x ) = sdd n ¯ 2 π sid n ¯ n sid n U n ( x ) 2 q n u n ( x ) , v n ( x ) .

Separable-footprint backprojection on main.

h n , x fan ( u ) = u u min u lo u min , u min u < u lo , 1 , u lo u u hi , u max u u max u hi , u hi < u u max , 0 , otherwise ,
B fan SF [ q ] ( x ) = c fan SF n γ n , x fan h n , x fan ( u ) q n ( u ) d u ,
B cone SF [ q ] ( x ) = c cone SF n γ n , x cone h n , x u ( u ) h n , x v ( v ) q n ( u , v ) d u d v .
Table A1. Crosswalk from the report’s mathematical notation to the current public diffct API, including projector classes, trajectory generators, analytical helpers, and the main-only SF backend names.
Table A1. Crosswalk from the report’s mathematical notation to the current public diffct API, including projector classes, trajectory generators, analytical helpers, and the main-only SF backend names.
Report object Code object
P par , P fan , P cone ParallelProjectorFunction, FanProjectorFunction, ConeProjectorFunction
P par , P fan , P cone ParallelBackprojectorFunction, FanBackprojectorFunction, ConeBackprojectorFunction
Circular trajectories circular_trajectory_2d_parallel, circular_trajectory_2d_fan, circular_trajectory_3d
Non-circular trajectories spiral_trajectory_3d, sinusoidal_trajectory_*, saddle_trajectory_3d, random_trajectory_3d, custom_trajectory_*
u k , v detector coordinates detector_coordinates_1d
Δ β n angular weights angular_integration_weights
w cos fan , w cos cone fan_cosine_weights, cone_cosine_weights
w P Parker window parker_weights
Ramp filter q ramp_filter_1d
B par , B fan , B cone parallel_weighted_backproject, fan_weighted_backproject, cone_weighted_backproject
Main-branch SF backend selectors backend="sf" for fan beam, backend="sf_tr" and backend="sf_tt" for cone beam

References

  1. Kak, A.C.; Slaney, M. Principles of Computerized Tomographic Imaging; Society for Industrial and Applied Mathematics: Philadelphia, PA, 2001. [Google Scholar] [CrossRef]
  2. Sun, Y. diffct: Differentiable Computed Tomography Reconstruction with CUDA. https://doi.org/10.5281/zenodo.14999333, 2025. Zenodo software release. [CrossRef]
  3. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. [Google Scholar]
  4. Siddon, R.L. Fast Calculation of the Exact Radiological Path for a Three-Dimensional CT Array. Med. Phys. 1985, 12, 252–255. [Google Scholar] [CrossRef] [PubMed]
  5. Feldkamp, L.A.; Davis, L.C.; Kress, J.W. Practical Cone-Beam Algorithm. J. Opt. Soc. Am. A 1984, 1, 612–619. [Google Scholar] [CrossRef]
  6. Parker, D.L. Optimal Short Scan Convolution Reconstruction for Fanbeam CT. Med. Phys. 1982, 9, 254–257. [Google Scholar] [CrossRef] [PubMed]
  7. Syben, C.; Michen, M.; Stimpel, B.; Seitz, S.; Ploner, S.; Maier, A.K. Technical Note: PYRO-NN: Python Reconstruction Operators in Neural Networks. Med. Phys. 2019, 46, 5110–5115. [Google Scholar] [CrossRef] [PubMed]
  8. Thies, M.; Wagner, F.; Maul, N.; Folle, L.; Meier, M.; Rohleder, M.; Schneider, L.S.; Pfaff, L.; Gu, M.; Utz, J.; et al. Gradient-Based Geometry Learning for Fan-Beam CT Reconstruction. Phys. Med. Biol. 2023, 68, 205004. [Google Scholar] [CrossRef] [PubMed]
  9. Long, Y.; Fessler, J.A.; Balter, J.M. 3D Forward and Back-Projection for X-Ray CT Using Separable Footprints. IEEE Trans. Med. Imaging 2010, 29, 1839–1850. [Google Scholar] [CrossRef] [PubMed]
  10. Kim, H.; Champley, K. Differentiable Forward Projector for X-ray Computed Tomography, 2023, [arXiv:eess.IV/2307.05801].
  11. Champley, K.; Kim, H. LivermorE AI Projector for Computed Tomography Tasks (LEAP). 2023. Software release. [CrossRef]
  12. Tuy, H.K. An Inversion Formula for Cone-Beam Reconstruction. SIAM J. Appl. Math. 1983, 43, 546–552. [Google Scholar] [CrossRef]
  13. Smith, B.D. Image Reconstruction from Cone-Beam Projections: Necessary and Sufficient Conditions and Reconstruction Methods. IEEE Trans. Med. Imaging 1985, 4, 14–25. [Google Scholar] [CrossRef] [PubMed]
  14. Ye, C.; Schneider, L.S.; Sun, Y.; Thies, M.; Mei, S.; Maier, A.K. DRACO: Differentiable Reconstruction for Arbitrary CBCT Orbits. Phys. Med. Biol. 2025, 70, 075005. [Google Scholar] [CrossRef] [PubMed]
  15. Lam, S.K.; Pitrou, A.; Seibert, S. Numba: A LLVM-Based Python JIT Compiler. In Proceedings of the Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 2015; pp. 1–6. [Google Scholar] [CrossRef]
Table 1. Repository lineage used by this report.
Table 1. Repository lineage used by this report.
Aspect main branch dev branch
Release status Stable, versioned, published on PyPI Development branch, source install only
Geometry API Circular-orbit closed-form parameters Explicit per-view arrays for arbitrary trajectories
Supported scan families Parallel, fan, cone on circular orbits Parallel, fan, cone on circular and non-circular trajectories
Projector backends Siddon plus SF variants (sf, sf_tr, sf_tt) Siddon family generalized to arbitrary trajectories
Analytical pipeline Current stable circular pipeline on the 1.3.4 release line Trajectory-array adaptation of the main 1.2.10/1.2.11 analytical overhaul
Testing focus Circular-orbit reference behavior Circular plus arbitrary-trajectory correctness and parity
Deferred feature Separable-footprint backends available Separable-footprint backends not yet generalized
Table 2. Circular analytical reconstructions documented in the synchronized code base.
Table 2. Circular analytical reconstructions documented in the synchronized code base.
Example Raw MSE Reported reconstruction range
circular_trajectory/fbp_parallel.py 0.00540 [ 0.03 , 1.01 ]
circular_trajectory/fbp_fan.py 0.00509 [ 0.11 , 1.03 ]
circular_trajectory/fdk_cone.py 0.00737 [ 0.13 , 1.11 ]
Table 3. Validation layers encoded in the repository.
Table 3. Validation layers encoded in the repository.
Test group What it protects
test_adjoint_inner_product.py Verifies A x , y = x , A y for parallel, fan, and cone operators
test_gradcheck.py Confirms that CUDA backward kernels match finite-difference Jacobians for all projector classes
test_cuda_smoke.py Checks end-to-end execution of projector, adjoint, and analytical helper paths
test_weights.py Verifies detector coordinates, angular weights, cosine weights, and Parker weights
test_ramp_filter_windows.py Verifies ramp-filter windows, DC behavior, FFT parity, and detector-spacing scaling
test_fbp_*_accuracy.py, test_fdk_cone_accuracy.py Bound RMSE and amplitude for analytical circular reconstructions
test_fbp_fan_offsets.py, test_fdk_cone_offsets.py Checks that shifted trajectory arrays are reconstructed correctly
test_cone_projector_autograd.py Guards cone-projector gradient finiteness and consistency under circular and spiral trajectories
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated