Wood Quality Assessment of Standing Stems When Measurements Do Not Line Up: A Knot Geometry Model for Spatially Misaligned Tree Data

Udayalakshmi Vepakomma; Isabelle Duchesne; Magloire Loudegui Djimdou; Arusharka Sen

doi:10.20944/preprints202603.1527.v1

Submitted:

18 March 2026

Posted:

19 March 2026

You are already at the latest version

Abstract

Nondestructive assessment of wood quality in standing trees is increasingly essential for valuebased harvesting, precision forestry, and largescale monitoring. Yet most existing approaches rely on destructive sampling or plotscale measurements that cannot be feasibly extended across extensive forest areas. A major barrier to scalable assessment is the spatial misalignment between external structural measurements and internal woodquality responses, which introduces systematic bias when conventional regression methods are applied. We propose a generalized statistical framework called Regression based on Misaligned Covariates (RMC), that reconciles covariates and responses measured at separate spatial locations. By combining the Law of Total Expectation with kernelbased estimation of marginal relationships, RMC recovers heightdependent conditional means without requiring onetoone spatial correspondence. The framework is demonstrated using two physiologically contrasting conifers, eastern white pine (Pinus strobus) and red pine (Pinus resinosa), using covariates that are directly measurable through nondestructive, remotesensingbased structural characterization. RMC produced smooth, biologically interpretable estimates of knot volume along the bole for both species, even when trained on small datasets and using only a single structural covariate in addition to height. The resulting cumulative knotvolume profiles captured speciesspecific differences in knot accumulation and aligned with known patterns of crown architecture and stem form. RMC provides a scalable, nondestructive pathway for predicting internal wood quality from external structural measurements, supporting preharvest planning, logsorting decisions, and longterm monitoring of stand development. Because the method is portable, parsimonious, and not speciesspecific, it offers a general solution for any ecological or remotesensing application involving misaligned covariates and responses, extending well beyond woodquality modelling.

Keywords:

non-coincident observations

;

kernel density estimation

;

CT scanning

;

knot characteristics

;

knot modeling

;

log sort

;

white pine

;

red pine

Subject:

Biology and Life Sciences - Forestry

1. Introduction

In North America, the assessment of standing wood quality has emerged as a critical early intervention strategy in modern forest management, marking a fundamental shift from traditional volume-based metrics to value-based recovery [1,2]. This paradigm shift is essential for optimizing the wood supply chain, ensuring the structural soundness of timber products, and maximizing market value while simultaneously reducing waste and supporting sustainable silvicultural practices [3,4,5]. By implementing assessment protocols early in the production cycle, forest managers can facilitate the strategic segregation of timber by quality prior to harvest. For example, adding only simple visual parameters like number of branches in the lower part of the bole has significantly improved prediction of lumber yield [6,7]. This proactive approach enables the precise allocation of raw materials to their most economically viable end-uses, such as distinguishing high-stiffness structural lumber from pulpwood, thereby maximizing the return on investment for each individual stem [3,2].

Beyond immediate economic gains, early assessment is a prerequisite for meeting the rigorous mechanical standards for strength and stiffness required by the modern construction industry [1,4]. Instead of viewing timber as a generic product, managers must align forest growth with industry demands by predicting wood quality long before harvest. Predicting the quality of potential products within specific growing environments allows managers to tailor silvicultural prescriptions to optimize both the quantity and internal characteristics of the standing resource [8,9]. Because the standing tree represents the origin of the forest products value chain, it is the point at which wood quality is most susceptible to management influence [10]. Consequently, targeting wood quality during silvicultural planning creates value that extends beyond mere fiber extraction, allowing for more robust decision-making under varying environmental and management scenarios [3,8]. However, to realize these benefits across broad forest landscapes, it is necessary to move beyond plot-level sampling toward integrated, large-scale assessment methodologies

In recent years, LiDAR (Light Detection and Range) has evolved from a simple inventory tool and volume assessment to a sophisticated technology that either substitutes or complements traditional Non-Destructive Evaluation (NDE) by predicting internal traits via external structure and high-resolution geometry [2]. Although current resolutions are insufficient to detect individual branch scars, high resolution LiDAR have been successfully employed to delineate crown features, canopy density, bole characteristics such as crown base height (CBH), diameter along the bole, stem straightness (sweep) etc. These metrics serve as effective proxies for wood quality in various stand types [11,12,13,14]. Furthermore, airborne LiDAR (ALS) is seen to be effective for predicting site-level wood density and general stem dimensions across large territories [14]. However, ALS often suffers from significant occlusion near-nadir scanning angles and canopy interference, which prevents detailed characterization of the lower stem where the most valuable sawlogs are located [15]. Operating at lower altitudes, slower speed and multiple scanning perspectives, drone or UAV based LiDAR ( ULS), with its significantly high density can penetrate the canopy more effectively to capture lower-stem morphology [13,16].

The size and distribution of knots are widely recognized as critical determinants of wood quality, directly influencing lumber grading, mechanical performance, and aesthetic value [2,17,18]. Although restricted to plot-level assessments, the proximity and perspective of Terrestrial (TLS) and Mobile Laser Scanning (MLS) enable the capture of millimeter-level point clouds. These high-density datasets allow for the accurate 3D reconstruction of woody structures, including the measurement of taper, sweep, and ovality [19,20], as well as the detection of branch scars and whorl patterns [21,22].

Recent studies have leveraged these external morphological features to infer internal characteristics; for instance, TLS data has been correlated with CT scans to achieve 62.5% accuracy in the automated quality grading of European beech sawlogs [23]. Furthermore, when integrated with resistance drilling, these scanners can predict wood basic density with moderate success of an R² of 0.51 [14]. While they provide high-resolution knot-level detail required for internal trait prediction, their limited spatial reach necessitates a bridge toward more scalable platforms for operational efficiency. One strategy for scaling wood quality predictions could be by leveraging ULS-derived crown and bole metrics as inputs for empirical models. These models, calibrated using meticulous ground-based data, allow for the translation of external structural geometry into broad-scale wood quality indicators.

Despite the high-resolution capabilities, spatial misalignment between external predictors and internal responses remains a fundamental bottleneck when integrating multi-source datasets [24]. In wood quality modeling, the covariates derived from LiDAR (such as external bole surface irregularities or branch metrics along the bole) and the response variables (such as internal knot geometry from destructive sampling or Computed Tomography - CT scans) are rarely evaluated at the same spatial coordinates. Despite efforts, this “change of support" problem introduces significant bias, as traditional regression techniques often assume a one-to-one spatial correspondence between variables [25,26,27]. Standard alignment methods frequently fail to account for the irregular 3D topology of the tree or the sensor-specific occlusions that shift the perceived location of a feature [28,29]. Consequently, there is a critical need for a generalized modeling framework capable of reconciling these disjointed data points without sacrificing the localized precision required for knot-level analysis or even log-segment level analysis.

The primary objective of this study is to develop a generalized modeling framework for cases where the covariate (e.g., external tree characteristics) and the response variable (e.g., knot geometry) are evaluated at disjointed spatial points. By leveraging the Law of Total Expectation in conjunction with Kernel Density Estimation, we propose a novel statistical method, called regression of misaligned covariates (RMC), that reconciles these misaligned datasets, enabling the prediction of internal wood quality attributes from external structural tree geometry.

The development of such a framework requires validation across contrasting physiological profiles; thus, red pine (Pinus resinosa) and eastern white pine (Pinus strobus) are utilized here as distinct prototypes of conifer physiology. Red pine, characterized by exceptional genetic and structural uniformity, serves as a stable baseline for quality-related assessments [30,31]. In contrast, eastern white pine presents significant geometric complexity due to its discrete branching architecture and susceptibility to terminal leader damage [32,33,34]. These biological stressors often result in morphological irregularities, including crooks, sweeps, and multi-leader crowns as well as positioning the species as a rigorous testbed for LiDAR-based algorithms designed to map knot distribution and volumetric recovery [35,36].

Through this dual-species validation, we illustrate the estimation of primary wood quality metrics in standing trees and demonstrate how these data can directly inform management decisions and industrial processing. By bridging the gap between raw spatial data and volumetric recovery, this research ultimately provides a robust, scalable pathway for assessing wood quality at both the individual tree and sub-tree levels across extensive forest blocks.

2. Materials and Methods

2.1. Regression of Misaligned Covariates (RMC)

2.1.1. Model Presentation

The goal is to predict the knot geometry, such as volume, diameter and size, using tree external characteristics such as log diameter, branch length and branch diameter measured along the height (or length) of the tree bole. However, in contrast with the usual regression framework, here we deal with misaligned covariates i.e., when the data for the covariates and the response variables are collected at different times and at varying heights along the bole. Specifically:

the knot geometry measurements (internal characteristics) are available at certain height points.
the external tree characteristics are recorded at other, possibly different, height points.

Given Y as a response variable (indexed by 0) as tree external characteristics, X as a covariate (indexed by 1) as external characteristics, and H as the relative height (indexed by 2), the sample measurements could be denoted as:

{(Y_{i}, H_{i})}_{i = 1}^{i = n_{02}} and {(X_{i}, H_{i})}_{i = 1}^{i = n_{12}}

The objective, hence, is to

l i n k Y

to

X

, taking note that the

Y_{i}

's to the

X_{i}

's are misaligned i.e. are not available at the same height points. Based on the available data, one can then estimate

m_{02} (h) : = E (Y ∣ H = h)

(1)

m_{12} (h) : = E (X ∣ H = h)

(2)

The goal is to estimate its regression given as:

m_{012} (x, h) : = E (Y ∣ X = x, H = h)

(3)

The proposed approach here is based on the well-known result that the expectation of conditional expectation is an unconditional expectation, or the law of total expectation, and given by the following equations:

{\begin{array}{l} (4) & E_{X} [E (Y ∣ X, h) ∣ H = h] = E [Y ∣ H = h] \\ (5) & E_{X, H} [E (Y ∣ X, H)] = E [Y] \end{array}

For its estimation, a suitable finite-dimensional model for

m_{012} (x, h)

could be chosen, for example a form like:

m_{012} (x, h) = a_{0} + a_{1} x + a_{2} h + a_{3} x^{2} + a_{4} h^{2} + \dots

(6)

and solve for the unknowns

a_{0}, a_{1}, \dots

Equations (4) and (5) can be written more precisely as,

{\begin{array}{l} (7) & \int (\int \frac{y f_{012} (y, x, h) d y}{f_{12} (x, h)}) \frac{f_{12} (x, h) d x}{f_{2} (h)} = \int \frac{y f_{02} (y, h) d y}{f_{2} (h)} \\ (8) & \iint (\int \frac{y f_{012} (y, x, h) d y}{f_{12} (x, h)}) f_{12} (x, h) d x d h = ∭ y f_{012} (y, x, h) d y d x d h \end{array}

where it is assumed that

(Y, X, H)

has joint density

f_{012} (y, x, h)

(Y, H)

has joint density

f_{02} (y, h)

(X, H)

has joint density

f_{12} (x, h)

H

has marginal density

f_{2} (h)

,

and

\int f_{012} (y, x, h) d x = f_{02} (y, h)

(9)

and \int f_{012} (y, x, h) d x d h = f_{0} (y)

(10)

f_{0} (y)

being the marginal density of

Y

.

2.1.2. Regression Estimation Approach

Cancelling

f_{2} (h)

from Equation (7), one can write

\int (\underset{m_{012} (x, h)}{\underset{⏟}{\int \frac{y f_{012} (y, x, h) d y}{f_{12} (x, h)}}}) \frac{f_{12} (x, h) d x}{f_{2} (h)} = \int \frac{y f_{02} (y, h) d y}{f_{2} (h)} for all h

(11)

⟺ \int m_{012} (x, h) f_{12} (x, h) d x = \int y f_{02} (y, h) d y for all h

(12)

so that Equations (7) and (8) become

{\begin{array}{l} (13) & \int m_{012} (x, h) f_{12} (x, h) d x = \int y f_{02} (y, h) d y \\ (14) & E [m_{012} (X, H)] = E (Y) \end{array}

Plugging in the corresponding kernel estimators on both sides we have

{\begin{array}{l} (15) & \int m_{012} (x, h) {\hat{f}}_{12} (x, h) d x = \int y {\hat{f}}_{02} (y, h) d y \\ (16) & \frac{1}{n_{12}} \sum_{i = 1}^{n_{12}} m_{012} (X_{i}, H_{i}) = \frac{1}{n_{02}} \sum_{i = 1}^{n_{02}} Y_{i} \end{array}

where

\int {\hat{f}}_{12} (x, h) d x = \frac{1}{n_{12} b_{n}^{2}} \sum_{i = 1}^{n_{12}} K (\frac{x - X_{i}}{b_{n}}) K (\frac{h - H_{i}}{b_{n}}) for j = 1, \dots, n_{12}

(17)

\int {\hat{f}}_{02} (y, h) d x = \frac{1}{n_{02} b_{n}^{2}} \sum_{i = 1}^{n_{02}} K (\frac{y - Y_{i}}{b_{n}}) K (\frac{h - H_{i}}{b_{n}}) for j = 1, \dots, n_{02}

(18)

and

K (

.

) i s a k e r n e l .

See, for example, [37] for an introduction to kernel smoothing.

The above equations (15) and (16) can be solved by choosing a suitable finite-dimensional model for

m_{012} (x, h)

as mentioned above, for example, assuming

m (x, h) = a_{0} + a_{1} x + a_{2} h + a_{3} x^{2} + a_{4} h^{2} + \dots

(19)

Coefficients

a_{0}, a_{1}, \dots

, can be solved by creating as many equations as the number of coefficients by considering different values of

h : h_{0}, h_{1}, h_{2}, \dots

To illustrate the proposed estimation method, the following subsection examines a tri-variate normal distribution of the variables of interest.

2.1.3. (Y,X,H) Follow a 3- Variate Normal

Suppose

(\begin{array}{l} Y \\ X \\ H \end{array}) \sim N_{3} ((\begin{array}{l} μ_{0} \\ μ_{1} \\ μ_{2} \end{array}), Σ)

(20)

From the properties of multivariate Normal distributions, the expectations

m_{12} (h) = E (X ∣ H = h), m_{02} (h) = E (Y ∣ H = h)

and

m (x, h) = m_{012} (x, h) = E (Y ∣ X = x, H = h)

are linear.

Thus we may write

m_{12} (h) = E (X ∣ H = h) = b_{0} + b_{2} h

(21)

and m_{02} (h) = E (Y ∣ H = h) = c_{0} + c_{2} h .

(22)

And

m (x, h) = m_{012} (x, h) = E (Y ∣ X = x, H = h) = a_{0} + a_{1} x + a_{2} h

(23)

Equations (4) and (5) then become

{\begin{array}{l} (24) & a_{0} + a_{1} (b_{0} + b_{2} h) + a_{2} h = c_{0} + c_{2} h, for all h \\ (25) & a_{0} + a_{1} μ_{1} + a_{2} μ_{2} = μ_{0} \end{array}

This yields a system of three equations with three unknowns (

a_{0}, a_{1}, a_{2}

):

{\begin{array}{l} (26) & a_{0} + b_{0} a_{1} = c_{0} \\ (27) & b_{2} a_{1} + a_{2} = c_{2} \\ (28) & a_{0} + a_{1} μ_{1} + a_{2} μ_{2} = μ_{0} \end{array}

That can be represented in a matrix form as:

(\begin{matrix} 1 & b_{0} & 0 \\ 0 & b_{2} & 1 \\ 1 & μ_{1} & μ_{2} \end{matrix}) (\begin{matrix} a_{0} \\ a_{1} \\ a_{2} \end{matrix}) = (\begin{matrix} c_{0} \\ c_{2} \\ μ_{0} \end{matrix})

(29)

(\begin{matrix} a_{0} \\ a_{1} \\ a_{2} \end{matrix}) = {(\begin{matrix} 1 & b_{0} & 0 \\ 0 & b_{2} & 1 \\ 1 & μ_{1} & μ_{2} \end{matrix})}^{- 1} (\begin{matrix} c_{0} \\ c_{2} \\ μ_{0} \end{matrix})

(30)

Given a sample,

{\hat{μ}}_{0} = \frac{1}{n_{02}} \sum_{i = 1}^{n_{02}} Y_{i}

(31)

{\hat{μ}}_{1} = \frac{1}{n_{12}} \sum_{i = 1}^{n_{12}} X_{i}

(32)

{\hat{μ}}_{2} = \frac{1}{n_{2}} \sum_{i = 1}^{n_{2}} H_{i} = \frac{1}{n_{02} + n_{12}} (\sum_{i = 1}^{n_{02}} H_{i} + \sum_{i = 1}^{n_{12}} H_{i})

(33)

where

b_{0}, b_{2}, c_{0}

and

c_{2}

can be estimated from Equations (21) and (22) using Ordinary Least Squares (OLS) technique. This should yield

(\begin{array}{l} a_{0} \\ a_{1} \\ a_{2} \end{array}) = {(\begin{matrix} 1 & {\hat{b}}_{0} & 0 \\ 0 & {\hat{b}}_{2} & 1 \\ 1 & {\hat{μ}}_{1} & {\hat{μ}}_{2} \end{matrix})}^{- 1} (\begin{matrix} {\hat{c}}_{0} \\ {\hat{c}}_{2} \\ {\hat{μ}}_{0} \end{matrix})

(34)

2.1.4. Generalisation of the Model

To demonstrate the flexibility of the proposed framework, three specific functional forms for m (x,h) are evaluated and their estimations presented in this section.

Case 1: m (x,h) is linear

Given m (x,h) is linear,

m_{012} (x, h) = E (Y ∣ X = x, H = h) = a_{0} + a_{1} x + a_{2} h

(35)

And equations (15) and (16) become:

{\begin{matrix} (36) & \int (a_{0} + a_{1} x + a_{2} h) {\hat{f}}_{12} (x, h) d x = \int y {\hat{f}}_{02} (y, h) d y \\ (37) & \frac{1}{n_{12}} \sum_{i = 1}^{n_{12}} (a_{0} + a_{1} X_{i} + a_{2} H_{i}) = \frac{1}{n_{02}} \sum_{i = 1}^{n_{02}} Y_{i} \end{matrix}

{\begin{matrix} (38) & \int (a_{0} + a_{1} x + a_{2} h) {\hat{f}}_{12} (x, h) d x = \int y {\hat{f}}_{02} (y, h) d y \\ (39) & a_{0} + a_{1} {\overline{X}}_{n_{12}} + a_{2} {\overline{H}}_{n_{12}} = {\overline{Y}}_{n_{02}} \end{matrix}

where

{\overline{X}}_{n_{12}} = \iint x {\hat{f}}_{12} (x, h) d x d h = \frac{1}{n_{12}} \sum_{i = 1}^{n_{12}} X_{i}

(40)

{\overline{H}}_{n_{12}} = \iint h {\hat{f}}_{12} (x, h) d x d h = \frac{1}{n_{12}} \sum_{i = 1}^{n_{12}} H_{i}

(41)

{\overline{Y}}_{n_{0}} = \int y {\hat{f}}_{2} (y) d y = \frac{1}{n_{0}} \sum_{i = 1}^{n_{0}} Y_{i}

(42)

Taking

h = h_{1}

and

h = h_{2}

in the first equation (Equation 36), we have

{\begin{array}{l} (43) & \int (a_{0} + a_{1} x + a_{2} h_{1}) {\hat{f}}_{12} (x, h_{1}) d x = \int y {\hat{f}}_{02} (y, h_{1}) d y \\ (44) & \int (a_{0} + a_{1} x + a_{2} h_{2}) {\hat{f}}_{12} (x, h_{2}) d x = \int y {\hat{f}}_{02} (y, h_{2}) d y \\ (45) & a_{0} + a_{1} {\overline{X}}_{n_{12}} + a_{2} {\overline{H}}_{n_{12}} = {\overline{Y}}_{n_{02}} \end{array}

{\begin{array}{l} (46) & a_{0} \int {\hat{f}}_{12} (x, h_{1}) d x + a_{1} \int x {\hat{f}}_{12} (x, h_{1}) d x + a_{2} h_{1} \int {\hat{f}}_{12} (x, h_{1}) d x = \int y {\hat{f}}_{02} (y, h_{1}) d y \\ (47) & a_{0} \int {\hat{f}}_{12} (x, h_{2}) d x + a_{1} \int x {\hat{f}}_{12} (x, h_{2}) d x + a_{2} h_{1} \int {\hat{f}}_{12} (x, h_{2}) d x = \int y {\hat{f}}_{02} (y, h_{2}) d y \\ (48) & a_{0} + a_{1} {\overline{X}}_{n_{12}} + a_{2} {\overline{H}}_{n_{12}} = {\overline{Y}}_{n_{02}} \end{array}

{\begin{matrix} (49) & a_{0} {\hat{f}}_{2} (h_{1}) + a_{1} \int x {\hat{f}}_{12} (x, h_{1}) d x + a_{2} h_{1} {\hat{f}}_{2} (h_{1}) = \int y {\hat{f}}_{02} (y, h_{1}) d y \\ (50) & a_{0} {\hat{f}}_{2} (h_{2}) + a_{1} \int x {\hat{f}}_{12} (x, h_{2}) d x + a_{2} h_{2} {\hat{f}}_{2} (h_{2}) = \int y {\hat{f}}_{02} (y, h_{2}) d y \\ (51) & a_{0} + a_{1} {\overline{X}}_{n_{12}} + a_{2} {\overline{H}}_{n_{12}} = {\overline{Y}}_{n_{02}} \end{matrix}

(\begin{matrix} {\hat{f}}_{2} (h_{1}) & \int x {\hat{f}}_{12} (x, h_{1}) d x & h_{1} {\hat{f}}_{2} (h_{1}) \\ {\hat{f}}_{2} (h_{2}) & \int x {\hat{f}}_{12} (x, h_{2}) d x & h_{2} {\hat{f}}_{2} (h_{2}) \\ 1 & {\overline{X}}_{n_{12}} \\ {\overline{H}}_{n_{12}} \end{matrix}) (\begin{array}{l} a_{0} \\ a_{1} \\ a_{2} \end{array}) = (\begin{matrix} \int y {\hat{f}}_{02} (y, h_{1}) d y \\ \int y {\hat{f}}_{02} (y, h_{2}) d y \\ {\overline{Y}}_{n_{02}} \end{matrix})

(52)

(\begin{matrix} a_{0} \\ a_{1} \\ a_{2} \end{matrix}) = {(\begin{matrix} {\hat{f}}_{2} (h_{1}) & \int x {\hat{f}}_{12} (x, h_{1}) d x & h_{1} {\hat{f}}_{2} (h_{1}) \\ {\hat{f}}_{2} (h_{2}) & \int x {\hat{f}}_{12} (x, h_{2}) d x & h_{2} {\hat{f}}_{2} (h_{2}) \\ 1 & {\overline{X}}_{n_{12}} & {\overline{H}}_{n_{12}} \end{matrix})}^{- 1} (\begin{matrix} \int y {\hat{f}}_{02} (y, h_{1}) d y \\ \int y {\hat{f}}_{02} (y, h_{2}) d y \\ {\overline{Y}}_{n_{02}} \end{matrix})

(53)

where

{\hat{f}}_{2} (h_{j}) = \frac{1}{n_{2} b_{n}} \sum_{i = 1}^{n_{2}} K (\frac{h_{j} - H_{i}}{b_{n}}) for j = 1, \dots, n_{2}

(54)

\int x {\hat{f}}_{12} (x, h_{j}) d x = \frac{1}{n_{12} b_{n}} \sum_{i = 1}^{n_{12}} X_{i} K (\frac{h_{j} - H_{i}}{b_{n}}) for j = 1, \dots, n_{12}

(55)

\int y {\hat{f}}_{02} (y, h_{j}) d y = \frac{1}{n_{02} b_{n}} \sum_{i = 1}^{n_{02}} Y_{i} K (\frac{h_{j} - H_{i}}{b_{n}}) for j = 1, \dots, n_{02}

(56)

Case 2: m (x,h) is quadratic

When m (x,h) is quadratic,

m_{012} (x, h) = E (Y ∣ X = x, H = h) = a_{0} + a_{1} x + a_{2} h + a_{3} x^{2} + a_{4} h^{2} + a_{5} x h

(57)

And equations (15) and (16) become:

{\begin{array}{l} (58) & \int (a_{0} + a_{1} x + a_{2} h + a_{3} x^{2} + a_{4} h^{2} + a_{5} x h) {\hat{f}}_{12} (x, h) d x = \int y {\hat{f}}_{02} (y, h) d y \\ (59) & \frac{1}{n_{12}} \sum_{i = 1}^{n_{12}} (a_{0} + a_{1} X_{i} + a_{2} H_{i} + a_{3} X_{i}^{2} + a_{4} H_{i}^{2} + a_{5} X_{i} H_{i}) = \frac{1}{n_{02}} \sum_{i = 1}^{n_{02}} Y_{i} \end{array}

(58)

{\begin{array}{l} (60) & a_{0} {\hat{f}}_{2} (h) + a_{1} \int x {\hat{f}}_{12} (x, h) d x + a_{2} h {\hat{f}}_{2} (h) + a_{3} \int x^{2} {\hat{f}}_{12} (x, h) d x + a_{4} h^{2} {\hat{f}}_{2} (h) + a_{5} h \int x {\hat{f}}_{12} (x, h) d x = \int y {\hat{f}}_{02} (y, h) d y \\ (61) & a_{0} + a_{1} {\overline{X}}_{n_{12}} + a_{2} {\overline{H}}_{n_{12}} + \frac{a_{3}}{n_{12}} \sum_{i = 1}^{n_{12}} X_{i}^{2} + \frac{a_{4}}{n_{12}} \sum_{i = 1}^{n_{12}} H_{i}^{2} + \frac{a_{5}}{n_{12}} \sum_{i = 1}^{n_{12}} X_{i} H_{i} = {\overline{Y}}_{n_{02}} \end{array}

Taking

h = h_{1}

and

h = h_{2}

for the first equation, we have

{\begin{array}{l} (62) & a_{0} {\hat{f}}_{2} (h_{1}) + a_{1} \int x {\hat{f}}_{12} (x, h_{1}) d x + a_{2} h_{1} {\hat{f}}_{2} (h_{1}) + a_{3} \int x^{2} {\hat{f}}_{12} (x, h_{1}) d x + a_{4} h_{1}^{2} {\hat{f}}_{2} (h_{1}) + a_{5} h_{1} \int x {\hat{f}}_{12} (x, h_{1}) d x = \int y {\hat{f}}_{02} (y, h_{1}) d y \\ (63) & a_{0} {\hat{f}}_{2} (h_{2}) + a_{1} \int x {\hat{f}}_{12} (x, h_{2}) d x + a_{2} h_{2} {\hat{f}}_{2} (h_{2}) + a_{3} \int x^{2} {\hat{f}}_{12} (x, h_{2}) d x + a_{4} h_{2}^{2} {\hat{f}}_{2} (h_{2}) + a_{5} h_{2} \int x {\hat{f}}_{12} (x, h_{2}) d x = \int y {\hat{f}}_{02} (y, h_{2}) d y \\ (64) & a_{0} {\hat{f}}_{2} (h_{3}) + a_{1} \int x {\hat{f}}_{12} (x, h_{3}) d x + a_{2} h_{3} {\hat{f}}_{2} (h_{3}) + a_{3} \int x^{2} {\hat{f}}_{12} (x, h_{3}) d x + a_{4} h_{3}^{2} {\hat{f}}_{2} (h_{3}) + a_{5} h_{3} \int x {\hat{f}}_{12} (x, h_{3}) d x = \int y {\hat{f}}_{02} (y, h_{3}) d y \\ (65) & a_{0} {\hat{f}}_{2} (h_{4}) + a_{1} \int x {\hat{f}}_{12} (x, h_{4}) d x + a_{2} h_{4} {\hat{f}}_{2} (h_{4}) + a_{3} \int x^{2} {\hat{f}}_{12} (x, h_{4}) d x + a_{4} h_{4}^{2} {\hat{f}}_{2} (h_{4}) + a_{5} h_{4} \int x {\hat{f}}_{12} (x, h_{4}) d x = \int y {\hat{f}}_{02} (y, h_{4}) d y \\ (66) & a_{0} {\hat{f}}_{2} (h_{5}) + a_{1} \int x {\hat{f}}_{12} (x, h_{5}) d x + a_{2} h_{5} {\hat{f}}_{2} (h_{5}) + a_{3} \int x^{2} {\hat{f}}_{12} (x, h_{5}) d x + a_{4} h_{5}^{2} {\hat{f}}_{2} (h_{5}) + a_{5} h_{5} \int x {\hat{f}}_{12} (x, h_{5}) d x = \int y {\hat{f}}_{02} (y, h_{5}) d y \\ (67) & a_{0} + a_{1} {\overline{X}}_{n_{12}} + a_{2} {\overline{H}}_{n_{12}} + \frac{a_{3}}{n_{12}} \sum_{i = 1}^{n_{12}} X_{i}^{2} + \frac{a_{4}}{n_{12}} \sum_{i = 1}^{n_{12}} H_{i}^{2} + \frac{a_{5}}{n_{12}} \sum_{i = 1}^{n_{12}} X_{i} H_{i} = {\overline{Y}}_{n_{02}} \end{array}

In matrix form

(\begin{matrix} {\hat{f}}_{2} (h_{1}) & \int x {\hat{f}}_{12} (x, h_{1}) d x & h_{1} {\hat{f}}_{2} (h_{1}) & \int x^{2} {\hat{f}}_{12} (x, h_{1}) d x & h_{1}^{2} {\hat{f}}_{2} (h_{1}) & h_{1} \int x {\hat{f}}_{12} (x, h_{1}) d x \\ {\hat{f}}_{2} (h_{2}) & \int x {\hat{f}}_{12} (x, h_{2}) d x & h_{2} {\hat{f}}_{2} (h_{2}) & \int x^{2} {\hat{f}}_{12} (x, h_{2}) d x & h_{2}^{2} {\hat{f}}_{2} (h_{2}) & h_{2} \int x {\hat{f}}_{12} (x, h_{2}) d x \\ {\hat{f}}_{2} (h_{3}) & \int x {\hat{f}}_{12} (x, h_{3}) d x & h_{3} {\hat{f}}_{2} (h_{3}) & \int x^{2} {\hat{f}}_{12} (x, h_{3}) d x & h_{3}^{2} {\hat{f}}_{2} (h_{3}) & h_{3} \int x {\hat{f}}_{12} (x, h_{3}) d x \\ {\hat{f}}_{2} (h_{4}) & \int x {\hat{f}}_{12} (x, h_{4}) d x & h_{4} {\hat{f}}_{2} (h_{4}) & \int x^{2} {\hat{f}}_{12} (x, h_{4}) d x & h_{4}^{2} {\hat{f}}_{2} (h_{4}) & h_{4} \int x {\hat{f}}_{12} (x, h_{4}) d x \\ {\hat{f}}_{2} (h_{5}) & \int x {\hat{f}}_{12} (x, h_{5}) d x & h_{5} {\hat{f}}_{2} (h_{5}) & \int x^{2} {\hat{f}}_{12} (x, h_{5}) d x & h_{5}^{2} {\hat{f}}_{2} (h_{5}) & h_{5} \int x {\hat{f}}_{12} (x, h_{5}) d x \\ 1 & {\overline{X}}_{n_{12}} & {\overline{H}}_{n_{12}} & \frac{1}{n_{12}} \sum_{i = 1}^{n_{12}} X_{i}^{2} & \frac{1}{n_{12}} \sum_{i = 1}^{n_{12}} H_{i}^{2} & \frac{1}{n_{12}} \sum_{i = 1}^{n_{12}} X_{i} H_{i} \end{matrix}) (\begin{matrix} a_{0} \\ a_{1} \\ a_{2} \\ a_{3} \\ a_{4} \\ a_{5} \end{matrix}) = (\begin{matrix} \int y {\hat{f}}_{02} (y, h_{1}) d y \\ \int y {\hat{f}}_{02} (y, h_{2}) d y \\ \int y {\hat{f}}_{02} (y, h_{3}) d y \\ \int y {\hat{f}}_{02} (y, h_{4}) d y \\ \int y {\hat{f}}_{02} (y, h_{5}) d y \\ {\overline{Y}}_{n_{02}} \end{matrix})

(68–73)

2.1.5. Performance Evaluation of the Models with Residual Analysis

Recall Equation (15):

\int m_{012} (x, h) {\hat{f}}_{12} (x, h) d x = \int y {\hat{f}}_{02} (y, h) d y

(15)

Taking

h = H_{j}

, for

j \in {1, \dots, n_{02}}

, and dividing by

{\hat{f}}_{2} (H_{j})

, we get

\hat{\hat{m}} (H_{j}) = \hat{E} (m_{012} (X, H) ∣ H = H_{j})

(74)

= \frac{\int {\hat{m}}_{012} (x, H_{j}) {\hat{f}}_{12} (x, H_{j}) d x}{{\hat{f}}_{2} (H_{j})}

(75)

= \frac{\frac{1}{n_{12} b_{n}} \sum_{i = 1}^{n_{12}} {\hat{m}}_{012} (X_{i}, H_{j}) K (\frac{H_{j} - H_{i}}{b_{n}})}{\frac{1}{n_{2} b_{n}} \sum_{i = 1}^{n_{2}} K (\frac{H_{j} - H_{i}}{b_{n}})}

(76)

= \frac{n_{2} \sum_{i = 1}^{n_{12}} {\hat{m}}_{012} (X_{i}, H_{j}) K (\frac{H_{j} - H_{i}}{b_{n}})}{n_{12} \sum_{i = 1}^{n_{2}} K (\frac{H_{j} - H_{i}}{b_{n}})}

(77)

as predicted values. Note that

\hat{\hat{m}} (H_{j})

should be close to

\hat{m} (H_{j}) = \frac{\int y {\hat{f}}_{02} (y, H_{j}) d y}{{\hat{f}}_{2} (H_{j})}

(78)

= \frac{\frac{1}{n_{02} b_{n}} \sum_{i = 1}^{n_{02}} Y_{i} K (\frac{H_{j} - H_{i}}{b_{n}})}{\frac{1}{n_{2} b_{n}} \sum_{i = 1}^{n_{2}} K (\frac{H_{j} - H_{i}}{b_{n}})}

(79)

= \frac{n_{2} \sum_{i = 1}^{n_{02}} Y_{i} K (\frac{H_{j} - H_{i}}{b_{n}})}{n_{02} \sum_{i = 1}^{n_{2}} K (\frac{H_{j} - H_{i}}{b_{n}})}

(80)

because of Equation (15).

The residuals can then be computed as

r_{j} = Y_{j} - \hat{\hat{m}} (H_{j}), for j \in {1, \dots, n_{02}} .

(81)

and Mean Squared Error as

M S E = \frac{1}{n_{02}} \sum_{j = 1}^{n_{02}} r_{j}^{2} = \frac{1}{n_{02}} \sum_{j = 1}^{n_{02}} {(Y_{j} - \hat{\hat{m}} (H_{j}))}^{2}

(82)

2.2. Demonstration Data Set

2.2.1. Destructive Sampling and External Tree Characterization

More than 40 candidate trees were initially identified through random selection on the LiDAR map of two unmanaged sites within the mixed wood stands of the Petawawa Research Forest (PRF) in Ontario, Canada (Figure 1). Of these, 18 trees were excluded due to decay or major structural defects, including forking, crooks, rotten branches, dead tops, and stem lean greater than 5°. The final sample consisted of 14 white pine aged (113 - 131 years) and 8 red pine trees (aged 70 – 128 years), which were manually felled using two approaches: (1) full-tree felling followed by delimbing on the ground (Figure 2a), and (2) delimbing the standing tree, lowering the branches to the ground with a crane, and subsequently felling the bole (Figure 2b). While the second method provided greater control and minimized damage, the first method resulted in breakage of approximately 5–10% of the branches.

After felling, all branches were organized to ensure that each limb remained intact and correctly oriented, and the bole and branches were positioned on a level surface for measurement but close to the stump. Stump characteristics, including stump height at the lowest and highest points and stump diameter were recorded. A reference line was then marked along the stem from the cut base to the tip (Figure 2c), and total tree length of the felled part was measured. Stump height was noted as the average height from the ground at the highest and lowest elevation. Total tree height is defined as the total of stump height and total tree length. The height to the base of the live crown (crown base height, CBH) is defined as the position of the lowest living whorl containing at least one live branch and with continuous whorls above it. Stem diameter was recorded at 1-m intervals up to the height of the first living branch.

For each whorl, starting from the last living branch to the tree apex, its position from the base of the stem was recorded. A sample of how the stem diameter at each recorded whorl was measured is shown in Figure 2d. Sum of the length between the branches of the whorl exactly 5 cm below the whorl was calculated as the diameter of the stem at the whorl. For every branch within each whorl, measurements included position relative to the reference line, branch length, branch diameter, branch inclination, and vitality status (dead or live). Finally, the stem was divided into 5-m sections for further bucking and evaluation. A small notch was made along the reference line at each 5-m mark (needed to keep track of the original orientation of each log within each tree for stem reconstruction, after which the stem was cut into logs and each segment was numbered.

To facilitate visualization and validation of the detailed measurements collected from the felled trees, a digital tree-reconstruction tool was developed in-house using OpenGL (Figure 3d,e). The tool visualizes the tree segments to the scale. Each tree was validated against the visualization model and bi-temporal lidar data that was flown before and after the harvest. Due either to the 50-cm width constraint during log CT scanning or to branch losses of 5 - 10% resulting from the felling process, four white pine and two red pine trees were excluded from further analysis.

Finally, a total of 58 white pine and 20 red pine 5m logs were sent for log reconstruction and knot characterisation. Because the height of the trees was variable, the number of logs and their lengths varied between trees. Over 97% of the logs had a length in the 4.5–5.5 m range and 2–7 logs were cut per tree (Table 1).

2.2.2. Knot Characterization

Following harvesting at the PRF, the segmented logs were transported to the Institut national de la recherche scientifique (INRS) in Quebec City for internal characterization. The knot measurement protocol was executed in two primary stages: initial X-ray Computed Tomography (CT) scanning, followed by automated feature extraction and measurement using the CT2Opti software suite.

X- Ray CT Scanning and Image Acquisition

Log sections were scanned using a Siemens Somatom Definition 128 AS+ medical CT scanner (Siemens-Healthineers, Erlangen, Germany) available at the INRS. To accommodate the scanner’s 2.1 m length limit, each 5 m log was halved (Figure 4a) and scanned in two passes (Figure 4b). CT images were reconstructed at 1 mm intervals along the longitudinal axis with a 2 mm slice thickness. This 1 mm overlap was intentional, to ensure maximum knot detection across consecutive frames [38]. Scanning parameters included an X-ray voltage of 120 kV, a current of 275 mAs, a pitch of 0.5, and a variable field of view yielding a 0.6 mm x 0.6 mm pixel resolution. Image reconstruction utilized the I70f Safir 3 filter, producing approximately 2,500 images per 2.5 m section.

The resulting Hounsfield units (gray tones) were utilized as a proxy for wood density [39]. While knots typically appear brighter due to higher density relative to clear wood [40], sapwood and wet pockets can exhibit similar density profiles, potentially obscuring contrast. This challenge was addressed using the specialized extraction pipeline detailed below.

Knot Extraction and Measurements with CT2Opti

Internal features were extracted using CT2Opti and Optitek 10, a specialized image-processing suite (please consult [38,41] for details) that combines morphological filtering, thresholding, and edge detection. The extraction followed a refined pipeline that distinguished real knots from false positives by requiring objects to originate near the pith and extend radially toward the log surface. The three-step pipeline can be described as:

Candidate Identification: Where potential knot sections were identified in each 2D image
3D Reconstruction: Where individual sections were grouped into 3D objects by connecting adjacent pixels with similar characteristics across the image stack and
Biological Filtering: Real knots were distinguished from false positives (e.g., wet pockets) by applying a knot model requiring objects to originate near the pith and extend radially toward the log surface.

This process enabled the detailed 3D reconstruction of diverse log profiles, ranging from large-diameter white pine butt logs with significant flare (Figure 4d) and low-tapered intermediary sections (Figure 4e) to complex crown logs with smaller diameters and higher knot densities (Figure 4f–g). High-resolution visualizations further demonstrate the characteristic anatomical expansion of knot size as branches grow from the pith toward the outer bark (Figure 4h).

Following manual review and digital reassembly along the pith-line, to reconstruct the complete tree architecture, high resolution metrics were measured for each sample with care. Metrics extracted by CT2Opti includes Total number of knots; Total knot volume (cm³) along with Average knot volume, Minimum knot volume and Maximum knot volume; Average knot area (cm²) along with Minimum knot area and Maximum knot area (found at the surface of the log), and total log volume (cm³). Table 2 summarises the measurements extracted from the felled white and red pine logs. In addition to these metrics, K/T index was also computed. K/T is defined as the proportion of knot volume over the total volume of the reconstructed tree, which provides insights of between tree or species variations [42].

2.3. Application of the Model – Estimating Knot Characteristics of White and Red Pines

2.3.1. Regression Strategy and Data Preparation

As the covariates varied between the lower and upper portions of the tree, the Regression of Misaligned Covariates (RMC) was estimated independently for the Clear Stem (CS) and the Living Crown (LC) across both species, white pine (WP) and red pine (RP). In this framework, trees were grouped by species. To evaluate model performance, a leave-one-tree-out cross-validation approach was adopted: for each species, one tree was sequestered as the evaluation (test) dataset, while the remaining pooled trees served as the training dataset.

To account for varying total heights among the pooled samples, the vertical coordinate was transformed into a relative altitude (RelAltitude) variable. This dimensionless index ranges from 0 to 1, where:

For the CS: 0 represents the base of the bole and 1 represents the highest point of the clear stem.
For the LC: 0 represents the transition from the clear stem and 1 represents the topping height.

Prior to modeling, every tree was validated by ensuring the total bole length matched the cumulative lengths of the CT scanned logs following the knot characterization process.

Variable Selection: The internal characteristics serving as response variables (Y) included Knot Volume (KnotVolume), Surface Knot Diameter (SurfaceKnotDiameter), and Knot Size (KnotSize)—the latter derived from the surface diameter assuming a circular knot geometry. The external covariates utilized for the LC included Log Diameter (LogDiameter), Branch Length (BranchLength), and Branch Diameter (BranchDiameter), while the CS model relied primarily on LogDiameter.

2.3.2. Model Application to Different Pine Species

As previously described, given a small set of felled trees, a Leave-One-Tree-Out cross-validation approach was employed to train the RMC model. Data partitioning was strictly controlled at the tree level rather than the individual knot level. This was to ensure independence between training and validation sets. In this demonstration, knot volume was chosen as the response as it connects statistical estimation directly to cumulative defect risk along the bole, allowing model performance to be evaluated in terms that are immediately relevant to log sorting and wood quality assessment. In addition, covariates that are directly estimable from LiDAR point clouds are chosen, viz., BranchLength and BranchDiameter within the live crown, and LogDiameter for the clear stem, to facilitate seamless integration of the proposed RMC framework with operational remote-sensing workflows and enable stand-level prediction of knot volume without reliance on destructive measurements.

Discrete knot volume predictions were integrated along the length of the bole to generate cumulative response profiles characterizing the rate of defect accumulation from the stump to the tree top. These profiles provide a spatially explicit basis for identifying optimal bucking points and log-quality thresholds.

2.3.3. Model Formulations and Software

Four distinct variations of the generalized model were implemented to evaluate different structural assumptions regarding the relationship between the response (Y), the covariates (X), and the relative height (H):

norm: A parametric approach assuming (Y, X, H) follows a trivariate normal distribution.
linear: A model assuming the regression function m(x, h) is linear in both x and h.
quad: A model assuming m(x, h) follows a quadratic polynomial relationship in x and h.
cubic: A model assuming m(x, h) follows a cubic polynomial relationship in x and h.

It is to be noted that despite the implementation of a cubic model, this study focuses only on norm, linear and quad models to maintain the intended scope.

Performance of the models was evaluated using the mean squared error derived in the section above. All modeling and statistical analyses were performed using R (v.2026.01.0; [43]).

3. Results and Discussion

3.1. Summary of the Demonstration Dataset

The demonstration dataset comprised mature timber, with most sampled trees exceeding 111 years of age; specifically, WP ranged from 111 to 131 years and RP) from 70 to 128 years, with only a single specimen (RP4) representing a younger age class at 70 years (Table 1). All samples adhered to a 50 cm diameter restriction necessitated by the physical aperture limits of the CT scanner. The WP specimens exhibited an average diameter at breast height (DBH) of 43.2 cm and an average total height of 27.54 m, while RP specimens averaged 40.0 cm in DBH and 25.22 m in height. Anatomical complexity varied significantly between the species: WP trees averaged 181 whorls per tree compared to 129 for RP. Out of a total of 2,104 recorded branches (1,590 WP; 514 RP), approximately 60% of them were identified as dead.

The comprehensive internal wood characterisation spanned a total bole length of 155 m for WP and 87 m for RP, resulting in the characterization of 3,316 and 1,150 knots, respectively (Table 2). Of these knots, over 90% of the surface diameters were below 8 cm, and 90% of knot sizes were below 20 cm² for both species. However, knot volume exhibited greater inter-species variation: 90% of the knots in RP were below 150 cm³, whereas the 90th percentile for WP extended up to 400 cm³ (Table 2).

3.2. Process of RMC to the Demonstration Dataset

Figure 5 illustrates the implementation of the RMC framework in R, predicting internal knot volume from externally measured log diameter across the entire bole of a sample RP specimen. It documents a two-step estimation strategy: estimate the one-covariate conditional means

m_{0,2} (h) = E [Y ∣ H = h]

and

m_{1,2} (h) = E [X ∣ H = h]

from training data, then solve the integral equations that recover the two-covariate regression

m_{0,1, 2} (x, h) = E [Y ∣ X = x, H = h]

under a chosen finite-dimensional model (normal, linear, quadratic, cubic, etc.). In this example, the training phase utilized three RP trees, comprising a total of 1,107 RelAltitude, 361 LogDiameter and 726 KnotVolume observations. LogDiameter (x) and KnotVolume (y), are not aligned in space, yet both exhibit distinct correlations with the shared spatial index, RelAltitude (h).

The resulting predicted surfaces demonstrate the framework’s ability to reconcile these disjointed variables into a unified 3D modeling space. Aside from the data range, the distributions of the training and test variables are largely consistent. While KnotVolume exhibits significant noise in both datasets, most observations remain below 70 cm³. Within this range, all candidate models, apart from the quad, successfully track the central density of the measured data across the tree’s relative height. Quantitative prediction performance, as indicated by the MSE, shows that the norm) and linear models yield results closely aligned with the measured data and the optimized m-hat estimator.

3.3. Application of RMC to Two Different Species

One sample tree from each of the species was randomly selected (viz., WP11 and RP5) to evaluate application of RMC to the two pine species. Both are matured timber of similar ages, of 125 (WP11) and 128 years (RP5), but exhibit distinct structural differences that test the model’s flexibility (Table 3). While RP5 reached a greater height (29.92 m) compared to WP11 (26.46 m), WP11 has a longer clear stem (10.3 m vs 7.75 m). In the living crown part, RP5 has higher number of branches as well as lower proportion of dead branches. As a result, RP5 has a higher number of knots, and almost twice the amount of knot volume compared to WP11. The higher variance in knot volume (up to 428.60 cm³) and the distinct K/T ratios between the two trees provide a comprehensive dataset to evaluate the predictive accuracy of the RMC framework across different pine species.

Table 4 details the partitioning used across the two primary physiological segments (clear stem and live crown). In the clear stem, the dataset maintains a high concentration of knot observations relative to the primary covariate (LogDiameter) to effectively model the spatial misalignment between disjointed measurements. Reflecting the greater morphological complexity of the upper bole, the living crown had significantly larger training sets, for instance, training data comprised up to 2,130 knot observations for white pine, and 620 for red pine, accounting for the increased density and volumetric variance of the branch base.

The resulting estimated regressions after applying the RMC framework, which characterize these spatial relationships for both species, are presented below and illustrated in Figure 6.

Estimated regressions:

(a): White pine – Clear stem with LogDiameter:

\hat{m} (x, h) = {\hat{m}}_{012} (x, h) = \hat{E} (Y | X = x, H = h) = 5.69 * 10^{+ 01} + 4.47 * 10^{- 02} x + 2.59 h (n r m a l

)

{\hat{m}}_{012} (x, h) = 7.82 * 10^{+ 01} - 5.93 * 10^{- 01} x + 8.41 h (l i n e a r)

{\hat{m}}_{012} (x, h) = 1.68 * 10^{+ 02} + 1.49 * 10^{+ 01} x - 2.01 * 10^{+ 02} h - 3.79 * 10^{- 01} x^{2} + 2.23 * 10^{+ 02} h^{2} - 3.51 x h (q u a d)

(b) White pine – live crown with BranchLength

\hat{m} (x, h) = {\hat{m}}_{012} (x, h) = \hat{E} (Y | X = x, H = h) = 6.53 * 10^{+ 01} - 5.28 * 10^{- 01} x - 4.05 * 10^{+ 01} h (n o r m a l)

{\hat{m}}_{012} (x, h) = - 3.20 * 10^{+ 01} + 2.52 x + 3.41 h (l i n e a r)

{\hat{m}}_{012} (x, h) = 7.14 * 10^{+ 01} - 2.20 x + 1.59 * 10^{+ 01} h + 4.01 * 10^{- 02} x^{2} - 3.76 h^{2} - 1.76 x h (q u a d)

(c) White pine – live crown with BranchDiameter

\hat{m} (x, h) = {\hat{m}}_{012} (x, h) = \hat{E} (Y ∣ X = x, H = h) = 1.11 * 10^{+ 02} - 1.01 * 10^{+ 01}

(normal)

{\hat{m}}_{012} (x, h) = 8.19 * 10^{+ 01} - 6.04 x - 2.85 h

(linear)

{\hat{m}}_{012} (x, h) = 4.79 * 10^{+ 01} + 6.04 * 10^{+ 01} x + 4.60 * 10^{+ 01} h - 1.04 * 10^{+ 01} x^{2} - 2.25 * 10^{+ 01} h^{2} - 2.36 * 10^{+ 01} x h

(quad)

(d) Red pine – Clear stem:

\hat{m} (x, h) = 1.96 * 10^{+ 01} + 4.83 * 10^{- 01} x + 2.16 * 10^{+ 01} h (n o r m)

{\hat{m}}_{012} (x, h) = - 1.64 * 10^{+ 01} + 2.25 x - 3.71 h (l i n e a r)

{\hat{m}}_{012} (x, h) = 3.15 * 10^{+ 01} - 2.95 x + 8.87 h + 9.04 * 10^{- 02} x^{2} - 3.86 * 10^{+ 01} h^{2} + 1.98 x h (q u a d)

(e) Red pine – live crown with BranchLength

\hat{m} (x, h) = {\hat{m}}_{012} (x, h) = \hat{E} (Y | X = x, H = h) = - 9.34 + 3.49 * 10^{- 01} x - 2.74 * 10^{+ 01} h (n o r m a l)

{\hat{m}}_{012} (x, h) = 7.88 * 10^{+ 01} - 9.02 * 10^{- 02} x - 3.18 h (l i n e a r)

{\hat{m}}_{012} (x, h) = 8.65 * 10^{+ 01} - 9.58 * 10^{- 01} x + 1.19 * 10^{+ 01} h + 2.87 * 10^{- 03} x^{2} - 1.24 h^{2} + 2.25 * 10^{- 02} x h (q u a d)

(f) Red pine – live crown with BranchDiameter

\hat{m} (x, h) = {\hat{m}}_{012} (x, h) = \hat{E} (Y | X = x, H = h) = 2.11 * 10^{+ 02} - 3.86 * 10^{+ 01} x - 3.37 * 10^{+ 01} h (n o r m a l)

{\hat{m}}_{012} (x, h) = 1.08 * 10^{+ 02} - 2.40 * 10^{+ 01} x + 4.31 * 10^{+ 01} h (l i n e a r)

{\hat{m}}_{012} (x, h) = 8.17 * 10^{+ 01} - 2.03 * 10^{+ 01} x - 4.21 * 10^{+ 01} h + 1.74 x^{2} + 5.98 * 10^{+ 01} h^{2} - 2.52 x h (q u a d)

It should be noted that the m-hat estimator (Nadaraya-Watson estimator) provides a theoretical ceiling (a theoretical maxima and a benchmark) for predictive accuracy by utilizing known response (y) values. It represents the lowest possible error achievable for a given data structure. In practice, implementation for unknown timber relies on the proposed parametric models that utilize only the covariates (here they are externally measurable variables from LiDAR like log diameter, branch length and height).

In this application, m-hat performs well across all variable combinations, species and physiological sections justifying a good benchmark (Figure 6 and Table 5). It yields low or closest MSE to measured deviation (sd), for example 37.98 vs 38.71 and 37.95 vs 38.05 in the clear stem of WP11 and RP5 respectively, with conservative spread avoiding extreme overpredictions at high values. Similarly in the living crown, the model performance is comparable for both sets of covariates, BranchLength (38.37 vs 41.09 and 82.3 vs 83.92 for WP11 and RP5 respectively) and BranchDiameter (82.3 vs 83.92 and 37.96 vs 38.03 for WP11 and RP5 respectively), indicating its robustness in complex structural heterogeneity well. This also emphasizes its role as a strong operational benchmark for LiDAR-based standing-tree wood-quality estimation.

Within the clear stem section, the predictive performance of the proposed models varied between the species (Table 5 and Table 6). Both the linear and normal models closely approximated the trajectory of the m-hat benchmark across the bole, with linear model having a slightly higher bias at higher altitudes in WP11 compared to RP5 (Figure 6). Comparing their predictive performance, both models have lower bias and consistently near-parity ratio against that of m-hat, establishing their reliability for predicting knot volume from external LogDiameter measurements. Conversely, the quadratic (Quad) model exhibited a consistent upward bias at higher RelAltitude and five times higher error compared to m-hat. This sensitivity to extreme values was particularly pronounced in the morphologically complex white pine (WP11) relative to the more uniform red pine (RP5).

Predictions in the crown section show higher overall error across all models due to increased data variance, yet the hierarchy of performance remains consistent (Figure 6, Table 5 and Table 6). For WP11,BranchLength as well as BranchDiameter all models have near-parity with the m-hat, especially normal performed within 10% of m-hat, but showed higher mean bias in estimating KnotVolume when using BranchLength. For RP5, the quadratic model performed surprisingly well in the crown using BranchDiameter (41.92 vs. 37.96 for m-hat and low mean bias), suggesting that for this specific species and section, a slightly curved relationship better captures the volume of larger branch bases. However, linear model performs well with BranchLength as the predictor closest to m-hat with a comparatively low mean bias (Table 5 and Table 6).

3.4. Management Implications – Turning Data into Decisions for Bucking and Log Sort

Profiles of the cumulative estimated knot volume along the bole height for the proposed and benchmark models (WP11 and RP5) are illustrated in Figure 7. Vertical markers indicate the heights at which 25%, 50%, and 75% of the total knot volume are exceeded. These profiles provide a high-resolution map of internal wood quality for individual standing trees, identifying transition points and accumulation rates that allow foresters to move beyond external visual assessments toward more precise, volume-based grading.

For both species, the 25% threshold identifies the portion of the bole most likely to yield high-quality butt logs with minimal knot impact. When this threshold lies above the clear stem height, the first log can be confidently assigned to sawlog or veneer classes. The 50% threshold marks a transition zone where knot influence becomes substantial; logs cut above this height are increasingly likely to require downgrading to lower-grade sawlogs and lumber. Finally, the 75% threshold suggests that the upper bole sections contribute little additional high-quality material and are better directed to pulp or biomass streams.

Except for the quadratic model in WP11, all proposed models effectively captured the accumulation trends observed in the benchmark and measured data profiles. Although the clear stem of WP11 (10.3 m) is longer than that of RP5 (7.75 m), WP11 accumulates the first 25% of its knot volume at a significantly faster rate. Beyond this point, accumulation in WP11 becomes gradual, whereas RP5—with its more complex branch structure—adds knot volume rapidly after the 50% threshold. Specifically, while WP11 reaches the 25% mark at 10.7 m (near the clear stem top), RP5 does not reach this same threshold until 13.8 m.

Species-specific differences in these threshold positions provide a framework for differentiated sorting strategies. WP11’s later accumulation of most of its knot volume offers a greater proportion of high-grade lower logs. Conversely, the earlier accumulation trends in RP5 imply that a more conservative bucking height is required for premium products. Ultimately, these cumulative curves allow managers to predefine bucking heights that balance value recovery and defect risk, improving consistency and efficiency in industrial log sorting.

4. Discussion

This study introduces a generalized statistical framework, Regression of Misaligned Covariates, for reconciling covariates and responses that are measured at different spatial locations - a long-standing challenge in wood-quality modelling and in the integration of multi-source forest data. By combining the Law of Total Expectation with kernel-based estimation of marginal relationships, the RMC approach provides a principled way to recover conditional mean structure even when the underlying datasets are spatially misaligned. This directly addresses the “change of support” problem that arises when LiDAR-derived external attributes and destructively sampled internal knot properties are not observed at the same coordinates.

Applying RMC to eastern white pine and red pine demonstrates the flexibility of the framework across contrasting physiological architectures. Red pine, with its uniform crown structure, provided a stable baseline, while eastern white pine with its irregular branching and susceptibility to structural disturbances, served as a rigorous test of robustness. Across both species, RMC produced smooth, biologically interpretable patterns of knot volume along the bole, capturing species-specific differences in the rate and height at which knot volume accumulates.

The study also demonstrates simplicity of its application and operational practicality. The results highlight that the method does not require species-specific parameterization or complex covariate sets, history or stand conditions. In fact, the model performed well using only height and a single LiDAR-derivable covariate. Notably, the framework produced stable estimates even when trained on a very small dataset, for instance only four red pine trees were used to train and predict RP5 knot characteristics, suggesting that RMC can extract meaningful structure from limited data, though performance will naturally improve as larger datasets become available.

The cumulative knot-volume profiles derived from RMC also illustrate the direct management value of the framework. Expressing knot accumulation as a function of height provides a clear link to operational decisions such as bucking, log sorting, and product allocation. Thresholds marking where 25%, 50%, and 75% of total knot volume occur allow managers to identify the heights at which log quality begins to decline, assess the likely grade of the first and second logs, and anticipate the proportion of the stem suitable for sawlog, dimension lumber and/or studwood, or pulp. Because these thresholds are expressed as cumulative proportions, they can be compared across species, stands, and management regimes, making the approach broadly applicable in both industrial and silvicultural contexts.

Beyond wood-quality modelling, RMC represents a general framework for any application involving misaligned covariates and responses. Many ecological, environmental, and remote-sensing datasets share this structure—for example, soil properties measured at sparse points paired with canopy metrics from LiDAR, or wildlife observations paired with environmental covariates collected at different scales. The RMC formulation is therefore not limited to forestry; it provides a transferable solution for a broad class of misalignment problems where traditional regression approaches fail.

There are, however, several avenues for further work. First, the choice of kernel bandwidth

(b_{n})

plays a central role in shaping the smoothness and accuracy of the marginal estimates that feed into the RMC solution. A careful, data-driven selection of

b_{n}

—potentially through cross-validation or adaptive bandwidth strategies—will be important for optimizing performance in different forest conditions. Second, the present study demonstrates the framework using two covariates (height and one structural attribute) to illustrate the core idea. Extending RMC to three or more covariates

(H, X_{1}, \dots, X_{p})

is a natural next step and would allow the integration of richer LiDAR-derived metrics such as branch angle, crown width, or local curvature of the stem. Finally, although the method performed well on small datasets, broader validation across larger samples, additional species, and diverse stand structures will help refine its operational reliability.

Overall, RMC provides a portable and adaptable modelling framework that can be applied to any species or region without requiring species-specific calibration. Because the covariates used here are directly measurable from terrestrial or airborne LiDAR, the approach enables non-destructive prediction of internal wood quality in standing trees. This positions RMC as a practical tool for pre-harvest planning, value-based harvesting, and long-term monitoring of stand development - supporting more precise and sustainable forest management.

5. Conclusions

This study presents RMC as a general and scalable solution to the persistent problem of misaligned covariates and responses in forestry and beyond. By validating the framework across two physiologically contrasting conifer species, we demonstrate that reliable, biologically interpretable estimates of knot volume can be recovered even from small datasets and using only minimal LiDAR-derived covariates. The resulting cumulative profiles translate directly into operational guidance for bucking, sorting, and value-based harvesting, while also enabling non-destructive assessment of standing-tree wood quality. Because RMC is not tied to species-specific parameters and can incorporate additional covariates as needed, it is readily portable across regions, forest types, and remote-sensing platforms. As multi-source data become increasingly central to forest monitoring, RMC offers a flexible pathway for integrating external structural measurements with internal wood-quality attributes, supporting both pre-harvest decision-making and long-term sustainable management.

Author Contributions

Conceptualization, UV; methodology, UV, ID, AS, MD.; software, MD.; validation, MD, UV.; formal analysis, UV.; investigation, UV, MD, AS.; resources, UV, ID, AS.; data curation, UV, ID, MD; writing—original draft preparation, UV, MD, ID, AS.; writing—review and editing, UV; visualization, UV.; supervision, UV, ID, AS.; project administration, ID, UV.; funding acquisition, ID, UV. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by multiple sources. Udayalakshmi Vepakomma received financial support from Natural Resources Canada under the Transformative Technologies contribution agreement with FPInnovations and the industry members of FPInnovations, and Isabelle Duchesne obtained funding from the Forest Innovation Program of Natural Resources Canada (via the Canadian Wood Fibre Centre).

Acknowledgments

The authors would like to thank the Petawawa Research Forest (Canadian Forest Service, Natural Resources Canada) for providing access to the study area and supporting our field trials. We are grateful to Jacques Lirette and Josianne Guay (FPInnovations) for their essential support and assistance with field assessments, tree harvesting, and meticulous data processing. Special thanks go to Steve Vallarand (FPInnovations) for his technical expertise in modifying CT2Opti to meet the specific requirements of this project, to Airu Ji for assistance with knot characterization, to Luc Bedard (FPInnovations) for his valuable support on Optitek, and to Pierre Francus and Louis-Frédéric Daigle (INRS) for providing CT scanning expertise. We also thank Denis Cormier (FPInnovations) for his thorough review of the manuscript and for providing insightful suggestions throughout the project. During the preparation of this work, the authors used Gemini and Copilot to improve clarity and brevity of the text, especially in the methods section. After using this tool, the authors reviewed and edited the content thoroughly as needed and take full responsibility for the content of the publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

RMC	Regression of Misaligned Covariates
RP	Red pine
WP	White Pine
LiDAR	Light Detection and Ranging
NDE	Non-Destructive Evaluation
ALS	Airborne LiDAR Scanning
TLS	Terrestrial LiDAR Scanning
ULS	UAV borne LiDAR Scanning
UAV	Unmanned Areal Vehicle
CT	X-Ray? Computed Tomography
PRF	Petawawa Research Forest
INRS	Institut national de la recherche scientifique
CS	Clear Stem
LC	Living Crown
DBH	Diameter at Breast Height
CBH	Canopy Base Height
MSE	Mean-Squared Error

References

Wang, X.; Ross, R.J. forest wood quality assessments: NDT technologies; USDA Forest Service, 2022. [Google Scholar]
Nocetti, M.; Brunetti, M. Advancements in standing-tree visual wood-quality assessment: A review. Forests 2024, 15, 943. [Google Scholar] [CrossRef]
Rudnicki, M.; Wang, X.; Ross, R.J.; Allison, R.B.; Perzynski, K. Measuring wood quality in standing trees: A review; USDA Forest Service, 2017; p. FPL-GTR-248. [Google Scholar]
Schimleck, L.; Dahlen, J.; Apiolaza, L.A.; et al. Non-destructive evaluation techniques for wood property variation. Forests 2019, 10, 728. [Google Scholar] [CrossRef]
Schimleck, L.; Dahlen, J.; Mora, C.; et al. Maps of within-tree wood property variation in North American conifers. Canadian Journal of Forest Research 2025, 55, 1–14. [Google Scholar] [CrossRef]
Clark, A., III; McAlister, R.H. Visual tree grading systems for estimating lumber yields in southern pine. Forest Products Journal 1998, 48(10), 59–67. [Google Scholar]
Trincado, G.; Burkhart, H. A model of knot shape and volume in loblolly pine. Wood and Fiber Science 2008, 40(4), 634–646. [Google Scholar]
Burkhart, H.E.; Tomé, M. Modeling Forest Trees and Stands; Springer: Heidelberg, 2012. [Google Scholar]
Barrette, J.; Achim, A.; Auty, D. Impact of intensive forest management practices on wood quality from conifers: Literature review and reflection on future challenges. Current Forestry Reports 2023, 9, 101–130. [Google Scholar] [CrossRef]
Drew, D.M.; Downes, G.M.; Seifert, T.; et al. Progress and applications in wood quality modelling. Current Forestry Reports 2022, 8, 317–332. [Google Scholar] [CrossRef]
Holmgren, J.; Persson, Å. Identifying species of individual trees using airborne laser scanning. Remote Sensing of Environment 2004, 90, 415–423. [Google Scholar] [CrossRef]
Popescu, S.C.; Zhao, K. A voxel-based LiDAR method for estimating crown base height. Remote Sensing of Environment 2008, 112, 767–781. [Google Scholar] [CrossRef]
Vepakomma, U.; Cormier, D. Valuing forest stands using UAV-based LiDAR. ISPRS Archives 2019, XLII-2/W13, 643–647. [Google Scholar]
Li, T.; Shen, X.; Zhou, K.; Cao, L. Estimating tree structure and wood density of Ginkgo biloba using TLS and resistance drilling. Remote Sensing 2024, 17(1), 99. [Google Scholar] [CrossRef]
Alvites, C.; Marchetti, M.; Lasserre, B.; Santopuoli, G. LiDAR as a tool for assessing timber assortments: A systematic literature review. Remote Sensing 2022, 14(18), 4466. [Google Scholar] [CrossRef]
Vepakomma, U.; Cormier, D.; Hansson, L.; Talbot, B. Remote sensing at local scales for operational forestry. In Boreal Forests in the Face of Climate Change; Springer, 2023. [Google Scholar]
Zolotarev, F.; Eerola, T.; Lensu, L.; et al. Modelling internal knot distribution using external log features. Computers and Electronics in Agriculture 2020, 179, 105795. [Google Scholar] [CrossRef]
Miao, Z.; Zhao, X.; Jiang, Y.; et al. Integrating taper and knot-length models for Korean pine. Journal of Forestry Research 2026, 37, 37. [Google Scholar] [CrossRef]
An, Z.; Froese, R.E. Tree stem volume estimation from terrestrial LiDAR point cloud by unwrapping. Canadian Journal of Forest Research 2023, 53(1), 60–70. [Google Scholar] [CrossRef]
Vandendaele, B.; Martin-Ducup, O.; Fournier, R.A.; Pelletier, G. Evaluation of mobile laser scanning scenarios for automated wood volume estimation. Canadian Journal of Forest Research 2024, 54(6), 774–792. [Google Scholar] [CrossRef]
Stängle, S.M.; Brüchert, F.; Kretschmer, U.; Spiecker, H.; Sauter, U.H. Clear wood content in standing trees predicted from branch scar measurements with terrestrial LiDAR and verified with X-ray computed tomography. Canadian Journal of Forest Research 2014, 44(2), 145–153. [Google Scholar] [CrossRef]
Pehkonen, M.; Vastaranta, M.; Holopainen, M.; et al. Identification of branch whorls and sawlogs using TLS and deep learning. Forestry 2025, 98(5), 712. [Google Scholar] [CrossRef]
Morhart, C.; Schindler, Z.; Frey, J.; et al. Limitations of estimating branch volume from TLS. European Journal of Forest Research 2024, 143, 687–702. [Google Scholar] [CrossRef]
Balestra, M.; Marselis, S.; Sankey, T.T.; et al. LiDAR data fusion to improve forest attribute estimates: A review. Current Forestry Reports 2024, 10, 281–297. [Google Scholar] [CrossRef]
Tho, Z.Y.; Hui, F.K.C.; Welsh, A.H.; Zou, T. Cokrig-and-Regress for spatially misaligned environmental data. arXiv 2024. [Google Scholar] [CrossRef]
Gotway, C.A.; Young, L.J. Combining incompatible spatial data. Journal of the American Statistical Association 2002, 97(458), 632–648. [Google Scholar] [CrossRef]
Gelfand, A.E.; Zhu, L.; Carlin, B.P. On the change of support problem for spatio-temporal data. Biostatistics 2001, 2(1), 31–45. [Google Scholar] [CrossRef]
Calders, K.; Newnham, G.; Burt, A.; et al. Nondestructive estimates of above-ground biomass using terrestrial laser scanning. Methods in Ecology and Evolution 2015, 6(2), 198–208. [Google Scholar] [CrossRef]
Raumonen, P.; Kaasalainen, M.; Åkerblom, M.; Kaasalainen, S.; Kaartinen, H.; Vastaranta, M.; Holopainen, M.; Disney, M.; Lewis, P. Fast automatic precision tree models from terrestrial laser scanner data. Remote Sensing 2013, 5(2), 491–520. [Google Scholar] [CrossRef]
Fowler, D.P.; Morris, R.W. Genetic diversity in red pine: Evidence for low heterozygosity. Canadian Journal of Forest Research 1977, 7(2), 349–357. [Google Scholar] [CrossRef]
Gilmore, D.W.; Palik, B.J. A Synthesis of Red Pine Silviculture in the Great Lakes Region. In USDA Forest Service; 2006; p. GTR NC-262. [Google Scholar]
Rapraeger, E.F. Development of branches and knots in western white pine. Journal of Forestry 1939, 37(3), 239–245. [Google Scholar] [CrossRef]
Mottet, M.J.; Daoust, G.; Zhang, S.Y. Impact of white pine weevil on Norway spruce lumber properties. Forestry Chronicle 2006, 82(6), 834–843. [Google Scholar] [CrossRef]
Dreibelbis, S.R.; Germain, R.H.; Smith, W.B. Longer crowns and fewer knots: Managing for high-quality eastern white pine. Journal of Forestry 2025, 123, 321–338. [Google Scholar] [CrossRef]
Côté, J.F.; Fournier, R.A.; Frazer, G.W.; Niemann, K.O. A fine-scale architectural model of trees to enhance LiDAR-derived measurements of forest canopy structure. Agricultural and forest meteorology 2012, 166, 72–85. [Google Scholar] [CrossRef]
Klockow, P.A.; Putman, E.B.; Vogel, J.G.; Moore, G.W.; Edgar, C.B.; Popescu, S.C. Allometry and structural volume change of standing dead southern pine trees using non-destructive terrestrial LiDAR. Remote Sensing of Environment 2020, 241, 111729. [Google Scholar] [CrossRef]
Wand, M.P.; Jones, M.C. Kernel Smoothing; Chapman & Hall, 1995. [Google Scholar]
Belley, D.; Duchesne, I.; Vallerand, S.; Barrette, J.; Beaudoin, M. Computed tomography scanning of internal log attributes prior to sawing increases lumber value in white spruce and jack pine. Canadian Journal of Forest Research 2019, 49, 1516–1524. [Google Scholar] [CrossRef]
Freyburger, C.; Longuetaud, F.; Mothe, F.; Constant, T.; Leban, J.-M. Measuring wood density using X-ray CT. Annals of Forest Science 2009, 66, 804. [Google Scholar] [CrossRef]
Boutelje, J.B. On the anatomical structure, moisture content, density, shrinkage, and resin content of the wood in and around knots in Swedish pine and spruce. Svensk Papperstidning 1966, 69(1), 1–10. [Google Scholar]
Vallerand, S.; Belley, D.; Duchesne, I.; Beaudoin, M. Utilisation d’Images CT pour la Modélisation 3D de Billes Réelles; ForêtValeur, 2011. [Google Scholar]
Ji, A.; Cool, J.; Duchesne, I. Using CT-reconstructed logs to predict knot characteristics and tree value. Forests 2021, 12, 720. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation: Vienna, 2026. [Google Scholar]

Figure 1. Local and characteristics of the site where sample trees were felled.

Figure 2. Destructive sampling and external tree measurement. (a) Felling of the whole tree prior to delimbing (b) Arborist carefully delimbing the branches before felling (c) Reference line along the bole for measurement (d) sample guideline - measuring diameter at a whorl (e) 5-m log segments of the bole after completing measurements (f) Logs carefully transported to the lab for CT scanning of internal log attributes.

Figure 3. Validation and comparative visualization of the felled tree external tree measurements (a) Field photo of red and white pines in the test site (b) and (c) Bi-temporal lidar point clouds showing the structural state of a felled white pine tree (d) and (e) Scaled digital reconstruction of white and red pine sample felled tree respectively using the developed architectural model based on precise manual field measurements (f) and (g) Segmented white pine log showing the cross-section and side profile respectively.

Figure 4. Knot extraction using CT2Opti and visualization using Optitek: (a) Segmenting a 5 m log to meet scanner size constraints; (b) X-ray CT scanning of a white pine log at INRS (Photo credit: INRS); (c) a typical X-ray image with contrasting (denser) knots; (d)–(h) Visualized samples in Optitek 10 (after CT2Opti knot extraction): (d) 2.3 m white pine butt log (LED = 48.73 cm, Volume = 330.39 dm³); (e) 2.02 m intermediary white pine log (LED = 42.13 cm, Volume = 259.96 dm³); (f) 1.84 m white pine crown log LED = 17.87 cm, Volume = 37.33 dm³); (g) 2.34 m red pine crown log (LED = 25.98 cm, Volume = 67.36 dm³); (h) Close-up of a whorl from log illustrating the radial increase in knot size from pith to surface.

Figure 5. Illustration of the proposed workflow with a sample prediction of knot volume along the clear stem of felled 125-year-old red pine tree. Left: training data for log diameter and knot volume versus relative altitude with fitted conditional means

m (x, h)

and

m (y, h)

. Center: predicted knot-volume surfaces from three candidate models (normal, linear, quadratic). Right: test data and predicted knot volumes from each model; bottom table summarizes sample sizes and prediction metrics (min, mean, max,

\sqrt{MSE}

). See Methods for model definitions and evaluation metrics.

Figure 5. Illustration of the proposed workflow with a sample prediction of knot volume along the clear stem of felled 125-year-old red pine tree. Left: training data for log diameter and knot volume versus relative altitude with fitted conditional means

m (x, h)

and

m (y, h)

. Center: predicted knot-volume surfaces from three candidate models (normal, linear, quadratic). Right: test data and predicted knot volumes from each model; bottom table summarizes sample sizes and prediction metrics (min, mean, max,

\sqrt{MSE}

). See Methods for model definitions and evaluation metrics.

Figure 6. Predicting knot volume of a white and red pine trees – in two different physiological sections: clear stem and living crown. Panels show knot volume (cm³) versus relative altitude for white pine (top row) and red pine (bottom row). Columns show relationships with LogDiameter (left), BranchLength (center), and BranchDiameter (right). Blue points are measured values; colored lines are fitted models: Normal (orange), Linear (green), Quadratic (blue), and mHat (purple).

Figure 7. Cumulative knot volume along the bole for red pine and white pine. Cumulative knot volume is plotted as a function of height from the stump, expressed as a proportion of total knot volume per tree. Vertical markers indicate the heights at which 25%, 50%, and 75% of total knot volume are exceeded. The clear stem–living crown boundary is shown in different colors for each species. Curves represent measured values (and model estimates where applicable). These cumulative thresholds illustrate how rapidly knot volume accumulates with height and provide a direct basis for evaluating log quality and bucking decisions along the stem.

Table 1. Summary external characteristics of the felled sample trees considered for modeling.

Measured Characteristic	White pine		Red Pine
Measured Characteristic	Mean	Range	Mean	Range
Number of subject trees	10		5
Tree age (yr)	124	111-131	100	70 - 128
Density (kg / m3)	360.19	312.99 - 624.13	380.68	325.77 - 450.24
DBH (cm)	43.2	26.5 -51.8	40	35.0 - 47.7
Total Height (m)	27.54	24.7 - 31.79	25.22	22.07 - 29.92
Stump height (cm)	31.8	10.0 - 70.7	34.3	19.0 - 44.4
Topping length at 9.1 cm diameter (m)	3.17	1.37-4.35	3.2	3.07-3.45
Height at topping (m)	24.36	20.8 - 27.99	21.99	19.0 - 26.47
Height to the last living branch (m)	10.05	3.9 - 16.16	11.62	5.2 - 16.42
Total number of dead branches	111	59 - 228	80	18 - 121
Total number of live branches	70	22 - 199	49	34 - 63
Number of whorls	181	108 - 427	129	67 - 162
Crown diameter (m)	6.11	2.93 - 8.76	5.54	2.86 - 7.6
Branch diameter (cm)	4.45	3.28 - 6.29	3.82	3.24 - 4.27
Branch length (cm)	215.4	45.52 - 336.52	211.8	200.8 - 232.08
Branch inclination (o)	110	107 - 140	120	25 - 160
Number of logs	5	4 - 7	5	4 - 6
Log diameter (cm)	29.4	10.3 - 64	24.63	9.2 - 42.2
Log length (m)	4.28	2.5 - 5.05	4.33	1.37 - 5.04

Table 2. Summary internal characteristics of the felled sampled trees considered for modeling.

			Knot - Volume (cm³)				Knot Area (cm²)
Species	TreeId	#Knots	Total	Average	Min	Max	Average	Min	Max	LogVolume	K/T*100
WhitePine	WP1	345	12606.33	45.27	0.19	277.83	15.39	0.22	54.04	1885519.03	0.67
	WP2	329	13011.17	48.35	0.08	332.31	14.87	0.14	55.84	2255631.28	0.58
	WP6	237	7878.58	33.40	0.06	568.29	12.48	0.12	63.91	1092508.77	0.72
	WP8	249	4139.20	20.84	0.01	159.05	7.11	0.05	49.77	498720.45	0.83
	WP9	254	10360.23	37.86	0.07	299.63	10.09	0.14	50.37	1491214.65	0.69
	WP11	331	9027.89	27.13	0.09	348.16	11.49	0.16	62.03	1395931.70	0.65
	WP12	435	12196.71	41.78	0.06	361.86	14.30	0.09	96.15	20280238.00	0.06
	WP5	361	17660.22	48.92	0.05	1564.05	17.27	0.06	160.58	2148683.20	0.82
	WP7	369	26603.19	72.10	0.06	676.09	20.66	0.11	100.48	1876483.12	1.42
	WP14	406	23265.39	57.30	0.00	1050.82	10.61	0.01	117.77	2325245.90	1.00
RedPine	RP1	259	13620.45	19.46	0.01	273.42	12.50	0.01	57.24	986707.53	1.38
	RP2	290	17377.75	67.07	0.05	359.33	18.77	0.06	61.33	1454018.80	1.20
	RP4	184	6504.10	40.99	0.06	163.86	14.51	0.07	41.99	915729.20	0.71
	RP5	417	17393.11	44.03	0.02	428.59	12.72	0.05	62.03	1666761.00	1.04

¹ WP stands for white pine and RP stands for red pine; All volumes are indicated in cm³.

Table 3. Comparative summary of test tree characteristics for RMC application.

Table 4. Summary sample size of the training and test (prediction) dataset used for WP11 and RP5.

		WP11		RP5
Response	Covariates	Train	Test	Train	Test
KnotVolume - ClearStem	LogDiameter	32	10	26	8
	KnotVolume	173	100	124	88
	RelAltitude	205	110	150	96
KnotVolume - Crown	BranchLength	474	85	180	56
	KnotVolume	2130	215	620	329
	RelAltitude	2604	300	782	385
KnotVolume - Crown	BranchDia	127	56	438	78
	KnotVolume	396	244	1660	141
	RelAltitude	523	300	2098	291

Table 5. Summary of predictive performance of the models in estimating knot volume (cm³) of white and red pine specimens.

Table 6. Comparative performance metrics of various models relative to m-hat in terms of MSE and mean bias.

		Clear Stem		Living Crown
		LogDiameter, h		BranchLength, h		BranchDiameter, h
	Model	Ratio	MeanBias	Ratio	MeanBias	Ratio	MeanBias
White pine - WP11	normal	1.37	28.78	1.09	17.30	1.07	-8.37
	linear	1.53	34.94	1.12	21.15	1.10	-3.87
	quad	5.04	158.95	1.20	25.22	1.05	-15.43
Red pine - RP5	normal	1.00	7.22	1.10	-14.38	1.28	30.59
	linear	1.01	-7.79	1.08	2.73	1.39	26.51
	quad	1.05	11.23	1.12	-4.33	1.10	10.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Wood Quality Assessment of Standing Stems When Measurements Do Not Line Up: A Knot Geometry Model for Spatially Misaligned Tree Data

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. Regression of Misaligned Covariates (RMC)

2.1.1. Model Presentation

2.1.2. Regression Estimation Approach

2.1.3. (Y,X,H) Follow a 3- Variate Normal

2.1.4. Generalisation of the Model

2.1.5. Performance Evaluation of the Models with Residual Analysis

2.2. Demonstration Data Set

2.2.1. Destructive Sampling and External Tree Characterization

2.2.2. Knot Characterization

X- Ray CT Scanning and Image Acquisition

Knot Extraction and Measurements with CT2Opti

2.3. Application of the Model – Estimating Knot Characteristics of White and Red Pines

2.3.1. Regression Strategy and Data Preparation

2.3.2. Model Application to Different Pine Species

2.3.3. Model Formulations and Software

3. Results and Discussion

3.1. Summary of the Demonstration Dataset

3.2. Process of RMC to the Demonstration Dataset

3.3. Application of RMC to Two Different Species

3.4. Management Implications – Turning Data into Decisions for Bucking and Log Sort

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

MDPI Initiatives

Important Links

Subscribe