2. Materials and Methods
2.1. Paired Binary Data Structure and Notation
We consider an anatomic variant that may be present on the left side, the right side, on both sides, or on neither side of an individual. Each observation therefore consists of a paired binary outcome, reflecting the presence or absence of the variant on the left and right sides within the same individual.
For conceptual clarity, it is useful to distinguish three related but distinct levels of description: individual-level pairing, joint occurrence, and marginal prevalences.
At the individual level, left- and right-side observations are inherently paired, because they arise within the same body and are subject to shared developmental, genetic, and environmental influences. This pairing implies that left- and right-side occurrences cannot, in general, be treated as independent realizations.
At the population level, the paired structure can be summarized by the joint distribution of left- and right-side presence. This joint distribution specifies the proportions of individuals in whom the variant is absent bilaterally, present only on the left, present only on the right, or present on both sides. These four joint probabilities fully characterize laterality and bilateralism at the individual level.
In practice, however, most primary anatomic studies do not report this joint distribution. Instead, they typically report only marginal side-specific prevalences: the proportion of individuals in whom the variant is observed on the left side and the proportion in whom it is observed on the right side. These marginal quantities summarize how frequently the variant appears on each side considered separately, but they do not indicate whether left- and right-side occurrences arise in the same individuals.
This distinction is critical. Marginal prevalences alone do not determine how often a variant occurs bilaterally, nor how often left-only and right-only cases occur. Multiple joint distributions—corresponding to different degrees of within-individual dependence—can produce exactly the same marginal prevalences. Consequently, when joint left–right information is unreported, the individual-level structure underlying laterality and bilateralism is fundamentally unobserved.
Throughout this study, we explicitly distinguish between marginal side-specific prevalences and the unobserved joint distribution that links them. All subsequent modeling is built on this distinction. Formally, we denote by the joint probability that the variant is present () or absent () on the left side and present () or absent () on the right side. These four probabilities fully characterize the paired distribution, while the marginal prevalences reported in primary studies correspond to linear combinations of the .
2.2. Target Estimands: Laterality and Bilateralism
We focus on two co-primary endpoints that capture complementary aspects of anatomic variation: laterality and bilateralism. Although both depend on the same underlying joint distribution, they address different anatomical questions and behave differently when joint information is missing.
2.2.1. Laterality
Laterality concerns whether a variant preferentially affects one side of the body over the other at the individual level. In paired binary data, laterality is quantified using the paired odds ratio, which compares the frequency of individuals with right-only manifestation to the frequency of individuals with left-only manifestation.
Importantly, this measure depends exclusively on discordant individuals—those in whom the variant is present on one side but absent on the other. Bilateral cases do not contribute to laterality, because they exhibit no side preference within individuals. As a result, laterality reflects directional asymmetry rather than overall prevalence or symmetry.
When discordant counts are explicitly reported in primary studies, the paired odds ratio is directly identifiable without additional assumptions. When only marginal prevalences are reported, however, the number of discordant individuals cannot be determined without making an assumption about how probability mass is distributed between discordant and concordant outcomes. Laterality inference therefore becomes sensitive to assumptions about within-individual dependence.
2.2.2. Bilateralism
Bilateralism addresses a different anatomical question: how often a variant occurs symmetrically on both sides within the same individual. We quantify bilateralism using bilateral prevalence, defined as the proportion of individuals in whom the variant is present on both the left and right sides.
Bilateral prevalence is conceptually distinct from side-specific prevalence and from the prevalence of having the variant on at least one side. Whereas marginal prevalences describe how frequently a variant appears on each side separately, bilateral prevalence captures the tendency toward symmetric expression within individuals.
Because bilateral prevalence depends directly on the joint occurrence of left and right manifestations, it cannot be inferred from marginal prevalences alone when joint data are unreported. Any estimate of bilateral prevalence under such conditions necessarily relies on an assumption about the underlying left–right dependence structure.
Taken together, laterality and bilateralism provide a complete and anatomically meaningful summary of paired binary variation. Laterality captures directional asymmetry among discordant individuals, whereas bilateralism captures symmetric expression among concordant individuals. The remainder of the Methods section develops a principled framework for analyzing both endpoints when the joint distribution underlying these quantities is unobserved.
2.3. Feasible Joint Distributions
When joint left–right data are unreported, the central inferential difficulty is that the joint distribution linking left- and right-side occurrences is unknown. However, it is not unconstrained. For any given pair of marginal side-specific prevalences, only a restricted set of joint distributions is mathematically admissible.
This restriction arises because joint probabilities must be non-negative, sum to one, and reproduce the observed marginal prevalences. As a consequence, the probability that a variant occurs bilaterally cannot vary freely: it is bounded above and below by limits determined entirely by the marginals. These limits are known as the Fréchet bounds.
Conceptually, the Fréchet bounds define the most extreme joint structures that are compatible with the observed marginals. At one extreme, left and right occurrences are arranged so as to minimize bilateral co-occurrence, subject to the marginal constraints. At the other extreme, left and right occurrences are arranged to maximize bilateral co-occurrence. Any joint distribution consistent with the reported marginals must lie between these two extremes.
The lower bound corresponds to the weakest possible within-individual association compatible with the marginals, while the upper bound corresponds to the strongest possible association. Importantly, neither bound generally corresponds to independence unless the marginals are balanced in a particular way. Independence is therefore one admissible joint structure, but not a privileged one.
Because laterality and bilateral prevalence both depend on how probability mass is allocated between concordant and discordant outcomes, their values are constrained by the same feasibility limits. When only marginal prevalences are known, neither laterality nor bilateral prevalence is identifiable without an additional assumption specifying where the true joint distribution lies within the admissible range.
This observation is fundamental. It implies that the inferential problem posed by unreported joint data is not one of estimation error, but of non-identifiability. Multiple, mutually incompatible joint distributions can reproduce the same marginal prevalences while implying different values of laterality and bilateral prevalence. Any analysis based on marginal data alone must therefore make an explicit assumption about the within-individual dependence structure.
In the following section, we make these feasibility constraints explicit by expressing the Fréchet bounds as inequalities on the joint probabilities and introducing a scalar parameter that spans the entire admissible dependence range.
2.4. Feasibility-Based Dependence Parameterization
As established in
Section 2.3, when only marginal side-specific prevalences are available, the joint left–right distribution is not identifiable but is constrained to lie within a well-defined feasible region. We now formalize these constraints and introduce a scalar parameter that spans the entire admissible range of within-individual dependence.
Let
and
denote binary indicators of variant presence on the left and right sides, respectively, with joint probabilities
The marginal prevalences are given by
Because probabilities must be non-negative and sum to one, the joint probability of bilateral occurrence
cannot take arbitrary values once
and
are fixed. Instead, it is constrained by the
Fréchet bounds,
These bounds are sharp: every value of within this interval corresponds to at least one valid joint distribution consistent with the observed marginals, whereas values outside the interval are mathematically impossible. Independence corresponds to the specific interior value , which is admissible but not privileged.
The Fréchet bounds therefore define a one-dimensional feasible segment of joint distributions compatible with the observed marginals. Any assumption about within-individual dependence in the absence of joint data amounts to selecting a point along this segment.
To parameterize this selection transparently, we introduce a feasibility-based dependence index
, defined through the linear interpolation
Under this construction, corresponds to independence, while corresponds to maximal feasible bilateral concordance. Intermediate values of span the entire admissible dependence range, ensuring that all assumed joint distributions are feasible for the observed marginals.
Once
is specified through
, the remaining joint probabilities follow deterministically:
These quantities determine both laterality and bilateral prevalence. In particular, the probabilities of discordant outcomes ( and ) govern the paired odds ratio used to quantify laterality, whereas directly defines bilateral prevalence.
By construction, is a dimensionless index that depends only on feasibility and not on any assumed correlation scale. Correlation measures such as the phi coefficient arise as derived quantities once and the marginal prevalences are specified. We therefore treat as the primary modeling assumption, with all downstream estimands interpreted conditionally on this choice.
2.5. The Midway Dependence Hypothesis
The feasibility-based parameterization introduced in
Section 2.4 spans the entire admissible range of joint left–right dependence through the scalar index
. In the absence of joint data, however, a specific working assumption is required in order to compute laterality and bilateral prevalence from marginal prevalences alone. We therefore define a neutral reference point within the admissible dependence range, termed the
midway dependence hypothesis.
Formally, the midway dependence hypothesis corresponds to the choice
which places the joint probability of bilateral occurrence exactly halfway between the value implied by independence and the value implied by maximal feasible concordance. Under this assumption, the bilateral joint probability is given by
Because the joint probability varies linearly with by construction, the midway hypothesis corresponds to the arithmetic midpoint of the feasible interval for . The same linearity implies that the induced probabilities of laterality-relevant discordant outcomes, and , as well as the logarithm of the paired odds ratio, are also positioned halfway between their independence and maximal-concordance limits.
It is important to emphasize that the midway dependence hypothesis does not assert that the true within-individual dependence equals , nor does it correspond to a fixed correlation on any conventional scale. Rather, it serves as a neutral feasibility-based reference, analogous to choosing the center of a bounded parameter space when no empirical information is available to justify favoring either extreme.
This role is particularly relevant in anatomical datasets with partial or unknown pairing, where neither complete independence nor maximal bilateral concordance is anatomically or empirically defensible. By anchoring inference at the midpoint of the admissible range, the midway hypothesis provides a transparent and reproducible basis for estimation, while allowing sensitivity analyses to explore the full dependence spectrum defined by .
In subsequent sections, we examine how laterality and bilateral prevalence behave across the admissible dependence range and assess the robustness of conclusions drawn under the midway dependence hypothesis, particularly in settings involving rare variants or imbalanced marginal prevalences.
2.6. Derived Correlation Measures and Non-Invariance
Correlation measures are often invoked to summarize within-individual dependence in paired data. In the present framework, however, such measures are not treated as primary modeling quantities. Instead, they arise as derived consequences of the assumed joint distribution, conditional on the marginal prevalences and the chosen value of the feasibility index .
For paired binary outcomes, a commonly reported association measure is the
phi coefficient (
), defined as the Pearson correlation between the binary indicators
and
. In terms of the joint probabilities introduced in
Section 2.1 and
Section 2.4,
is given by
This expression makes explicit that depends jointly on the bilateral probability , the discordant probabilities and , and the marginal prevalences and .
Substituting the feasibility-based parameterization from
Section 2.4, with
expressed as a function of
, yields an induced correlation
. Because both the numerator and denominator of
depend on the marginals, the same value of
generally corresponds to different values of
across prevalence settings. In particular,
is
not invariant under changes in overall prevalence or left–right imbalance.
This non-invariance has important implications. First, it implies that no single correlation value can represent a fixed level of within-individual dependence across studies with differing marginal prevalences. Second, it explains why adopting a fixed correlation assumption—either implicitly or explicitly—can lead to incompatible or infeasible joint distributions when applied across heterogeneous datasets.
The dependence of on feasibility constraints is further illustrated by its attainable range. For fixed marginals, the maximum feasible value of is achieved when attains its Fréchet upper bound, while the minimum feasible value corresponds to the Fréchet lower bound. These extrema depend explicitly on and , reinforcing that the correlation scale itself is constrained by feasibility and is not freely specifiable.
Under the midway dependence hypothesis (), the induced correlation is obtained by evaluating at the midpoint of the admissible range for . This quantity provides a convenient reference value for interpretation, but it should not be construed as a universal or biologically meaningful correlation. Rather, it represents the correlation implied by a neutral feasibility-based assumption, conditional on the observed marginals.
For completeness, we report both exact and approximate expressions for the maximum feasible correlation and for in later sections, including analytic simplifications relevant for rare variants. These results are used to interpret simulation outputs and to contextualize the sensitivity of laterality inference to dependence assumptions, but they do not alter the primary role of as the fundamental modeling parameter.
2.7. Exact Feasible Range of the Phi Correlation
For fixed marginal prevalences
and
, the phi coefficient
defined in
Section 2.6 cannot vary freely. Instead, its admissible range is constrained by the Fréchet bounds on the joint probability
. In this section, we make these constraints explicit and derive the exact feasible limits of
implied by the marginal prevalences.
Recall that the joint probability of bilateral occurrence satisfies
Because is a monotone increasing function of for fixed marginals, the minimum and maximum attainable values of occur at the lower and upper Fréchet bounds, respectively.
Substituting the Fréchet upper bound into the definition of yields the maximum feasible phi correlation, denoted . Analogously, substituting the Fréchet lower bound yields the minimum feasible phi correlation, denoted . These quantities define the full admissible correlation range for paired binary data with the given marginals.
Importantly, both and depend explicitly on the marginal prevalences. Even in the absence of left–right imbalance (), the maximum attainable correlation is generally less than one unless prevalence is extremely low or extremely high. When marginals are imbalanced, the attainable correlation range may be substantially narrower.
Under the feasibility-based parameterization introduced in
Section 2.4, the induced correlation corresponding to a given value of
is obtained by evaluating
at
. In particular, under the midway dependence hypothesis (
), the induced correlation
lies strictly between
and
, with its exact value determined jointly by
and
.
For later reference, we provide exact closed-form expressions for and as functions of the marginal prevalences. These expressions are used to interpret the magnitude of correlation implied by feasibility-based assumptions and to assess how fixed values of translate into different correlation scales across studies. Numerical values for representative prevalence scenarios are reported in Supplementary Material.
This exact characterization of the feasible correlation range highlights a central implication of the feasibility framework: correlation is not an intrinsic or invariant descriptor of within-individual dependence in paired binary data, but a marginal-dependent quantity whose attainable values are constrained by probability theory. As a result, comparisons or assumptions based solely on correlation coefficients must be interpreted in light of the underlying marginal prevalences.
2.8. Behavior of the Midway Dependence Hypothesis Under Rare and Imbalanced Marginals
The interpretability of any dependence assumption in paired binary data depends critically on the marginal prevalences. This is particularly relevant in anatomic studies, where many variants are rare and left–right prevalences may differ substantially. In this section, we examine how the midway dependence hypothesis () behaves under such conditions and clarify its implications for derived correlation measures.
When joint left–right information is unavailable, the feasibility-based framework parameterizes the unidentified joint probability along the admissible segment between independence and maximal feasible concordance. The midway dependence hypothesis places at the midpoint of this segment, regardless of the absolute magnitude or balance of the marginals. As a result, the implied joint structure adapts automatically to the prevalence setting, even though the reference position within the feasible range remains fixed.
In the regime of rare variants, where both marginal prevalences are small, the Fréchet upper bound on is approximately equal to the smaller of the two marginals. Under these conditions, the maximum feasible phi correlation approaches one, while the correlation implied by independence approaches zero. The midpoint assumption therefore induces a correlation that lies strictly between these extremes, but its numerical value depends on the marginal prevalences and their balance.
Marginal imbalance further modifies this relationship. When left- and right-side prevalences differ, the admissible range of —and consequently of —narrows. Under the midway hypothesis, the induced correlation reflects this narrowing automatically, without requiring any adjustment to . Thus, although is fixed at 0.5, the implied correlation scale contracts or expands depending on the marginals.
These properties highlight an important distinction between the feasibility-based dependence index and conventional correlation measures. A fixed value of does not correspond to a fixed value of , particularly in settings involving rare or imbalanced marginals. Instead, specifies a relative position within the admissible dependence range, while reflects the absolute magnitude of association implied by that position under the given marginal constraints.
To facilitate interpretation, we evaluate both exact and approximate expressions for the maximum feasible correlation and for the correlation induced by the midway dependence hypothesis across representative scenarios involving rare variants and marginal imbalance. These results are used to contextualize the magnitude of observed in simulations and empirical examples, and to demonstrate that the midway hypothesis remains a neutral and internally consistent reference even in extreme prevalence regimes.
2.9. Dependence Parameterization Under Rare Variants
To further clarify the relationship between the feasibility-based dependence index and conventional correlation measures, we consider the limiting behavior of the phi coefficient under rare variants. This regime is common in anatomic prevalence studies and provides useful analytic insight into how feasibility constraints translate into correlation scales.
We focus on settings in which both marginal prevalences are small,
, while allowing for possible left–right imbalance. In this regime, the Fréchet upper bound on the joint probability of bilateral occurrence satisfies
up to negligible corrections of order
. Under independence, the joint probability satisfies
, which is of smaller order.
Substituting these expressions into the definition of the phi coefficient yields simple approximations. At maximal feasible concordance, the maximum attainable correlation satisfies
highlighting the strong dependence of the feasible correlation range on marginal imbalance. In particular, even when variants are rare, substantial left–right imbalance can sharply reduce the maximum attainable correlation.
Under the midway dependence hypothesis (
), the joint probability is approximated by
and the induced correlation satisfies
Thus, in the rare-variant limit, the midway hypothesis corresponds approximately to halving the maximum feasible correlation, regardless of the absolute prevalence level. This relationship provides a simple and intuitive interpretation of in settings where exact expressions are cumbersome.
These approximations are not used for estimation or inference. Rather, they serve to illustrate how a fixed feasibility-based assumption translates into different correlation magnitudes depending on the marginal prevalences and their balance. Exact expressions, which account for finite-prevalence corrections, are used in all numerical evaluations and are reported in Supplementary Material.
By making explicit the rare-variant behavior of and , this section reinforces a central message of the feasibility framework: correlation is a marginal-dependent quantity whose scale and interpretation cannot be divorced from prevalence. The dependence index , by contrast, remains invariant across prevalence regimes and therefore provides a more stable and transparent basis for modeling unreported joint dependence.
2.10. Simulation Study Design
Simulation studies were conducted to evaluate the behavior of laterality and bilateral prevalence estimands when joint left–right information is unreported and dependence must be reconstructed under assumed values of the feasibility-based index . The simulation framework mirrors the analytical structure developed in the preceding sections and allows direct comparison between gold-standard inference based on fully observed joint data and reconstructed inference based on marginal data alone.
For each simulated study, marginal prevalences for left- and right-side occurrence were specified in advance. These marginals were chosen to represent a range of anatomically realistic scenarios, including rare variants, balanced and imbalanced prevalences, and varying degrees of overall occurrence. Given the specified marginals, a joint distribution was constructed by selecting a value of
and computing the corresponding joint probability
using the feasibility-based parameterization described in
Section 2.4. The remaining joint probabilities were then determined uniquely by the marginals.
Individual-level paired binary outcomes were generated by sampling from the resulting multinomial distribution over the four joint outcome categories. From these simulated data, gold-standard values of the paired odds ratio and bilateral prevalence were computed directly using the true joint distribution.
To mimic the reporting limitations of primary anatomic studies, joint information was then discarded, retaining only marginal side-specific prevalences. Using these marginals, laterality and bilateral prevalence were reconstructed under assumed dependence scenarios, including the midway dependence hypothesis () and alternative values spanning the admissible dependence range.
Simulated studies were aggregated using standard meta-analytic procedures. Laterality was meta-analyzed on the log paired odds ratio scale using inverse-variance weighting, while bilateral prevalence was meta-analyzed on the logit scale. Random-effects models were employed throughout to reflect between-study heterogeneity.
Performance was evaluated by comparing reconstructed estimates with gold-standard values across simulation replicates. Metrics included bias, root mean squared error, confidence interval coverage, and measures of inferential instability, particularly for laterality under strong dependence and rare-variant conditions. These summaries quantify both the accuracy and robustness of inference under different dependence assumptions and prevalence regimes.
By explicitly separating data generation, information removal, reconstruction under assumed dependence, and meta-analytic aggregation, the simulation design provides a controlled setting in which to assess the practical consequences of feasibility-based dependence assumptions and to interpret the sensitivity analyses presented in the Results.
2.11. Propagation of Uncertainty in the Dependence Assumption and Unequal Marginals
The analyses described thus far treat the feasibility-based dependence index as a fixed working assumption. In practice, however, uncertainty about within-individual dependence may itself be substantial, particularly when joint left–right information is entirely unreported. To assess the robustness of laterality inference to such uncertainty, we extended the deterministic framework by allowing to vary stochastically rather than remaining fixed.
Uncertainty in the dependence assumption was modeled on the logit scale. Specifically, we assumed that follows a normal distribution with specified mean and variance, and mapped realizations back to the unit interval via the inverse logit transformation. This construction ensures that all sampled values of remain within the admissible range , while allowing flexible control over the degree of uncertainty around a chosen reference value, such as the midway hypothesis.
For each scenario, we evaluated the impact of uncertainty in on the precision of laterality inference by computing the expected standard error of the log paired odds ratio as a function of the induced variance in . This approach parallels classical analyses of uncertainty propagation in paired continuous outcomes, but is adapted here to the feasibility-based dependence framework for paired binary data.
In addition to uncertainty in dependence, we examined the effect of unequal marginal prevalences on the behavior and interpretability of the midway dependence hypothesis. Marginal asymmetry was summarized using the prevalence ratio . Across a range of imbalance scenarios, we evaluated (i) the correlation implied by the midway dependence assumption as a function of marginal imbalance, and (ii) the relative precision of laterality estimates under the midway hypothesis compared with independence, expressed as the ratio of standard errors.
These analyses characterize how both uncertainty in the dependence assumption and asymmetry in marginal prevalences influence the stability and precision of laterality inference. They also clarify the conditions under which the midway dependence hypothesis provides a practically useful reference point, and those under which laterality estimates become inherently unstable regardless of the assumed dependence structure.
2.11. Computational Implementation and Software
All analytical derivations, simulations, and graphical displays were implemented using the R statistical environment (version 4.2.2; R Foundation for Statistical Computing, Vienna, Austria). Data manipulation and aggregation were performed using the dplyr and tidyr packages, and all figures were produced using ggplot2. Multinomial sampling for paired binary outcomes was carried out using base R functions.
Simulation studies were implemented to evaluate the behavior of laterality and bilateral prevalence estimands across the full admissible range of within-individual dependence. Values of the feasibility-based dependence index
were varied on a fine grid to ensure smooth and interpretable summaries, and simulation settings were chosen to reflect realistic anatomic prevalence scenarios. All simulation code was written to ensure reproducibility and consistency with the analytical framework described in
Section 2.1,
Section 2.2,
Section 2.3,
Section 2.4,
Section 2.5,
Section 2.6,
Section 2.7,
Section 2.8,
Section 2.9,
Section 2.10 and
Section 2.11.
As a supportive aid during manuscript development, ChatGPT (version 5.2; OpenAI, San Francisco, CA, USA) was used for assistance in code structuring, language refinement, and consistency checking of analytical descriptions. All methodological choices, mathematical formulations, simulation designs, and interpretations were conceived, verified, and approved by the authors. ChatGPT was not used to generate data, perform statistical analyses, or determine scientific conclusions.