1. Introduction
In animal breeding, standard multiple-trait mixed models (SMTM) are often applied to analyze multiple traits [
1], estimating unstructured covariance matrices. The number of (co)variance parameters involved grows quadratically with the number of traits, and when phenotypes are highly correlated, the residual and genetic covariance matrices can be singular, making the convergence of estimation algorithms complex [
1,
2] and inferences unreliable.
Structural equation models (SEM) are often used to obtain more parsimonious (and sometimes more interpretable) parametrization of multivariate models. This approach has been used in quantitative genetics to structure the genetic and environmental covariance matrices of multivariate linear mixed models [
2,
3,
4,
5,
6], commonly used for genetic evaluations. Primarily due to its ability to consider causal relationships between latent variables, allowing for the modeling of complex phenomena while simultaneously reducing the dimensionality of the data [
7,
8,
9]. SEM methodology comprises many different techniques and procedures used together to model covariance structures. These models can be viewed as an extension of the multivariate linear mixed model [
10] that can express functional networks among traits [
2]. Gianola and Sorensen [
3] discussed the use of recursive (REC) and simultaneous equation models (both special cases of SEM) acting on phenotypes. Valente et al. [
11] proposed a methodology to search for recursive causal structures among phenotypes within a multivariate linear mixed model, which has been applied to various species and traits, including carcass and meat quality traits in beef cattle [
12]. Some authors discussed the use of REC models operating on genetic covariance matrices [
6,
8,
13]. In certain cases, modeling the covariance matrices helps infer or analyze causality between the breeding values of different traits or examine the ratio of two correlated traits, thereby reducing the model’s dimension with more parsimonious and efficient parameterizations [
8,
14].
Alternatively, factor analysis (FA), another specific case of SEM, can be used to reduce the dimension of the estimated genetic covariance matrix [
15,
16] and achieve a more parsimonious model of genetic effects within a multivariate linear mixed model, without decreasing the original number of traits and records. In the FA framework, the observed traits are modeled as linear combinations of fewer unobservable latent factors and trait-specific residual components, effectively capturing the common sources of variation underlying correlated traits while preserving individual trait identities. This approach has been successfully applied in quantitative genetics to structure genetic and environmental covariance matrices [
16,
17], proving particularly useful when analyzing highly correlated traits where traditional unstructured covariance matrices may suffer from convergence issues or poor estimability. The reduction in the number of parameters not only improves computational efficiency but can also lead to more stable parameter estimates and better model convergence, especially in datasets with limited sample sizes relative to the number of traits being analyzed [
7]. Moreover, FA models can provide biological insights by identifying latent factors that may represent underlying biological processes or developmental pathways common to multiple phenotypes.
While previous studies have demonstrated the utility of SEM in various contexts, their application to structuring both genetic and residual covariance matrices independently in beef cattle breeding remains underexplored. In the current omics era, breeding programs are increasingly collecting larger numbers of correlated phenotypes, making computationally efficient, statistically robust methods for multi-trait genetic evaluation even more critical. These more parsimonious models offer practical advantages for routine genetic evaluations in breeding programs, enabling the simultaneous analysis of multiple traits across large populations with improved computational efficiency and numerical stability compared to fully parameterized models. Therefore, the objective of this study was to evaluate the performance of factor analysis and recursive models applied independently to additive genetic and residual covariance matrices for growth and ultrasound carcass traits in Nellore cattle, comparing them with standard multiple-trait mixed models in terms of parameter estimation, breeding value prediction, and model fit criteria.
2. Materials and Methods
2.1. Dataset Description
The present study used data from 2,942 Nellore animals collected over 3 years from an existing database of 10 herds participating in the Brazilian Nellore Breeding Program across 6 States. On-farm data collection was conducted according to routine management procedures and in accordance with the [
18] guidelines. To collect carcass data without slaughtering the animals, the ultrasound methodology was applied [
19]. Real-time ultrasound measurements of carcass traits included the longissimus muscle area (LMA, cm
2) and backfat thickness (BF, mm), both obtained from a cross-sectional image of the longissimus dorsi muscle between the 12th and 13th ribs. Rump fat thickness (RF, mm) was measured at the intersection of the gluteus medius and biceps femoris muscles, located between the hooks and pin bones. Backfat thickness was measured at the 3/4 position from the chine bone end of the longissimus muscle, using the cross-sectional ribeye image. Other traits recorded included body weight (BW, kg) and hip height (HH, cm), both measured at the date of ultrasound scanning, and 450-day age-standardized scrotal circumference (SC, mm). Age-standardized SC was obtained as described in the Guidelines for Uniform Beef Improvement Programs [
20]. The traits were assessed in animals aged between 480 and 629 days.
Contemporary groups (CG) were defined as animals of the same sex, born in the same herd, year, and season (dry or rainy), and reared within the same management group. A total of 302 contemporary groups were formed, with an average of 9.7 animals per group (range: 3 to 45 animals). Additional effects considered in the analyses included age of animal at scanning (linear covariate for all traits except SC) and age of dam at calving (linear and quadratic covariates for BF, RF, HH, and BW; classes for LMA and SC). A description of the traits and data structure for Brazilian Nellore cattle is presented in
Table 1.
2.2. Standard Multi-Trait Mixed Model
Genetic parameters and model solutions were estimated using Bayesian inference implemented via Gibbs sampling. In the following sections, we first describe the standard multi-trait mixed model commonly used in genetic evaluation, then present extensions based on factor analytic (FA) and recursive (REC) model structures.
In the standard multi-trait mixed model (SMTM), a sire model was fitted for
p traits measured on
n individuals. The mixed-model equation can be described as:
where
is the vector of phenotypic observations,
is the vector of fixed effects,
is the vector of random additive sire effects,
is the vector of residual effects, and
and
are the corresponding incidence matrices. Under the Bayesian framework, the likelihood function was specified as:
with prior distributions
and
, where
is the
additive genetic (co)variance matrix among sire effects,
is the
residual (co)variance matrix,
is Wright’s numerator relationship matrix among sires,
is an identity matrix of order
n, and ⊗ denotes the Kronecker product operator.
Prior distributions for model parameters were specified as follows. Systematic effects included contemporary groups, animal age at scanning (linear effect, except for SC), and dam age (linear and quadratic effects for BF, RF, HH, and BW). For the vector of systematic effects
with vague information [
21], prior distributions were assigned as:
For the (co)variance matrices, scaled inverse Wishart distributions were assigned as prior distributions:
where
and
are
scale matrices and
and
are degrees of freedom parameters. The joint posterior distribution of all model parameters is proportional to the product of the likelihood and prior distributions:
2.3. Factor Analytic Model
Factor analysis is a multivariate statistical technique that simplifies complex and correlated data structures [
22,
23]. It can be viewed as an extension of principal component analysis, where the factor-analytic variance-covariance structure serves as a parsimonious approximation to a fully unstructured variance-covariance matrix, enabling more efficient parameterization of complex correlation patterns. In a standard FA, a vector of random variables (say
) can be decomposed as a linear combination of fewer unobservable random variables called common factors (
), with an unobservable incidence matrix (
) of factor loadings, plus a vector of trait-specific errors (
) to each individual
i that represent the lack of fit of the model. In compact notation:
In vector notation, the factor analytic model for these effects can be written as:
where
,
, and
. This results in the following covariance matrix of
:
where
is a diagonal matrix of specific variances. Likewise, the residual vector (
) can be decomposed into common and trait-specific factors. Consequently, the marginal distributions of
and
are:
where “iid” stands for “independent and identically distributed”.
To model the additive genetic and residual (co)variance matrices using FA, it was assumed that Equation 2 holds for the vector of random additive genetic effects (
) in Equation 1 and, likewise, for the vector of random residual effects (
) such that
where
,
,
and
are as before and
and
are interpreted as vectors of common and specific additive genetic effects, respectively, similarly for the random residual effects (
). Combining the assumptions of the FA model described above with those of SMTM leads to the following random effects joint distribution,
where
and
are the matrices of additive genetic and residual factor loads, respectively,
and
are the diagonal matrices of specific additive genetic and residual variances, respectively. More details on the implementation of this model can be found in [
17].
2.4. Recursive Models
Recursive models are a category of SEMs that postulate causal and unidirectional relationships between latent variables, unlike simultaneous models, which admit mutual feedback effects between traits. For more information on the procedure for REC effects decomposition, see [
17,
21]. Using REC, the vector
can be decomposed as
where is the random effect of the ith sire, is a strictly lower-triangular matrix; i.e., for all , whose non-zero entries define recursive effects, and is a vector of random effects whose (co)variance matrix is: where is a diagonal matrix. The reduced form model is (Equation 4).
Similarly to the FA model, to implement the REC to model the additive genetic and the residual (co)variance matrices, it was assumed that Equation 4 holds the vector of random additive genetic effects () in Equation 1, likewise for the vector of random residual effects (). Therefore, the additive genetic covariance matrix is equal to , as . Similarly, using REC for residuals, can be represented as , where is a strictly lower-triangular matrix defining recursive effects between model residuals, and is diagonal. A fully recursive model (FRM) occurs when all lower-triangular entries of and of are non-zero, i.e., all and with are parameters to be estimated. This model has as many (co)dispersion parameters as the SMTM, and there is a one to one map from the unknowns in to those in . Therefore, the FRM is just a re-parameterization of SMTM; however, various degrees of parsimony can be obtained by setting some of the lower-triangular entries of and to zero. Models in this study differed in how and were structured.
2.5. Statistical Analyses
Bayesian inference and MCMC implementation: All models were implemented in a Bayesian framework using Gibbs sampling via the MTM package [
24] in R software [
25]. An initial chain of 1,500,000 iterations was run to assess convergence. The final chain length, burn-in period, and thinning interval were determined using the criteria of [
26], implemented in the Bayesian Output Analysis (BOA) package [
27]. Convergence was further evaluated through visual inspection of trace plots for variance components. Based on these diagnostics, posterior inference was based on 146,000 samples obtained after discarding the first 40,000 iterations as burn-in and retaining every 10th iteration thereafter.
Model comparison: Multiple alternative models were fitted to assess the effectiveness of parsimonious covariance structures. The standard multi-trait mixed model (SMTM) with unstructured (co)variance matrices served as the baseline, requiring estimation of 42 (co)variance parameters (21 for and 21 for ).
Factor analytic models: Four FA models were evaluated. The FA2F model specified two common factors for both additive genetic and residual (co)variance matrices, reducing the total number of parameters to 24 (12 genetic + 12 residual). The FA3F model extended this to three factors, increasing the parameter count to 36 (18 genetic + 18 residual). To evaluate the benefit of parsimony in only one (co)variance component, two additional models were fitted: FA2G applied two factors to while maintaining the unstructured (33 parameters total), whereas FA2R applied two factors to while maintaining the unstructured (33 parameters total).
Recursive models: Three recursive models were evaluated. First, a fully recursive model (FRM) was fitted, which is equivalent to SMTM in terms of number of parameters but reparameterizes the covariance structures using recursive paths. Based on FRM results, two reduced models were constructed. The REC1 model set the th entry of to zero if or , where and denote the posterior mean and standard deviation of the th entry of from FRM, respectively. This criterion resulted in six recursive effects being removed from . The REC2 model removed any effect in or whose posterior mean from FRM had absolute value less than 0.15 (an arbitrary threshold chosen to evaluate a more parsimonious structure), resulting in removal of 15 recursive effects (10 from and 5 from ).
Model selection criteria: Models were compared using the deviance information criterion [DIC; [
28,
29], the posterior mean of the log-likelihood [Mean(
)], and the effective number of parameters [
; [
28,
29]. Lower DIC values indicate better model fit penalized for complexity, while
quantifies model complexity. Additionally, Spearman and Pearson correlations between predicted sire breeding values across models were calculated to assess ranking stability.
4. Discussion
This study aimed to evaluate the potential advantages of using structured equation models, such as recursive and factor-analytic models, for sire modeling in beef cattle genetic evaluations of ultrasound carcass and growth traits.
Given the magnitude of standard errors, differences between heritability estimates obtained by SMTM and the two best models (REC1 and FA2G) were only observed for BF and RF traits in the FA2G model, which yielded slightly lower estimates, except for SC in the REC1 model, but with the same pattern and within one standard deviation. The correlations (Spearman and Pearson) between estimated breeding values for each trait, across models were high and close to the unit, showing that selections decision would be basically the same using any of the methodologies considered here, despite the lower genetic variability (heritability estimates) of some traits in some models, because when estimating genetic breeding values the variances are generally fixed.
The FA2G model estimated genetic correlations with the same pattern as the SMTM model, but with a lower magnitude. It is suggested that only when genetic correlations are of lower magnitude can FA models properly estimate the true covariance structure; therefore, FA models are not a good alternative for estimating high correlations, as discussed by [
30]. On the other hand, most of the genetic correlations estimated with the REC1 model were of slightly larger magnitude than those estimated by the SMTM, but yet very similar, indicating a good alternative for modeling additive genetic covariance matrices in beef cattle.
As expected, REC1 and FA2G modeled residual covariance matrices of beef cattle data very similarly to the SMTM model, except for the association between SC and BF in the REC1 model. Nevertheless, the residual correlation in the REC1 model was basically null . Considering the reasonable agreement between covariance parameters and breeding values estimated by the REC1 and FA2G models with the traditional model (SMTM), it could be suggested that these structural equations can be successfully used to reduce dimensionality of genetic evaluations, optimizing processing by a single analysis including all traits of interest and, consequently, providing faster and reliable estimation of breeding values compared to the traditional univariate analyses for each selection criteria. Thus, beef cattle breeding programs could run multiple-trait mixed-model analyses of large datasets nearly every day, such that as soon as the farmers send in new field data, they can get prompt results for selection decisions on their herds.
Recursive models act separately on and , allowing specific patterns to be captured in each component (Table 3). The different patterns of genetic and residual correlations can be interpreted using estimates of recursive effects, and based on this information, irrelevant effects can be suppressed from the covariance matrices. The six recursive effects zeroed out in for the REC1 model were between RF and LMA, SC and LMA, BW and RF, HH and RF, SC and RF, and SC and HH. Therefore, the REC1 model saves these six recursive effects, which were virtually null, out of the total 21 parameters to be estimated, suggesting that this type of structural equation model can facilitate the estimation of breeding values for multiple traits in a single analysis.
In terms of the
criterion, all recursive models (FRM, REC1, and REC2) had smaller
values than the SMTM model; even for FRM, which has the same dimension as the SMTM, some benefits were observed when using this kind of structural equation model. For the FA models,
favored the FA2G (two factors) model over all other FA and REC models. However, when we tested a model with three factors (FA3F), the
was the largest. This was expected due to the number of parameters (
p) of
in the FA, which can be calculated as
parameters, where
are the dimensions of the
matrix of factor loads (Equation 2). FA3F has 21
p in each matrix (additive genetic and residual), while FA2F has 17
p. In the SMTM, the
p of
can be calculated as
parameters, where
are the dimensions of the
matrix, thus yielding 21
p in each matrix (additive genetic and residual) in the SMTM model. Depending on the data set, i.e., the number of records, a greater number of traits, and the most correlated traits, the FA models may be used as a special case of SEM to reduce the covariance matrices’ rank in the model and provide a more parsimonious estimation of genetic parameters compared to SMTM. In this paper, the FA3F model had a poorer fit to the data than the other models because this dataset has only six traits. Nevertheless, if the dataset had more traits, a model with more factors would probably work better at reducing the
. Generally, when more factors are included in the model, it better explains the relationships between traits; however, with fewer traits, fewer factors can be precisely estimated. Because we have only six traits and the number of records of SC used is limited, just because it is collected only in males (
), this could at least partially explain the worst performance in terms of DIC,
, and Mean(L) of the FA3F model. As discussed by [
31] and [
30], the FA model usually reduces the rank of covariance matrices. Still, the
matrix has to be close to zero
, when specific effects are assumed absent. This results in a mixed-model formulation with covariance matrices that are not of full rank.
In the same sense, the Mean(L) and DIC demonstrated considerably better fits to the data for the models SMTM and REC1 than for all the other FA and REC models. Since DIC also accounts for dimensionality through a penalty (), the REC1 model might be an alternative to the SMTM in beef cattle genetic evaluation schemes that involve multiple-trait estimates.
In general, even though FA2G was effective in terms of , comparing all the criteria and parameters, the REC1 model showed a good alternative for this kind of dataset (e.g., six traits) because it had a smaller compared to the SMTM model, better DIC, and a reasonable Mean(L) compared with all other models tested. This model, when applied to the residual effect, can be used to obtain more parsimonious representations of the residual covariance matrix. This reduces the number of parameters to be estimated, and may yield useful interpretations of the underlying patterns of (co)variability. Exploration of genetic and residual relationships between traits can be detected using the REC1 model, helping to understand how selecting certain criteria can affect another trait. Moreover, unlike recursive models acting on phenotypes, these models do not impose cross-restrictions between the genetic and residual covariance matrices. Results from this study indicate that beef cattle growth and ultrasound carcass traits can be successfully analyzed using recursive models to estimate breeding values and identify a possible causal effect between latent variables, allowing for the modeling of complex phenomena while simultaneously reducing the dimensionality of the data.
Table 1.
Descriptive statistics of ultrasound carcass and growth traits in Nellore cattle
Table 1.
Descriptive statistics of ultrasound carcass and growth traits in Nellore cattle
| Traits |
No. of records |
Mean ± SD |
No. of sires |
No. of dams |
No. CG |
| LMA |
2,770 |
48.05 ± 8.36 |
231 |
2,552 |
243 |
| BF |
2,577 |
1.87 ± 1.07 |
226 |
2,397 |
253 |
| RF |
2,566 |
2.95 ± 1.94 |
226 |
2,384 |
252 |
| SC |
1,340 |
245.87 ± 30.22 |
106 |
1,009 |
88 |
| HH |
2,349 |
136.06 ± 5.04 |
226 |
2,308 |
250 |
| BW |
2,942 |
339.69 ± 65.98 |
236 |
2,683 |
302 |
Table 2.
Estimates of heritability (diagonal), genetic (above diagonal) and residual (below diagonal) correlations, with standard errors, from the standard multi-trait mixed model with sire model (SMTM)
Table 2.
Estimates of heritability (diagonal), genetic (above diagonal) and residual (below diagonal) correlations, with standard errors, from the standard multi-trait mixed model with sire model (SMTM)
| Traits |
LMA |
BF |
RF |
BW |
HH |
SC |
| LMA |
|
|
|
|
|
|
| BF |
|
|
|
|
|
|
| RF |
|
|
|
|
|
|
| BW |
|
|
|
|
|
|
| HH |
|
|
|
|
|
|
| SC |
|
|
|
|
|
|
Table 3.
Estimates of heritability (diagonal), genetic (above diagonal) and residual (below diagonal) correlations, with standard errors, from model FA2G (model with two factors only for the matrix of additive genetic loading factors and in the residual matrix it was considered as in the standard multi-trait mixed models)
Table 3.
Estimates of heritability (diagonal), genetic (above diagonal) and residual (below diagonal) correlations, with standard errors, from model FA2G (model with two factors only for the matrix of additive genetic loading factors and in the residual matrix it was considered as in the standard multi-trait mixed models)
| Traits |
LMA |
BF |
RF |
BW |
HH |
SC |
| LMA |
|
|
|
|
|
|
| BF |
|
|
|
|
|
|
| RF |
|
|
|
|
|
|
| BW |
|
|
|
|
|
|
| HH |
|
|
|
|
|
|
| SC |
|
|
|
|
|
|
Table 4.
Estimates of heritability (diagonal), genetic (above diagonal) and residual (below diagonal) correlations, with standard errors, from model REC1 (six recursive effects zeroed-out at residual covariance matrix)
Table 4.
Estimates of heritability (diagonal), genetic (above diagonal) and residual (below diagonal) correlations, with standard errors, from model REC1 (six recursive effects zeroed-out at residual covariance matrix)
| Traits |
LMA |
BF |
RF |
BW |
HH |
SC |
| LMA |
|
|
|
|
|
|
| BF |
|
|
|
|
|
|
| RF |
|
|
|
|
|
|
| BW |
|
|
|
|
|
|
| HH |
|
|
|
|
|
|
| SC |
|
|
|
|
|
|
Table 5.
Estimates of genetic (above diagonal) and residual (below diagonal) recursive effects, with standard errors, from the fully-recursive model (FRM) in Nellore cattle
Table 5.
Estimates of genetic (above diagonal) and residual (below diagonal) recursive effects, with standard errors, from the fully-recursive model (FRM) in Nellore cattle
| Traits |
LMA |
BF |
RF |
BW |
HH |
SC |
| LMA |
— |
|
|
|
|
|
| BF |
|
— |
|
|
|
|
| RF |
|
|
— |
|
|
|
| BW |
|
|
|
— |
|
|
| HH |
|
|
|
|
— |
|
| SC |
|
|
|
|
|
— |
Table 6.
Model comparison criteria to assess performance of different Bayesian models
Table 6.
Model comparison criteria to assess performance of different Bayesian models
| Models |
DIC |
|
Mean(L) |
| SMTM |
37,874.7 |
456.5 |
|
| FA2F |
38,036.8 |
455.2 |
|
| FA3F |
38,915.2 |
564.1 |
|
| FA2G |
37,898.3 |
|
|
| FA2R |
38,043.5 |
499.7 |
|
| FRM |
37,871.8 |
447.5 |
|
| REC1 |
|
443.6 |
|
| REC2 |
37,937.4 |
440.2 |
|