Submitted:
06 March 2024
Posted:
07 March 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Current Testing Recommendations
1.2. The Null Hypothesis in Conditional Independence Testing in Causal DAGs
2. A new Bayesian Non-Parametric Bootstrap Method
2.1. The Basic Premise
2.2. Bayesian Bootstrap
2.3. The test Procedure
- Establish a Dirichlet distribution: Set the values of the Dirichlet distribution.
- Initial tuning: Tune the regression model using the original data . This tuning is done through n-fold cross-validation and a grid search over the hyperparameters to determine the best (range) of hyperparameters .
- Generate weights: Generate n sample weights from a prior Dirichlet distribution, which will be used for resampling.
- Bootstrap sample: Resample according to the sample weights, to create the bootstrap sample .
- Train test split: Split into a training set and a test set .
- Bootstrap-specific tuning: For , perform a grid search over the best set of hyperparameters determining the best bootstrap-specific set of hyperparameters , using .
- Train : Train using hyperparameters and .
- Permute: Reorder elements in x to obtain in .
- Train : With the same bootstrap-specific hyperparameters as for , train using .
- Predict: Predict using and for the test set .
- Comparison Calculate performance metrics (e.g., prediction error) of each model, , and compare these using their difference: .
3. Simulation Study
- DGF 1 – Simple Linear Fork: This function simulates data from a straightforward fork-shaped DAG, utilizing a linear model for the relationships.
- DGF 2 – Non-Linear Fork: DGF 2 also derives from a fork DAG structure but introduces non-linear functional forms, adding complexity to the data simulation.
- DGF 3 – Double Fork with Non-Linearity: DGF 3 evolves from a single fork to a double fork configuration, maintaining non-linear relationships among the variables.
- DGF 4 – Mixed Relationship Model: This function is based on a five-variable DAG. It encompasses both linear and non-linear relations, presenting a more intricate scenario.
- DGF 5 – Diverse Variable Types: Originating from the same five-variable DAG as DGF 4, DGF 5 incorporates non-linear relationships and incorporates categorical data types. It includes both continuous variables (, , ) and categorical variables (, ).
3.1. Implementation of the BB CI Test Procedure
3.2. Simulation Results
4. Industrial Case
5. Discussion
Heterogenity of Conditional Independencies
6. Conclusion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| DAG | Directed Acyclic Graph |
| CI | Conditional Independence |
| BB CI test | Bayesian Bootstrap Conditional Independence test |
| GCM | Generalized Covariance Measure |
| MSE | Mean Square Error |
| RMSE | Root Mean Square Error |
| BB | Bayesian Bootstrap |
References
- Pearl, J.; Glymour, M.; Jewell, N.P. Causal Inference in Statistics - A Primer; Wiley, 2016.
- Ankan, A.; Wortel, I.M.N.; Textor, J. Testing Graphical Causal Models Using the R Package ”dagitty”. Current Protocols 2021, 1. [Google Scholar] [CrossRef] [PubMed]
- Agresti, A. Categorical Data Analysis; John Wiley and Sons, 2002.
- McElreath, R. Statistical Rethinking A Bayesian Course with examples in R and Stan; CRC Press, 2020.
- Shah, R.D.; Peters, J. The Hardness of Conditional Independence Testing and the Generalised Covariance Measure. Annals of Statistics 2018, 48, 1514–1538. [Google Scholar] [CrossRef]
- Daudin, J.J. Partial association measures and an application to qualitative regression. Biometrika 1980, 67, 581–590. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: New York, NY, USA, 2016; KDD’16; pp. 785–794. [Google Scholar] [CrossRef]
- Fisher, R. Statistical Methods and Scientific Induction. Journal of the Royal Statistical Society. Series B (Methodological) 1955, 17, 69–78. [Google Scholar] [CrossRef]
- Gill, J. The Insignificance of Null Hypothesis Significance Testing. Political Research Quarterly 1999, 52, 647–674. [Google Scholar] [CrossRef]
- Gigerenzer, G. Mindless statistics. The Journal of Socio-Economics 2004, 33, 587–606. [Google Scholar] [CrossRef]
- Rubin, D.B. The Bayesian Bootstrap. The Annals of Statistics 1981, 9, 130–134. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing: Vienna, Austria, 2024.







| Observations | |||||||
|---|---|---|---|---|---|---|---|
| DGF | C.I. Condition | [T/F]1 | 200 | 400 | 800 | 1600 | 3200 |
| P-value | |||||||
| 1 | [T] | 93 % | 94 % | 94 % | 95 % | 95 % | |
| 1 | [F] | 0 % | 0 % | 0 % | 0 % | 0 % | |
| 2 | [T] | 93 % | 94 % | 94 % | 95 % | 95 % | |
| 2 | [F] | 0 % | 0 % | 0 % | 0 % | 0 % | |
| 3 | [T] | 92 % | 88 % | 87 % | 83 % | 80 % | |
| 3 | [F] | 49 % | 48 % | 40 % | 37 % | 34 % | |
| 4 | [T] | 95 % | 95 % | 97 % | 95 % | 94 % | |
| 4 | [F] | 0 % | 0 % | 0 % | 0 % | 0 % | |
| 4 | [T] | 88 % | 90 % | 92 % | 92 % | 94 % | |
| 4 | [F] | 3 % | 0 % | 0 % | 0 % | 0 % | |
| 5 | [T] | 93 % | 94 % | 96 % | 94 % | 95 % | |
| 5 | [F] | 0 % | 0 % | 0 % | 0 % | 0 % | |
| 5 | [T] | 96 % | 96 % | 95 % | 95 % | 95 % | |
| 5 | [F] | 29 % | 6 % | 0 % | 0 % | 0 % | |
| Observations | ||||||||
|---|---|---|---|---|---|---|---|---|
| DGF | C.I. Condition | [T/F]1 | Metric | 200 | 400 | 800 | 1600 | 3200 |
| 0 within 2.5th and 97.5th percentile | ||||||||
| 1 | [T] | RMSE | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 1 | [F] | RMSE | 0 % | 0 % | 0 % | 0 % | 0 % | |
| 2 | [T] | RMSE | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 2 | [F] | RMSE | 0 % | 0 % | 0 % | 0 % | 0 % | |
| 3 | [T] | RMSE | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 3 | [F] | RMSE | 49 % | 0 % | 0 % | 0 % | 0 % | |
| 4 | [T] | RMSE | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 4 | [F] | RMSE | 86 % | 20 % | 0 % | 0 % | 0 % | |
| 4 | [T] | RMSE | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 4 | [F] | RMSE | 1 % | 0 % | 0 % | 0 % | 0 % | |
| 5 | [T] | RMSE | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 5 | [F] | RMSE | 84 % | 79 % | 18 % | 0 % | 0 % | |
| 5 | [T] | RMSE | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 5 | [F] | RMSE | 100 % | 99 % | 30 % | 0 % | 0 % | |
| Categorical outcome | ||||||||
| 5 | [T] | Acc. | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 5 | [T] | Kappa | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 5 | [F] | Acc. | 61 % | 18 % | 3 % | 0 % | 0 % | |
| 5 | [F] | Kappa | 16 % | 0 % | 0 % | 0 % | 0 % | |
| 5 | [T] | Acc. | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 5 | [T] | Kappa | 100 % | 100 % | 100 % | 100 % | 100 % | |
| 5 | [F] | Acc. | 22 % | 1 % | 0 % | 0 % | 0 % | |
| 5 | [F] | Kappa | 24 % | 1 % | 0 % | 0 % | 0 % | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).