Submitted:
28 May 2024
Posted:
29 May 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Introduced NodeFlow, to the best of our knowledge, the first framework to apply an end-to-end, tree-structured deep learning model for probabilistic regression on tabular data;
- Demonstrated superior performance in multivariate probabilistic regression and competitive results in univariate tasks on benchmark datasets, establishing NodeFlow’s effectiveness;
- Conducted a focused ablation study, hyperparameter sensitivity analysis, and computational efficiency assessment, validating NodeFlow’s design and scalability.
2. Literature Review
2.1. Tree-based Regression on Tabular Data
2.2. Tree-based Probabilistic Regression on Tabular Data
2.3. Deep Learning Regression on Tabular Data
2.4. Deep Learning Probabilistic Regression on Tabular Data
3. NodeFlow
3.1. Extracting Hierarchical Representation with NODE
3.2. Probabilistic Modeling with CNF
3.3. Training NodeFlow
4. Experiments
4.1. Methodology
4.2. Probabilistic Regression Framework
4.3. Point Prediction Regression Setup
4.4. Summary
5. Ablation Studies
5.1. Feature Representation Component
5.2. Probabilistic Modeling Component
6. Time Complexity Analysis
7. Conclusions
Author Contributions
Funding
Appendix A. Appendix Datasets
| Dataset | N | CV Splits / | D | P |
|---|---|---|---|---|
| Concrete | 1030 | 20 CV | 8 | 1 |
| Energy | 768 | 20 CV | 8 | 1 |
| Kin8nm | 8192 | 20 CV | 8 | 1 |
| Naval | 11934 | 20 CV | 16 | 1 |
| Power | 9568 | 20 CV | 4 | 1 |
| Protein | 45730 | 5 CV | 9 | 1 |
| Wine | 1588 | 20 CV | 11 | 1 |
| Yacht | 308 | 20 CV | 6 | 1 |
| Year MSD | 515345 | 1 CV | 90 | 1 |
| Parkinsons | 4,112 | 1,763 | 16 | 2 |
| scm20d | 7,173 | 1,793 | 61 | 16 |
| WindTurbine | 4,000 | 1,000 | 8 | 6 |
| Energy | 57,598 | 14,400 | 32 | 17 |
| usFlight | 500,000 | 200,000 | 8 | 2 |
| Oceanographic | 373,227 | 41,470 | 9 | 2 |
Appendix B. Appendix Implementation Details

| Dataset | NUM LAYERS | DEPTH | TREE OUTPUTPUT DIM | NUM TREES | FLOW HIDDEN DIMS | N EPOCHS | # OF ITERATIONS |
|---|---|---|---|---|---|---|---|
| CONCRETE | 1-8 | 1-7 | 1-3 | 100-600 | [4,4],[8,8],[16,16],[32,32] | 400 | 400 |
| ENERGY | 1-8 | 1-6 | 1-3 | 100-600 | [4,4],[8,8],[16,16],[32,32] | 400 | 300 |
| KIN8NM | 1-8 | 1-6 | 1-3 | 100-600 | [4,4],[8,8],[16,16],[32,32] | 100 | 100 |
| NAVAL | 1-8 | 1-6 | 1-3 | 100-600 | [4,4],[8,8],[16,16],[32,32] | 300 | 100 |
| POWER | 1-8 | 1-6 | 1-3 | 100-600 | [4,4],[8,8],[16,16],[32,32] | 200 | 100 |
| PROTEIN | 1-8 | 1-6 | 1-3 | 100-600 | [4,4],[8,8],[16,16],[32,32] | 100 | 100 |
| WINE | 1-8 | 1-6 | 1-3 | 100-600 | [4,4],[8,8],[16,16],[32,32] | 400 | 500 |
| YACHT | 1-8 | 1-6 | 1-3 | 100-500 | [4,4],[8,8],[16,16],[32,32] | 400 | 400 |
| YEAR MSD | 6 | 2, 4 | 1 | 100, 300 | [4,4],[8,8],[16,16],[32,32] | 10 | 16 |
References
- Borisov,V.;Leemann,T.;Seßler,K.;Haug,J.;Pawelczyk,M.;Kasneci,G.DeepNeuralNetworksandTabular Data: ASurvey. CoRR 2021, abs/2110.01889, [2110.01889].
- Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. ACM, 2016, pp. 785–794. [CrossRef]
- Prokhorenkova, L.O.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, 2018, pp. 6639–6649.
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 3146–3154.
- Popov, S.; Morozov, S.; Babenko, A. Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data. 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
- Abutbul, A.; Elidan, G.; Katzir, L.; El-Yaniv, R. DNF-Net: A Neural Architecture for Tabular Data. CoRR 2020, abs/2006.06465, [2006.06465].
- Gorishniy, Y.; Rubachev, I.; Khrulkov, V.; Babenko, A. Revisiting Deep Learning Models for Tabular Data. Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual; Ranzato, M.; Beygelzimer, A.; Dauphin, Y.N.; Liang, P.; Vaughan, J.W., Eds., 2021, pp. 18932–18943.
- Duan, T.; Anand, A.; Ding, D.Y.; Thai, K.K.; Basu, S.; Ng, A.Y.; Schuler, A. NGBoost: Natural Gradient Boosting for Probabilistic Prediction. Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. PMLR, 2020, Vol. 119, Proceedings of Machine Learning Research, pp. 2690–2700.
- Sprangers, O.; Schelter, S.; de Rijke, M. Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression. KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021. ACM, 2021, pp. 1510–1520. [CrossRef]
- Malinin, A.; Prokhorenkova, L.; Ustimenko, A. Uncertainty in Gradient Boosting via Ensembles. 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Wielopolski, P.; Zięba, M. TreeFlow: Going Beyond Tree-Based Parametric Probabilistic Regression. In ECAI 2023; IOS Press, 2023; Vol. 372, Frontiers in Artificial Intelligence and Applications, pp. 2631–2638. [CrossRef]
- Ren, L.; Sun, G.; Wu, J. RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping. CoRR 2019, abs/1912.02338, [1912.02338].
- Arik, S.Ö.; Pfister, T. TabNet: Attentive Interpretable Tabular Learning. Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 2021, pp. 6679–6687.
- Somepalli, G.; Goldblum, M.; Schwarzschild, A.; Bruss, C.B.; Goldstein, T. SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training, 2021, [arXiv:cs.LG/2106.01342].
- Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 6402–6413.
- Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016; Balcan, M.; Weinberger, K.Q., Eds. JMLR.org, 2016, Vol. 48, JMLR Workshop and Conference Proceedings, pp. 1050–1059.
- Hernández-Lobato, J.M.; Adams, R.P. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks. Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015; Bach, F.R.; Blei, D.M., Eds. JMLR.org, 2015, Vol. 37, JMLR Workshop and Conference Proceedings, pp. 1861–1869.
- Peters, B.; Niculae, V.; Martins, A.F.T. Sparse Sequence-to-Sequence Models. Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers; Korhonen, A.; Traum, D.R.; Màrquez, L., Eds. Association for Computational Linguistics, 2019, pp. 1504–1519. [CrossRef]
- Yang, G.; Huang, X.; Hao, Z.; Liu, M.; Belongie, S.J.; Hariharan, B. PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows. 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 2019, pp. 4540–4549. [CrossRef]
- Sendera, M.; Tabor, J.; Nowak, A.; Bedychaj, A.; Patacchiola, M.; Trzcinski, T.; Spurek, P.; Zieba, M. Non-Gaussian Gaussian Processes for Few-Shot Regression. Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021, pp. 10285–10298.
- Grathwohl, W.; Chen, R.T.Q.; Bettencourt, J.; Sutskever, I.; Duvenaud, D. FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models. 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- McInnes, L.; Healy, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. CoRR 2018, abs/1802.03426, [1802.03426].
- Li, L.; Jamieson, K.G.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. J. Mach. Learn. Res. 2017, 18, 185:1–185:52. [Google Scholar]
- Hutter, F.; Hoos, H.H.; Leyton-Brown, K. An Efficient Approach for Assessing Hyperparameter Importance. Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014. JMLR.org, 2014, Vol. 32, JMLR Workshop and Conference Proceedings, pp. 754–762.



| Dataset | Deep. Ens. | CatBoost | NGBoost | RoNGBa | PGBM | TreeFlow | NodeFlow |
|---|---|---|---|---|---|---|---|
| Concrete | 3.06 ± 0.18 | 3.06 ± 0.13 | 3.04 ± 0.17 | 2.94 ± 0.18 | 2.75 ± 0.21 | 3.02 ± 0.15 | 3.15 ± 0.21 |
| Energy | 1.38 ± 0.22 | 1.24 ± 1.28 | 0.60 ± 0.45 | 0.37 ± 0.28 | 1.74 ± 0.04 | 0.85 ± 0.35 | 0.90 ± 0.25 |
| Kin8nm | -1.20 ± 0.02 | - 0.63 ± 0.02 | -0.49 ± 0.02 | -0.60 ± 0.03 | -0.54 ± 0.04 | -1.03 ± 0.06 | -1.10 ± 0.05 |
| Naval | -5.63 ± 0.05 | -5.39 ± 0.04 | -5.34 ± 0.04 | -5.49 ± 0.04 | -3.44 ± 0.04 | -5.54 ± 0.16 | -5.45 ± 0.08 |
| Power | 2.79 ± 0.04 | 2.72 ± 0.12 | 2.79 ± 0.11 | 2.65 ± 0.08 | 2.60 ± 0.02 | 2.65 ± 0.06 | 2.62 ± 0.05 |
| Protein | 2.83 ± 0.02 | 2.73 ± 0.07 | 2.81 ± 0.03 | 2.76 ± 0.03 | 2.79 ± 0.01 | 2.02 ± 0.02 | 2.04 ± 0.04 |
| Wine | 0.94 ± 0.12 | 0.93 ± 0.08 | 0.91 ± 0.06 | 0.91 ± 0.08 | 0.97 ± 0.20 | -0.56 ± 0.62 | -0.21 ± 0.28 |
| Yacht | 1.18 ± 0.21 | 0.41 ± 0.39 | 0.20 ± 0.26 | 1.03 ± 0.44 | 0.05 ± 0.28 | 0.72 ± 0.40 | 0.79 ± 0.55 |
| Year MSD | 3.35 ± NA | 3.43 ± NA | 3.43 ± NA | 3.46 ± NA | 3.61 ± NA | 3.27 ± NA | 3.09 ± NA |
| Dataset | Ind. NGBoost | NGBoost | TreeFlow | NodeFlow |
|---|---|---|---|---|
| Parkinsons | 6.86 | 5.85 | 5.26 | 5.06 |
| Scm20d | 94.40 | 94.81 | 93.41 | 91.98 |
| Wind | -0.65 | -0.67 | -2.57 | -3.20 |
| Energy | 166.90 | 175.80 | 180.00 | 163.86 |
| USflight | 9.56 | 8.57 | 7.49 | 7.38 |
| Ocean. | 7.74 | 7.73 | 7.84 | 7.81 |
| Dataset | Deep. Ens. | CatBoost | NGBoost | RoNGBa | PGBM | TreeFlow (@2) | NodeFlow(@2) |
|---|---|---|---|---|---|---|---|
| Concrete | 6.03 ± 0.58 | 5.21 ± 0.53 | 5.06 ± 0.61 | 4.71 ± 0.61 | 3.97 ± 0.76 | 5.41 ± 0.71 | 5.51 ± 0.66 |
| Energy | 2.09 ± 0.29 | 0.57 ± 0.06 | 0.46 ± 0.06 | 0.35 ±0.07 | 0.35 ± 0.06 | 0.65 ± 0.12 | 0.70 ± 0.40 |
| Kin8nm | 0.09 ± 0.00 | 0.14 ± 0.00 | 0.16 ± 0.00 | 0.14 ± 0.00 | 0.13 ± 0.01 | 0.10 ± 0.01 | 0.08 ± 0.00 |
| Naval | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| Power | 4.11 ± 0.17 | 3.55 ± 0.27 | 3.70 ± 0.22 | 3.47 ± 0.19 | 3.35 ± 0.15 | 3.79 ± 0.25 | 3.94 ± 0.16 |
| Protein | 4.71 ± 0.06 | 3.92 ± 0.08 | 4.33 ± 0.03 | 4.21 ± 0.06 | 3.98 ± 0.06 | 3.01 ± 0.06 | 4.32 ± 0.03 |
| Wine | 0.64 ± 0.04 | 0.63 ± 0.04 | 0.62 ± 0.04 | 0.62 ± 0.05 | 0.60 ± 0.05 | 0.41 ± 0.09 | 0.44 ± 0.03 |
| Yacht | 1.58 ± 0.48 | 0.82 ± 0.40 | 0.50 ± 0.20 | 0.90 ± 0.35 | 0.63 ± 0.21 | 0.75 ± 0.26 | 1.18 ± 0.47 |
| Year MSD | 8.89 ± NA | 8.99 ± NA | 8.94 ± NA | 9.14 ± NA | 9.09 ± NA | 8.64 ± NA | 8.84 ± NA |
| Dataset | NLL | CRPS | RMSE | ||||||
|---|---|---|---|---|---|---|---|---|---|
| CNF | CNF + MLP | NodeFlow | CNF | CNF + MLP | NodeFlow | CNF | CNF + MLP | NodeFlow | |
| Concrete | 3.24 ± 0.28 | 3.15 ± 0.13 | 3.15 ± 0.21 | 3.80 ± 1.33 | 3.39 ± 0.34 | 2.80 ± 0.34 | 7.16 ± 2.22 | 6.43 ± 0.54 | 5.51 ± 0.66 |
| Energy | 2.90 ± 0.45 | 2.43 ± 0.31 | 0.90 ± 0.25 | 2.73 ± 1.45 | 1.73 ± 0.77 | 0.35 ± 0.14 | 4.90 ± 2.41 | 3.26 ± 1.26 | 0.70 ± 0.40 |
| Kin8nm | -0.66 ± 0.12 | -0.86 ± 0.07 | -1.10 ± 0.05 | 0.07 ± 0.01 | 0.06 ± 0.00 | 0.04 ± 0.00 | 0.14 ± 0.02 | 0.11 ± 0.01 | 0.08 ± 0.00 |
| Naval | -3.42 ± 0.34 | -3.55 ± 0.21 | -5.45 ± 0.08 | 0.01 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.01 ± 0.00 | 0.01 ± 0.00 | 0.00 ± 0.00 |
| Power | 2.92 ± 0.24 | 2.90 ± 0.26 | 2.62 ± 0.05 | 2.59 ± 1.00 | 2.61 ± 1.15 | 1.95 ± 0.06 | 4.69 ± 1.71 | 4.77 ± 1.94 | 3.94 ± 0.16 |
| Protein | 2.57 ± 0.03 | 2.56 ± 0.02 | 2.04 ± 0.04 | 2.69 ± 0.04 | 2.67 ± 0.03 | 1.75 ± 0.03 | 5.88 ± 0.11 | 5.81 ± 0.10 | 4.32 ± 0.03 |
| Wine | 0.07 ± 0.62 | 0.34 ± 0.63 | -0.21 ± 0.28 | 0.36 ± 0.04 | 0.37 ± 0.04 | 0.34 ± 0.02 | 0.54 ± 0.14 | 0.61 ± 0.14 | 0.44 ± 0.09 |
| Yacht | 1.92 ± 1.67 | 1.35 ± 1.82 | 0.79 ± 0.55 | 2.45 ± 3.06 | 1.26 ± 2.35 | 0.50 ± 0.19 | 5.06 ± 5.42 | 2.71 ± 4.33 | 1.18 ± 0.47 |
| Dataset | NLL | CRPS | RMSE | ||||||
|---|---|---|---|---|---|---|---|---|---|
| NodeGauss | NodeGMM | NodeFlow | NodeGauss | NodeGMM | NodeFlow | NodeGauss | NodeGMM | NodeFlow | |
| Concrete | 3.13 ± 0.39 | 3.03 ± 0.18 | 3.15 ± 0.21 | 8.54 ± 0.49 | 9.04 ± 0.49 | 2.80 ± 0.34 | 15.52 ± 0.86 | 16.08 ± 0.86 | 5.51 ± 0.66 |
| Energy | 1.84 ± 0.23 | 1.70 ± 0.21 | 0.90 ± 0.25 | 5.16 ± 0.27 | 5.59 ± 0.27 | 0.35 ± 0.14 | 9.53 ± 0.41 | 9.94 ± 0.41 | 0.70 ± 0.40 |
| Kin8nm | -0.90 ± 0.07 | -0.97 ± 0.06 | -1.10 ± 0.05 | 0.14 ± 0.00 | 0.15 ± 0.00 | 0.04 ± 0.00 | 0.18 ± 0.01 | 0.22 ± 0.01 | 0.08 ± 0.00 |
| Naval | -4.91 ± 0.29 | -4.95 ± 0.15 | -5.45 ± 0.08 | 0.01 ± 0.00 | 0.01 ± 0.00 | 0.00 ± 0.00 | 0.01 ± 0.00 | 0.01 ± 0.00 | 0.00 ± 0.00 |
| Power | 2.84 ± 0.05 | 2.76 ± 0.04 | 2.62 ± 0.05 | 8.88 ± 0.12 | 9.59 ± 0.12 | 1.95 ± 0.06 | 16.10 ± 0.22 | 16.88 ± 0.23 | 3.94 ± 0.16 |
| Protein | 2.84 ± 0.07 | 2.36 ± 0.12 | 2.04 ± 0.04 | 3.39 ± 0.02 | 3.39 ± 0.03 | 1.75 ± 0.03 | 6.03 ± 0.06 | 7.40 ± 0.36 | 4.32 ± 0.03 |
| Wine | 0.97 ± 0.08 | 0.51 ± 0.37 | -0.21 ± 0.28 | 0.45 ± 0.03 | 0.45 ± 0.03 | 0.34 ± 0.02 | 0.82 ± 0.05 | 0.59 ± 0.16 | 0.44 ± 0.09 |
| Yacht | 2.26 ± 0.72 | 1.84 ± 0.63 | 0.79 ± 0.55 | 6.67 ± 1.52 | 6.62 ± 1.58 | 0.50 ± 0.19 | 14.19 ± 3.02 | 14.26 ± 2.95 | 1.18 ± 0.47 |
| Dataset | CNF | CNF + MLP | NodeGauss | NodeGMM | NodeFlow |
|---|---|---|---|---|---|
| Concrete | 335.23 ± 64.91 s | 431.65 ± 232.73 s | 43.82 ± 15.28 s | 25.20 ± 9.74 s | 482.69 ± 127.31 s |
| Energy | 70.63 ± 6.34 s | 80.83 ± 7.33 s | 23.25 ± 7.35 s | 15.48 ± 6.36 s | 687.24 ± 99.62 s |
| Kin8nm | 137.19 ± 9.76 s | 169.22 ± 40.49 s | 45.72 ± 13.31 s | 55.14 ± 16.32 s | 308.89 ± 61.57 s |
| Naval | 213.13 ± 61.62 s | 228.93 ± 20.99 s | 56.22 ± 20.75 s | 47.74 ± 27.42 s | 2413.23 ± 649.67 s |
| Power | 141.333 ± 12.30 s | 180.81 ± 17.90 s | 40.19 ± 15.56 s | 43.93 ± 15.51 s | 1360.29 ± 192.94 s |
| Protein | 373.255 ± 40.39 s | 417.45 ± 52.54 s | 217.13 ± 22.18 s | 224.45 ± 63.75 s | 3018.98 ± 616.95 s |
| Wine | 352.964 ± 69.65 s | 353.93 ± 67.75 s | 26.82 ± 10.80 s | 11.92 ± 6.41 s | 614.85 ± 136.68 s |
| Yacht | 203.561 ± 117.80 s | 259.64 ± 135.60 s | 19.50 ± 10.33 s | 13.31 ± 4.60 s | 567.44 ± 216.81 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).