Submitted:
22 July 2025
Posted:
23 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methodology
2.1. Forward Feature Selection
2.2. Backward Feature Deletion
2.3. MI Main-Effect Model Selection
3. Empirical Example
3.1. Ethnic/Culture Groups
3.2. Psychological Questionnaires
3.3. SNP Genotyping Data
3.4. Results
4. Discussion
Supplementary Materials
Author Contributions
Acknowledgments
References
- Agresti A. (2013). Categorical data analysis. 3rd ed., Wiley, New York.
- Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans Auto Cont., 19, 716-723.
- Amari, S. (2001). Information geometry on hierarchy of probability distributions. IEEE Trans Info Th., 47, 1701-1711.
- Beck, A. T., Steer, R. A., Ball, R., & Ranieri, W. F. (1996). Comparison of Beck Depression Inventories-IA and-II in psychiatric outpatients. Journal of personality assessment, 67(3), 588-597.
- Bishop, Y. M., Fienberg, S.E., and Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice, Cambridge, MA: MIT Press.
- Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37, 373–384.
- Brummett, B.H. et al. (2014). A putatively functional polymorphism in the HTR2C gene is associated with depressive symptoms in white females reporting significant life stress, PLoS ONE 9(12): e114451. DOI:10.1371/journal.pone.0114451.
- Burnham, K. P. and Anderson, D. R. (2010). Model Selection and Multimodal Inference: A Practical Information-Theoretic Approach, 2nd ed. New York: Springer.
- Chan, T. E., Stumpf, M. P. H. and Babtie, A. C. (2017). Gene regulatory network inference from single-cell data using multivariate information measures. Cell Systems, 5, 251-267.
- Chen Q, Ding J, An L, Wang H. (2022). Ca2+-stimulated adenylyl cyclases as therapeutic targets for psychiatric and neurodevelopmental disorders. Front Pharmacol. 13: 949384.
- Cheng, P. E., Liou, J. W., Liou, M and Aston, J. A. D. (2006). Data information in contingency tables: A fallacy of hierarchical log-linear models. J. Data Science, 4, 387-398.
- Cheng, P. E., Liou, M., Aston, J. A. and Tsai, A. C. (2008). Information identities and testing hypotheses: Power analysis for contingency tables. Statistica Sinica, 18, 535-558.
- Cheng, P. E., Liou, M. and Aston, J. A. (2010). Likelihood ratio tests with three-way tables. J. Am. Stat. Asso., 105, 740-749.
- Cheng, P. E. and Liou, M. (2024). Mutual information decomposition with applications. Behaviormetrika, https://doi.org/10.1007/s41237-024-00241-6.
- Cheng, X., Ji, Z., Tsalkova, T., & Mei, F. (2008). Epac and PKA: a tale of two intracellular cAMP receptors. Acta biochimica et biophysica Sinica, 40(7), 651-662.
- Dracheva S, Patel N, Woo DA, Marcus SC, Siever LJ, & Haroutunian V. (2008). Increased serotonin-2C receptor mRNA editing: a possible risk factor for suicide. Mol Psychiatry. 13: 1001-1010.
- Eguchi, S. & Copas, J. (2006). Interpreting Kllback-Leibler divergence with the Neyman-Pearson Lemma. J. Multiv. Anal., 97, 2034-2040.
- Fahrmeir, L., Hamerle, A. and Tutz, G. (1984). Multivariate Statistische Verfahren [Multivariate Statistical Analyses]; Walter de Grnyter: Berlin, Germany. (In German).
- Fahrmeir, L. and Tutz, G. (1994). Multivariate Statistical Modeling Based on Generalized Linear Models; Springer: Berlin, Germany.
- Gray, M, Nash, K. R., & Yao, Y. (2024). Adenylyl cyclase 2 expression and function in neurological diseases. CNS Neurosci Ther. 30(7), e14880.
- Härdle, W. and Stoker, T. (1986). Investigating smooth multiple regression by the method of average derivatives. J. Amer. Statist. Assoc. 84, 986-995.
- Hastie, T. (1986). Generalized additive models. Stat. Sci. 1, 297-318.
- Hastie, T. and Tibshirani, R. (1987). Generalized additive models: some applications. J. Amer. Statist. Assoc. 82, 371-386.
- Ince, R. A. A., Bruno L. Giordano, B. L., Kayser, C., Rousselet, G. A., Gross, J. and Schyns, P. G. (2017). A Statistical Framework for Neuroimaging Data Analysis Based on Mutual Information Estimated via a Gaussian Copula. Human Brain Mapping 38, 1541-1573.
- Ivanov, R., Kazantsev, F., Zavarzin, E., Klimenko, A., Milakhina, N., Matushkin, Y. G., Savostyanov, A., & Lashin, S. (2022). ICBrainDB: An Integrated Database for Finding Associations between Genetic Factors and EEG Markers of Depressive Disorders. Journal of personalized medicine, 12(1), 53.
- Ivanov, R., Zamyatin, V., Klimenko, A., Matushkin, Y., Savostyanov, A., & Lashin, S. (2019). Reconstruction and analysis of gene networks of human neurotransmitter systems reveal genes with contentious manifestation for anxiety, depression, and intellectual disabilities. Genes, 10(9), 699.
- Iwamoto K, Bundo M, Kato T. Serotonin receptor 2C and mental disorders: genetic, expression, and RNA editing studies. RNA Biol. 2009; 6(3): 248–253.
- Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 29, 119-127.
- Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Stat. 22, 79-86.
- Linhart, H. and Zucchini, W. (1986). Finite sample selection criteria for multinomial models. Statistische Hefte. 27, 173-178.
- Liou, J. W., Liou, M. and Cheng, P. E. (2023). Modeling categorical variables by mutual information decomposition. Entropy. 25(5), 750; https://doi.org/10.3390/e25050750.
- Mallows, C. L. (1973). Some comments on cp. Technometrics, 15, 661–675.
- McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed., Chapman & Hall, London.
- McGill, W. J. (1954). Multivariate information transmission. Psychometrika, 19, 97-115.
- McMahon, S.S., Sim, A., Johnson, R., Liepe, J., and Stumpf, M.P.H. (2014). Information theory and signal transduction systems: from molecular information processing to network inference. Semin. Cell Dev. Biol. 35, 98–108.
- Powell, J. L., Stock, J. H. and Stoker, T. M. (1989). Semiparametric estimation of index coefficients. Econometrica 57, 1403-1430.
- Schwarz, G. (1978). Estimating the dimension of a model. Ann. Stat. 6, 461–464.
- Sen P et al. (2025). A bipolar disorder-associated missense variant alters adenylyl cyclase 2 activity and promotes mania-like behavior. Mol Psychiatry. 30(1): 97–110.
- Shannon, C. E. (1948). A mathematical theory of communication. Bell Sys. Tech. Journal 27, 379-423; 623-656.
- Shek, D. T. (1993). The Chinese version of the State-Trait Anxiety Inventory: Its relationship to different measures of psychological well-being. Journal of clinical psychology, 49(3), 349-358.
- Spielberger, C. D. (2010). State-Trait anger expression inventory. The Corsini encyclopedia of psychology, 1-1.
- Stoker, T. M. (1986). Consistent estimation of scaled coefficients. Econometrica 54, 1461-1481.
- Stone, C. J. (1986). The dimensionality reduction principle for generalized additive models. Ann. Stat. 14, 590-606.
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288.
- Timme, N., Wesley, A. Flecker, B. and Beggs, J. M. (2014). Synergy, redundancy, and.
- multivariate information measures: an experimentalist’s perspective. J. Comput. Neurosci. 36, 119-140.
- Tyc, V. L., Fairclough, D., Fletcher, B., Leigh, L., & Mulhern, R. K. (1995). Children’s distress during magnetic resonance imaging procedures. Children’s Health Care, 24(1), 5-19.
- Vergara, J. R. and Estévez, P. A. (2014). A review of feature selection methods based on mutual information. Neural Computing and Applications, 24, 175-186.
- Vrshek-Schallhorn, S. (2015). Additive genetic risk from five serotonin system polymorphisms interacts with interpersonal stress to predict depression. Journal of Abnormal Psychology, 124(4), 776-790.
- Wang Q, O’Brien PJ, Chen CX, Cho DS, Murray JM, Nishikura K. Altered G protein-coupling functions of RNA editing isoform and splicing variant serotonin2C receptors. J Neurochem 2000; 74:1290-300.
- Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, Chichester.
| Genes | ADCY2 | DARPP32 | GRP6 | HTR2C | OXTR | PENK | TF | WFS1 | ||||||||
| Splits | ≤ 0 | > 0 | ≤ 1 | > 1 | ≤ 0 | > 0 | ≤ 0 | > 0 | ≤ 11 | > 11 | ≤ 0 | > 0 | ≤ 0 | > 0 | ≤ 10 | > 10 |
| Patients | 54 | 0 | 54 | 0 | 54 | 0 | 54 | 0 | 27 | 27 | 54 | 0 | 54 | 0 | 54 | 0 |
| Controls | 241 | 369 | 546 | 64 | 529 | 81 | 241 | 369 | 510 | 100 | 249 | 361 | 354 | 256 | 280 | 330 |
| Genes | ADCY2 | CALU | GRIK4 | MAOA | PPP1R1B | ||||||||||
| Splits | ≤ 0 | > 0 | ≤ 5 | (5,6] | > 6 | ≤ 7 | (7,10] | (10,14] | (14,15] | (15,19] | >19 | ≤ 3 | > 3 | ≤ 0 | > 0 |
| Patients | 54 | 0 | 41 | 10 | 3 | 0 | 2 | 19 | 14 | 16 | 3 | 54 | 0 | 54 | 0 |
| Controls | 241 | 369 | 430 | 46 | 134 | 128 | 63 | 117 | 45 | 124 | 133 | 531 | 79 | 405 | 205 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).