A primer on the use of machine learning to distil knowledge from data in biological psychiatry.
Journal
Molecular psychiatry
ISSN: 1476-5578
Titre abrégé: Mol Psychiatry
Pays: England
ID NLM: 9607835
Informations de publication
Date de publication:
04 Jan 2024
04 Jan 2024
Historique:
received:
12
07
2022
accepted:
17
11
2023
revised:
21
09
2023
medline:
5
1
2024
pubmed:
5
1
2024
entrez:
4
1
2024
Statut:
aheadofprint
Résumé
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.
Identifiants
pubmed: 38177352
doi: 10.1038/s41380-023-02334-2
pii: 10.1038/s41380-023-02334-2
doi:
Types de publication
Journal Article
Review
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2023. The Author(s), under exclusive licence to Springer Nature Limited.
Références
Alpaydin E. Machine learning, revised and updated edition. The MIT Press; 2021.
Deo RC. Machine learning in medicine. Circulation. 2015;132:1920–30.
pubmed: 26572668
pmcid: 5831252
doi: 10.1161/CIRCULATIONAHA.115.001593
Tarca AL, Carey VJ, Chen X-W, Romero R, Drăghici S. Machine learning and its applications to biology. PLoS Comput Biol. 2007;3:e116.
pubmed: 17604446
pmcid: 1904382
doi: 10.1371/journal.pcbi.0030116
de Ridder D, de Ridder J, Reinders MJT. Pattern recognition in bioinformatics. Brief Bioinform. 2013;14:633–47.
pubmed: 23559637
doi: 10.1093/bib/bbt020
Perlman ZE, Slack MD, Feng Y, Mitchison TJ, Wu LF, Altschuler SJ. Multidimensional drug profiling by automated microscopy. Science. 2004;306:1194–8.
pubmed: 15539606
doi: 10.1126/science.1100709
Li A, Walling J, Ahn S, Kotliarov Y, Su Q, Quezado M, et al. Unsupervised analysis of transcriptomic profiles reveals six glioma subtypes. Cancer Res. 2009;69:2091–9.
pubmed: 19244127
pmcid: 2845963
doi: 10.1158/0008-5472.CAN-08-2100
Drysdale AT, Grosenick L, Downar J, Dunlop K, Mansouri F, Meng Y, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat Med. 2017;23:28–38.
pubmed: 27918562
doi: 10.1038/nm.4246
Cheng W-Y, Ou Yang T-H, Anastassiou D. Development of a prognostic model for breast cancer survival in an open challenge environment. Sci Transl Med. 2013;5:181ra50.
pubmed: 23596202
doi: 10.1126/scitranslmed.3005974
Cheng W-Y, Ou Yang T-H, Anastassiou D. Biomolecular events in cancer revealed by attractor metagenes. PLoS Comput Biol. 2013;9:e1002920.
pubmed: 23468608
pmcid: 3581797
doi: 10.1371/journal.pcbi.1002920
Gao M, Igata H, Takeuchi A, Sato K, Ikegaya Y. Machine learning-based prediction of adverse drug effects: An example of seizure-inducing compounds. J Pharm Sci. 2017;133:70–8.
doi: 10.1016/j.jphs.2017.01.003
Leung MKK, Delong A, Alipanahi B, Frey BJ. Machine learning in genomic medicine: a review of computational problems and data sets. Proc IEEE. 2016;104:176–97.
doi: 10.1109/JPROC.2015.2494198
Steyerberg EW. Clinical prediction models: a practical approach to development, validation, and updating. New York, NY: Springer; 2009.
Varghese B, Chen F, Hwang D, Palmer SL, De Castro Abreu AL, Ukimura O, et al. Objective risk stratification of prostate cancer using machine learning and radiomics applied to multiparametric magnetic resonance images. Sci Rep. 2019;9:1570.
pubmed: 30733585
pmcid: 6367324
doi: 10.1038/s41598-018-38381-x
Pandey G, Pandey OP, Rogers AJ, Ahsen ME, Hoffman GE, Raby BA, et al. A nasal brush-based classifier of asthma identified by machine learning analysis of nasal RNA sequence data. Sci Rep. 2018;8:8826.
pubmed: 29891868
pmcid: 5995932
doi: 10.1038/s41598-018-27189-4
Karczewski KJ, Snyder MP. Integrative omics for health and disease. Nat Rev Genet. 2018;19:299–310.
pubmed: 29479082
pmcid: 5990367
doi: 10.1038/nrg.2018.4
Zhang K, Sun Y, Wu S, Zhou M, Zhang X, Zhou R, et al. Systematic imaging in medicine: a comprehensive review. Eur J Nucl Med Mol Imaging. 2021;48:1736–58.
pubmed: 33210241
doi: 10.1007/s00259-020-05107-z
Oliveira AL. Biotechnology, Big Data and Artificial Intelligence. Biotechnol J. 2019;14:e1800613.
pubmed: 30927505
doi: 10.1002/biot.201800613
Iniesta R, Stahl D, McGuffin P. Machine learning, statistical learning and the future of biological research in psychiatry. Psychol Med. 2016;46:2455–65.
pubmed: 27406289
pmcid: 4988262
doi: 10.1017/S0033291716001367
Kuhn M, Johnson K. Applied predictive modeling. New York, NY: Springer; 2013.
Lee Y, Ragguett R-M, Mansur RB, Boutilier JJ, Rosenblat JD, Trevizol A, et al. Applications of machine learning algorithms to predict therapeutic outcomes in depression: a meta-analysis and systematic review. J Affect Disord. 2018;241:519–32.
pubmed: 30153635
doi: 10.1016/j.jad.2018.08.073
Bzdok D, Meyer-Lindenberg A. Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry Cogn Neurosci Neuroimaging. 2018;3:223–30.
pubmed: 29486863
Vieira S, Pinaya WHL, Mechelli A. Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications. Neurosci Biobehav Rev. 2017;74:58–75.
pubmed: 28087243
doi: 10.1016/j.neubiorev.2017.01.002
Dwyer DB, Falkai P, Koutsouleris N. Machine learning approaches for clinical psychology and psychiatry. Annu Rev Clin Psychol. 2018;14:91–118.
pubmed: 29401044
doi: 10.1146/annurev-clinpsy-032816-045037
Wolfers T, Buitelaar JK, Beckmann CF, Franke B, Marquand AF. From estimating activation locality to predicting disorder: A review of pattern recognition for neuroimaging-based psychiatric diagnostics. Neurosci Biobehav Rev. 2015;57:328–49.
pubmed: 26254595
doi: 10.1016/j.neubiorev.2015.08.001
Schwarz E, Guest PC, Rahmoune H, Harris LW, Wang L, Leweke FM, et al. Identification of a biological signature for schizophrenia in serum. Mol Psychiatry. 2012;17:494–502.
pubmed: 21483431
doi: 10.1038/mp.2011.42
Bahn S, Noll R, Barnes A, Schwarz E, Guest PC. Challenges of introducing new biomarker products for neuropsychiatric disorders into the market. Int Rev Neurobiol. 2011;101:299–327.
pubmed: 22050857
doi: 10.1016/B978-0-12-387718-5.00012-2
Lee G, Nho K, Kang B, Sohn K-A, Kim D. Predicting Alzheimer’s disease progression using multi-modal deep learning approach. Sci Rep. 2019;9:1952.
pubmed: 30760848
pmcid: 6374429
doi: 10.1038/s41598-018-37769-z
Cui R, Liu M. RNN-based longitudinal analysis for diagnosis of Alzheimer’s disease. Comput Med Imaging Graph. 2019;73:1–10.
pubmed: 30763637
doi: 10.1016/j.compmedimag.2019.01.005
Bellazzi R, Zupan B. Towards knowledge-based gene expression data mining. J Biomed Inf. 2007;40:787–802.
doi: 10.1016/j.jbi.2007.06.005
Zhang C, Zhang S. Association rule mining: models and algorithms. Berlin, Heidelberg: Springer-Verlag; 2002.
Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv. 2009;41:15:1–15:58.
doi: 10.1145/1541880.1541882
Noto K, Majidi S, Edlow AG, Wick HC, Bianchi DW, Slonim DK. CSAX: characterizing systematic anomalies in eXpression data. J Comput Biol. 2015;22:402–13.
pubmed: 25651392
pmcid: 4424968
doi: 10.1089/cmb.2014.0155
Quinn TP, Nguyen T, Lee SC, Venkatesh S. Cancer as a tissue anomaly: classifying tumor transcriptomes based only on healthy data. Front Genet. 2019;10:599.
pubmed: 31312210
pmcid: 6614188
doi: 10.3389/fgene.2019.00599
Legendre P, Gallagher ED. Ecologically meaningful transformations for ordination of species data. Oecologia. 2001;129:271–80.
pubmed: 28547606
doi: 10.1007/s004420100716
Pinaya WHL, Mechelli A, Sato JR. Using deep autoencoders to identify abnormal brain structural patterns in neuropsychiatric disorders: a large-scale multi-sample study. Hum Brain Mapp. 2019;40:944–54.
pubmed: 30311316
doi: 10.1002/hbm.24423
Zhang-James Y, Buitelaar JK, Rooij D, Faraone SV, The ENIGMA-ASD Working Group. Ensemble classification of autism spectrum disorder using structural magnetic resonance imaging features. JCPP Adv. 2021;1:e12042.
pubmed: 37431438
pmcid: 10242907
doi: 10.1002/jcv2.12042
Geddes TA, Kim T, Nan L, Burchfield JG, Yang JYH, Tao D, et al. Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis. BMC Bioinform. 2019;20:660.
doi: 10.1186/s12859-019-3179-5
Gligorijevic V, Barot M, Bonneau R. deepNF: deep network fusion for protein function prediction. Bioinformatics. 2018;34:3873–81.
pubmed: 29868758
pmcid: 6223364
doi: 10.1093/bioinformatics/bty440
Zhu X, Goldberg AB. Introduction to Semi-Supervised Learning. Synth Lect Artif Intell Mach Learn. 2009;3:1–130.
Mitchell TM. Machine learning. 1st ed. New York: McGraw-Hill Education; 1997.
Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.
Bzdok D, Altman N, Krzywinski M. Statistics versus machine learning. Nat Methods. 2018;15:233–4.
pubmed: 30100822
pmcid: 6082636
doi: 10.1038/nmeth.4642
Breen MS, Tylee DS, Maihofer AX, Neylan TC, Mehta D, Binder EB, et al. PTSD blood transcriptome mega-analysis: shared inflammatory pathways across biological sex and modes of trauma. Neuropsychopharmacology. 2018;43:469–81.
pubmed: 28925389
doi: 10.1038/npp.2017.220
Bousman CA, Chana G, Glatt SJ, Chandler SD, Lucero GR, Tatro E, et al. Preliminary evidence of ubiquitin proteasome system dysregulation in schizophrenia and bipolar disorder: convergent pathway analysis findings from two independent samples. Am J Med Genet B Neuropsychiatr Genet. 2010;153B:494–502.
pubmed: 19582768
doi: 10.1002/ajmg.b.31006
Vawter MP, Philibert R, Rollins B, Ruppel PL, Osborn TW. Exon array biomarkers for the differential diagnosis of schizophrenia and bipolar disorder. Mol Neuropsychiatry. 2018;3:197–213.
pubmed: 29888231
pmcid: 5981774
Nazeen S, Palmer NP, Berger B, Kohane IS. Integrative analysis of genetic data sets reveals a shared innate immune component in autism spectrum disorder and its co-morbidities. Genome Biol. 2016;17:228.
pubmed: 27842596
pmcid: 5108086
doi: 10.1186/s13059-016-1084-z
Kong SW, Collins CD, Shimizu-Motohashi Y, Holm IA, Campbell MG, Lee I-H, et al. Characteristics and predictive value of blood transcriptome signature in males with autism spectrum disorders. PLoS One. 2012;7:e49475.
pubmed: 23227143
pmcid: 3515554
doi: 10.1371/journal.pone.0049475
Hicks SD, Ignacio C, Gentile K, Middleton FA. Salivary miRNA profiles identify children with autism spectrum disorder, correlate with adaptive behavior, and implicate ASD candidate genes involved in neurodevelopment. BMC Pediatr. 2016;16:52.
pubmed: 27105825
pmcid: 4841962
doi: 10.1186/s12887-016-0586-x
Tsuang MT, Nossova N, Yager T, Tsuang M-M, Guo S-C, Shyu KG, et al. Assessing the validity of blood-based gene expression profiles for the classification of schizophrenia and bipolar disorder: a preliminary report. Am J Med Genet B Neuropsychiatr Genet. 2005;133B:1–5.
pubmed: 15645418
doi: 10.1002/ajmg.b.30161
Tylee DS, Hess JL, Quinn TP, Barve R, Huang H, Zhang-James Y, et al. Blood transcriptomic comparison of individuals with and without autism spectrum disorder: a combined-samples mega-analysis. Am J Med Genet B Neuropsychiatr Genet. 2017;174:181–201.
pubmed: 27862943
doi: 10.1002/ajmg.b.32511
Takahashi M, Hayashi H, Watanabe Y, Sawamura K, Fukui N, Watanabe J, et al. Diagnostic classification of schizophrenia by neural network analysis of blood-based gene expression signatures. Schizophr Res. 2010;119:210–8.
pubmed: 20083392
doi: 10.1016/j.schres.2009.12.024
Zhang H, Xie Z, Yang Y, Zhao Y, Zhang B, Fang J. The correlation-base-selection algorithm for diagnostic schizophrenia based on blood-based gene expression signatures. Biomed Res Int. 2017;2017:7860506.
pubmed: 28280741
pmcid: 5322573
Yi Z, Li Z, Yu S, Yuan C, Hong W, Wang Z, et al. Blood-based gene expression profiles models for classification of subsyndromal symptomatic depression and major depressive disorder. PLoS ONE. 2012;7:e31283.
pubmed: 22348066
pmcid: 3278427
doi: 10.1371/journal.pone.0031283
Struyf J, Dobrin S, Page D. Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia. BMC Genomics. 2008;9:531.
pubmed: 18992130
pmcid: 2628394
doi: 10.1186/1471-2164-9-531
Breen MS, Uhlmann A, Nday CM, Glatt SJ, Mitt M, Metsalpu A, et al. Candidate gene networks and blood biomarkers of methamphetamine-associated psychosis: an integrative RNA-sequencing report. Transl Psychiatry. 2016;6:e802.
pubmed: 27163203
pmcid: 5070070
doi: 10.1038/tp.2016.67
Hess JL, Tylee DS, Barve R, de Jong S, Ophoff RA, Kumarasinghe N, et al. Transcriptome-wide mega-analyses reveal joint dysregulation of immunologic genes and transcription regulators in brain and blood in schizophrenia. Schizophr Res. 2016;176:114–24.
pubmed: 27450777
pmcid: 5026943
doi: 10.1016/j.schres.2016.07.006
Nicodemus KK, Malley JD. Predictor correlation impacts machine learning algorithms: implications for genomic studies. Bioinformatics. 2009;25:1884–90.
pubmed: 19460890
doi: 10.1093/bioinformatics/btp331
Nicodemus KK, Malley JD, Strobl C, Ziegler A. The behaviour of random forest permutation-based variable importance measures under predictor correlation. BMC Bioinform. 2010;11:110.
doi: 10.1186/1471-2105-11-110
Yassin W, Nakatani H, Zhu Y, Kojima M, Owada K, Kuwabara H, et al. Machine-learning classification using neuroimaging data in schizophrenia, autism, ultra-high risk and first-episode psychosis. Transl Psychiatry. 2020;10:278.
pubmed: 32801298
pmcid: 7429957
doi: 10.1038/s41398-020-00965-5
Cho G, Yim J, Choi Y, Ko J, Lee S-H. Review of machine learning algorithms for diagnosing mental illness. Psychiatry Investig. 2019;16:262–9.
pubmed: 30947496
pmcid: 6504772
doi: 10.30773/pi.2018.12.21.2
Zhang-James Y, Chen Q, Kuja-Halkola R, Lichtenstein P, Larsson H, Faraone SV. Machine-Learning prediction of comorbid substance use disorders in ADHD youth using Swedish registry data. J Child Psychol Psychiatry. 2020;61:1370–9.
pubmed: 32237241
pmcid: 7754321
doi: 10.1111/jcpp.13226
Zhang-James Y, Razavi AS, Hoogman M, Franke B, Faraone SV. Machine learning and MRI-based diagnostic models for ADHD: are we there yet? J Atten Disord. 2023;27:335–53.
pubmed: 36651494
doi: 10.1177/10870547221146256
Chekroud AM, Zotti RJ, Shehzad Z, Gueorguieva R, Johnson MK, Trivedi MH, et al. Cross-trial prediction of treatment outcome in depression: a machine learning approach. Lancet Psychiatry. 2016;3:243–50.
pubmed: 26803397
doi: 10.1016/S2215-0366(15)00471-X
Nie Z, Vairavan S, Narayan VA, Ye J, Li QS. Predictive modeling of treatment resistant depression using data from STAR*D and an independent clinical study. PLoS ONE. 2018;13:e0197268.
pubmed: 29879133
pmcid: 5991746
doi: 10.1371/journal.pone.0197268
Lenhard F, Sauer S, Andersson E, Månsson KN, Mataix-Cols D, Rück C, et al. Prediction of outcome in internet-delivered cognitive behaviour therapy for paediatric obsessive-compulsive disorder: a machine learning approach. Int J Methods Psychiatr Res. 2018;27:e1576.
pubmed: 28752937
doi: 10.1002/mpr.1576
Flygare O, Enander J, Andersson E, Ljótsson B, Ivanov VZ, Mataix-Cols D, et al. Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach. BMC Psychiatry. 2020;20:247.
pubmed: 32429939
pmcid: 7238519
doi: 10.1186/s12888-020-02655-4
van Breda W, Bremer V, Becker D, Hoogendoorn M, Funk B, Ruwaard J, et al. Predicting therapy success for treatment as usual and blended treatment in the domain of depression. Internet Inter. 2018;12:100–4.
doi: 10.1016/j.invent.2017.08.003
Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997;1:67–82.
doi: 10.1109/4235.585893
Koohy H. The rise and fall of machine learning methods in biomedical research. F1000Res. 2017;6:2012.
pubmed: 29375816
doi: 10.12688/f1000research.13016.1
Molnar C. Interpretable machine learning. 2020. Lulu.com .
Gountouna V-E, Bermingham M, Kuznetsova K, Urda Munoz D, Agakov F, Robson S, et al. Predictive machine learning for personalised medicine in major depressive disorder. medRxiv. 2022. https://doi.org/10.1101/2022.02.11.22270724 .
Strobl C, Boulesteix A-L, Zeileis A, Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. 2007;8:25.
doi: 10.1186/1471-2105-8-25
Boulesteix A-L, Slawski M. Stability and aggregation of ranked gene lists. Brief Bioinform. 2009;10:556–68.
pubmed: 19679825
doi: 10.1093/bib/bbp034
Verma S, Dickerson J, Hines K. Counterfactual explanations for machine learning: challenges revisited. arXiv. 2021. https://doi.org/10.48550/arXiv.2106.07756 .
Lundberg S, Lee S-I. A unified approach to interpreting model predictions. arXiv. 2017. https://doi.org/10.48550/arXiv.1705.07874 .
Tsang M, Liu H, Purushotham S, Murali P, Liu Y. Neural interaction transparency (NIT): disentangling learned interactions for improved interpretability. 2018:5809–18.
Zhang Y, Yang Q. A Survey on Multi-Task Learning. IEEE Trans Knowl Data Eng. 2021. 1–1
Widmer C, Rätsch G. Multitask learning in computational biology. In: Guyon I, Dror G, Lemaire V, Taylor G, Silver D, editors. Proc. ICML Workshop Unsupervised Transf. Learn., vol. 27, Bellevue, Washington, USA.
Li Y, Wang J, Ye J, Reddy CK. A multi-task learning formulation for survival analysis. KDD 16, New York, NY, USA: ACM; 2016. p. 1715–24.
Yuan H, Paskov I, Paskov H, González AJ, Leslie CS. Multitask learning improves prediction of cancer drug sensitivity. Sci Rep. 2016;6:31619.
pubmed: 27550087
pmcid: 4994023
doi: 10.1038/srep31619
Feriante J. Massively multitask deep learning for drug discovery. University of Wisconsin-Madison, 2015.
Xu Q, Pan SJ, Xue HH, Yang Q. Multitask learning for protein subcellular location prediction. IEEEACM Trans Comput Biol Bioinform. 2011;8:748–59.
doi: 10.1109/TCBB.2010.22
Zhou J, Liu J, Narayan VA, Ye J, Alzheimer’s Disease Neuroimaging Initiative. Modeling disease progression via multi-task learning. Neuroimage. 2013;78:233–48.
pubmed: 23583359
doi: 10.1016/j.neuroimage.2013.03.073
Perlich C. Learning curves in machine learning. In: Encyclopedia of machine learning and data mining, Boston, MA: Springer US; 2011. p. 577–80.
Barnett E, Onete D, Salekin A, Faraone SV Genomic machine learning meta-regression: Insights on associations of study features with reported model performance. BioRxiv. 2022. https://doi.org/10.1101/2022.01.10.22268751 .
Stripelis D, Gupta U, Saleem H, Dhinagar N, Ghai T, Sanchez R, et al. Secure federated learning for neuroimaging. arXiv. 2022. https://doi.org/10.48550/arXiv.2205.05249 .
Vaid A, Jaladanki SK, Xu J, Teng S, Kumar A, Lee S, et al. Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach. JMIR Med Inf. 2021;9:e24207.
doi: 10.2196/24207
Zhai Y, Ong Y-S, Tsang IW. The Emerging ‘Big Dimensionality’. IEEE Comput Intell Mag. 2014;9:14–26.
doi: 10.1109/MCI.2014.2326099
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507–17.
pubmed: 17720704
doi: 10.1093/bioinformatics/btm344
Tang J, Alelyani S, Liu H. Feature selection for classification: a review. data classif. Algorithms Appl., CRC Press; 2014. p. 37–64.
Sorzano COS, Vargas J, Pascual Montano A. A survey of dimensionality reduction techniques. ArXiv. 2014. https://doi.org/10.48550/arXiv.1403.2877
Smialowski P, Frishman D, Kramer S. Pitfalls of supervised feature selection. Bioinformatics. 2010;26:440–3.
pubmed: 19880370
doi: 10.1093/bioinformatics/btp621
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A. Feature selection for high-dimensional data. Springer; 2015.
Ambroise C, McLachlan GJ. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA. 2002;99:6562–6.
pubmed: 11983868
pmcid: 124442
doi: 10.1073/pnas.102102699
Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016.
Quinn TP, Lee SC, Venkatesh S, Nguyen T. Improving the classification of neuropsychiatric conditions using gene ontology terms as features. Am J Med Genet B Neuropsychiatr Genet. 2019;180:508–18.
pubmed: 31025483
doi: 10.1002/ajmg.b.32727
Chawla NV Data Mining for Imbalanced Datasets: An Overview. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. Boston, MA: Springer US; 2005. p. 853–67.
Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE. 2015;10:e0118432.
pubmed: 25738806
pmcid: 4349800
doi: 10.1371/journal.pone.0118432
Altman N, Krzywinski M. Graphical assessment of tests and classifiers. Nat Methods. 2021;18:840–2.
pubmed: 34316067
doi: 10.1038/s41592-021-01232-1
Zhao Q, Adeli E, Pohl KM. Training confounder-free deep learning models for medical applications. Nat Commun. 2020;11:6010.
pubmed: 33243992
pmcid: 7691500
doi: 10.1038/s41467-020-19784-9
Loughman A, Quinn T, Nation ML, Reichelt A, Moore RJ, Van TTH, et al. Infant microbiota in colic: predictive associations with problem crying and subsequent child behavior. J Dev Orig Health Dis. 2021;12:260–70.
Wang H, Wu Z, Xing EP. Removing confounding factors associated weights in deep neural networks improves the prediction accuracy for healthcare applications. Pac Symp Biocomput. 2019;24:54–65.
pubmed: 30864310
pmcid: 6417810
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, et al. Domain-adversarial training of neural networks. J Mach Learn Res. 2016;17:2096–30.
Liu Y, Nyunoya T, Leng S, Belinsky SA, Tesfaigzi Y, Bruse S. Softwares and methods for estimating genetic ancestry in human populations. Hum Genomics. 2013;7:1.
pubmed: 23289408
pmcid: 3542037
doi: 10.1186/1479-7364-7-1
Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:1724–35.
pubmed: 17907809
doi: 10.1371/journal.pgen.0030161
Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring Fairness in Machine Learning to Advance Health Equity. Ann Intern Med. 2018;169:866–72.
pubmed: 30508424
pmcid: 6594166
doi: 10.7326/M18-1990
Quinn TP, Coghlan S. Readying medical students for medical AI: the need to embed AI ethics education. ArXiv. 2021. https://doi.org/10.48550/arXiv.2109.02866 .
Food and Drug Administration (FDA). Artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan.
Franco D, Oneto L, Navarin N, Anguita D. Toward learning trustworthily from data combining privacy, fairness, and explainability: an application to face recognition. Entropy Basel Switz. 2021;23:1047.
Nicodemus KK, Callicott JH, Higier RG, Luna A, Nixon DC, Lipska BK, et al. Evidence of statistical epistasis between DISC1, CIT and NDEL1 impacting risk for schizophrenia: biological validation with functional neuroimaging. Hum Genet. 2010;127:441–52.
pubmed: 20084519
doi: 10.1007/s00439-009-0782-y
Nicodemus KK, Law AJ, Radulescu E, Luna A, Kolachana B, Vakkalanka R, et al. Biological validation of increased schizophrenia risk with NRG1, ERBB4, and AKT1 epistasis via functional neuroimaging in healthy controls. Arch Gen Psychiatry. 2010;67:991–1001.
pubmed: 20921115
pmcid: 4291187
doi: 10.1001/archgenpsychiatry.2010.117
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22.
pubmed: 30763612
doi: 10.1016/j.jclinepi.2019.02.004
Song X, Liu X, Liu F, Wang C. Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis. Int J Med Inf. 2021;151:104484.
doi: 10.1016/j.ijmedinf.2021.104484
Smith DL, Held P. Moving toward precision PTSD treatment: predicting veterans’ intensive PTSD treatment response using continuously updating machine learning models. Psychol Med. 2023;53:5500–9.
Hess JL, Tylee DS, Barve R, de Jong S, Ophoff RA, Kumarasinghe N, et al. Transcriptomic abnormalities in peripheral blood in bipolar disorder, and discrimination of the major psychoses. Schizophr Res. 2020;217:124–35.
pubmed: 31391148
doi: 10.1016/j.schres.2019.07.036
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine learning for integrating data in biology and medicine: principles, practice, and opportunities. Inf Fusion. 2019;50:71–91.
pubmed: 30467459
doi: 10.1016/j.inffus.2018.09.012
Schwarz E, Leweke FM, Bahn S, Liò P. Clinical bioinformatics for complex disorders: a schizophrenia case study. BMC Bioinform. 2009;10:S6.
doi: 10.1186/1471-2105-10-S12-S6
Xia CH, Ma Z, Ciric R, Gu S, Betzel RF, Kaczkurkin AN, et al. Linked dimensions of psychopathology and connectivity in functional brain networks. Nat Commun. 2018;9:3003.
pubmed: 30068943
pmcid: 6070480
doi: 10.1038/s41467-018-05317-y
Shomorony I, Cirulli ET, Huang L, Napier LA, Heister RR, Hicks M, et al. Unsupervised integration of multimodal dataset identifies novel signatures of health and disease. BioRxiv. 2018. https://doi.org/10.1101/432641 .
Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, et al. Multi-omics factor analysis-a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol. 2018;14:e8124.
pubmed: 29925568
pmcid: 6010767
doi: 10.15252/msb.20178124
Koutsouleris N, Meisenzahl EM, Borgwardt S, Riecher-Rössler A, Frodl T, Kambeitz J, et al. Individualized differential diagnosis of schizophrenia and mood disorders using neuroanatomical biomarkers. Brain. 2015;138:2059–73.
pubmed: 25935725
pmcid: 4572486
doi: 10.1093/brain/awv111
Doan NT, Kaufmann T, Bettella F, Jørgensen KN, Brandt CL, Moberget T, et al. Distinct multivariate brain morphological patterns and their added predictive value with cognitive and polygenic risk scores in mental disorders. Neuroimage Clin. 2017;15:719–31.
pubmed: 28702349
pmcid: 5491456
doi: 10.1016/j.nicl.2017.06.014
Cao H, Duan J, Lin D, Shugart YY, Calhoun V, Wang Y-P. Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs. Neuroimage. 2014;102:220–8.
pubmed: 24530838
doi: 10.1016/j.neuroimage.2014.01.021
Li YC, Wang L, Law JN, Murali TM, Pandey G. Integrating multimodal data through interpretable heterogeneous ensembles. Bioinforma Adv. 2022;2:vbac065.
doi: 10.1093/bioadv/vbac065
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15:20170387.
Lin E, Kuo P-H, Liu Y-L, Yu YW-Y, Yang AC, Tsai S-J. A deep learning approach for predicting antidepressant response in major depression using clinical and genetic biomarkers. Front Psychiatry. 2018;9:290.
pubmed: 30034349
pmcid: 6043864
doi: 10.3389/fpsyt.2018.00290
Sundaram L, Bhat RR, Viswanath V, Li X. DeepBipolar: Identifying genomic mutations for bipolar disorder via deep learning. Hum Mutat. 2017;38:1217–24.
pubmed: 28600868
doi: 10.1002/humu.23272
Plis SM, Hjelm DR, Salakhutdinov R, Allen EA, Bockholt HJ, Long JD, et al. Deep learning for neuroimaging: a validation study. Front Neurosci. 2014;8:229.
pubmed: 25191215
pmcid: 4138493
doi: 10.3389/fnins.2014.00229
Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am J Hum Genet. 2015;97:576–92.
pubmed: 26430803
pmcid: 4596916
doi: 10.1016/j.ajhg.2015.09.001
Bulik-Sullivan, Loh BK, Finucane P-R, Ripke HK, Yang S, Schizophrenia Working Group of the Psychiatric Genomics Consortium J, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47:291–5.
pubmed: 25642630
pmcid: 4495769
doi: 10.1038/ng.3211
Krapohl E, Patel H, Newhouse S, Curtis CJ, von Stumm S, Dale PS, et al. Multi-polygenic score approach to trait prediction. Mol Psychiatry. 2018;23:1368–74.
pubmed: 28785111
doi: 10.1038/mp.2017.163
Barnett EJ, Biederman J, Doyle AE, Hess J, DiSalvo M, Faraone SV. Identifying pediatric mood disorders from transdiagnostic polygenic risk scores: a study of children and adolescents. J Clin Psychiatry. 2022;83:40635.
Chen J, Schwarz E. BioMM: Biologically-informed Multi-stage Machine learning for identification of epigenetic fingerprints. arXiv:171200336. 2017. https://doi.org/10.48550/arXiv.1712.00336 .
Vu M-AT, Adalı T, Ba D, Buzsáki G, Carlson D, Heller K, et al. A shared vision for machine learning in neuroscience. J Neurosci. 2018;38:1601–7.
pubmed: 29374138
pmcid: 5815449
doi: 10.1523/JNEUROSCI.0508-17.2018
Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2:230–43.
pubmed: 29507784
pmcid: 5829945
doi: 10.1136/svn-2017-000101
Kang J, Rancati T, Lee S, Oh JH, Kerns SL, Scott JG, et al. Machine learning and radiogenomics: lessons learned and future directions. Front Oncol. 2018;8:228.
pubmed: 29977864
pmcid: 6021505
doi: 10.3389/fonc.2018.00228
Shrager J, Tenenbaum JM. Rapid learning for precision oncology. Nat Rev Clin Oncol. 2014;11:109.
pubmed: 24445514
doi: 10.1038/nrclinonc.2013.244
Doyle-Lindrud S. Watson will see you now: a supercomputer to help clinicians make informed treatment decisions. Clin J Oncol Nurs. 2015;19:31–2.
pubmed: 25689646
doi: 10.1188/15.CJON.31-32
Wang S, Summers RM. Machine learning and radiology. Med Image Anal. 2012;16:933–51.
pubmed: 22465077
pmcid: 3372692
doi: 10.1016/j.media.2012.02.005
Neale B. Perspective on data sharing and open science in psychiatric genetics. Eur Neuropsychopharmacol. 2019;29:S778.
Bell V. Open science in mental health research. Lancet Psychiatry. 2017;4:525–6.
pubmed: 28552500
doi: 10.1016/S2215-0366(17)30244-4
Hardwicke TE, Mathur MB, MacDonald K, Nilsonne G, Banks GC, Kidwell MC, et al. Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition. R Soc Open Sci. 2018;5:180448.
pubmed: 30225032
pmcid: 6124055
doi: 10.1098/rsos.180448
National Research Council (US) Committee on Applied, Statistics T. The current state of data integration in science. USA: National Academies Press; 2010.
Sullivan PF, Agrawal A, Bulik CM, Andreassen OA, Børglum AD, Breen G, et al. Psychiatric Genomics: An Update and an Agenda. Am J Psychiatry. 2018;175:15–27.
pubmed: 28969442
doi: 10.1176/appi.ajp.2017.17030283
MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017;45:D896–D901.
pubmed: 27899670
doi: 10.1093/nar/gkw1133
Shen Z, Spruit M. A systematic review of open source clinical software on GitHub for improving software reuse in smart healthcare. NATO Adv Sci Inst Ser E Appl Sci. 2019;9:150.
Guinney J, Saez-Rodriguez J. Alternative models for sharing confidential biomedical data. Nat Biotechnol. 2018;36:391–2.
pubmed: 29734317
doi: 10.1038/nbt.4128
Huang S, Chaudhary K, Garmire LX. More is better: recent progress in multi-omics data integration methods. Front Genet. 2017;8:84.
pubmed: 28670325
pmcid: 5472696
doi: 10.3389/fgene.2017.00084
Lapatas V, Stefanidakis M, Jimenez RC, Via A, Schneider MV. Data integration in biological research: an overview. J Biol Res. 2015;22:9.
Lam RW, Milev R, Rotzinger S, Andreazza AC, Blier P, Brenner C, et al. Discovering biomarkers for antidepressant response: protocol from the Canadian biomarker integration network in depression (CAN-BIND) and clinical characteristics of the first patient cohort. BMC Psychiatry. 2016;16:105.
pubmed: 27084692
pmcid: 4833905
doi: 10.1186/s12888-016-0785-x
Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artif Intell. 1997;97:245–71.
doi: 10.1016/S0004-3702(97)00063-5
Kolachalama VB, Garg PS. Machine learning and medical education. Npj Digit Med. 2018;1:54.
pubmed: 31304333
pmcid: 6550167
doi: 10.1038/s41746-018-0061-1
Hekler A, Utikal JS, Enk AH, Solass W, Schmitt M, Klode J, et al. Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images. Eur J Cancer Oxf Engl. 2019;118:91–6.
doi: 10.1016/j.ejca.2019.06.012
McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577:89–94.
pubmed: 31894144
doi: 10.1038/s41586-019-1799-6
Wang S, Manning C. Fast dropout training. In: Dasgupta S, McAllester D, editors. Proc. 30th International Conference on Machine Learning, vol. 28, PMLR: Atlanta, Georgia, USA; 2013. p. 118–26.
Lakshminarayanan B, Pritzel A, Blundell C. Simple and scalable predictive uncertainty estimation using deep ensembles. ArXiv. 2017. https://doi.org/10.48550/arXiv.1612.01474 .
Wang G, Li W, Aertsen M, Deprest J, Ourselin S, Vercauteren T. Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks. Neurocomputing. 2019;338:34–45.
doi: 10.1016/j.neucom.2019.01.103
Dolezal JM, Srisuwananukorn A, Karpeyev D, Ramesh S, Kochanny S, Cody B, et al. Uncertainty-informed deep learning models enable high-confidence predictions for digital histopathology. Nat Commun. 2022;13:6572.
pubmed: 36323656
pmcid: 9630455
doi: 10.1038/s41467-022-34025-x
Tandon N, Tandon R. Machine learning in psychiatry- standards and guidelines. Asian J Psychiatr. 2019;44:A1–A4.
pubmed: 31530438
doi: 10.1016/j.ajp.2019.09.009
Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. ArXiv. 2019. https://doi.org/10.48550/arXiv.1910.10045 .
Bollen KA, Jackman RW. Regression diagnostics: an expository treatment of outliers and influential cases. Socio Methods Res. 1985;13:510–42.
doi: 10.1177/0049124185013004004
Samiei M, Würfl T, Deleu T, Weiss M, Dutil F, Fevens T, et al. The TCGA meta-dataset clinical benchmark. ArXiv. 2019. https://doi.org/10.48550/arXiv.1910.08636 .
Feng J, Xu H, Mannor S, Yan S. Robust logistic regression and classification. In: Advances in neural information processing systems, vol. 27, Curran Associates, Inc.; 2014.
Ying X. An overview of overfitting and its solutions. J Phys Confer Ser. 2019;1168:022022.
doi: 10.1088/1742-6596/1168/2/022022
Orrù G, Pettersson-Yeo W, Marquand AF, Sartori G, Mechelli A. Using support vector machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review. Neurosci Biobehav Rev. 2012;36:1140–52.
pubmed: 22305994
doi: 10.1016/j.neubiorev.2012.01.004
Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS ONE. 2019;14:e0224365.
pubmed: 31697686
pmcid: 6837442
doi: 10.1371/journal.pone.0224365
Ramteke RJ, Khachane M. Automatic medical image classification and abnormality detection using K- nearest neighbour. Int J Adv Comput Res. 2012;2:190–6.
Awoyemi JO, Adetunmbi AO, Oluwadare SA. Credit card fraud detection using machine learning techniques: a comparative analysis. 2017 Int. Confer. Comput. Netw. Inform. Lagos, Nigeria: ICCNI; 2017. p. 1–9.
Delgadillo J, Lutz W. A development pathway towards precision mental health care. JAMA Psychiatry. 2020;77:889.
pubmed: 32459326
doi: 10.1001/jamapsychiatry.2020.1048
Lutz W, Schwartz B, Martín Gómez Penedo J, Boyle K, Deisenhofer A-K. Working towards the development and implementation of precision mental healthcare: an example. Adm Policy Ment Health. 2020;47:856–61.
pubmed: 32715429
pmcid: 8316220
doi: 10.1007/s10488-020-01053-y
Heckerman D A Bayesian Approach to Learning Causal Networks. ArXiv. 2015. https://doi.org/10.48550/arXiv.1302.4958 .
Madar IH, Sultan G, Tayubi IA, Hasan AN, Pahi B, Rai A, et al. Identification of marker genes in Alzheimer’s disease using a machine-learning model. Bioinformation. 2021;17:348–55.
pubmed: 34234395
pmcid: 8225597
doi: 10.6026/97320630017363
Chen Y-C, Wheeler TA, Kochenderfer MJ. Learning discrete bayesian networks from continuous data. J Artif Intell Res. 2017;59:103–32.
doi: 10.1613/jair.5371
Cheng J, Liu H-P, Lin W-Y, Tsai F-J. Machine learning compensates fold-change method and highlights oxidative phosphorylation in the brain transcriptome of Alzheimer’s disease. Sci Rep. 2021;11:13704.
pubmed: 34211065
pmcid: 8249453
doi: 10.1038/s41598-021-93085-z
Dwyer K, Holte R. Decision Tree Instability and Active Learning. In: Kok JN, Koronacki J, de Mantaras RL, Matwin S, Mladenič D, Skowron A, editors. Mach. Learn. ECML 2007, Berlin, Heidelberg: Springer; 2007. p. 128–39.
Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001;16:199–231.
doi: 10.1214/ss/1009213726
Ainscough BJ, Barnell EK, Ronning P, Campbell KM, Wagner AH, Fehniger TA, et al. A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data. Nat Genet. 2018;50:1735–43.
pubmed: 30397337
pmcid: 6428590
doi: 10.1038/s41588-018-0257-y
Sole X, Ramisa A, Torras C. Evaluation of random forests on large-scale classification problems using a Bag-of-Visual-Words representation.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. New York, NY: Springer; 2009.
Vangay P, Hillmann BM, Knights D. Microbiome Learning Repo (ML Repo): a public repository of microbiome regression and classification tasks. GigaScience. 2019;8:giz042.
pubmed: 31042284
pmcid: 6493971
doi: 10.1093/gigascience/giz042
Zhang H, Nettleton D, Zhu Z. Regression-enhanced random forests. ArXiv. 2019. https://doi.org/10.48550/arXiv.1904.10416 .
Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag. 2012;29:82–97.
doi: 10.1109/MSP.2012.2205597
Suk H-I, Shen D. Deep learning-based feature representation for AD/MCI classification. Med Image Comput Comput Assist Interv. 2013;16:583–90.
pubmed: 24579188
pmcid: 4029347
Telenti A, Lippert C, Chang P-C, DePristo M. Deep learning of genomic variation and regulatory network data. Hum Mol Genet. 2018;27:R63–R71.
pubmed: 29648622
pmcid: 6499235
doi: 10.1093/hmg/ddy115
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual Attention Network for Image Classification. arXiv:170406904. 2017. https://doi.org/10.48550/arXiv.1704.06904 .
Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag. 2018;13:55–75.
doi: 10.1109/MCI.2018.2840738
Zrimec J, Börlin CS, Buric F, Muhammad AS, Chen R, Siewers V, et al. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat Commun. 2020;11:6141.
pubmed: 33262328
pmcid: 7708451
doi: 10.1038/s41467-020-19921-4
Pandey M, Fernandez M, Gentile F, Isayev O, Tropsha A, Stern AC, et al. The transformational role of GPU computing and deep learning in drug discovery. Nat Mach Intell. 2022;4:211–21.
doi: 10.1038/s42256-022-00463-x
Koppe G, Meyer-Lindenberg A, Durstewitz D. Deep learning for small and big data in psychiatry. Neuropsychopharmacology. 2021;46:176–90.
pubmed: 32668442
doi: 10.1038/s41386-020-0767-z
Alwosheel A, van Cranenburgh S, Chorus CG. Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis. J Choice Model. 2018;28:167–82.
doi: 10.1016/j.jocm.2018.07.002
Passafaro TL, Lopes FB, Dórea JRR, Craven M, Breen V, Hawken RJ, et al. Would large dataset sample size unveil the potential of deep neural networks for improved genome-enabled prediction of complex traits? The case for body weight in broilers. BMC Genomics. 2020;21:771.
pubmed: 33167865
pmcid: 7654004
doi: 10.1186/s12864-020-07181-x
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.
pubmed: 7063747
doi: 10.1148/radiology.143.1.7063747