Early detection of ovarian cancer by wavelet analysis of protein mass spectra.
classification
distance variance
ovarian cancer diagnostics
wavelet spectra
Journal
Statistics in medicine
ISSN: 1097-0258
Titre abrégé: Stat Med
Pays: England
ID NLM: 8215016
Informations de publication
Date de publication:
15 06 2023
15 06 2023
Historique:
revised:
24
01
2023
received:
14
07
2022
accepted:
17
03
2023
medline:
29
5
2023
pubmed:
1
4
2023
entrez:
31
3
2023
Statut:
ppublish
Résumé
Accurate and efficient detection of ovarian cancer at early stages is critical to ensure proper treatments for patients. Among the first-line modalities investigated in studies of early diagnosis are features distilled from protein mass spectra. This method, however, considers only a specific subset of spectral responses and ignores the interplay among protein expression levels, which can also contain diagnostic information. We propose a new modality that automatically searches protein mass spectra for discriminatory features by considering the self-similar nature of the spectra. Self-similarity is assessed by taking a wavelet decomposition of protein mass spectra and estimating the rate of level-wise decay in the energies of the resulting wavelet coefficients. Level-wise energies are estimated in a robust manner using distance variance, and rates are estimated locally via a rolling window approach. This results in a collection of rates that can be used to characterize the interplay among proteins, which can be indicative of cancer presence. Discriminatory descriptors are then selected from these evolutionary rates and used as classifying features. The proposed wavelet-based features are used in conjunction with features proposed in the existing literature for early stage diagnosis of ovarian cancer using two datasets published by the American National Cancer Institute. Including the wavelet-based features from the new modality results in improvements in diagnostic performance for early-stage ovarian cancer detection. This demonstrates the ability of the proposed modality to characterize new ovarian cancer diagnostic information.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
2257-2273Informations de copyright
© 2023 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Références
American Cancer Society. Key statistics for ovarian cancer. 2022 https://www.cancer.org/cancer/ovarian-cancer/about/key-statistics.html. Accessed July 7, 2022.
Torre LA, Trabert B, DeSantis CE, et al. Ovarian cancer statistics, 2018. CA Cancer J Clin. 2018;68(4):284-296. doi:10.3322/caac.21456
Petricoin EF, Ardekani AM, Hitt BA, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet. 2002;359(9306):572-577. doi:10.1016/S0140-6736(02)07746-2
Tang H, Mukomel Y, Fink E. Diagnosis of ovarian cancer based on mass spectra of blood samples. Paper presented at: 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583). vol. 4. IEEE; 2004; The Hague, Netherlands:3444-3450.
Vannucci M, Sha N, Brown PJ. NIR and mass spectra classification: Bayesian methods for wavelet-based feature selection. Chemom Intel Lab Syst. 2005;77(1):139-148. doi:10.1016/j.chemolab.2004.10.009
Li L, Tang H, Wu Z, et al. Data mining techniques for cancer detection using serum proteomic profiling. Artif Intell Med. 2004;32(2):71-83.
Jung YY, Park Y, Jones DP, Ziegler TR, Vidakovic B. Self-similarity in NMR spectra: an application in assessing the level of cysteine. J Data Sci. 2021;8(1):1-19.
Jeon S, Nicolis O, Vidakovic B. Mammogram diagnostics via 2-D complex wavelet-based self-similarity measures. São Paulo J Math Sci. 2014;8(2):265-284.
Edelmann D, Richards D, Vogel D. The distance standard deviation. Ann Stat. 2020;46(6):3395-3416.
ANCII Repository. The datasets used in the study. 2022 https://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp. Accessed May 6, 2022.
Sorace JM, Zhan M. A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinform. 2003;4(24):572-577.
Vidakovic B. Statistical Modeling by Wavelets. New York, NY: John Wiley and Sons Inc; 1999.
Morettin PA. Wavelets in Statistics. Sáo Paulo, Brazil: University of Sáo Paulo; 1997.
Abry P, Gonçalvès P, Flandrin P. Wavelets, spectrum analysis and 1/f processes. In: Antoniadis A, Oppenheim G, eds. Wavelets and statistics. New York: Springer; 1995:15-29. doi:10.1007/978-1-4612-2544-7_2
Hamilton EK, Jeon S, Cobo PR, Lee KS, Vidakovic B. Robust wavelet-based assessment of scaling with applications. 2022 https://arxiv.org/abs/1902.01032. Accessed May 15, 2022.
Campbell OL, Weber AM. Monofractal analysis of functional magnetic resonance imaging: an introductory review. Hum Brain Mapp. 2022;43(8):2693-2706.
Roberts T, Newell MS, Auffermann WF, Vidakovic B. Wavelet-based scaling indices for breast cancer diagnostics. Stat Med. 2017;36:1989-2000.
Kong T, Vidakovic B. Non-decimated complex wavelet spectral tools with applications. 2019 https://arxiv.org/abs/1902.01032. Accessed May 15, 2022.
Székely GJ, Rizzo ML. On the uniqueness of distance covariance. Stat Prob Lett. 2012;82(12):2278-2282.
Székely GJ, Rizzo ML. Partial distance correlation with methods for dissimilarities. 2013 https://arxiv.org/abs/1310.2926. Accessed May 15, 2022.
Matteson DS, Tsay RS. Independent component analysis via distance covariance. J Am Stat Assoc. 2013;112:623-637.
Cowley B, Semedo J, Zandvakili A, Smith M, Kohn A, Yu B. Distance covariance analysis. In: Singh A, Zhu J, eds. Artificial Intelligence and Statistics. Vol 54. PMLR; 2017:242-251.
Tsyawo ES, Soale AN. A distance covariance-based estimator. 2021 https://arxiv.org/abs/2102.07008. Accessed May 15, 2022.
Székely GJ, Rizzo ML. Brownian distance covariance. Ann Appl Stat. 2009;3(4):1236-1265.
Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. J Am Stat Assoc. 2012;107(499):1129-1139.
Huo X, Székely GJ. Fast computing for distance covariance. 2014 https://arxiv.org/abs/1410.1503. Accessed May 15, 2022.
Chaudhuri A, Hu W. A fast algorithm for computing distance correlation. Comput Stat Data Anal. 2019;135:15-24.
Vidakovic B. Engineering Biostatistics: An Introduction Using Matlab and Winbugs. Hoboken, NJ: John Wiley & Sons; 2007.
Tiwari P, Viswanath S, Kurhanewicz J, Sridhar A, Madabhushi A. Multimodal wavelet embedding representation for data combination (MaWERiC): integrating magnetic resonance imaging and spectroscopy for prostate cancer detection. NMR Biomed. 2012;4(25):607-619.
Wang F, Han XH, Chen YW. Biomedical imaging modality classification using combined visual features and textual terms. Int J Biomed Imag. 2011;2011:241396.
Usman K, Rajpoot K. Brain tumor classification from multi-modality MRI using wavelets and machine learning. Pattern Anal Applic. 2017;20(3):871-881.
Sun L, Li L, Li Z, et al. Alterations in the serum proteome profile during the development of ovarian cancer. Int J Oncol. 2014;45-6:2495-2501.
Zhang H, Kong B, Qu X, Jia L, Deng B, Yang Q. Biomarker discovery for ovarian cancer using SELDI-TOF-MS. Gynecol Oncol. 2006;102(1):61-66.
Fung ET, Yip TT, Lomas L, et al. Classification of cancer types by measuring variants of host response proteins using SELDI serum assays. Int J Cancer. 2005;115(5):783-789.
Sugasawa S, Kobayashi G. Robust fitting of mixture models using weighted complete estimating equations. Comput Stat Data Anal. 2022;174:107526.
McNamara ME, Zisser M, Beevers CG, Shumake J. Not just “big” data: importance of sample size, measurement error, and uninformative predictors for developing prognostic models for digital interventions. Behav Res Ther. 2022;153:104086.