Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
27 Aug 2020
Historique:
received: 13 12 2019
accepted: 21 08 2020
entrez: 29 8 2020
pubmed: 29 8 2020
medline: 30 9 2020
Statut: epublish

Résumé

About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction of knowledge from the data. These techniques may enhance the prognosis and diagnosis associated with reducing diseases such as T2DM. We applied four classification models, including K-nearest neighbor (KNN), support vector machine (SVM), logistic regression, and artificial neural networks (ANN) for diagnosing T2DM, and we compared the diagnostic power of these algorithms with each other. We performed the algorithms on six LncRNA variables (LINC00523, LINC00995, HCG27_201, TPT1-AS1, LY86-AS1, DKFZP) and demographic data. To select the best performance, we considered the AUC, sensitivity, specificity, plotted the ROC curve, and showed the average curve and range. The mean AUC for the KNN algorithm was 91% with 0.09 standard deviation (SD); the mean sensitivity and specificity were 96 and 85%, respectively. After applying the SVM algorithm, the mean AUC obtained 95% after stratified 10-fold cross-validation, and the SD obtained 0.05. The mean sensitivity and specificity were 95 and 86%, respectively. The mean AUC for ANN and the SD were 93% and 0.03, also the mean sensitivity and specificity were 78 and 85%. At last, for the logistic regression algorithm, our results showed 95% of mean AUC, and the SD of 0.05, the mean sensitivity and specificity were 92 and 85%, respectively. According to the ROCs, the Logistic Regression and SVM had a better area under the curve compared to the others. We aimed to find the best data mining approach for the prediction of T2DM using six lncRNA expression. According to the finding, the maximum AUC dedicated to SVM and logistic regression, among others, KNN and ANN also had the high mean AUC and small standard deviations of AUC scores among the approaches, KNN had the highest mean sensitivity and the highest specificity belonged to SVM. This study's result could improve our knowledge about the early detection and diagnosis of T2DM using the lncRNAs as biomarkers.

Sections du résumé

BACKGROUND BACKGROUND
About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction of knowledge from the data. These techniques may enhance the prognosis and diagnosis associated with reducing diseases such as T2DM. We applied four classification models, including K-nearest neighbor (KNN), support vector machine (SVM), logistic regression, and artificial neural networks (ANN) for diagnosing T2DM, and we compared the diagnostic power of these algorithms with each other. We performed the algorithms on six LncRNA variables (LINC00523, LINC00995, HCG27_201, TPT1-AS1, LY86-AS1, DKFZP) and demographic data.
RESULTS RESULTS
To select the best performance, we considered the AUC, sensitivity, specificity, plotted the ROC curve, and showed the average curve and range. The mean AUC for the KNN algorithm was 91% with 0.09 standard deviation (SD); the mean sensitivity and specificity were 96 and 85%, respectively. After applying the SVM algorithm, the mean AUC obtained 95% after stratified 10-fold cross-validation, and the SD obtained 0.05. The mean sensitivity and specificity were 95 and 86%, respectively. The mean AUC for ANN and the SD were 93% and 0.03, also the mean sensitivity and specificity were 78 and 85%. At last, for the logistic regression algorithm, our results showed 95% of mean AUC, and the SD of 0.05, the mean sensitivity and specificity were 92 and 85%, respectively. According to the ROCs, the Logistic Regression and SVM had a better area under the curve compared to the others.
CONCLUSION CONCLUSIONS
We aimed to find the best data mining approach for the prediction of T2DM using six lncRNA expression. According to the finding, the maximum AUC dedicated to SVM and logistic regression, among others, KNN and ANN also had the high mean AUC and small standard deviations of AUC scores among the approaches, KNN had the highest mean sensitivity and the highest specificity belonged to SVM. This study's result could improve our knowledge about the early detection and diagnosis of T2DM using the lncRNAs as biomarkers.

Identifiants

pubmed: 32854616
doi: 10.1186/s12859-020-03719-8
pii: 10.1186/s12859-020-03719-8
pmc: PMC7451240
doi:

Substances chimiques

Biomarkers 0
RNA, Long Noncoding 0
TPT1 protein, human 0
Tumor Protein, Translationally-Controlled 1 0
RNA, Antisense 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

372

Références

Endocr J. 2011;58(9):723-39
pubmed: 21778616
Front Genet. 2018 Nov 06;9:515
pubmed: 30459809
Int J Med Inform. 2018 Apr;112:99-103
pubmed: 29500028
Cell Metab. 2012 Oct 3;16(4):435-48
pubmed: 23040067
Exp Clin Endocrinol Diabetes. 2018 Jul;126(7):406-410
pubmed: 29975979
Exp Clin Endocrinol Diabetes. 2017 Jun;125(6):377-383
pubmed: 28407663
Kaohsiung J Med Sci. 2013 Feb;29(2):93-9
pubmed: 23347811
N Engl J Med. 2001 May 3;344(18):1343-50
pubmed: 11333990
Mol Biol Rep. 2018 Dec;45(6):2601-2608
pubmed: 30328000
Cancer Sci. 2018 Dec;109(12):4033-4044
pubmed: 30290038
BMC Pediatr. 2019 May 20;19(1):159
pubmed: 31109318
Eur Radiol Exp. 2017;1(1):16
pubmed: 29708185
Diabetes Res Clin Pract. 2014 Feb;103(2):137-49
pubmed: 24630390
Mol Biol Rep. 2018 Oct;45(5):1227-1233
pubmed: 30043104
Math Biosci. 2018 Oct;304:1-8
pubmed: 30086268
Science. 2007 Jun 8;316(5830):1484-8
pubmed: 17510325
Genes (Basel). 2017 Aug 22;8(8):
pubmed: 28829354
Nat Genet. 2010 Jul;42(7):579-89
pubmed: 20581827

Auteurs

Faranak Kazerouni (F)

Department of Laboratory Medicine, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Azadeh Bayani (A)

Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Farkhondeh Asadi (F)

Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran. Asadifar@sbmu.ac.ir.

Leyla Saeidi (L)

Department of Clinical Biochemistry, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.

Nasrin Parvizi (N)

Department of Genetics, Faculty of Medicine, Babol University of Medical Sciences, Babol, Iran.

Zahra Mansoori (Z)

Department of Laboratory Medicine, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH