Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression.
Biomarkers, Tumor
/ genetics
Breast Neoplasms
/ genetics
Cell Line, Tumor
Disease Progression
Female
Gene Expression Profiling
/ methods
Gene Expression Regulation, Neoplastic
/ genetics
Gene Regulatory Networks
/ genetics
Humans
Logistic Models
MCF-7 Cells
Oligonucleotide Array Sequence Analysis
/ methods
Transcription Factors
/ genetics
GRN
MCF-7
New logistic regression-based model
Oncogenic
Samples
TFs
Tumor classification
Journal
Gene
ISSN: 1879-0038
Titre abrégé: Gene
Pays: Netherlands
ID NLM: 7706761
Informations de publication
Date de publication:
05 Feb 2020
05 Feb 2020
Historique:
received:
05
07
2019
revised:
21
09
2019
accepted:
11
10
2019
pubmed:
25
11
2019
medline:
7
1
2020
entrez:
25
11
2019
Statut:
ppublish
Résumé
Methods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the 'curse of dimensionality' many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease.
Identifiants
pubmed: 31759986
pii: S0378-1119(19)30827-3
doi: 10.1016/j.gene.2019.144168
pii:
doi:
Substances chimiques
Biomarkers, Tumor
0
Transcription Factors
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
144168Informations de copyright
Copyright © 2019 Elsevier B.V. All rights reserved.