A Geometric Clustering Tool (AGCT) to robustly unravel the inner cluster structures of time-series gene expressions.
Algorithms
Animals
Cell Cycle
/ genetics
Computational Biology
/ methods
Datasets as Topic
Gene Expression
Gene Expression Regulation
/ drug effects
Liver
/ drug effects
Markov Chains
Mice
Pentachlorophenol
/ pharmacology
Random Allocation
Saccharomyces cerevisiae
/ cytology
Software
Systems Biology
/ methods
Wavelet Analysis
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2020
2020
Historique:
received:
15
08
2019
accepted:
12
05
2020
entrez:
7
7
2020
pubmed:
7
7
2020
medline:
9
9
2020
Statut:
epublish
Résumé
Systems biology aims at holistically understanding the complexity of biological systems. In particular, nowadays with the broad availability of gene expression measurements, systems biology challenges the deciphering of the genetic cell machinery from them. In order to help researchers, reverse engineer the genetic cell machinery from these noisy datasets, interactive exploratory clustering methods, pipelines and gene clustering tools have to be specifically developed. Prior methods/tools for time series data, however, do not have the following four major ingredients in analytic and methodological view point: (i) principled time-series feature extraction methods, (ii) variety of manifold learning methods for capturing high-level view of the dataset, (iii) high-end automatic structure extraction, and (iv) friendliness to the biological user community. With a view to meet the requirements, we present AGCT (A Geometric Clustering Tool), a software package used to unravel the complex architecture of large-scale, non-necessarily synchronized time-series gene expression data. AGCT capture signals on exhaustive wavelet expansions of the data, which are then embedded on a low-dimensional non-linear map using manifold learning algorithms, where geometric proximity captures potential interactions. Post-processing techniques, including hard and soft information geometric clustering algorithms, facilitate the summarizing of the complete map as a smaller number of principal factors which can then be formally identified using embedded statistical inference techniques. Three-dimension interactive visualization and scenario recording over the processing helps to reproduce data analysis results without additional time. Analysis of the whole-cell Yeast Metabolic Cycle (YMC) moreover, Yeast Cell Cycle (YCC) datasets demonstrate AGCT's ability to accurately dissect all stages of metabolism and the cell cycle progression, independently of the time course and the number of patterns related to the signal. Analysis of Pentachlorophenol iduced dataset demonstrat how AGCT dissects data to identify two networks: Interferon signaling and NRF2-signaling networks.
Identifiants
pubmed: 32628677
doi: 10.1371/journal.pone.0233755
pii: PONE-D-19-22852
pmc: PMC7337352
doi:
Substances chimiques
Pentachlorophenol
D9BSU0SE4T
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0233755Déclaration de conflit d'intérêts
Commercial affiliation authors does not alter our adherence to PLOS ONE policies on sharing data and materials. The authors have declared no competing interests exist.
Références
Proc Natl Acad Sci U S A. 2015 Dec 22;112(51):15672-7
pubmed: 26644564
Gene. 1999 Aug 5;236(1):33-42
pubmed: 10433964
Open Biol. 2012 Sep;2(9):120117
pubmed: 23091701
PLoS Comput Biol. 2008 Jul 25;4(7):e1000029
pubmed: 18654623
Bioinformatics. 2003 Sep 22;19(14):1787-99
pubmed: 14512350
Proc Natl Acad Sci U S A. 1999 Jun 8;96(12):6745-50
pubmed: 10359783
BMC Genomics. 2006 Mar 29;7:64
pubmed: 16571132
Microbiol Rev. 1993 Jun;57(2):383-401
pubmed: 8393130
J Appl Toxicol. 2013 Dec;33(12):1433-41
pubmed: 22972318
Curr Biol. 2016 Apr 25;26(8):1117-25
pubmed: 27020743
Proc Natl Acad Sci U S A. 1996 May 28;93(11):5629-34
pubmed: 8643628
Nat Biotechnol. 2005 Dec;23(12):1499-501
pubmed: 16333293
Chem Res Toxicol. 2009 Jun;22(6):969-77
pubmed: 19408893
Chem Biol Interact. 2006 Jul 10;161(3):251-61
pubmed: 16729991
Nat Genet. 2006 May;38(5):500-1
pubmed: 16642009
Cell Cycle. 2016;15(3):441-54
pubmed: 26726837
Bioinformatics. 2008 Feb 1;24(3):430-2
pubmed: 18065427
Science. 2000 Dec 22;290(5500):2319-23
pubmed: 11125149
Bioinformatics. 2014 Feb 15;30(4):523-30
pubmed: 24336805
PLoS One. 2009;4(1):e4189
pubmed: 19142232
Cold Spring Harb Symp Quant Biol. 2007;72:339-43
pubmed: 18419291
Science. 2005 Nov 18;310(5751):1152-8
pubmed: 16254148
Nat Genet. 2017 Mar;49(3):438-443
pubmed: 28166214
Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8
pubmed: 9843981
Nat Biotechnol. 2010 Apr;28(4):322-4
pubmed: 20379172
Mol Biol Cell. 1998 Dec;9(12):3273-97
pubmed: 9843569
Nucleic Acids Res. 2011 Sep 1;39(17):7380-9
pubmed: 21690098
Nat Genet. 1999 Jul;22(3):281-5
pubmed: 10391217
Proc Natl Acad Sci U S A. 1999 Mar 16;96(6):2907-12
pubmed: 10077610
J Toxicol Sci. 2013;38(4):643-54
pubmed: 23892564
Mol Cell. 1998 Jul;2(1):65-73
pubmed: 9702192
BMC Bioinformatics. 2006 Apr 05;7:191
pubmed: 16597342
Nat Commun. 2017 Feb 07;8:14238
pubmed: 28169989
Science. 2000 Dec 22;290(5500):2323-6
pubmed: 11125150
Neural Comput. 2007 Mar;19(3):780-91
pubmed: 17298233