A Transferable Machine Learning Framework for Predicting Transcriptional Responses of Genes Across Species.

Dinucleotide frequency Gene annotation Grasses Machine learning Random forest Transfer learning

Journal

Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969

Informations de publication

Date de publication:
2023
Historique:
medline: 11 9 2023
pubmed: 8 9 2023
entrez: 8 9 2023
Statut: ppublish

Résumé

Leveraging existing resources in studied species to predict gene functions has the potential to rapidly expand understanding of annotated genes in other, less well-studied, species with assembled genomes. However, orthology is not a reliable predictor for the transcriptional responses of genes to stress. Machine learning methods can quantitatively estimate expression patterns and gene functions using known annotations and collections of features describing each gene. In this chapter, we describe a supervised machine learning framework to predict stress-responsive genes across species using only features derived from nucleotide sequences, using the example of cold stress-responsive genes in different Panicoid grass species.

Identifiants

pubmed: 37682485
doi: 10.1007/978-1-0716-3354-0_21
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

361-379

Informations de copyright

© 2023. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.

Références

Curwen V, Eyras E, Andrews TD et al (2004) The Ensembl automatic gene annotation system. Genome Res 14:942–950
doi: 10.1101/gr.1858004 pubmed: 15123590 pmcid: 479124
Washburn JD, Mejia-Guerra MK, Ramstein G et al (2019) Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence. Proc Natl Acad Sci U S A 116:5542–5549
doi: 10.1073/pnas.1814551116 pubmed: 30842277 pmcid: 6431157
Azodi CB, Lloyd JP, Shiu S-H (2020) The cis-regulatory codes of response to combined heat and drought stress in Arabidopsis thaliana. NAR Genom Bioinform 2:lqaa049
doi: 10.1093/nargab/lqaa049 pubmed: 33575601 pmcid: 7671360
Zhou P, Enders TA, Myers ZA et al (2022) Prediction of conserved and variable heat and cold stress response in maize using cis-regulatory information. Plant Cell 34:514–534
doi: 10.1093/plcell/koab267 pubmed: 34735005
Zou C, Sun K, Mackaluso JD et al (2011) Cis-regulatory code of stress-responsive transcription in Arabidopsis thaliana. Proc Natl Acad Sci U S A 108:14992–14997
doi: 10.1073/pnas.1103202108 pubmed: 21849619 pmcid: 3169165
Schreiber J, Singh R (2021) Machine learning for profile prediction in genomics. Curr Opin Chem Biol 65:35–41
doi: 10.1016/j.cbpa.2021.04.008 pubmed: 34107341
Jiao Y, Peluso P, Shi J et al (2017) Improved maize reference genome with single-molecule technologies. Nature 546:524
doi: 10.1038/nature22971 pubmed: 28605751 pmcid: 7052699
McCormick RF, Truong SK, Sreedasyam A et al (2018) The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J 93:338–354
doi: 10.1111/tpj.13781 pubmed: 29161754
Bennetzen JL, Schmutz J, Wang H et al (2012) Reference genome sequence of the model plant Setaria. Nat Biotechnol 30:555
doi: 10.1038/nbt.2196 pubmed: 22580951
Lovell JT, MacQueen AH, Mamidi S et al (2021) Genomic mechanisms of climate adaptation in polyploid bioenergy switchgrass. Nature 590:438–444
doi: 10.1038/s41586-020-03127-1 pubmed: 33505029 pmcid: 7886653
Zou C, Li L, Miki D et al (2019) The genome of broomcorn millet. Nat Commun 10:436
doi: 10.1038/s41467-019-08409-5 pubmed: 30683860 pmcid: 6347628
Varshney RK, Shi C, Thudi M et al (2017) Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments. Nat Biotechnol 35:969–976
doi: 10.1038/nbt.3943 pubmed: 28922347 pmcid: 6871012
Zhang Y, Ngu DW, Carvalho D et al (2017) Differentially regulated orthologs in sorghum and the subgenomes of maize. Plant Cell 29(8):1938–1951
doi: 10.1105/tpc.17.00354 pubmed: 28733421 pmcid: 5590507
Meng X, Liang Z, Dai X et al (2021) Predicting transcriptional responses to cold stress across plant species. Proc Natl Acad Sci U S A 118:e2026330118. https://doi.org/10.1073/pnas.2026330118
doi: 10.1073/pnas.2026330118 pubmed: 33658387 pmcid: 7958178
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
doi: 10.1093/bioinformatics/btu170 pubmed: 24695404 pmcid: 4103590
Wu TD, Reeder J, Lawrence M et al (2016) GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality. Methods Mol Biol 1418:283–334
doi: 10.1007/978-1-4939-3578-9_15 pubmed: 27008021
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
doi: 10.1093/bioinformatics/btp352 pubmed: 19505943 pmcid: 2723002
Trapnell C, Williams BA, Pertea G et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515
doi: 10.1038/nbt.1621 pubmed: 20436464 pmcid: 3146043
Anders S, Pyl PT, Huber W (2015) HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2):166–169
doi: 10.1093/bioinformatics/btu638 pubmed: 25260700
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550
doi: 10.1186/s13059-014-0550-8 pubmed: 25516281 pmcid: 4302049
Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
doi: 10.1093/nar/25.17.3389 pubmed: 9254694 pmcid: 146917
Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189
doi: 10.1101/gr.1224503 pubmed: 12952885 pmcid: 403725
Breiman L (2001) Random forests. Mach Learn 45:5–32
doi: 10.1023/A:1010933404324
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
doi: 10.1093/bioinformatics/bts635 pubmed: 23104886
Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360
doi: 10.1038/nmeth.3317 pubmed: 25751142 pmcid: 4655817
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140
doi: 10.1093/bioinformatics/btp616 pubmed: 19910308
Liang Z, Anderson SN, Noshay JM et al (2021) Genetic and epigenetic variation in transposable element expression responses to abiotic stress in maize. Plant Physiol 186:420–433
doi: 10.1093/plphys/kiab073 pubmed: 33591319 pmcid: 8154091
Li Y, Wang X, Li Y et al (2020) Transcriptomic analysis revealed the common and divergent responses of maize seedling leaves to cold and heat stresses. Genes 11:881. https://doi.org/10.3390/genes11080881
doi: 10.3390/genes11080881 pubmed: 32756433 pmcid: 7464670
Bieniawska Z, Espinoza C, Schlereth A et al (2008) Disruption of the Arabidopsis circadian clock is responsible for extensive variation in the cold-responsive transcriptome. Plant Physiol 147:263–279
doi: 10.1104/pp.108.118059 pubmed: 18375597 pmcid: 2330297
Lai X, Bendix C, Yan L et al (2020) Interspecific analysis of diurnal gene regulation in panicoid grasses identifies known and novel regulatory motifs. BMC Genomics 21:428
doi: 10.1186/s12864-020-06824-3 pubmed: 32586356 pmcid: 7315539

Auteurs

Zhikai Liang (Z)

Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN, USA.

Xiaoxi Meng (X)

Department of Horticultural Science, University of Minnesota, Saint Paul, MN, USA.

James C Schnable (JC)

Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, USA. schnable@unl.edu.
Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, USA. schnable@unl.edu.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
Humans Artificial Intelligence Neoplasms Prognosis Image Processing, Computer-Assisted

Classifications MeSH