A multiple alignment workflow shows the effect of repeat masking and parameter tuning on alignment in plants.


Journal

The plant genome
ISSN: 1940-3372
Titre abrégé: Plant Genome
Pays: United States
ID NLM: 101273919

Informations de publication

Date de publication:
06 2022
Historique:
received: 26 08 2021
accepted: 21 02 2022
pubmed: 14 4 2022
medline: 18 6 2022
entrez: 13 4 2022
Statut: ppublish

Résumé

Alignments of multiple genomes are a cornerstone of comparative genomics, but generating these alignments remains technically challenging and often impractical. We developed the msa_pipeline workflow (https://bitbucket.org/bucklerlab/msa_pipeline) to allow practical and sensitive multiple alignment of diverged plant genomes and calculation of conservation scores with minimal user inputs. As high repeat content and genomic divergence are substantial challenges in plant genome alignment, we also explored the effect of different masking approaches and parameters of the LAST aligner using genome assemblies of 33 grass species. Compared with conventional masking with RepeatMasker, a masking approach based on k-mers (nucleotide sequences of k length) increased the alignment rate of coding sequence and noncoding functional regions by 25 and 14%, respectively. We further found that default alignment parameters generally perform well, but parameter tuning can increase the alignment rate for noncoding functional regions by over 52% compared with default LAST settings. Finally, by increasing alignment sensitivity from the default baseline, parameter tuning can increase the number of noncoding sites that can be scored for conservation by over 76%. Overall, tuning of masking and alignment parameters can generate optimized multiple alignments to drive biological discovery in plants.

Identifiants

pubmed: 35416423
doi: 10.1002/tpg2.20204
doi:

Types de publication

Journal Article Research Support, U.S. Gov't, Non-P.H.S. Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e20204

Informations de copyright

© 2022 The Authors. The Plant Genome published by Wiley Periodicals LLC on behalf of Crop Science Society of America.

Références

Armstrong, J., Fiddes, I. T., Diekhans, M., & Paten, B. (2019). Whole-genome alignment and comparative annotation. Annual Review of Animal Biosciences, 7, 41-64. https://doi.org/10.1146/annurev-animal-020518-115005
Armstrong, J., Hickey, G., Diekhans, M., Fiddes, I. T., Novak, A. M., Deran, A., Fang, Q., & Paten, B. (2020). Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature, 587, 246-251. https://doi.org/10.1038/s41586-020-2871-y
Bayer, P. E., Edwards, D., & Batley, J. (2018). Bias in resistance gene prediction due to repeat masking. Nature Plants, 4, 762-765. https://doi.org/10.1038/s41477-018-0264-0
Chowdhury, B., & Garai, G. (2017). A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics, 109(5-6), 419-431. https://doi.org/10.1016/j.ygeno.2017.06.007
Christin, P.-A., Spriggs, E., Osborne, C. P., Strömberg, C. A. E., Salamin, N., & Edwards, E. J. (2014). Molecular dating, evolutionary rates, and the age of the grasses. Systematic Biology, 63, 153-165. https://doi.org/10.1093/sysbio/syt072
Contreras-Moreira, B., Filippi, C. V., Naamati, G., García Girón, C., Allen, J. E., & Flicek, P. (2021). K-mer counting and curated libraries drive efficient annotation of repeats in plant genomes. Plant Genome, 14, e20143. https://doi.org/10.1002/tpg2.20143
Cotton, J. L., Wysocki, W. P., Clark, L. G., Kelchner, S. A., Pires, J. C., Edger, P. P., Mayfield-Jones, D., & Duvall, M. R. (2015). Resolving deep relationships of PACMAD grasses: A phylogenomic approach. BMC Plant Biology, 15, 1-11. https://doi.org/10.1186/s12870-015-0563-9
Davydov, E. V., Goode, D. L., Sirota, M., Cooper, G. M., Sidow, A., & Batzoglou, S. (2010). Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Computational Biology, 6, e1001025. https://doi.org/10.1371/journal.pcbi.1001025
Feng, S., Stiller, J., Deng, Y., Armstrong, J., Fang, Q., Reeve, A. H., Xie, D., & Zhang, G. (2020). Dense sampling of bird diversity increases power of comparative genomics. Nature, 587, 252-257. https://doi.org/10.1038/s41586-020-2873-9
Frith, M. C., Hamada, M., & Horton, P. (2010). Parameters for accurate genome alignment. BMC Bioinformatics, 11, 80. https://doi.org/10.1186/1471-2105-11-80
Frith, M. C., & Noé, L. (2014). Improved search heuristics find 20,000 new alignments between human and mouse genomes. Nucleic Acids Research, 42, e59. https://doi.org/10.1093/nar/gku104
Frith, M. C., & Kawaguchi, R. (2015). Split-alignment of genomes finds orthologies more accurately. Genome Biology, 16, 106. https://doi.org/10.1186/s13059-015-0670-9
Girgis, H. Z. (2015). Red: An intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics, 16, 227. https://doi.org/10.1186/s12859-015-0654-5
Guo Stephen, H., & Moose, P. (2003). Conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution. The Plant Cell, 15, 1143-1158. https://doi.org/10.1105/tpc.010181
Joly-Lopez, Z., Platts, A. E., Gulko, B., Choi, J. Y., Groen, S. C., Zhong, X., Siepel, A., & Purugganan, M. D. (2020). An inferred fitness consequence map of the rice genome. Nature Plants, 6, 119-130. https://doi.org/10.1038/s41477-019-0589-3
Li, H. (2018). Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 34, 3094-3100. https://doi.org/10.1093/bioinformatics/bty191
Lin, H. N., & Hsu, W. L. (2020). GSAlign: An efficient sequence alignment tool for intra-species genomes. BMC Genomics, 21, 182. https://doi.org/10.1186/s12864-020-6569-1
Lu, Z., Marand, A. P., Ricci, W. A., Ethridge, C. L., Zhang, X., & Schmitz, R. J. (2019). The prevalence, evolution and chromatin signatures of plant regulatory elements. Nature Plants, 5, 1250-1259. https://doi.org/10.1038/s41477-019-0548-z
Marçais, G., Delcher, A. L., Phillippy, A. M., Coston, R., Salzberg, S. L., & Zimin, A. (2018). MUMmer4: A fast and versatile genome alignment system. PLoS Computational Biology, 14, e1005944. https://doi.org/10.1371/journal.pcbi.1005944
Minkin, I., & Medvedev, P. (2020). Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nature Communications, 11, 6327. https://doi.org/10.1038/s41467-020-19777-8
Paradis, E., & Schliep, K. (2018). Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics, 35, 526-528. https://doi.org/10.1093/bioinformatics/bty633
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R., & Siepel, A. (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome Research, 20, 110-121. https://doi.org/10.1101/gr.097857.109
Ricci, W. A., Lu, Z., Ji, L., Marand, A. P., Ethridge, C. L., Murphy, N. G., Noshay, J. N., & Zhang, X. (2019). Widespread long-range cis-regulatory elements in the maize genome. Nature Plants, 5, 1237-1249. https://doi.org/10.1038/s41477-019-0547-0
Sharma, V., & Hiller, M. (2017). Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Research, 45, 8369-8377. https://doi.org/10.1093/nar/gkx554
Siepel, A., Bejerano, G., Pedersen, J. S., Hinrichs, A. S., Hou, M., Rosenbloom, K., Clawson, H., & Haussler, D. (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast Genomes. Genome Research, 15, 1034-1050. https://doi.org/10.1101/gr.3715005
Song, B., Wang, H., Wu, Y., Rees, E., Gates, D. J., Burch, M., Bradbury, P. J., & Buckler, E. S. (2020). Constrained non-coding sequence provides insights into regulatory elements and loss of gene expression in maize. Genome Research, 31, 1245-1257. https://doi.org/10.1101/gr.266528.120
Springer, N. M., Anderson, S. N., Andorf, C. M., Ahern, K. R., Bai, F., Barad, O., Barbazuk, W. B., & Brutnell, T. P. (2018). The maize W22 genome provides a foundation for functional genomics and transposon biology. Nature Genetics, 50, 1282-1288. https://doi.org/10.1038/s41588-018-0158-0
Stitzer, M. C., Anderson, S. N., Springer, N. M., & Ross-Ibarra, J. (2019). The genomic ecosystem of transposable elements in maize. bioRxiv. https://doi.org/10.1101/559922
Sun, S., Zhou, Y., Chen, J., Shi, J., Zhao, H., Zhao, H., Song, W., & Lai, J. (2018). Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nature Genetics, 50, 1289-1295. https://doi.org/10.1038/s41588-018-0182-0
Wang, Y., Tang, H., Debarry, J. D., Tan, X., Li, J., Wang, X., Lee, T.-H., & Paterson, A. H. (2012). MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Research, 40, e49. https://doi.org/10.1093/nar/gkr1293
Zhou, C., Yuan, Z., Ma, X., Yang, H., Wang, P., Zheng, L., Zhang, Y., & Liu, X. (2021). Accessible chromatin regions and their functional interrelations with gene transcription and epigenetic modifications in sorghum genome. Plant Communications, 2, 100140. https://doi.org/10.1016/j.xplc.2020.100140

Auteurs

Yaoyao Wu (Y)

Institute for Genomic Diversity, Cornell Univ., Ithaca, NY, 14853, USA.
Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.

Lynn Johnson (L)

Institute for Genomic Diversity, Cornell Univ., Ithaca, NY, 14853, USA.

Baoxing Song (B)

Institute for Genomic Diversity, Cornell Univ., Ithaca, NY, 14853, USA.

Cinta Romay (C)

Institute for Genomic Diversity, Cornell Univ., Ithaca, NY, 14853, USA.

Michelle Stitzer (M)

Institute for Genomic Diversity, Cornell Univ., Ithaca, NY, 14853, USA.
Dep. of Molecular Biology and Genetics, Cornell Univ., Ithaca, NY, 14853, USA.

Adam Siepel (A)

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.

Edward Buckler (E)

Institute for Genomic Diversity, Cornell Univ., Ithaca, NY, 14853, USA.
USDA-ARS, Ithaca, NY, 14853, USA.
Dep. of Molecular Biology and Genetics, Cornell Univ., Ithaca, NY, 14853, USA.

Armin Scheben (A)

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.

Articles similaires

Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Genome Size Genome, Plant Magnoliopsida Evolution, Molecular Arabidopsis
Genome, Bacterial Virulence Phylogeny Genomics Plant Diseases
Host Specificity Bacteriophages Genomics Algorithms Escherichia coli

Classifications MeSH