Automated annotation of human centromeres with HORmon.
Journal
Genome research
ISSN: 1549-5469
Titre abrégé: Genome Res
Pays: United States
ID NLM: 9518021
Informations de publication
Date de publication:
06 2022
06 2022
Historique:
received:
03
11
2021
accepted:
06
05
2022
pubmed:
12
5
2022
medline:
29
6
2022
entrez:
11
5
2022
Statut:
ppublish
Résumé
Recent advances in long-read sequencing opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. They also emphasized the need for centromere annotation (partitioning human centromeres into monomers and higher-order repeats [HORs]). Although there was a half-century-long series of semi-manual studies of centromere architecture, a rigorous centromere annotation algorithm is still lacking. Moreover, an automated centromere annotation is a prerequisite for studies of genetic diseases associated with centromeres and evolutionary studies of centromeres across multiple species. Although the monomer decomposition (transforming a centromere into a monocentromere written in the monomer alphabet) and the HOR decomposition (representing a monocentromere in the alphabet of HORs) are currently viewed as two separate problems, we show that they should be integrated into a single framework in such a way that HOR (monomer) inference affects monomer (HOR) inference. We thus developed the HORmon algorithm that integrates the monomer/HOR inference and automatically generates the human monomers/HORs that are largely consistent with the previous semi-manual inference.
Identifiants
pubmed: 35545449
pii: gr.276362.121
doi: 10.1101/gr.276362.121
pmc: PMC9248890
doi:
Types de publication
Journal Article
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1137-1151Informations de copyright
© 2022 Kunyavskaya et al.; Published by Cold Spring Harbor Laboratory Press.
Références
Nat Biotechnol. 2020 Nov;38(11):1309-1316
pubmed: 32665660
Science. 1976 Feb 13;191(4227):528-35
pubmed: 1251186
Nucleic Acids Res. 1985 Apr 25;13(8):2731-43
pubmed: 2987865
Prog Mol Subcell Biol. 2021;60:203-234
pubmed: 34386877
Genome Biol. 2021 Jul 12;22(1):203
pubmed: 34253240
Science. 2001 Aug 10;293(5532):1098-102
pubmed: 11498581
Chromosome Res. 2018 Sep;26(3):115-138
pubmed: 29974361
Nat Biotechnol. 2011 Nov 08;29(11):987-91
pubmed: 22068540
BMC Genomics. 2009 Dec 23;10:630
pubmed: 20030836
Bioinformatics. 2016 Jul 1;32(13):1921-1924
pubmed: 27153570
Genes (Basel). 2018 Dec 07;9(12):
pubmed: 30544645
Genome Res. 2020 Sep;30(9):1291-1305
pubmed: 32801147
Nature. 2021 May;593(7857):101-107
pubmed: 33828295
PLoS Comput Biol. 2007 Sep;3(9):1807-18
pubmed: 17907796
Science. 2022 Apr;376(6588):44-53
pubmed: 35357919
Annu Rev Genet. 2021 Nov 23;55:583-602
pubmed: 34813350
Mol Biol Evol. 1998 Aug;15(8):1062-8
pubmed: 9718733
Genome Res. 2014 Apr;24(4):697-707
pubmed: 24501022
Genom Data. 2015 Sep 1;5:139-146
pubmed: 26167452
Nucleic Acids Res. 1987 Sep 25;15(18):7549-69
pubmed: 3658703
Nature. 2020 Sep;585(7823):79-84
pubmed: 32663838
Bioinformatics. 2005 Apr 1;21(7):846-52
pubmed: 15509609
Cell. 2009 Sep 18;138(6):1067-82
pubmed: 19766562
Chromosoma. 2001 Aug;110(4):253-66
pubmed: 11534817
Science. 2022 Apr;376(6588):eabl4178
pubmed: 35357911
Bioinformatics. 2020 Jul 1;36(Suppl_1):i93-i101
pubmed: 32657390
Int J Mol Sci. 2021 Apr 21;22(9):
pubmed: 33919233
Somat Cell Mol Genet. 1989 Sep;15(5):445-60
pubmed: 2781415
Data Brief. 2019 Mar 08;24:103708
pubmed: 30989093
Bioinformatics. 2021 Jul 12;37(Suppl_1):i196-i204
pubmed: 34252949
Sci Adv. 2020 Dec 11;6(50):
pubmed: 33310858