GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data.
Journal
bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187
Informations de publication
Date de publication:
07 Aug 2023
07 Aug 2023
Historique:
pubmed:
31
1
2023
medline:
31
1
2023
entrez:
30
1
2023
Statut:
epublish
Résumé
New large scale initiatives, such as the Earth BioGenome Project, require efficient automatic tools for eukaryotic genome annotation. A new automatic tool, GeneMark-ETP, presented here, finds genes by integration of genomic-, transcriptomic- and protein-derived evidence. GeneMark-ETP first identifies genomic loci where extrinsic data is sufficient for gene prediction with 'high confidence' and then proceeds with finding the remaining genes across the whole genome. The initial set of parameters of the statistical model is estimated on the training set made from the high confidence genes. Subsequently, the model parameters are iteratively updated in the cycles of gene prediction and parameter re-estimation. Upon reaching convergence GeneMark-ETP makes the final prediction of the whole complement of genes. The algorithm development was made with a focus on large plant and animal genomes. GeneMark-ETP performance was compared favorably with the ones of the gene finders using a single type of extrinsic evidence delivered by either short RNA reads (GeneMark-ET), or by mapped to genome homologous proteins (GeneMark-EP+). These outcomes could be expected. Moreover, comparisons were made with the pipelines utilizing both transcript- and protein-derived extrinsic evidence. For these experiments we have chosen TSEBRA, combining BRAKER1 and BRAKER2, as well as MAKER2. The results demonstrated that GeneMark-ETP delivered state-of-the-art gene prediction accuracy with a large margin of improvement in large eukaryotic genomes.
Identifiants
pubmed: 36711453
doi: 10.1101/2023.01.13.524024
pmc: PMC9882169
pii:
doi:
Types de publication
Preprint
Langues
eng
Subventions
Organisme : NIGMS NIH HHS
ID : R01 GM128145
Pays : United States
Déclaration de conflit d'intérêts
Conflict of interest statement. None declared.