Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
05 Jul 2024
Historique:
received: 15 01 2024
accepted: 24 06 2024
medline: 6 7 2024
pubmed: 6 7 2024
entrez: 5 7 2024
Statut: epublish

Résumé

To explore complex biological questions, it is often necessary to access various data types from public data repositories. As the volume and complexity of biological sequence data grow, public repositories face significant challenges in ensuring that the data is easily discoverable and usable by the biological research community. To address these challenges, the National Center for Biotechnology Information (NCBI) has created NCBI Datasets. This resource provides straightforward, comprehensive, and scalable access to biological sequences, annotations, and metadata for a wide range of taxa. Following the FAIR (Findable, Accessible, Interoperable, and Reusable) data management principles, NCBI Datasets offers user-friendly web interfaces, command-line tools, and documented APIs, empowering researchers to access NCBI data seamlessly. The data is delivered as packages of sequences and metadata, thus facilitating improved data retrieval, sharing, and usability in research. Moreover, this data delivery method fosters effective data attribution and promotes its further reuse. This paper outlines the current scope of data accessible through NCBI Datasets and explains various options for exploring and downloading the data.

Identifiants

pubmed: 38969627
doi: 10.1038/s41597-024-03571-y
pii: 10.1038/s41597-024-03571-y
doi:

Types de publication

Journal Article Dataset

Langues

eng

Sous-ensembles de citation

IM

Pagination

732

Informations de copyright

© 2024. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.

Références

Bornstein, K., Gryan, G., Chang, E. S., Marchler-Bauer, A. & Schneider, V. A. The NIH Comparative Genomics Resource: addressing the promises and challenges of comparative genomics on human health. BMC Genomics 24, 575 (2023).
doi: 10.1186/s12864-023-09643-4 pubmed: 37759191 pmcid: 10523801
Lathe, W., Williams, J., Mangan, M. & Karolchik, D. Genomic Data Resources: Challenges and Promises. Nature Education 1(3), 2 (2008).
Fan, J. Why it’s worth making computational methods easy to use. Nature https://doi.org/10.1038/d41586-023-01440-z (2023).
doi: 10.1038/d41586-023-01440-z pubmed: 38122821 pmcid: 11000523
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016 31 3, 1–9 (2016).
Schuler, G. D., Epstein, J. A., Ohkawa, H. & Kans, J. A. Entrez: molecular biology database and retrieval system. Methods Enzymol. 266, 141–161 (1996).
doi: 10.1016/S0076-6879(96)66012-1 pubmed: 8743683
Kitts, P. A. et al. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res. 44, D73–D80 (2016).
doi: 10.1093/nar/gkv1226 pubmed: 26578580
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
doi: 10.1093/nar/gkv1189 pubmed: 26553804
Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res. 51, D29–D38 (2023).
doi: 10.1093/nar/gkac1032 pubmed: 36370100
Najar, F. Z. et al. Future COVID19 surges prediction based on SARS-CoV-2 mutations surveillance. ELife 12, e82980 (2023).
doi: 10.7554/eLife.82980 pubmed: 36655992 pmcid: 9894583
Cheng, W. et al. The Special and General Mechanism of Cyanobacterial Harmful Algal Blooms. Microorganisms. Apr 10;11(4):987. (2023)
Ricci, M. et al. Comparative analysis of bats and rodents’ genomes suggests a relation between non-LTR retrotransposons, cancer incidence, and aging. Sci Rep 13, 9039 (2023).
doi: 10.1038/s41598-023-36006-6 pubmed: 37270634 pmcid: 10239488
Sayers E. A General Introduction to the E-utilities. [Updated 2022 Nov 17]. In: Entrez Programming Utilities Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); (2010).
The Galaxy Community. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res. 50(W1), W345–W351 (2022).
doi: 10.1093/nar/gkac247

Auteurs

Nuala A O'Leary (NA)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA. olearyna@ncbi.nlm.nih.gov.

Eric Cox (E)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

J Bradley Holmes (JB)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

W Ray Anderson (WR)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Robert Falk (R)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Vichet Hem (V)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Mirian T N Tsuchiya (MTN)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Gregory D Schuler (GD)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Xuan Zhang (X)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

John Torcivia (J)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Anne Ketter (A)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Laurie Breen (L)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Jonathan Cothran (J)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Hena Bajwa (H)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Jovany Tinne (J)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Peter A Meric (PA)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Wratko Hlavina (W)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Valerie A Schneider (VA)

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD, 20894, USA.

Articles similaires

Humans United States Aged Cross-Sectional Studies Medicare Part C
Humans Emergency Service, Hospital Child Child, Preschool Infant
Humans Mobile Applications Hepatitis C Male Female

How Certification Exams Reflect Current Practice.

Tara L Myers, Sean DeGarmo, Marianne Horahan
1.00
Humans Certification Clinical Competence Education, Nursing, Continuing Adult

Classifications MeSH