Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space.
Journal
Cell genomics
ISSN: 2666-979X
Titre abrégé: Cell Genom
Pays: United States
ID NLM: 9918284260106676
Informations de publication
Date de publication:
12 Jan 2022
12 Jan 2022
Historique:
entrez:
24
2
2022
pubmed:
25
2
2022
medline:
25
2
2022
Statut:
ppublish
Résumé
The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.
Identifiants
pubmed: 35199087
doi: 10.1016/j.xgen.2021.100085
pmc: PMC8863334
mid: NIHMS1772401
pii:
doi:
Types de publication
Journal Article
Langues
eng
Subventions
Organisme : NHGRI NIH HHS
ID : U24 HG006620
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG010262
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG010263
Pays : United States
Références
PLoS Biol. 2015 Jul 07;13(7):e1002195
pubmed: 26151137
Cancer Res. 2017 Nov 1;77(21):e3-e6
pubmed: 29092927
Cell Genom. 2021 Nov 10;1(2):None
pubmed: 34820659
Genome Biol. 2010;11(8):R86
pubmed: 20738864
Nucleic Acids Res. 2021 Jan 8;49(D1):D18-D28
pubmed: 33175170
Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21
pubmed: 21062823
Biopreserv Biobank. 2020 Feb;18(1):7-9
pubmed: 32069098
Nat Biotechnol. 2014 Apr;32(4):381-386
pubmed: 24658644
PLoS Med. 2015 Mar 31;12(3):e1001779
pubmed: 25826379
Nucleic Acids Res. 2018 Jan 4;46(D1):D30-D35
pubmed: 29040613
Nature. 2020 Oct;586(7831):683-692
pubmed: 33116284
Cell Genom. 2021 Nov 10;1(2):
pubmed: 35072136
IEEE Pulse. 2015 Nov-Dec;6(6):22-6
pubmed: 26583887
Cell. 2022 Sep 1;185(18):3426-3440.e19
pubmed: 36055201
Genome Biol. 2009;10(3):R25
pubmed: 19261174
Nature. 2017 Jan 18;541(7637):331-338
pubmed: 28102262
Science. 2022 Apr;376(6588):44-53
pubmed: 35357919
Bioinformatics. 2019 Feb 1;35(3):421-432
pubmed: 30020410
Nat Biotechnol. 2011 May 15;29(7):644-52
pubmed: 21572440
Genome Biol. 2004;5(2):R12
pubmed: 14759262
Nat Genet. 2015 May;47(5):435-44
pubmed: 25807286
Nature. 2020 May;581(7809):434-443
pubmed: 32461654
Gigascience. 2021 Jan 13;10(1):
pubmed: 33438730
Bioinformatics. 2020 Jun 1;36(12):3712-3718
pubmed: 32321164
Trends Genet. 2000 Jun;16(6):276-7
pubmed: 10827456
Nat Methods. 2020 Aug;17(8):793-798
pubmed: 32719530
Nucleic Acids Res. 2020 Jul 2;48(W1):W395-W402
pubmed: 32479607
Bioinformatics. 2021 May 5;37(6):744-749
pubmed: 33107913
NPJ Genom Med. 2020 Mar 5;5:9
pubmed: 32194983
Cell. 2013 Sep 26;155(1):27-38
pubmed: 24074859
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Nucleic Acids Res. 2021 Jul 2;49(W1):W624-W632
pubmed: 33978761
Science. 2020 Sep 11;369(6509):1318-1330
pubmed: 32913098
Nucleic Acids Res. 2021 Jan 8;49(D1):D121-D124
pubmed: 33166387
Nucleic Acids Res. 2021 Jan 8;49(D1):D82-D85
pubmed: 33175160
Algorithms Mol Biol. 2013 Sep 16;8(1):22
pubmed: 24040893
Cell Genom. 2021 Oct 13;1(1):
pubmed: 36082306
Genet Med. 2010 Apr;12(4 Suppl):S39-70
pubmed: 20393310
Nat Methods. 2020 Feb;17(2):137-145
pubmed: 31792435
Nat Rev Genet. 2016 May 17;17(6):333-51
pubmed: 27184599
Genome Biol. 2004;5(10):R80
pubmed: 15461798
Nat Biotechnol. 2010 Jul;28(7):691-3
pubmed: 20622843
Genome Res. 2011 May;21(5):734-40
pubmed: 21245279
Genome Res. 2021 May;31(5):910-918
pubmed: 33811084
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Nature. 2021 Feb;590(7845):198-201
pubmed: 33568833
Nat Genet. 2022 Mar;54(3):263-273
pubmed: 35256806
Cancer Res. 2020 Mar 15;80(6):1279-1292
pubmed: 31919242
Cell Genom. 2021 Nov 10;1(2):
pubmed: 35128509
Nat Methods. 2018 Jul;15(7):475-476
pubmed: 29967506
Nat Rev Genet. 2018 Apr;19(4):208-219
pubmed: 29379135
Nat Biotechnol. 2016 Mar;34(3):300-2
pubmed: 26854477
Nat Rev Genet. 2008 May;9(5):356-69
pubmed: 18398418
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
Bioinformatics. 2011 Jun 1;27(11):1571-2
pubmed: 21493656
PLoS Pathog. 2020 Aug 13;16(8):e1008643
pubmed: 32790776
Cell Genom. 2021 Nov 10;1(2):None
pubmed: 34820660
Science. 2022 Apr;376(6588):eabl3533
pubmed: 35357935
Nat Med. 2020 Apr;26(4):542-548
pubmed: 32251405
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W5-9
pubmed: 18440982
Science. 2021 Feb 5;371(6529):
pubmed: 33303686
Nat Rev Genet. 2018 Sep;19(9):581-590
pubmed: 29789686
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Nature. 2021 Feb;590(7845):290-299
pubmed: 33568819
Nucleic Acids Res. 2021 Jan 8;49(D1):D1046-D1057
pubmed: 33221922
Nucleic Acids Res. 2014 Jan;42(Database issue):D975-9
pubmed: 24297256
Nature. 2009 Sep 10;461(7261):168-70
pubmed: 19741685