AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
08 09 2020
08 09 2020
Historique:
received:
24
03
2020
accepted:
19
08
2020
entrez:
9
9
2020
pubmed:
10
9
2020
medline:
10
9
2020
Statut:
epublish
Résumé
The ever-growing availability of computing power and the sustained development of advanced computational methods have contributed much to recent scientific progress. These developments present new challenges driven by the sheer amount of calculations and data to manage. Next-generation exascale supercomputers will harden these challenges, such that automated and scalable solutions become crucial. In recent years, we have been developing AiiDA (aiida.net), a robust open-source high-throughput infrastructure addressing the challenges arising from the needs of automated workflow management and data provenance recording. Here, we introduce developments and capabilities required to reach sustained performance, with AiiDA supporting throughputs of tens of thousands processes/hour, while automatically preserving and storing the full data provenance in a relational database making it queryable and traversable, thus enabling high-performance data analytics. AiiDA's workflow language provides advanced automation, error handling features and a flexible plugin model to allow interfacing with external simulation software. The associated plugin registry enables seamless sharing of extensions, empowering a vibrant user community dedicated to making simulations more robust, user-friendly and reproducible.
Identifiants
pubmed: 32901044
doi: 10.1038/s41597-020-00638-4
pii: 10.1038/s41597-020-00638-4
pmc: PMC7479590
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
300Références
Ioannidis, J. P. A. et al. Repeatability of published microarray gene expression analyses. Nat. Genet. 41, 149–155, https://doi.org/10.1038/ng.295 (2009).
doi: 10.1038/ng.295
pubmed: 19174838
Peng, R. D. Reproducible research in computational science. Sci. 334, 1226–1227, https://doi.org/10.1126/science.1213847 (2011).
doi: 10.1126/science.1213847
Stoddart, C. Is there a reproducibility crisis in science? Nat., https://doi.org/10.1038/d41586-019-00067-3 (2016).
Allison, D. B., Brown, A. W., George, B. J. & Kaiser, K. A. Reproducibility: A tragedy of errors. Nat. 530, 27–29, https://doi.org/10.1038/530027a (2016).
doi: 10.1038/530027a
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, https://doi.org/10.1038/sdata.2016.18 (2016).
Goble, C. et al. FAIR computational workflows. Data Intell. 2, 108–121, https://doi.org/10.1162/dint_a_00033 (2020).
doi: 10.1162/dint_a_00033
Pizzi, G., Cepellotti, A., Sabatini, R., Marzari, N. & Kozinsky, B. AiiDA: automated interactive infrastructure and database for computational science. Comput. Mater. Sci. 111, 218–230, https://doi.org/10.1016/j.commatsci.2015.09.013 (2016).
doi: 10.1016/j.commatsci.2015.09.013
Jain, A. et al. FireWorks: a dynamic workflow system designed for high-throughput applications. Concurr. Comput. Pract. Exp. 27, 5037–5059, https://doi.org/10.1002/cpe.3505 (2015).
doi: 10.1002/cpe.3505
Curtarolo, S. et al. AFLOW: An automatic framework for high-throughput materials discovery. Comput. Mater. Sci. 58, 218–226, https://doi.org/10.1016/j.commatsci.2012.02.005 (2012).
doi: 10.1016/j.commatsci.2012.02.005
Larsen, A. H. et al. The atomic simulation environment—a python library for working with atoms. J. Physics: Condens. Matter 29, 273002, https://doi.org/10.1088/1361-648x/aa680e (2017).
doi: 10.1088/1361-648x/aa680e
Maffioletti, S. & Murri, R. GC3pie: A python framework for high-throughput computing. In Proceedings of EGI Community Forum 2012/EMI Second Technical Conference — PoS(EGICF12-EMITC2), https://doi.org/10.22323/1.162.0143 (Sissa Medialab, 2012).
Adorf, C. S., Dodd, P. M., Ramasubramani, V. & Glotzer, S. C. Simple data and workflow management with the signac framework. Comput. Mater. Sci. 146, 220–229, https://doi.org/10.1016/j.commatsci.2018.01.035 (2018).
doi: 10.1016/j.commatsci.2018.01.035
Babuji, Y. et al. Parsl. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC 2019, https://doi.org/10.1145/3307681.3325400 (ACM Press, 2019).
Mounet, N. et al. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds. Nat. Nanotechnol. 13, 246–252, https://doi.org/10.1038/s41565-017-0035-5 (2018).
doi: 10.1038/s41565-017-0035-5
Kahle, L., Marcolongo, A. & Marzari, N. High-throughput computational screening for solid-state Li-ion conductors. Energy & Environ. Sci. 13, 928–948, https://doi.org/10.1039/c9ee02457c (2020).
doi: 10.1039/c9ee02457c
Mercado, R. et al. In silico design of 2d and 3d covalent organic frameworks for methane storage applications. Chem. Mater. 30, 5069–5086, https://doi.org/10.1021/acs.chemmater.8b01425 (2018).
doi: 10.1021/acs.chemmater.8b01425
Prandini, G., Marrazzo, A., Castelli, I. E., Mounet, N. & Marzari, N. Precision and efficiency in solid-state pseudopotential calculations. npj Comput. Mater. 4, https://doi.org/10.1038/s41524-018-0127-2 (2018).
Vitale, V. et al. Automated high-throughput Wannierisation. npj. Comput. Mater. 6, 66, https://doi.org/10.1038/s41524-020-0312-y (2020).
Talirz, L. et al. Materials cloud, a platform for open computational science. Sci. Data. https://doi.org/10.1038/s41597-020-00637-5 (2020).
Uhrin, M., Huber, S. P., Yu, J., Marzari, N. & Pizzi, G. Workflows in AiiDA: Engineering a high-throughput, event-based engine for robust and modular computational workflows. Preprint at https://arxiv.org/abs/2007.10312 (2020).
Giannozzi, P. et al. QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. J. Physics: Condens. Matter 21, 395502, https://doi.org/10.1088/0953-8984/21/39/395502 (2009).
doi: 10.1088/0953-8984/21/39/395502
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186, https://doi.org/10.1103/physrevb.54.11169 (1996).
doi: 10.1103/physrevb.54.11169
Ullmann, J. R. An algorithm for subgraph isomorphism. J. ACM (JACM) 23, 31–42, https://doi.org/10.1145/321921.321925 (1976).
doi: 10.1145/321921.321925
Gražulis, S. et al. Crystallography open database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Res. 40, D420–D427, https://doi.org/10.1093/nar/gkr900 (2011).
doi: 10.1093/nar/gkr900
pubmed: 22070882
pmcid: 3245043
Gražulis, S. et al. Launching the theoretical crystallography open database. Acta Crystallogr. Sect. A Foundations Adv. 70, C1736–C1736, https://doi.org/10.1107/s2053273314082631 (2014).
doi: 10.1107/s2053273314082631
Blokhin, E. & Villars, P. The PAULING FILE project and materials platform for data science: From big data toward materials genome. In Handbook of Materials Modeling, 1–26, https://doi.org/10.1007/978-3-319-42913-7_62-1 (Springer International Publishing, 2018).
Jain, A. et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002, https://doi.org/10.1063/1.4812323 (2013).
doi: 10.1063/1.4812323
Draxl, C. & Scheffler, M. The NOMAD laboratory: from data sharing to artificial intelligence. J. Physics: Mater. 2, 036001, https://doi.org/10.1088/2515-7639/ab13bb (2019).
doi: 10.1088/2515-7639/ab13bb
Kirklin, S. et al. The open quantum materials database (OQMD): assessing the accuracy of DFT formation energies. npj Comput. Mater. 1, https://doi.org/10.1038/npjcompumats.2015.10 (2015).
Duvall, P., Matyas, S. M. & Glover, A. Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series) (Addison-Wesley Professional, 2007).
Ong, S. P. et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319, https://doi.org/10.1016/j.commatsci.2012.10.028 (2013).
doi: 10.1016/j.commatsci.2012.10.028
Togo, A. & Tanaka, I. Spglib: a software library for crystal symmetry search. Preprint at https://arxiv.org/abs/1808.01590 (2018).
Hinuma, Y., Pizzi, G., Kumagai, Y., Oba, F. & Tanaka, I. Band structure diagram paths based on crystallography. Comput. Mater. Sci. 128, 140–184, https://doi.org/10.1016/j.commatsci.2016.10.015 (2017).
doi: 10.1016/j.commatsci.2016.10.015
Belsky, A., Hellenbrandt, M., Karen, V. L. & Luksch, P. New developments in the inorganic crystal structure database (ICSD): accessibility in support of materials research and design. Acta Crystallogr. Sect. B Struct. Sci. 58, 364–369, https://doi.org/10.1107/s0108768102006948 (2002).
doi: 10.1107/s0108768102006948
Merkys, A. et al. A posteriori metadata from automated provenance tracking: integration of AiiDA and TCOD. J. Cheminformatics 9, 56–67, https://doi.org/10.1186/s13321-017-0242-y (2017).
doi: 10.1186/s13321-017-0242-y
Gröning, O. et al. Engineering of robust topological quantum phases in graphene nanoribbons. Nat. 560, 209–213, https://doi.org/10.1038/s41586-018-0375-9 (2018).
doi: 10.1038/s41586-018-0375-9
Atambo, M. O. et al. Electronic and optical properties of doped TiO2 by many-body perturbation theory. Phys. Rev. Mater. 3, https://doi.org/10.1103/physrevmaterials.3.045401 (2019).
Wang, S. et al. On-surface synthesis and characterization of individual polyacetylene chains. Nat. Chem. 11, 924–930, https://doi.org/10.1038/s41557-019-0316-8 (2019).
doi: 10.1038/s41557-019-0316-8
pubmed: 31477850
Mishra, S. et al. Topological frustration induces unconventional magnetism in a nanographene. Nat. Nanotechnol. 15, 22–28, https://doi.org/10.1038/s41565-019-0577-9 (2019).
doi: 10.1038/s41565-019-0577-9
pubmed: 31819244
Li, W. et al. Interface engineered room-temperature ferromagnetic insulating state in ultrathin manganite films. Adv. Sci. 7, 1901606, https://doi.org/10.1002/advs.201901606 (2019).
doi: 10.1002/advs.201901606
Abbott, D. F. et al. Design and synthesis of Ir/Ru pyrochlore catalysts for the oxygen evolution reaction based on their bulk thermodynamic properties. ACS Appl. Mater. & Interfaces 11, 37748–37760, https://doi.org/10.1021/acsami.9b13220 (2019).
doi: 10.1021/acsami.9b13220
Mateo, L. M. et al. On-surface synthesis and characterization of triply fused porphyrin–graphene nanoribbon hybrids. Angewandte Chemie Int. Ed. 59, 1334–1339, https://doi.org/10.1002/anie.201913024 (2020).
doi: 10.1002/anie.201913024
Stamminger, A. R., Ziebarth, B., Mrovec, M., Hammerschmidt, T. & Drautz, R. Fast diffusion mechanism in Li4P2S6 via a concerted process of interstitial li ions. RSC Adv. 10, 10715–10722, https://doi.org/10.1039/d0ra00932f (2020).
doi: 10.1039/d0ra00932f
Mohr, S. et al. Accurate and efficient linear scaling DFT calculations with universal applicability. Phys. Chem. Chem. Phys. 17, 31360–31370, https://doi.org/10.1039/c5cp00437c (2015).
doi: 10.1039/c5cp00437c
pubmed: 25958954
Clark, S. J. et al. First principles methods using CASTEP. Zeitschrift für Kristallographie - Cryst. Mater. 220, https://doi.org/10.1524/zkri.220.5.567.65075 (2005).
Hutter, J., Iannuzzi, M., Schiffmann, F. & VandeVondele, J. cp2k: atomistic simulations of condensed matter systems. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 15–25, https://doi.org/10.1002/wcms.1159 (2013).
doi: 10.1002/wcms.1159
Dovesi, R. et al. Quantum-mechanical condensed matter simulations with CRYSTAL. Wiley Interdiscip. Rev. Comput. Mol. Sci. 8, e1360, https://doi.org/10.1002/wcms.1360 (2018).
doi: 10.1002/wcms.1360
Frisch, M. J. et al. Gaussian~16 Revision C.01 (2016). Gaussian Inc. Wallingford CT.
Gale, J. D. GULP: A computer program for the symmetry-adapted simulation of solids. J. Chem. Soc. Faraday Transactions 93, 629–637, https://doi.org/10.1039/a606455h (1997).
doi: 10.1039/a606455h
Togo, A. & Tanaka, I. First principles phonon calculations in materials science. Scripta Materialia 108, 1–5, https://doi.org/10.1016/j.scriptamat.2015.07.021 (2015).
doi: 10.1016/j.scriptamat.2015.07.021
Dubbeldam, D., Calero, S., Ellis, D. E. & Snurr, R. Q. RASPA: molecular simulation software for adsorption and diffusion in flexible nanoporous materials. Mol. Simul. 42, 81–101, https://doi.org/10.1080/08927022.2015.1010082 (2015).
doi: 10.1080/08927022.2015.1010082
Soler, J. M. et al. The SIESTA method for ab initio order-n materials simulation. J. Physics: Condens. Matter 14, 2745–2779, https://doi.org/10.1088/0953-8984/14/11/302 (2002).
doi: 10.1088/0953-8984/14/11/302
Pizzi, G. et al. Wannier90 as a community code: new features and applications. J. Physics: Condens. Matter 32, 165902, https://doi.org/10.1088/1361-648x/ab51ff (2020).
doi: 10.1088/1361-648x/ab51ff
Sangalli, D. et al. Many-body perturbation theory calculations using the yambo code. J. Physics: Condens. Matter 31, 325902, https://doi.org/10.1088/1361-648x/ab15d0 (2019).
doi: 10.1088/1361-648x/ab15d0
Mounet, N. et al. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds. Materials Cloud https://doi.org/10.24435/materialscloud:2017.0008/v3 (2018).
Huber, S. P. et al. AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance. Materials Cloud https://doi.org/10.24435/materialscloud:2020.0027/V1 (2020).