MatKG: An autonomously generated knowledge graph in Material Science.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
17 Feb 2024
17 Feb 2024
Historique:
received:
21
11
2023
accepted:
01
02
2024
medline:
19
2
2024
pubmed:
18
2
2024
entrez:
17
2
2024
Statut:
epublish
Résumé
In this paper, we present MatKG, a knowledge graph in materials science that offers a repository of entities and relationships extracted from scientific literature. Using advanced natural language processing techniques, MatKG includes an array of entities, including materials, properties, applications, characterization and synthesis methods, descriptors, and symmetry phase labels. The graph is formulated based on statistical metrics, encompassing over 70,000 entities and 5.4 million unique triples. To enhance accessibility and utility, we have serialized MatKG in both CSV and RDF formats and made these, along with the code base, available to the research community. As the largest knowledge graph in materials science to date, MatKG provides structured organization of domain-specific data. Its deployment holds promise for various applications, including material discovery, recommendation systems, and advanced analytics.
Identifiants
pubmed: 38368452
doi: 10.1038/s41597-024-03039-z
pii: 10.1038/s41597-024-03039-z
pmc: PMC10874416
doi:
Types de publication
Dataset
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
217Informations de copyright
© 2024. The Author(s).
Références
Ward, C., Warren, J. & Hanisch, R. Making materials science and engineering data more valuable research products. Integrating Materials and Manufacturing Innovation 3, 22, https://doi.org/10.1186/s40192-014-0022-8 (2014).
doi: 10.1186/s40192-014-0022-8
Venugopal, V. et al. Looking through glass: Knowledge discovery from materials science literature using natural language processing. Patterns 2, 100290 (2021).
doi: 10.1016/j.patter.2021.100290
pubmed: 34286304
pmcid: 8276010
Venugopal, V., Broderick, S. R. & Rajan, K. A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map. MRS Communications 9, 1134–1141 (2019).
doi: 10.1557/mrc.2019.136
Court, C. J. & Cole, J. M. Auto-generated materials database of curie and néel temperatures via semi-supervised relationship extraction. Scientific data 5, 1–12 (2018).
doi: 10.1038/sdata.2018.111
White, A. The materials genome initiative: One year on. Mrs Bulletin 37, 715–716 (2012).
doi: 10.1557/mrs.2012.194
Khan, A. A., Laghari, A. A. & Awan, S. A. Machine learning in computer vision: a review. EAI Endorsed Transactions on Scalable Information Systems 8, e4–e4 (2021).
Danilevsky, M. et al. A survey of the state of explainable ai for natural language processing. arXiv preprint arXiv:2010.00711 (2020).
Van Roy, V., Vertesy, D. & Damioli, G. Ai and robotics innovation. Handbook of labor, human resources and population economics 1–35 (2020).
Walker, N. et al. The impact of domain-specific pre-training on named entity recognition tasks in materials science. Available at SSRN 3950755 (2021).
Auer, S. et al. Dbpedia: A nucleus for a web of open data. In The semantic web, 722–735 (Springer, 2007).
Trending, W. T. I. Chatgpt or google scholar? (2023).
Jain, A. et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL materials 1, 011002 (2013).
doi: 10.1063/1.4812323
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (oqmd). Jom 65, 1501–1509 (2013).
doi: 10.1007/s11837-013-0755-4
Draxl, C. & Scheffler, M. The nomad laboratory: from data sharing to artificial intelligence. Journal of Physics: Materials 2, 036001 (2019).
Hogan, A. et al. Knowledge graphs. ACM Computing Surveys (Csur) 54, 1–37 (2021).
doi: 10.1145/3447772
Uyar, A. & Aliyu, F. M. Evaluating search features of google knowledge graph and bing satori: entity types, list searches and query interfaces. Online Information Review 39, 197–213 (2015).
doi: 10.1108/OIR-10-2014-0257
Noy, N. et al. Industry-scale knowledge graphs: Lessons and challenges: Five diverse technology companies show how it’s done. Queue 17, 48–75 (2019).
doi: 10.1145/3329781.3332266
Cook-Gallardo, J., Ma, W., Terwilliger, S. & Zhou, R. Replication of a knowledge graph recommendation system. (2020).
Bachman, J. A., Gyori, B. M. & Sorger, P. K. Automated assembly of molecular mechanisms at scale from text mining and curated databases. Molecular Systems Biology 19, e11325 (2023).
doi: 10.15252/msb.202211325
pubmed: 36938926
pmcid: 10167483
Cheng, D., Yang, F., Wang, X., Zhang, Y. & Zhang, L. Knowledge graph-based event embedding framework for financial quantitative investments. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2221–2230 (2020).
Zou, X. A survey on application of knowledge graph. In Journal of Physics: Conference Series, vol. 1487, 012016 (IOP Publishing, 2020).
Statt, M. J. et al. The materials experiment knowledge graph. Digital Discovery 2, 909–914 (2023).
doi: 10.1039/D3DD00067B
Blokhin, E. & Villars, P. The pauling file project and materials platform for data science: From big data toward materials genome. Handbook of Materials Modeling: Methods: Theory and Modeling 1837–1861 (2020).
Mrdjenovich, D. et al. Propnet: a knowledge graph for materials science. Matter 2, 464–480 (2020).
doi: 10.1016/j.matt.2019.11.013
Borysov, S. S., Geilhufe, R. M. & Balatsky, A. V. Organic materials database: An open-access online database for data mining. PloS one 12, e0171501 (2017).
doi: 10.1371/journal.pone.0171501
pubmed: 28182744
pmcid: 5300202
An, Y. et al. Building open knowledge graph for metal-organic frameworks (mof-kg): Challenges and case studies. arXiv preprint arXiv:2207.04502 (2022).
McCusker, J. P. et al. Nanomine: A knowledge graph for nanocomposite materials science. In International Semantic Web Conference, 144–159 (Springer, 2020).
Kim, E., Huang, K., Jegelka, S. & Olivetti, E. Virtual screening of inorganic materials synthesis parameters with deep learning. npj Computational Materials 3, 1–9 (2017).
doi: 10.1038/s41524-017-0055-6
Kim, E. Article downloader. https://github.com/olivettigroup/article-downloader (2017).
Weston, L. et al. Named entity recognition and normalization applied to large-scale information extraction from the materials science literature. Journal of chemical information and modeling 59, 3692–3702 (2019).
doi: 10.1021/acs.jcim.9b00470
pubmed: 31361962
Trewartha, A. et al. Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science. Patterns 3 (2022).
Gupta, T., Zaki, M. & Krishnan, N. A. & Mausam. Matscibert: A materials domain language model for text mining and information extraction. npj Computational Materials 8, 102 (2022).
doi: 10.1038/s41524-022-00784-w
Cegin, J., Simko, J. & Brusilovsky, P. Chatgpt to replace crowdsourcing of paraphrases for intent classification: Higher diversity and comparable model robustness. arXiv preprint arXiv:2305.12947 (2023).
Venugopal, V. Matkg. https://github.com/olivettigroup/MatKG (2023).
Banerjee, A. et al. Catalytic activities of Fe2O
doi: 10.1016/j.apcatb.2012.07.030
Pan, J. Z. Resource description framework. In Handbook on ontologies, 71–90 (Springer, 2009).
Angles, R. The property graph database model. In AMW (2018).
McGuinness, D. L. et al. Owl web ontology language overview. W3C recommendation 10, 2004 (2004).
Sporny, M., Longley, D., Kellogg, G., Lanthaler, M. & Lindström, N. Json-ld 1.1. W3C Recommendation, Jul (2020).
Venugopal, V. & Olivetti, E. Matkg 1.4., Zenodo, https://doi.org/10.5281/zenodo.10144972 (2023).
Berners-Lee, T., Fielding, R. & Masinter, L. Uniform resource identifier (uri): Generic syntax. Tech. Rep. (2005).
Pérez, J., Arenas, M. & Gutierrez, C. Semantics and complexity of sparql. ACM Transactions on Database Systems (TODS) 34, 1–45 (2009).
doi: 10.1145/1567274.1567278
Krech, D. et al. RDFLib, Zenodo, https://doi.org/10.5281/zenodo.6845245 (2023).
Ong, S. P. et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science 68, 314–319, https://doi.org/10.1016/j.commatsci.2012.10.028 (2013).
doi: 10.1016/j.commatsci.2012.10.028