Code4ML: a large-scale dataset of annotated Machine Learning code.

Jupyter code snippets ML code dataset

Journal

PeerJ. Computer science
ISSN: 2376-5992
Titre abrégé: PeerJ Comput Sci
Pays: United States
ID NLM: 101660598

Informations de publication

Date de publication:
2023
Historique:
received: 05 10 2022
accepted: 09 01 2023
medline: 22 6 2023
pubmed: 22 6 2023
entrez: 22 6 2023
Statut: epublish

Résumé

The use of program code as a data source is increasingly expanding among data scientists. The purpose of the usage varies from the semantic classification of code to the automatic generation of programs. However, the machine learning model application is somewhat limited without annotating the code snippets. To address the lack of annotated datasets, we present the Code4ML

Identifiants

pubmed: 37346615
doi: 10.7717/peerj-cs.1230
pii: cs-1230
pmc: PMC10280557
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e1230

Informations de copyright

© 2023 Drozdova et al.

Déclaration de conflit d'intérêts

The authors declare that they have no competing interests.

Auteurs

Anastasia Drozdova (A)

Department of Computer Science, NRU Higher School of Economics, Moscow, Russia.

Ekaterina Trofimova (E)

Department of Computer Science, NRU Higher School of Economics, Moscow, Russia.

Polina Guseva (P)

Department of Computer Science, NRU Higher School of Economics, Moscow, Russia.

Anna Scherbakova (A)

Department of Computer Science, NRU Higher School of Economics, Moscow, Russia.

Andrey Ustyuzhanin (A)

Department of Computer Science, NRU Higher School of Economics, Moscow, Russia.
National University of Science and Technology MISIS, Moscow, Russia.
Constructor University, Bremen, Germany.
Institute for Functional Intelligent Materials, National University of Singapore, Singapore.

Classifications MeSH