CPIExtract: A software package to collect and harmonize small molecule and protein interactions.
Journal
bioRxiv : the preprint server for biology
ISSN: 2692-8205
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187
Informations de publication
Date de publication:
05 Jul 2024
05 Jul 2024
Historique:
medline:
15
7
2024
pubmed:
15
7
2024
entrez:
15
7
2024
Statut:
epublish
Résumé
The binding interactions between small molecules and proteins are the basis of cellular functions. Yet, experimental data available regarding compound-protein interaction is not harmonized into a single entity but rather scattered across multiple institutions, each maintaining databases with different formats. Extracting information from these multiple sources remains challenging due to data heterogeneity. Here, we present CPIExtract (Compound-Protein Interaction Extract), a tool to interactively extract experimental binding interaction data from multiple databases, perform filtering, and harmonize the resulting information, thus providing a gain of compound-protein interaction data. When compared to a single source, DrugBank, we show that it can collect more than 10 times the amount of annotations. The end-user can apply custom filtering to the aggregated output data and save it in any generic tabular file suitable for further downstream tasks such as network medicine analyses for drug repurposing and cross-validation of deep learning models. CPIExtract is an open-source Python package under an MIT license. CPIExtract can be downloaded from https://github.com/menicgiulia/CPIExtract and https://pypi.org/project/cpiextract . The package can run on any standard desktop computer or computing cluster.
Identifiants
pubmed: 39005430
doi: 10.1101/2024.07.03.601957
pmc: PMC11245042
pii:
doi:
Types de publication
Journal Article
Preprint
Langues
eng