Scalable search of massively pooled nucleic acid samples enabled by a molecular database query language.


Journal

medRxiv : the preprint server for health sciences
Titre abrégé: medRxiv
Pays: United States
ID NLM: 101767986

Informations de publication

Date de publication:
15 Apr 2024
Historique:
medline: 3 5 2024
pubmed: 3 5 2024
entrez: 3 5 2024
Statut: epublish

Résumé

The surge in nucleic acid analytics requires scalable storage and retrieval systems akin to electronic databases used to organize digital data. Such a system could transform disease diagnosis, ecological preservation, and molecular surveillance of biothreats. Current storage systems use individual containers for nucleic acid samples, requiring single-sample retrieval that falls short compared with digital databases that allow complex and combinatorial data retrieval on aggregated data. Here, we leverage protective microcapsules with combinatorial DNA labeling that enables arbitrary retrieval on pooled biosamples analogous to Structured Query Languages. Ninety-six encapsulated pooled mock SARS-CoV-2 genomic samples barcoded with patient metadata are used to demonstrate queries with simultaneous matches to sample collection date ranges, locations, and patient health statuses, illustrating how such flexible queries can be used to yield immunological or epidemiological insights. The approach applies to any biosample database labeled with orthogonal barcodes, enabling complex post-hoc analysis, for example, to study global biothreat epidemiology.

Identifiants

pubmed: 38699348
doi: 10.1101/2024.04.12.24305660
pmc: PMC11064994
pii:
doi:

Types de publication

Preprint

Langues

eng

Auteurs

Classifications MeSH