Uncovering the functional diversity of rare CRISPR-Cas systems with deep terascale clustering.
Journal
Science (New York, N.Y.)
ISSN: 1095-9203
Titre abrégé: Science
Pays: United States
ID NLM: 0404511
Informations de publication
Date de publication:
24 11 2023
24 11 2023
Historique:
medline:
27
11
2023
pubmed:
23
11
2023
entrez:
23
11
2023
Statut:
ppublish
Résumé
Microbial systems underpin many biotechnologies, including CRISPR, but the exponential growth of sequence databases makes it difficult to find previously unidentified systems. In this work, we develop the fast locality-sensitive hashing-based clustering (FLSHclust) algorithm, which performs deep clustering on massive datasets in linearithmic time. We incorporated FLSHclust into a CRISPR discovery pipeline and identified 188 previously unreported CRISPR-linked gene modules, revealing many additional biochemical functions coupled to adaptive immunity. We experimentally characterized three HNH nuclease-containing CRISPR systems, including the first type IV system with a specified interference mechanism, and engineered them for genome editing. We also identified and characterized a candidate type VII system, which we show acts on RNA. This work opens new avenues for harnessing CRISPR and for the broader exploration of the vast functional diversity of microbial proteins.
Identifiants
pubmed: 37995242
doi: 10.1126/science.adi1910
doi:
Substances chimiques
CRISPR-Associated Proteins
0
RNA, Guide, CRISPR-Cas Systems
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM