Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport (PILOT).
Clustering
Disease Progression
Multi-scale Analysis
Optimal Transport
Wasserstein Distance
Journal
Molecular systems biology
ISSN: 1744-4292
Titre abrégé: Mol Syst Biol
Pays: England
ID NLM: 101235389
Informations de publication
Date de publication:
19 Dec 2023
19 Dec 2023
Historique:
received:
24
08
2023
accepted:
24
11
2023
revised:
20
11
2023
medline:
5
1
2024
pubmed:
5
1
2024
entrez:
4
1
2024
Statut:
aheadofprint
Résumé
Although clinical applications represent the next challenge in single-cell genomics and digital pathology, we still lack computational methods to analyze single-cell or pathomics data to find sample-level trajectories or clusters associated with diseases. This remains challenging as single-cell/pathomics data are multi-scale, i.e., a sample is represented by clusters of cells/structures, and samples cannot be easily compared with each other. Here we propose PatIent Level analysis with Optimal Transport (PILOT). PILOT uses optimal transport to compute the Wasserstein distance between two individual single-cell samples. This allows us to perform unsupervised analysis at the sample level and uncover trajectories or cellular clusters associated with disease progression. We evaluate PILOT and competing approaches in single-cell genomics or pathomics studies involving various human diseases with up to 600 samples/patients and millions of cells or tissue structures. Our results demonstrate that PILOT detects disease-associated samples from large and complex single-cell or pathomics data. Moreover, PILOT provides a statistical approach to find changes in cell populations, gene expression, and tissue structures related to the trajectories or clusters supporting interpretation of predictions.
Identifiants
pubmed: 38177382
doi: 10.1038/s44320-023-00003-8
pii: 10.1038/s44320-023-00003-8
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Deutsche Forschungsgemeinschaft (DFG)
ID : KFO 5011
Organisme : Deutsche Forschungsgemeinschaft (DFG)
ID : IDs 322900939,454024652,432698239 445703531
Organisme : Deutsche Forschungsgemeinschaft (DFG)
ID : DFG-GE2811/3
Organisme : Bundesministerium für Bildung und Forschung (BMBF)
ID : E:med Consortia Fibromap
Organisme : Bundesministerium für Bildung und Forschung (BMBF)
ID : STOP-FSGS-01GM2202C
Organisme : EC | ERC | HORIZON EUROPE European Research Council (ERC)
ID : No 101001791
Informations de copyright
© 2023. The Author(s).
Références
Albergante L, Mirkes E, Bac J, Chen H, Martin A, Faure L, Barillot E, Pinello L, Gorban A, Zinovyev A (2020) Robust and scalable learning of complex intrinsic dataset geometry via ElPiGraph. Entropy 3:296
doi: 10.3390/e22030296
Baghy K, Dezso K, László V, Fullár A, Péterfia B, Paku S, Nagy P, Schaff Z, Iozzo RV, Kovalszky I (2011) Ablation of the decorin gene enhances experimental hepatic fibrosis and impairs hepatic healing in mice. Lab Invest 3:439–451
doi: 10.1038/labinvest.2010.172
Bonneel N, Van De Panne M, Paris S, Heidrich W (2011) Displacement interpolation using Lagrangian mass transport. In: Proceedings of the 2011 SIGGRAPH Asia conference, pp 1–12
Bülow RD, Hölscher DL, Costa IG, Boor P (2023) Extending the landscape of omics technologies by pathomics. npj Syst Biol Appl 1:38
doi: 10.1038/s41540-023-00301-9
Berry T, Harlim J (2016) Variable bandwidth diffusion kernels. Appl Comput Harmon Anal 1:68–96
doi: 10.1016/j.acha.2015.01.001
Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc Natl Acad Sci USA 21:7426–7431
doi: 10.1073/pnas.0500334102
Cao J, O’Day DR, Pliner HA, Kingsley PD, Deng M, Daza RM, Zager MA, Aldinger KA, Blecher-Gonen R, Zhang F (2020) A human cell atlas of fetal gene expression. Science 6518:eaba7721
doi: 10.1126/science.aba7721
Cain A, Taga M, McCabe C, Green GS, Hekselman I, White CC, Lee DI, Gaur P, Rozenblatt-Rosen O, Zhang F et al (2023) Multicellular communities are perturbed in the aging human brain and Alzheimer’s disease. Nat Neurosci 26:1267–1280
Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 1:5–30
doi: 10.1016/j.acha.2006.04.006
Coppo R, Troyanov S, Bellur S, Cattran D, Cook HT, Feehally J, Roberts ISD, Morando L, Camilla R, Tesar V (2014) Validation of the Oxford classification of IgA nephropathy in cohorts with different presentations and treatments. Kidney Int 4:828–836
doi: 10.1038/ki.2014.63
Chen WS, Zivanovic N, van DD, Wolf G, Bodenmiller B, Krishnaswamy S (2020) Uncovering axes of variation among single-cell cancer specimens. Nat Methods 3:302–310
doi: 10.1038/s41592-019-0689-z
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Flamary R, Courty N, Gramfort A, Alaya MZ, Boisbunon A, Chambon S, Chapel L, Corenflos A, Fatras K (2021) POT: python optimal transport. J Mach Learn Res 78:1–8
Flores, ROR, Lanzer JD, Dimitrov D, Velten B, Saez-Rodruiguez J (2023) Multicellular factor analysis of single-cell data for a tissue-centric understanding of disease. eLife 12:e93161. https://doi.org/10.7554/eLife.93161
Hie B, Bryson B, Berger B (2019) Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat Biotechnol 6:685–691
doi: 10.1038/s41587-019-0113-3
Hölscher DL, Bouteldja N, Joodaki M, Russo ML, Lan YC, Sadr AV, Cheng M, Tesar V, Stillfried SV, Klinkhammer BM (2023) Next-Generation Morphometry for pathomics-data mining in histopathology. Nat Commun 1:470
doi: 10.1038/s41467-023-36173-0
Han G, Deng Q, Marques-Piubelli ML, Dai E, Dang M, Ma MCJ, Li X, Yang H, Henderson J, Kudryashova O (2022) Follicular lymphoma microenvironment characteristics associated with tumor cell mutations and MHC class II expression. Blood Cancer Discov 5:428–443
doi: 10.1158/2643-3230.BCD-21-0075
Hrovatin K, Bastidas-Ponce A, Bakhti M, Zappia L, Buttner M, Sallino C, Sterr M, Bottcher A, Migliorini A, Lickert H et al (2022) Delineating mouse β-cell identity during lifetime and in diabetes with a single cell atlas. Nature Metabolism 5:1615–1637. https://doi.org/10.1038/s42255-023-00876-x
Hill KE, Lovett BM, Schwarzbauer JE (2022) Heparan sulfate is necessary for the early formation of nascent fibronectin and collagen I fibrils at matrix assembly sites. J Biol Chem 298(1):101479. https://doi.org/10.1016/j.jbc.2021.101479
Huber PJ (1965) A robust version of the probability ratio test. Ann Math Stat 36:1753–1758
Huber PJ (1992) Robust estimation of a location parameter. In: Breakthroughs in statistics, pp 492–518
Hershberger RE, Norton N, Morales A, Li D, Siegfried JD, Gonzalez-Quintana J (2010) Coding sequence rare variants identified in MYBPC3, MYH6, TPM1, TNNC1, and TNNI3 from 312 patients with familial or idiopathic dilated cardiomyopathy. Circ Cardiovasc Genet 2:155–161
doi: 10.1161/CIRCGENETICS.109.912345
Hao Y, Hao S, Andersen-Nissen E, Mauck IIIWM, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M (2021) Integrated analysis of multimodal single-cell data. Cell 13:3573–3587
doi: 10.1016/j.cell.2021.04.048
Harrell EF (2001) Regression modeling strategies. Springer-Verlag, Berlin, Heidelberg
Isaka Y, Brees DK, Ikegaya K, Kaneda Y, Imai E, Noble NA, Border WA (1996) Gene therapy by skeletal muscle expression of decorin prevents fibrotic disease in rat kidney. Nat Med 2:418–423
Jiang J, Burgon PG, Wakimoto H, Onoue K, Gorham JM, O’Meara CC, Fomovsky G, McConnell BK, Lee RT, Seidman JG (2015) Cardiac myosin binding protein C regulates postnatal myocyte cytokinesis. Proc Natl Acad Sci USA 29:9046–9051
doi: 10.1073/pnas.1511004112
Kuppe C, Ramirez FloresRO, Li Z, Hannani M, Tanevski J, Halder M, Cheng M, Ziegler S, Zhang X, Preisker F (2020) Spatial multi-omic map of human myocardial infarction. Nature 6987:766–777
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, Raychaudhuri S (2019) Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 12:1289–1296
doi: 10.1038/s41592-019-0619-0
Kuchroo M, Huang J, Wong P, Grenier JC, Shung D, Tong A, Lucas C, Klein J, Burkhardt DB, Gigante S (2022) Multiscale PHATE identifies multimodal signatures of COVID-19. Nat Biotechnol 5:681–691
doi: 10.1038/s41587-021-01186-x
Lublin FD, Reingold SC (1996) Defining the clinical course of multiple sclerosis: results of an international survey. Neurology 4:907–911
doi: 10.1212/WNL.46.4.907
Lake BB, Menon R, Winfree S, Hu Q, Ferreira RM, Kalhor K, Barwinska D, Otto EA, Ferkowicz M, Diep D et al (2023) An atlas of healthy and injured cell states and niches in the human kidney. Nature 619:585–594. https://doi.org/10.1038/s41586-023-05769-3
Liu J, Vinck M (2022) Improved visualization of high-dimensional data using the distance-of-distance transformation. PLoS Comput Biol 12:e1010764
doi: 10.1371/journal.pcbi.1010764
Lamber EP, Guicheney P, Pinotsis N (2022) The role of the M-band myomesin proteins in muscle integrity and cardiac disease. J Biomed Sci 1:18
doi: 10.1186/s12929-022-00801-6
Moon KR, van DD, Wang Z, Gigante S, Burkhardt DB, Chen WS, Yim K, van denElzenA, Hirn MJ, Coifman RR, Ivanova NB, Wolf G, Krishnaswamy S (2019) Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol 12:1482–1492
doi: 10.1038/s41587-019-0336-3
Marx V (2022) How single-cell multi-omics builds relationships. Nat Methods 2:142–146
doi: 10.1038/s41592-022-01392-8
Perez RK, Gordon MG, Subramaniam M, Kim MC, Hartoularos GC, Targ S, Sun Y, Ogorodnikov A, Bueno R, Lu A (2022) Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus. Science 6589:eabf1970
doi: 10.1126/science.abf1970
Peyré G, Cuturi M (2019) Computational optimal transport. Found Trend Mach Learn 5-6:1–257
Peng J, Sun B-F, Chen C-Y, Zhou J-Y, Chen Y-S, Chen H, Liu L, Huang D, Jiang J, Cui G-S (2019) Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res 9:725–738
doi: 10.1038/s41422-019-0195-y
Polanski K, Young MD, Miao Z, Meyer KB, Teichmann SA, Park JE (2020) BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics 3:964–965
doi: 10.1093/bioinformatics/btz625
Ravindra N, Sehanobish A, Pappalardo JL, Hafler DA, van Dijk D (2020) Disease state prediction from single-cell data using graph attention networks. In: Proceedings of the ACM conference on health, inference, and learning, pp 121–130
Reimand, J, Kull, M, Peterson, H, Hansen, J, Vilo, J (2007) g: Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res (Suppl 2) W193–W200
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 7:e47
doi: 10.1093/nar/gkv007
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 336:846–850
doi: 10.1080/01621459.1971.10482356
Ren X, Wen W, Fan X, Hou W, Su B, Cai P, Li J, Liu Y, Tang F, Zhang F (2021) COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas. Cell 7:1895–1913
doi: 10.1016/j.cell.2021.01.053
Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 2:99–121
doi: 10.1023/A:1026543900054
Sikkema L, Ramírez-Suástegui C, Strobl DC, Gillett TE, Zappia L, Madissoon E, Markov NS, Zaragosi L-E, Ji Y, Ansari M (2023) An integrated cell atlas of the lung in health and disease. Nat Med 6:1563–1577
doi: 10.1038/s41591-023-02327-2
Sklavenitis-Pistofidis R, Getz G, Ghobrial I (2021) Single-cell RNA sequencing: one step closer to the clinic. Nat Med 3:375–376
doi: 10.1038/s41591-021-01276-y
Stephenson E, Reynolds G, Botting RA, Calero-Nieto FJ, Morgan MD, Tuong ZK, Bach K, Sungnak W, Worlock KB, Yoshida M (2021) Single-cell multi-omics analysis of the immune response in COVID-19. Nat Med 5:904–916
doi: 10.1038/s41591-021-01329-2
Salcher S, Sturm G, Horvath L, Untergasser G, Kuempers C, Fotakis G, Panizzolo E, Martowicz A, Trebo M, Pall G (2022) High-resolution single-cell atlas reveals diversity and plasticity of tissue-resident neutrophils in non-small cell lung cancer. Cancer Cell 12:1503–1520
doi: 10.1016/j.ccell.2022.10.008
Shah VM, Sheppard BC, Sears RC, Alani AWG (2020) Hypoxia: friend or foe for drug delivery in pancreatic cancer. Cancer Lett 1:63–70
doi: 10.1016/j.canlet.2020.07.041
Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL (2014) The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 4:381–386
doi: 10.1038/nbt.2859
Taniguchi K, Takeya R, Suetsugu S, Kan-o M, Narusawa M, Shiose A, Tominaga R, Sumimoto H (2009) Mammalian formin Fhod3 regulates actin assembly and sarcomere organization in striated muscles. J Biol Chem 43:29873–29881
doi: 10.1074/jbc.M109.059303
Tabula Sapiens Consortium, Jones RC, Karkanias J, Krasnow MA, Pisco AO, Quake SR, Salzman J, Yosef N, Bulthaup B, Brown P (2022) The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 6594:eabl4896
doi: 10.1126/science.abl4896
Traag VA, Waltman L, Van EckNJ (2019) From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 1:5233
doi: 10.1038/s41598-019-41695-z
Van den Berge K, Roux de Bézieux H, Street K, Saelens W, Cannoodt R, Saeys Y, Dudoit S, Clement L (2020) Trajectory-based differential expression analysis for single-cell sequencing data. Nat Commun 11:1201
Witten DM (2011) Classification and clustering of sequencing data using a Poisson model. Ann Appl Stat 5:2493–2518
Zhang Q, Wang L, Wang S, Cheng H, Xu L, Pei G, Wang Y, Fu C, Jiang Y, He C, Wei Q (2022) Signaling pathways and targeted therapy for myocardial infarction. Signal Transduct Target Ther 1:78
doi: 10.1038/s41392-022-00925-z