A universal molecular control for DNA, mRNA and protein expression.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
20 Mar 2024
Historique:
received: 12 09 2022
accepted: 28 02 2024
medline: 21 3 2024
pubmed: 21 3 2024
entrez: 21 3 2024
Statut: epublish

Résumé

The expression of genes encompasses their transcription into mRNA followed by translation into protein. In recent years, next-generation sequencing and mass spectrometry methods have profiled DNA, RNA and protein abundance in cells. However, there are currently no reference standards that are compatible across these genomic, transcriptomic and proteomic methods, and provide an integrated measure of gene expression. Here, we use synthetic biology principles to engineer a multi-omics control, termed pREF, that can act as a universal molecular standard for next-generation sequencing and mass spectrometry methods. The pREF sequence encodes 21 synthetic genes that can be in vitro transcribed into spike-in mRNA controls, and in vitro translated to generate matched protein controls. The synthetic genes provide qualitative controls that can measure sensitivity and quantitative accuracy of DNA, RNA and peptide detection. We demonstrate the use of pREF in metagenome DNA sequencing and RNA sequencing experiments and evaluate the quantification of proteins using mass spectrometry. Unlike previous spike-in controls, pREF can be independently propagated and the synthetic mRNA and protein controls can be sustainably prepared by recipient laboratories using common molecular biology techniques. Together, this provides a universal synthetic standard able to integrate genomic, transcriptomic and proteomic methods.

Identifiants

pubmed: 38509097
doi: 10.1038/s41467-024-46456-9
pii: 10.1038/s41467-024-46456-9
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

2480

Informations de copyright

© 2024. The Author(s).

Références

Buccitelli, C. & Selbach, M. mRNAs, proteins and the emerging principles of gene expression control. Nat. Rev. Genet. 21, 630–644 (2020).
doi: 10.1038/s41576-020-0258-4 pubmed: 32709985
Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
doi: 10.1016/j.cell.2016.03.014 pubmed: 27104977
Bowden, R. et al. Sequencing of human genomes with nanopore technology. Nat. Commun. 10, 1869 (2019).
doi: 10.1038/s41467-019-09637-5 pubmed: 31015479 pmcid: 6478738
Treangen, T. J. & Salzberg, S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2012).
doi: 10.1038/nrg3117
Goldfeder, R. L. et al. Medical implications of technical accuracy in genome sequencing. Genome Med. 8, 24 (2016).
doi: 10.1186/s13073-016-0269-0 pubmed: 26932475 pmcid: 4774017
Shin, S. & Park, J. Characterization of sequence-specific errors in various next-generation sequencing systems. Mol. Biosyst. 12, 914–922 (2016).
doi: 10.1039/C5MB00750J pubmed: 26790373
Sanger, F. et al. Nucleotide sequence of bacteriophage φX174 DNA. nature 265, 687–695 (1977).
doi: 10.1038/265687a0 pubmed: 870828
Hardwick, S. A., Deveson, I. W. & Mercer, T. R. Reference standards for next-generation sequencing. Nat. Rev. Genet. 18, 473–484 (2017).
doi: 10.1038/nrg.2017.44 pubmed: 28626224
Blackburn, J. et al. Use of synthetic DNA spike-in controls (sequins) for human genome sequencing. Nat. Protoc. 14, 2119–2151 (2019).
doi: 10.1038/s41596-019-0175-1 pubmed: 31217595
Mukherjee, S., Huntemann, M., Ivanova, N., Kyrpides, N. C. & Pati, A. Large-scale contamination of microbial isolate genomes by Illumina PhiX control. Stand. Genom. Sci. 10, 18 (2015).
doi: 10.1186/1944-3277-10-18
Reis, A. L. et al. A universal and independent synthetic DNA ladder for the quantitative measurement of genomic features. Nat. Commun. 11, 3609 (2020).
doi: 10.1038/s41467-020-17445-5 pubmed: 32681090 pmcid: 7367866
Hardwick, S. A. et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. methods 13, 792–798 (2016).
doi: 10.1038/nmeth.3958 pubmed: 27502218
Hardwick, S. A. et al. Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis. Nat. Commun. 9, 3096 (2018).
doi: 10.1038/s41467-018-05555-0 pubmed: 30082706 pmcid: 6078961
Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).
doi: 10.1101/gr.121095.111 pubmed: 21816910 pmcid: 3166838
Marx, H. et al. A large synthetic peptide and phosphopeptide reference library for mass spectrometry–based proteomics. Nat. Biotechnol. 31, 557–564 (2013).
doi: 10.1038/nbt.2585 pubmed: 23685481
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
doi: 10.1038/nbt.2931 pubmed: 25150836 pmcid: 4404308
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
doi: 10.1186/gb-2010-11-3-r25 pubmed: 20196867 pmcid: 2864565
Ruan, W. & Lai, M. Actin, a reliable marker of internal control? Clin. Chim. Acta 385, 1–5 (2007).
doi: 10.1016/j.cca.2007.07.003 pubmed: 17698053
Tsou, C. C., Tsai, C. F., Teo, G. C., Chen, Y. J. & Nesvizhskii, A. I. Untargeted, spectral library‐free analysis of data‐independent acquisition proteomics data generated using Orbitrap mass spectrometers. Proteomics 16, 2257–2271 (2016).
doi: 10.1002/pmic.201500526 pubmed: 27246681 pmcid: 5476226
Kuzyk, M. A. et al. Multiple reaction monitoring-based, multiplexed, absolute quantitation of 45 proteins in human plasma. Mol. Cell. Proteom. 8, 1860–1877 (2009).
doi: 10.1074/mcp.M800540-MCP200
Ohlund, L. B. et al. Standard operating procedures and protocols for the preparation and analysis of plasma samples using the iTRAQ methodology. In Sample preparation in biological mass spectrometry. (eds Ivanov, A. R. & Lazarev, A. V.) 575–624 (Springer, New York, 2011).
Anderson, L. & Hunter, C. L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol. Cell. Proteom. 5, 573–588 (2006).
doi: 10.1074/mcp.M500331-MCP200
Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. methods 17, 41–44 (2020).
doi: 10.1038/s41592-019-0638-x pubmed: 31768060
Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. methods 16, 509–518 (2019).
doi: 10.1038/s41592-019-0426-7 pubmed: 31133760
Bersanelli, M. et al. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinforma. 17, 167–177 (2016).
doi: 10.1186/s12859-015-0857-9
Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 83 (2017).
doi: 10.1186/s13059-017-1215-1 pubmed: 28476144 pmcid: 5418815
Chizzolini, F. et al. Cell-free translation is more variable than transcription. ACS Synth. Biol. 6, 638–647 (2017).
doi: 10.1021/acssynbio.6b00250 pubmed: 28100049
Orenstein, Y. & Shamir, R. Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers. Bioinformatics 29, i71–i79 (2013).
doi: 10.1093/bioinformatics/btt230 pubmed: 23813011 pmcid: 3694677
Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 314–324 (2019).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. methods 9, 357–359 (2012).
doi: 10.1038/nmeth.1923 pubmed: 22388286 pmcid: 3322381
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
doi: 10.1093/bioinformatics/bty191 pubmed: 29750242 pmcid: 6137996
Li, H. et al. The sequence alignment/map format and SAMtools. bioinformatics 25, 2078–2079 (2009).
doi: 10.1093/bioinformatics/btp352 pubmed: 19505943 pmcid: 2723002
Miles, A. Pysamstats: a fast Python and command-line utility for extracting simple statistics against genome positions based on sequence alignments from a SAM or BAM file, https://github.com/alimanfoo/pysamstats (2021).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Computat. Biol. 9, e1003118 (2013).
doi: 10.1371/journal.pcbi.1003118
Marcais, G. & Kingsford, C. Jellyfish: A fast k-mer counter. Tutorialis e Manuais 1, 1–8 (2012).
R Core Team, R. R: A language and environment for statistical computing. (2013).
Github. Wgsim is a small tool for simulating sequence reads from a reference genome, https://github.com/lh3/wgsim (2010).
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. bioinformatics 31, 166–169 (2015).
doi: 10.1093/bioinformatics/btu638 pubmed: 25260700
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. bioinformatics 26, 139–140 (2010).
doi: 10.1093/bioinformatics/btp616 pubmed: 19910308
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
doi: 10.1038/nbt.3519 pubmed: 27043002
Duvaud, S. et al. Expasy, the Swiss Bioinformatics Resource Portal, as designed by its users. Nucleic Acids Res. 49, W216–W227 (2021).
doi: 10.1093/nar/gkab225 pubmed: 33849055 pmcid: 8265094

Auteurs

Helen M Gunter (HM)

Australian Institute of Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, Australia.
BASE mRNA Facility, The University of Queensland, Brisbane, Queensland, Australia.
ARC Centre of Excellence in Synthetic Biology, The University of Queensland, Brisbane, Queensland, Australia.

Scott E Youlten (SE)

Department of Genetics, Yale University School of Medicine, New Haven, CT, 06510, USA.
Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
St Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia.

Andre L M Reis (ALM)

Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales, Australia.
School of Electrical and Information Engineering, University of Sydney, Sydney, New South Wales, Australia.

Tim McCubbin (T)

Australian Institute of Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, Australia.
ARC Centre of Excellence in Synthetic Biology, The University of Queensland, Brisbane, Queensland, Australia.

Bindu Swapna Madala (BS)

Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales, Australia.

Ted Wong (T)

Garvan Institute of Medical Research, Sydney, New South Wales, Australia.

Igor Stevanovski (I)

Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales, Australia.

Arcadi Cipponi (A)

Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
St Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia.

Ira W Deveson (IW)

Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales, Australia.
School of Electrical and Information Engineering, University of Sydney, Sydney, New South Wales, Australia.

Nadia S Santini (NS)

Centro Nacional de Investigación Disciplinaria en Conservación y Mejoramiento de Ecosistemas Forestales, INIFAP, Ciudad de México, 04010, Mexico.

Sarah Kummerfeld (S)

Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
St Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia.

Peter I Croucher (PI)

Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
St Vincent's Clinical School, University of New South Wales, Sydney, New South Wales, Australia.

Esteban Marcellin (E)

Australian Institute of Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, Australia.
ARC Centre of Excellence in Synthetic Biology, The University of Queensland, Brisbane, Queensland, Australia.

Tim R Mercer (TR)

Australian Institute of Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, Australia. t.mercer@uq.edu.au.
BASE mRNA Facility, The University of Queensland, Brisbane, Queensland, Australia. t.mercer@uq.edu.au.
ARC Centre of Excellence in Synthetic Biology, The University of Queensland, Brisbane, Queensland, Australia. t.mercer@uq.edu.au.
Garvan Institute of Medical Research, Sydney, New South Wales, Australia. t.mercer@uq.edu.au.

Classifications MeSH