Accurate Prediction of Protein Sequences for Proteogenomics Data Integration.

DNA/RNA next-generation sequencing Genomics Mass spectrometry Proteogenomics Proteomics

Journal

Methods in molecular biology (Clifton, N.J.)
ISSN: 1940-6029
Titre abrégé: Methods Mol Biol
Pays: United States
ID NLM: 9214969

Informations de publication

Date de publication:
2022
Historique:
entrez: 14 12 2021
pubmed: 15 12 2021
medline: 29 1 2022
Statut: ppublish

Résumé

This book chapter discusses proteogenomics data integration and provides an overview into the different omics layer involved in defining the proteome of a living organism. Various aspects of genome variability affecting either the sequence or abundance level of proteins are discussed in this book chapter, such as the effect of single-nucleotide variants or larger genomic structural variants on the proteome. Next, various sequencing technologies are introduced and discussed from a proteogenomics data integration perspective such as those providing short- and long-read sequencing and listing their respective advantages and shortcomings for accurate protein variant prediction using genomic/transcriptomics sequencing data. Finally, the various bioinformatics tools used to process and analyze DNA/RNA sequencing data are discussed with the ultimate goal of obtaining accurately predicted sample-specific protein sequences that can be used as a drop-in replacement in existing approaches for peptide and protein identification using popular database search engines such as MSFragger, SearchGUI/PeptideShaker.

Identifiants

pubmed: 34905178
doi: 10.1007/978-1-0716-1936-0_18
doi:

Substances chimiques

Proteome 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

233-260

Informations de copyright

© 2022. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.

Références

Yates AD, Achuthan P, Akanni W et al (2020) Ensembl 2020. Nucleic Acids Res 48:D682–D688. https://doi.org/10.1093/nar/gkz966
doi: 10.1093/nar/gkz966 pubmed: 31691826
Homo_sapiens—Ensembl genome browser 103. https://www.ensembl.org/Homo_sapiens/Info/Annotation#assembly . Accessed 22 Feb 2021
Phan L, Jin Y, Zhang H, Qiang W, Shekhtman E, Shao D, Revoe D, Villamarin R, Ivanchenko E, Kimura M, Wang ZY, Hao L, Sharopova N, Bihan M, Sturcke A, Lee M, Popova N, Wu W, Bastiani C, Ward M, Holmes JB, Lyoshin V, Kaur K, Mo E, BLK (2020) ALFA: allele frequency aggregator
McCarthy S, Das S, Kretzschmar W et al (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48:1279–1283. https://doi.org/10.1038/ng.3643
doi: 10.1038/ng.3643 pubmed: 27548312 pmcid: 5388176
Auton A, Abecasis GR, Altshuler DM et al (2015) A global reference for human genetic variation. Nature 526:68–74
doi: 10.1038/nature15393
Boomsma DI, Wijmenga C, Slagboom EP et al (2014) The genome of the Netherlands: design, and project goals. Eur J Hum Genet 22:221–227. https://doi.org/10.1038/ejhg.2013.118
doi: 10.1038/ejhg.2013.118 pubmed: 23714750
Karczewski KJ, Francioli LC, Tiao G et al (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581:434–443. https://doi.org/10.1038/s41586-020-2308-7
doi: 10.1038/s41586-020-2308-7 pubmed: 32461654 pmcid: 7334197
Brodnicki TC (2007) Somatic mutation and autoimmunity. Cell 131:1220–1221
doi: 10.1016/j.cell.2007.12.006
Ross KA (2014) Coherent somatic mutation in autoimmune disease. PLoS One 9:e101093. https://doi.org/10.1371/journal.pone.0101093
doi: 10.1371/journal.pone.0101093 pubmed: 24988487 pmcid: 4079513
Mills MC, Rahal C (2019) A scientometric review of genome-wide association studies. Commun Biol 2:9
doi: 10.1038/s42003-018-0261-x
Mueller WF, Larsen LSZ, Garibaldi A et al (2015) The silent sway of splicing by synonymous substitutions. J Biol Chem 290:27700–27711. https://doi.org/10.1074/jbc.M115.684035
doi: 10.1074/jbc.M115.684035 pubmed: 26424794 pmcid: 4646019
Yang Y, Peng X, Ying P et al (2019) AWESOME: a database of SNPs that affect protein post-translational modifications. Nucleic Acids Res 47:D874–D880. https://doi.org/10.1093/nar/gky821
doi: 10.1093/nar/gky821 pubmed: 30215764
Zeng Z, Bromberg Y (2019) Predicting functional effects of synonymous variants: a systematic review and perspectives. Front Genet 10:914
doi: 10.3389/fgene.2019.00914
Rosenfeld JA, Malhotra AK, Lencz T (2010) Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing. Nucleic Acids Res 38:6102–6111. https://doi.org/10.1093/nar/gkq408
doi: 10.1093/nar/gkq408 pubmed: 20488869 pmcid: 2952858
Wang Q, Pierce-Hoffman E, Cummings BB et al (2020) Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes. Nat Commun 12:827. https://doi.org/10.1038/s41467-019-12438-5
doi: 10.1038/s41467-019-12438-5
Bartonek L, Braun D, Zagrovic B (2020) Frameshifting preserves key physicochemical properties of proteins. Proc Natl Acad Sci U S A 117:5907–5912. https://doi.org/10.1073/pnas.1911203117
doi: 10.1073/pnas.1911203117 pubmed: 32127487 pmcid: 7084103
Houseley J, Tollervey D (2009) The many pathways of RNA degradation. Cell 136:763–776
doi: 10.1016/j.cell.2009.01.019
Jakubosky D, D’Antonio M, Bonder MJ et al (2020) Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nat Commun 11:1–15. https://doi.org/10.1038/s41467-020-16482-4
doi: 10.1038/s41467-020-16482-4
Matsukawa T, Aplan PD (2020) Clinical and molecular consequences of fusion genes in myeloid malignancies. Stem Cells 38:1366–1374. https://doi.org/10.1002/stem.3263
doi: 10.1002/stem.3263 pubmed: 32745287
Pajic P, Pavlidis P, Dean K et al (2019) Independent amylase gene copy number bursts correlate with dietary preferences in mammals. elife 8:e44628. https://doi.org/10.7554/eLife.44628
doi: 10.7554/eLife.44628 pubmed: 31084707 pmcid: 6516957
Perry GH, Dominy NJ, Claw KG et al (2007) Diet and the evolution of human amylase gene copy number variation. Nat Genet 39:1256–1260. https://doi.org/10.1038/ng2123
doi: 10.1038/ng2123 pubmed: 17828263 pmcid: 2377015
Gibbons JG, Branco AT, Yu S, Lemos B (2014) Ribosomal DNA copy number is coupled with gene expression variation and mitochondrial abundance in humans. Nat Commun 5:4850. https://doi.org/10.1038/ncomms5850
doi: 10.1038/ncomms5850 pubmed: 25209200
Malone JH (2015) Balancing copy number in ribosomal DNA. Proc Natl Acad Sci U S A 112:2635–2636
doi: 10.1073/pnas.1500054112
Liu Y, Beyer A, Aebersold R (2016) On the dependency of cellular protein levels on mRNA abundance. Cell 165:535–550
doi: 10.1016/j.cell.2016.03.014
Liu Y, Borel C, Li L et al (2017) Systematic proteome and proteostasis profiling in human trisomy 21 fibroblast cells. Nat Commun 8(1):1212. https://doi.org/10.1038/s41467-017-01422-6
doi: 10.1038/s41467-017-01422-6 pubmed: 29089484 pmcid: 5663699
Rao X, Thapa KS, Chen AB et al (2019) Allele-specific expression and high-throughput reporter assay reveal functional genetic variants associated with alcohol use disorders. Mol Psychiatry 26(4):1–10. https://doi.org/10.1038/s41380-019-0508-z
doi: 10.1038/s41380-019-0508-z
Ken-Dror G, Humphries SE, Drenos F (2013) The use of haplotypes in the identification of interaction between SNPs. Hum Hered 75:44–51. https://doi.org/10.1159/000350964
doi: 10.1159/000350964 pubmed: 23652782
Spooner W, McLaren W, Slidel T et al (2018) Haplosaurus computes protein haplotypes for use in precision drug design. Nat Commun 9:4128. https://doi.org/10.1038/s41467-018-06542-1
doi: 10.1038/s41467-018-06542-1 pubmed: 30297836 pmcid: 6175845
Trerotola M, Relli V, Simeone P, Alberti S (2015) Epigenetic inheritance and the missing heritability. Hum Genomics 9:17
doi: 10.1186/s40246-015-0041-3
Zakarya R, Adcock I, Oliver BG (2019) Epigenetic impacts of maternal tobacco and e-vapour exposure on the offspring lung. Clin Epigenetics 11:32
doi: 10.1186/s13148-019-0631-3
Dekker J, Belmont AS, Guttman M et al (2017) The 4D nucleome project. Nature 549:219–226
doi: 10.1038/nature23884
Dixon JR, Jung I, Selvaraj S et al (2015) Chromatin architecture reorganization during stem cell differentiation. Nature 518:331–336. https://doi.org/10.1038/nature14222
doi: 10.1038/nature14222 pubmed: 25693564 pmcid: 4515363
Yu M, Ren B (2017) The three-dimensional organization of mammalian genomes. Annu Rev Cell Dev Biol 33:265–289
doi: 10.1146/annurev-cellbio-100616-060531
Treangen TJ, Salzberg SL (2012) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13:36–46
doi: 10.1038/nrg3117
Bloom K, Costanzo V (2017) Centromere structure and function. Prog Mol Subcell Biol 56:515–539
doi: 10.1007/978-3-319-58592-5_21
Shay JW, Wright WE (2019) Telomeres and telomerase: three decades of progress. Nat Rev Genet 20:299–309. https://doi.org/10.1038/s41576-019-0099-1
doi: 10.1038/s41576-019-0099-1 pubmed: 30760854
Lomvardas S, Barnea G, Pisapia DJ et al (2006) Interchromosomal interactions and olfactory receptor choice. Cell 126:403–413. https://doi.org/10.1016/j.cell.2006.06.035
doi: 10.1016/j.cell.2006.06.035 pubmed: 16873069
Ong CT, Corces VG (2011) Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet 12:283–293
doi: 10.1038/nrg2957
Schoenfelder S, Fraser P (2019) Long-range enhancer–promoter contacts in gene expression control. Nat Rev Genet 20:437–455
doi: 10.1038/s41576-019-0128-0
Plaschka C, Hantsche M, Dienemann C et al (2016) Transcription initiation complex structures elucidate DNA opening. Nature 533:353–358. https://doi.org/10.1038/nature17990
doi: 10.1038/nature17990 pubmed: 27193681
Li C, Zhang J (2019) Stop-codon read-through arises largely from molecular errors and is generally nonadaptive. PLoS Genet 15:e1008141. https://doi.org/10.1371/journal.pgen.1008141
doi: 10.1371/journal.pgen.1008141 pubmed: 31120886 pmcid: 6550407
Stadtman TC (1996) Selenocysteine. Annu Rev Biochem 65:83–100
doi: 10.1146/annurev.bi.65.070196.000503
Pan Q, Shai O, Lee LJ et al (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40:1413–1415. https://doi.org/10.1038/ng.259
doi: 10.1038/ng.259 pubmed: 18978789
Chen J, Weiss WA (2015) Alternative splicing in cancer: implications for biology and therapy. Oncogene 34:1–14
doi: 10.1038/onc.2013.570
Conley AB, Jordan IK (2012) Cell type-specific termination of transcription by transposable element sequences. Mob DNA 3:15. https://doi.org/10.1186/1759-8753-3-15
doi: 10.1186/1759-8753-3-15 pubmed: 23020800 pmcid: 3517506
Wang J, Alvin Chew BL, Lai Y et al (2019) Quantifying the RNA cap epitranscriptome reveals novel caps in cellular and viral RNA. Nucleic Acids Res 47:e130. https://doi.org/10.1093/nar/gkz751
doi: 10.1093/nar/gkz751 pubmed: 31504804 pmcid: 6847653
Christofi T, Zaravinos A (2019) RNA editing in the forefront of epitranscriptomics and human health. J Transl Med 17:319
doi: 10.1186/s12967-019-2071-4
Picardi E, Manzari C, Mastropasqua F et al (2015) Profiling RNA editing in human tissues: towards the inosinome Atlas. Sci Rep 5:14941. https://doi.org/10.1038/srep14941
doi: 10.1038/srep14941 pubmed: 26449202 pmcid: 4598827
Ben-Dov E, Shapiro OH, Siboni N, Kushmaro A (2006) Advantage of using inosine at the 3′ termini of 16S rRNA gene universal primers for the study of microbial diversity. Appl Environ Microbiol 72:6902–6906. https://doi.org/10.1128/AEM.00849-06
doi: 10.1128/AEM.00849-06 pubmed: 16950904 pmcid: 1636166
Davidson NO (1994) RNA editing of the apolipoprotein B gene. A mechanism to regulate the atherogenic potential of intestinal lipoproteins? Trends Cardiovasc Med 4:231–235
doi: 10.1016/1050-1738(94)90039-6
Zhang P, Wu W, Chen Q, Chen M (2019) Non-coding RNAs and their integrated networks. J Integr Bioinform 16:20190027
doi: 10.1515/jib-2019-0027
Mattick JS, Makunin IV (2006) Non-coding RNA. Hum Mol Genet. 15 Spec No
Wilkinson ME, Charenton C, Nagai K (2020) RNA splicing by the spliceosome. Annu Rev Biochem 89:359–388. https://doi.org/10.1146/annurev-biochem-091719-064225
doi: 10.1146/annurev-biochem-091719-064225 pubmed: 31794245
Bracken CP, Scott HS, Goodall GJ (2016) A network-biology perspective of microRNA function and dysfunction in cancer. Nat Rev Genet 17:719–732
doi: 10.1038/nrg.2016.134
Ozata DM, Gainetdinov I, Zoch A et al (2019) PIWI-interacting RNAs: small RNAs with big functions. Nat Rev Genet 20:89–108
doi: 10.1038/s41576-018-0073-3
Pelechano V, Steinmetz LM (2013) Gene regulation by antisense transcription. Nat Rev Genet 14:880–893
doi: 10.1038/nrg3594
Uszczynska-Ratajczak B, Lagarde J, Frankish A et al (2018) Towards a complete map of the human long non-coding RNA transcriptome. Nat Rev Genet 19:535–548
doi: 10.1038/s41576-018-0017-y
Yao RW, Wang Y, Chen LL (2019) Cellular functions of long noncoding RNAs. Nat Cell Biol 21:542–551
doi: 10.1038/s41556-019-0311-8
Ji Z, Song R, Regev A, Struhl K (2015) Many lncRNAs, 5′UTRs, and pseudogenes are translated and some are likely to express functional proteins. Elife 4. https://doi.org/10.7554/eLife.08890
Borbolis F, Syntichaki P (2015) Cytoplasmic mRNA turnover and ageing. Mech Ageing Dev 152:32–42. https://doi.org/10.1016/j.mad.2015.09.006
doi: 10.1016/j.mad.2015.09.006 pubmed: 26432921 pmcid: 4710634
Franks A, Airoldi E, Slavov N (2017) Post-transcriptional regulation across human tissues. PLoS Comput Biol 13:e1005535. https://doi.org/10.1371/journal.pcbi.1005535
doi: 10.1371/journal.pcbi.1005535 pubmed: 28481885 pmcid: 5440056
Qu Z, Vondriska TM (2009) The effects of cascade length, kinetics and feedback loops on biological signal transduction dynamics in a simplified cascade model. Phys Biol 6:016007. https://doi.org/10.1088/1478-3975/6/1/016007
doi: 10.1088/1478-3975/6/1/016007 pubmed: 19242047
Powers KT, Szeto JYA, Schaffitzel C (2020) New insights into no-go, non-stop and nonsense-mediated mRNA decay complexes. Curr Opin Struct Biol 65:110–118
doi: 10.1016/j.sbi.2020.06.011
Veitia RA (2005) Gene dosage balance: deletions, duplications and dominance. Trends Genet 21:33–35
doi: 10.1016/j.tig.2004.11.002
Head SR, Kiyomi Komori H, LaMere SA et al (2014) Library construction for next-generation sequencing: overviews and challenges. BioTechniques 56:61–77. https://doi.org/10.2144/000114133
doi: 10.2144/000114133 pubmed: 24502796 pmcid: 4351865
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351
doi: 10.1038/nrg.2016.49
Logsdon GA, Vollger MR, Eichler EE (2020) Long-read human genome sequencing and its applications. Nat Rev Genet 21:597–614. https://doi.org/10.1038/s41576-020-0236-x
doi: 10.1038/s41576-020-0236-x pubmed: 32504078 pmcid: 7877196
Landrum MJ, Lee JM, Benson M et al (2018) ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46:D1062–D1067. https://doi.org/10.1093/nar/gkx1153
doi: 10.1093/nar/gkx1153 pubmed: 29165669
Warr A, Robert C, Hume D et al (2015) Exome sequencing: current and future perspectives. G3 Genes Genomes Genet 5:1543–1550. https://doi.org/10.1534/g3.115.018564
doi: 10.1534/g3.115.018564
Meienberg J, Bruggmann R, Oexle K, Matyas G (2016) Clinical sequencing: is WGS the better WES? Hum Genet 135:359–362. https://doi.org/10.1007/s00439-015-1631-9
doi: 10.1007/s00439-015-1631-9 pubmed: 26742503 pmcid: 4757617
Seaby EG, Pengelly RJ, Ennis S (2016) Exome sequencing explained: a practical guide to its clinical application. Brief Funct Genomics 15:374–384. https://doi.org/10.1093/bfgp/elv054
doi: 10.1093/bfgp/elv054 pubmed: 26654982
Wu J, Xiao J, Zhang Z et al (2014) Ribogenomics: the science and knowledge of RNA. Genomics Proteomics Bioinformat 12:57–63. https://doi.org/10.1016/j.gpb.2014.04.002
doi: 10.1016/j.gpb.2014.04.002
Cui P, Lin Q, Ding F et al (2010) A comparison between ribo-minus RNA-sequencing and polyA-selected RNA-sequencing. Genomics 96:259–265. https://doi.org/10.1016/j.ygeno.2010.07.010
doi: 10.1016/j.ygeno.2010.07.010 pubmed: 20688152
Liu Q, Shvarts T, Sliz P, Gregory RI (2020) RiboToolkit: an integrated platform for analysis and annotation of ribosome profiling data to decode mRNA translation at codon resolution. Nucleic Acids Res 48:W218–W229. https://doi.org/10.1093/nar/gkaa395
doi: 10.1093/nar/gkaa395 pubmed: 32427338 pmcid: 7319539
Calviello L, Ohler U (2017) Beyond read-counts: ribo-seq data analysis to understand the functions of the transcriptome. Trends Genet 33:728–744
doi: 10.1016/j.tig.2017.08.003
Spangenberg L, Shigunov P, Abud APR et al (2013) Polysome profiling shows extensive posttranscriptional regulation during human adipocyte stem cell differentiation into adipocytes. Stem Cell Res 11:902–912. https://doi.org/10.1016/j.scr.2013.06.002
doi: 10.1016/j.scr.2013.06.002 pubmed: 23845413
Mousset CM, Hobo W, Woestenenk R et al (2019) Comprehensive phenotyping of T cells using flow cytometry. Cytom Part A 95:647–654. https://doi.org/10.1002/cyto.a.23724
doi: 10.1002/cyto.a.23724
Mund A, Coscia F, Hollandi R et al (2021) AI-driven deep visual proteomics defines cell identity and heterogeneity proteomics program, 2 protein signaling program, and 3 protein imaging platform. bioRxiv 2021.01.25.427969. https://doi.org/10.1101/2021.01.25.427969
Marx V (2017) How to deduplicate PCR. Nat Methods 14:473–476. https://doi.org/10.1038/nmeth.4268
doi: 10.1038/nmeth.4268 pubmed: 28448070
Sena JA, Galotto G, Devitt NP et al (2018) Unique molecular identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis. Sci Rep 8:13121. https://doi.org/10.1038/s41598-018-31064-7
doi: 10.1038/s41598-018-31064-7 pubmed: 30177820 pmcid: 6120941
Hwang B, Lee JH, Bang D (2018) Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50:96
doi: 10.1038/s12276-018-0071-8
Qiu P (2020) Embracing the dropouts in single-cell RNA-seq analysis. Nat Commun 11:1169. https://doi.org/10.1038/s41467-020-14976-9
doi: 10.1038/s41467-020-14976-9 pubmed: 32127540 pmcid: 7054558
Specht H, Slavov N (2018) Transformative opportunities for single-cell proteomics. J Proteome Res 17:2565–2571
doi: 10.1021/acs.jproteome.8b00257
Petrany MJ, Swoboda CO, Sun C et al (2020) Single-nucleus RNA-seq identifies transcriptional heterogeneity in multinucleated skeletal myofibers. Nat Commun 11:1–12. https://doi.org/10.1038/s41467-020-20063-w
doi: 10.1038/s41467-020-20063-w
Wu H, Kirita Y, Donnelly EL, Humphreys BD (2019) Advantages of single-nucleus over single-cell RNA sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis. J Am Soc Nephrol 30:23–32. https://doi.org/10.1681/ASN.2018090912
doi: 10.1681/ASN.2018090912 pubmed: 30510133
Yao Y, Nishimura M, Murayama K et al (2019) A simple method for sequencing the whole human mitochondrial genome directly from samples and its application to genetic testing. Sci Rep 9:17411. https://doi.org/10.1038/s41598-019-53449-y
doi: 10.1038/s41598-019-53449-y pubmed: 31757988 pmcid: 6874554
Green ED (2001) Strategies for the systematic sequencing of complex genomes. Nat Rev Genet 2:573–583. https://doi.org/10.1038/35084503
doi: 10.1038/35084503 pubmed: 11483982
Gaspar JM (2018) NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors. BMC Bioinformatics 19:536. https://doi.org/10.1186/s12859-018-2579-2
doi: 10.1186/s12859-018-2579-2 pubmed: 30572828 pmcid: 6302405
Edwards HS, Krishnakumar R, Sinha A et al (2019) Real-time selective sequencing with rubric: read until with basecall and reference-informed criteria. Sci Rep 9:1–11. https://doi.org/10.1038/s41598-019-47857-3
doi: 10.1038/s41598-019-47857-3
Kovaka S, Fan Y, Ni B et al (2021) Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. Nat Biotechnol 39(4):1–11. https://doi.org/10.1038/s41587-020-0731-9
doi: 10.1038/s41587-020-0731-9
Loose M, Malla S, Stout M (2016) Real-time selective sequencing using nanopore technology. Nat Methods 13:751–754. https://doi.org/10.1038/nmeth.3930
doi: 10.1038/nmeth.3930 pubmed: 27454285 pmcid: 5008457
Miga KH, Koren S, Rhie A et al (2020) Telomere-to-telomere assembly of a complete human X chromosome. Nature 585:79–84. https://doi.org/10.1038/s41586-020-2547-7
doi: 10.1038/s41586-020-2547-7 pubmed: 32663838 pmcid: 32663838
Bayega A, Wang YC, Oikonomopoulos S et al (2018) Transcript profiling using long-read sequencing technologies. In: Methods in molecular biology. Humana Press, pp 121–147
Thibodeau ML, O’Neill K, Dixon K et al (2020) Improved structural variant interpretation for hereditary cancer susceptibility using long-read sequencing. Genet Med 22:1892–1897. https://doi.org/10.1038/s41436-020-0880-8
doi: 10.1038/s41436-020-0880-8 pubmed: 32624572 pmcid: 7605438
Navin NE, Hicks J (2010) Tracing the tumor lineage. Mol Oncol 4:267–283
doi: 10.1016/j.molonc.2010.04.010
Wenger AM, Peluso P, Rowell WJ et al (2019) Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37:1155–1162. https://doi.org/10.1038/s41587-019-0217-9
doi: 10.1038/s41587-019-0217-9 pubmed: 31406327 pmcid: 6776680
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170
doi: 10.1093/bioinformatics/btu170 pubmed: 24695404 pmcid: 4103590
Andrews S (2010) FastQC: a quality control tool for high throughput sequence data
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324
doi: 10.1093/bioinformatics/btp324 pubmed: 19451168 pmcid: 19451168
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21. https://doi.org/10.1093/bioinformatics/bts635
doi: 10.1093/bioinformatics/bts635 pubmed: 23104886 pmcid: 23104886
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. https://doi.org/10.1038/nbt.1883
doi: 10.1038/nbt.1883 pubmed: 21572440 pmcid: 3571712
Pertea M, Pertea GM, Antonescu CM et al (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33:290–295. https://doi.org/10.1038/nbt.3122
doi: 10.1038/nbt.3122 pubmed: 25690850 pmcid: 4643835
McKenna A, Hanna M, Banks E et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303. https://doi.org/10.1101/gr.107524.110
doi: 10.1101/gr.107524.110 pubmed: 20644199 pmcid: 2928508
McLaren W, Gil L, Hunt SE et al (2016) The Ensembl variant effect predictor. Genome Biol 17:122. https://doi.org/10.1186/s13059-016-0974-4
doi: 10.1186/s13059-016-0974-4 pubmed: 27268795 pmcid: 4893825
Ruggles KV, Tang Z, Wang X et al (2016) An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer. Mol Cell Proteomics 15:1060–1071. https://doi.org/10.1074/mcp.M115.056226
doi: 10.1074/mcp.M115.056226 pubmed: 26631509
Choong WK, Wang JH, Sung TY (2020) MinProtMaxVP: generating a minimized number of protein variant sequences containing all possible variant peptides for proteogenomic analysis. J Proteome 223:103819. https://doi.org/10.1016/j.jprot.2020.103819
doi: 10.1016/j.jprot.2020.103819
Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat Protoc 8:1494–1512. https://doi.org/10.1038/nprot.2013.084
doi: 10.1038/nprot.2013.084 pubmed: 23845962
Vaudel M, Barsnes H, Berven FS et al (2011) SearchGUI: an open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 11:996–999. https://doi.org/10.1002/pmic.201000595
doi: 10.1002/pmic.201000595 pubmed: 21337703
Vaudel M, Burkhart JM, Zahedi RP et al (2015) PeptideShaker enables reanalysis of MS-derived proteomics data sets: to the editor. Nat Biotechnol 33:22–24
doi: 10.1038/nbt.3109
Kong AT, Leprevost FV, Avtonomov DM et al (2017) MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods 14:513–520. https://doi.org/10.1038/nmeth.4256
doi: 10.1038/nmeth.4256 pubmed: 28394336 pmcid: 5409104
Zhang J, Xin L, Shan B et al (2012) PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 11:M111.010587. https://doi.org/10.1074/mcp.M111.010587
doi: 10.1074/mcp.M111.010587 pubmed: 22186715
Perkins DN, Pappin DJC, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. In: Electrophoresis. Wiley-VCH Verlag, pp 3551–3567
den Dunnen JT, Dalgleish R, Maglott DR et al (2016) HGVS recommendations for the description of sequence variants: 2016 update. Hum Mutat 37:564–569. https://doi.org/10.1002/humu.22981
doi: 10.1002/humu.22981
Bischoff R, Permentier H, Guryev V, Horvatovich P (2016) Genomic variability and protein species—improving sequence coverage for proteogenomics. J Proteome 134:25–36. https://doi.org/10.1016/j.jprot.2015.09.021
doi: 10.1016/j.jprot.2015.09.021
Barbieri R, Guryev V, Brandsma CA et al (2016) Proteogenomics: key driver for clinical discovery and personalized medicine. In: Advances in experimental medicine and biology. Springer, New York, pp 21–47
Horvatovich P, Brandsma C-A, Suits F et al (2019) Proteogenomics and multi-omics data integration for personalized medicine. In: Handbook of biomarkers and precision medicine. Chapman and Hall/CRC, pp 422–431
doi: 10.1201/9780429202872-50

Auteurs

Yanick Paco Hagemeijer (YP)

Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands.
European Research Institute for the Biology of Ageing, University Medical Center Groningen, Groningen, The Netherlands.

Victor Guryev (V)

European Research Institute for the Biology of Ageing, University Medical Center Groningen, Groningen, The Netherlands.

Peter Horvatovich (P)

Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands. p.l.horvatovich@rug.nl.

Articles similaires

Animals Hemiptera Insect Proteins Phylogeny Insecticides
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Genome, Bacterial Virulence Phylogeny Genomics Plant Diseases

Classifications MeSH