Modeling short visual events through the BOLD moments video fMRI dataset and metadata.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
24 Jul 2024
24 Jul 2024
Historique:
received:
14
08
2023
accepted:
04
07
2024
medline:
26
7
2024
pubmed:
26
7
2024
entrez:
24
7
2024
Statut:
epublish
Résumé
Studying the neural basis of human dynamic visual perception requires extensive experimental data to evaluate the large swathes of functionally diverse brain neural networks driven by perceiving visual events. Here, we introduce the BOLD Moments Dataset (BMD), a repository of whole-brain fMRI responses to over 1000 short (3 s) naturalistic video clips of visual events across ten human subjects. We use the videos' extensive metadata to show how the brain represents word- and sentence-level descriptions of visual events and identify correlates of video memorability scores extending into the parietal cortex. Furthermore, we reveal a match in hierarchical processing between cortical regions of interest and video-computable deep neural networks, and we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. With its rich metadata, BMD offers new perspectives and accelerates research on the human brain basis of visual event perception.
Identifiants
pubmed: 39048577
doi: 10.1038/s41467-024-50310-3
pii: 10.1038/s41467-024-50310-3
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
6241Informations de copyright
© 2024. The Author(s).
Références
Carandini, M. Do We Know What the Early Visual System Does? J. Neurosci. 25, 10577–10597 (2005).
pubmed: 16291931
pmcid: 6725861
doi: 10.1523/JNEUROSCI.3726-05.2005
DeYoe, E. A. & Van Essen, D. C. Concurrent processing streams in monkey visual cortex. Trends Neurosci. 11, 219–226 (1988).
pubmed: 2471327
doi: 10.1016/0166-2236(88)90130-0
DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How Does the Brain Solve Visual Object Recognition? Neuron 73, 415–434 (2012).
pubmed: 22325196
pmcid: 3306444
doi: 10.1016/j.neuron.2012.01.010
Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex N. Y. N. 1, 1–47 (1991).
doi: 10.1093/cercor/1.1.1
Logothetis, N. K. & Sheinberg, D. L. Visual object recognition. Annu. Rev. Neurosci. 19, 577–621 (1996).
pubmed: 8833455
doi: 10.1146/annurev.ne.19.030196.003045
Ress, D. & Heeger, D. J. Neuronal correlates of perception in early visual cortex. Nat. Neurosci. 6, 414–420 (2003).
pubmed: 12627164
pmcid: 2278238
doi: 10.1038/nn1024
Fairhall, S. L., Albi, A. & Melcher, D. Temporal Integration Windows for Naturalistic Visual Sequences. PLoS ONE 9, e102248 (2014).
pubmed: 25010517
pmcid: 4092072
doi: 10.1371/journal.pone.0102248
Hasson, U., Yang, E., Vallines, I., Heeger, D. J. & Rubin, N. A Hierarchy of Temporal Receptive Windows in Human Cortex. J. Neurosci. 28, 2539–2550 (2008).
pubmed: 18322098
pmcid: 2556707
doi: 10.1523/JNEUROSCI.5487-07.2008
Lingnau, A. & Downing, P. E. The lateral occipitotemporal cortex in action. Trends Cogn. Sci. 19, 268–277 (2015).
pubmed: 25843544
doi: 10.1016/j.tics.2015.03.006
Orlov, T. & Zohary, E. Object Representations in Human Visual Cortex Formed Through Temporal Integration of Dynamic Partial Shape Views. J. Neurosci. 38, 659–678 (2018).
pubmed: 29196319
pmcid: 6596194
doi: 10.1523/JNEUROSCI.1318-17.2017
Wurm, M. F. & Caramazza, A. Two ‘what’ pathways for action and object recognition. Trends Cogn. Sci. 26, 103–116 (2022).
pubmed: 34702661
doi: 10.1016/j.tics.2021.10.003
McMahon, E., Bonner, M. F. & Isik, L. Hierarchical organization of social action features along the lateral visual pathway. Curr. Biol. 33, 5035–5047.e8 (2023).
pubmed: 37918399
doi: 10.1016/j.cub.2023.10.015
Pitcher, D., Dilks, D. D., Saxe, R. R., Triantafyllou, C. & Kanwisher, N. Differential selectivity for dynamic versus static information in face-selective cortical regions. NeuroImage 56, 2356–2363 (2011).
pubmed: 21473921
doi: 10.1016/j.neuroimage.2011.03.067
Pitcher, D. & Ungerleider, L. G. Evidence for a Third Visual Pathway Specialized for Social Perception. Trends Cogn. Sci. 25, 100–110 (2021).
pubmed: 33334693
doi: 10.1016/j.tics.2020.11.006
Bainbridge, W. A. Chapter One—Memorability: How what we see influences what we remember. in Psychology of Learning and Motivation (eds. Federmeier, K. D. & Beck, D. M.) 70 1–27 (Academic Press, 2019).
Bylinskii, Z., Goetschalckx, L., Newman, A. & Oliva, A. Memorability: An Image-Computable Measure of Information Utility. in Human Perception of Visual Information (eds. Ionescu, B., Bainbridge, W. A. & Murray, N.) 207–239 (Springer International Publishing, Cham, 2022). https://doi.org/10.1007/978-3-030-81465-6_8 .
Han, J. et al. Learning Computational Models of Video Memorability from fMRI Brain Imaging. IEEE Trans. Cybern. 45, 1692–1703 (2015).
pubmed: 25314715
doi: 10.1109/TCYB.2014.2358647
Hasson, U., Furman, O., Clark, D., Dudai, Y. & Davachi, L. Enhanced Intersubject Correlations during Movie Viewing Correlate with Successful Episodic Encoding. Neuron 57, 452–462 (2008).
pubmed: 18255037
pmcid: 2789242
doi: 10.1016/j.neuron.2007.12.009
Schneider, W. X. Selective visual processing across competition episodes: a theory of task-driven visual attention and working memory. Philos. Trans. R. Soc. B Biol. Sci. 368, 20130060 (2013).
doi: 10.1098/rstb.2013.0060
Bartels, A. & Zeki, S. Functional brain mapping during free viewing of natural scenes. Hum. Brain Mapp. 21, 75–85 (2004).
pubmed: 14755595
doi: 10.1002/hbm.10153
Konen, C. S. & Kastner, S. Representation of Eye Movements and Stimulus Motion in Topographically Organized Areas of Human Posterior Parietal Cortex. J. Neurosci. 28, 8361–8375 (2008).
pubmed: 18701699
pmcid: 2685070
doi: 10.1523/JNEUROSCI.1930-08.2008
Press, W. A., Brewer, A. A., Dougherty, R. F., Wade, A. R. & Wandell, B. A. Visual areas and spatial summation in human visual cortex. Vis. Res. 41, 1321–1332 (2001).
pubmed: 11322977
doi: 10.1016/S0042-6989(01)00074-8
Schultz, J. & Pilz, K. S. Natural facial motion enhances cortical responses to faces. Exp. Brain Res. 194, 465–475 (2009).
pubmed: 19205678
pmcid: 2755747
doi: 10.1007/s00221-009-1721-9
Yildirim, I., Wu, J., Kanwisher, N. & Tenenbaum, J. An integrative computational architecture for object-driven cortex. Curr. Opin. Neurobiol. 55, 73–81 (2019).
pubmed: 30825704
pmcid: 6548583
doi: 10.1016/j.conb.2019.01.010
Buccino, G. et al. Action Observation Activates Premotor and Parietal Areas in a Somatotopic Manner: An fMRI Study. in Social Neuroscience (Psychology Press, 2004).
Kret, M. E., Pichon, S., Grèzes, J. & de Gelder, B. Similarities and differences in perceiving threat from dynamic faces and bodies. An fMRI study. NeuroImage 54, 1755–1762 (2011).
pubmed: 20723605
doi: 10.1016/j.neuroimage.2010.08.012
Hasson, U. et al. Neurocinematics: The Neuroscience of Film. Projections 2, 1–26 (2008).
doi: 10.3167/proj.2008.020102
Roberts, J., Wallis, G. & Breakspear, M. Fixational eye movements during viewing of dynamic natural scenes. Front. Psychol. 4, 797 (2013).
pubmed: 24194727
pmcid: 3810780
doi: 10.3389/fpsyg.2013.00797
Kriegeskorte, N. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. https://doi.org/10.3389/neuro.06.004.2008 (2008).
Bainbridge, W. A., Dilks, D. D. & Oliva, A. Memorability: A stimulus-driven perceptual neural signature distinctive from memory. NeuroImage 149, 141–152 (2017).
pubmed: 28132932
doi: 10.1016/j.neuroimage.2017.01.063
Bainbridge, W. A. & Rissman, J. Dissociating neural markers of stimulus memorability and subjective recognition during episodic retrieval. Sci. Rep. 8, 1–11 (2018).
doi: 10.1038/s41598-018-26467-5
Mohsenzadeh, Y., Mullin, C., Oliva, A. & Pantazis, D. The perceptual neural trace of memorable unseen scenes. Sci. Rep. 9, 6033 (2019).
pubmed: 30988333
pmcid: 6465597
doi: 10.1038/s41598-019-42429-x
Misaki, M., Luh, W.-M. & Bandettini, P. A. Accurate decoding of sub-TR timing differences in stimulations of sub-voxel regions from multi-voxel response patterns. NeuroImage 66, 623–633 (2013).
pubmed: 23128073
doi: 10.1016/j.neuroimage.2012.10.069
Prince, J. S. et al. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife 11, e77599 (2022).
pubmed: 36444984
pmcid: 9708069
doi: 10.7554/eLife.77599
Wittkuhn, L. & Schuck, N. W. Dynamics of fMRI patterns reflect sub-second activation sequences and reveal replay in human visual cortex. Nat. Commun. 12, 1795 (2021).
pubmed: 33741933
pmcid: 7979874
doi: 10.1038/s41467-021-21970-2
Mineault, P., Bakhtiari, S., Richards, B. & Pack, C. Your head is there to move you around: Goal-driven models of the primate dorsal pathway. in Advances in Neural Information Processing Systems 34 28757–28771 (Curran Associates, Inc., 2021).
Aliko, S., Huang, J., Gheorghiu, F., Meliss, S. & Skipper, J. I. A naturalistic neuroimaging database for understanding the brain using ecological stimuli. Sci. Data 7, 347 (2020).
pubmed: 33051448
pmcid: 7555491
doi: 10.1038/s41597-020-00680-2
Allen, E. J. et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 25, 116–126 (2022).
pubmed: 34916659
doi: 10.1038/s41593-021-00962-x
Hanke, M. et al. A studyforrest extension, simultaneous fMRI and eye gaze recordings during prolonged natural stimulation. Sci. Data 3, 160092 (2016).
pubmed: 27779621
pmcid: 5079121
doi: 10.1038/sdata.2016.92
Hebart, M. N. et al. THINGS: A database of 1,854 object concepts and more than 26,000 naturalistic object images. PLOS ONE 14, e0223792 (2019).
pubmed: 31613926
pmcid: 6793944
doi: 10.1371/journal.pone.0223792
Newman, A. et al. Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability. in Computer Vision—ECCV 2020 (eds. Vedaldi, A., Bischof, H., Brox, T. & Frahm, J.-M.) 223–240 (Springer International Publishing, Cham). https://doi.org/10.1007/978-3-030-58517-4_14 (2020).
Monfort, M. et al. Moments in Time Dataset: One Million Videos for Event Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42, 502–508 (2020).
pubmed: 30802849
doi: 10.1109/TPAMI.2019.2901464
Monfort, M. et al. Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 44, 9434–9445 (2022).
pubmed: 34752386
doi: 10.1109/TPAMI.2021.3126682
Berkes, P., Orbán, G., Lengyel, M. & Fiser, J. Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment. Science 331, 83–87 (2011).
pubmed: 21212356
pmcid: 3065813
doi: 10.1126/science.1195870
Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).
pubmed: 8637596
doi: 10.1038/381607a0
Olshausen, B. A. & Field, D. J. Natural image statistics and efficient coding. Netw. Comput. Neural Syst. 7, 333–339 (1996).
doi: 10.1088/0954-898X_7_2_014
Smyth, D., Willmore, B., Baker, G. E., Thompson, I. D. & Tolhurst, D. J. The Receptive-Field Organization of Simple Cells in Primary Visual Cortex of Ferrets under Natural Scene Stimulation. J. Neurosci. 23, 4746–4759 (2003).
pubmed: 12805314
pmcid: 6740783
doi: 10.1523/JNEUROSCI.23-11-04746.2003
Baddeley, A. Working Memory. Science 255, 556–559 (1992).
pubmed: 1736359
doi: 10.1126/science.1736359
Barrouillet, P., Bernardin, S. & Camos, V. Time Constraints and Resource Sharing in Adults’ Working Memory Spans. J. Exp. Psychol. Gen. 133, 83–100 (2004).
pubmed: 14979753
doi: 10.1037/0096-3445.133.1.83
Hasson, U., Nir, Y., Levy, I., Fuhrmann, G. & Malach, R. Intersubject Synchronization of Cortical Activity During Natural Vision. Science 303, 1634–1640 (2004).
pubmed: 15016991
doi: 10.1126/science.1089506
Haxby, J. V. et al. A Common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron 72, 404–416 (2011).
pubmed: 22017997
pmcid: 3201764
doi: 10.1016/j.neuron.2011.08.026
Haxby, J. V., Guntupalli, J. S., Nastase, S. A. & Feilong, M. Hyperalignment: modeling shared information encoded in idiosyncratic cortical topographies. eLife 9, e56601 (2020).
pubmed: 32484439
pmcid: 7266639
doi: 10.7554/eLife.56601
Buccino, G., Binkofski, F. & Riggio, L. The mirror neuron system and action recognition. Brain Lang. 89, 370–376 (2004).
pubmed: 15068920
doi: 10.1016/S0093-934X(03)00356-0
Buccino, G. et al. Neural circuits underlying imitation learning of hand actions: an event-related fMRI study. Neuron 42, 323–334 (2004).
pubmed: 15091346
doi: 10.1016/S0896-6273(04)00181-3
Gazzola, V. & Keysers, C. The observation and execution of actions share motor and somatosensory voxels in all tested subjects: single-subject analyses of unsmoothed fMRI data. Cereb. Cortex 19, 1239–1255 (2009).
pubmed: 19020203
doi: 10.1093/cercor/bhn181
Rizzolatti, G. & Sinigaglia, C. The functional role of the parieto-frontal mirror circuit: interpretations and misinterpretations. Nat. Rev. Neurosci. 11, 264–274 (2010).
pubmed: 20216547
doi: 10.1038/nrn2805
Lafer-Sousa, R., Conway, B. R. & Kanwisher, N. G. Color-biased regions of the ventral visual pathway lie between face- and place-selective regions in humans, as in Macaques. J. Neurosci. 36, 1682–1697 (2016).
pubmed: 26843649
pmcid: 4737777
doi: 10.1523/JNEUROSCI.3164-15.2016
Hutchison, R. M. et al. Dynamic functional connectivity: Promise, issues, and interpretations. NeuroImage 80, 360–378 (2013).
pubmed: 23707587
doi: 10.1016/j.neuroimage.2013.05.079
Smith, S. M. et al. Functional connectomics from resting-state fMRI. Trends Cogn. Sci. 17, 666–682 (2013).
pubmed: 24238796
pmcid: 4004765
doi: 10.1016/j.tics.2013.09.016
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A. & Torralba, A. Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2018).
pubmed: 28692961
doi: 10.1109/TPAMI.2017.2723009
Hebart, M. N. et al. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife 12, e82580 (2023).
pubmed: 36847339
pmcid: 10038662
doi: 10.7554/eLife.82580
Monfort, M. et al. Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions. in 14871–14881 (2021).
Gorgolewski, K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data 3, 160044 (2016).
pubmed: 27326542
pmcid: 4978148
doi: 10.1038/sdata.2016.44
Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2019).
pubmed: 30532080
doi: 10.1038/s41592-018-0235-4
Kay, K., Jamison, K. W., Zhang, R.-Y. & Uğurbil, K. A temporal decomposition method for identifying venous effects in task-based fMRI. Nat. Methods 17, 1033–1039 (2020).
pubmed: 32895538
pmcid: 7721302
doi: 10.1038/s41592-020-0941-6
Le, A., Vesia, M., Yan, X., Crawford, J. D. & Niemeier, M. Parietal area BA7 integrates motor programs for reaching, grasping, and bimanual coordination. J. Neurophysiol. 117, 624–636 (2017).
pubmed: 27832593
doi: 10.1152/jn.00299.2016
Silver, M. A. & Kastner, S. Topographic maps in human frontal and parietal cortex. Trends Cogn. Sci. 13, 488–495 (2009).
pubmed: 19758835
pmcid: 2767426
doi: 10.1016/j.tics.2009.08.005
VanRullen, R. & Thorpe, S. J. The Time Course of Visual Processing: From Early Perception to Decision-Making. J. Cogn. Neurosci. 13, 454–461 (2001).
pubmed: 11388919
doi: 10.1162/08989290152001880
Esteban, O. et al. MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites. PLOS ONE 12, e0184661 (2017).
pubmed: 28945803
pmcid: 5612458
doi: 10.1371/journal.pone.0184661
Friston, K. J. et al. Statistical parametric maps in functional imaging: A general linear approach. Hum. Brain Mapp. 2, 189–210 (1994).
doi: 10.1002/hbm.460020402
Khosla, M., Ratan Murty, N. A. & Kanwisher, N. A highly selective response to food in human visual cortex revealed by hypothesis-free voxel decomposition. Curr. Biol. 32, 4159–4171.e9 (2022).
pubmed: 36027910
pmcid: 9561032
doi: 10.1016/j.cub.2022.08.009
Naselaris, T., Kay, K. N., Nishimoto, S. & Gallant, J. L. Encoding and decoding in fMRI. NeuroImage 56, 400–410 (2011).
pubmed: 20691790
doi: 10.1016/j.neuroimage.2010.07.073
Ratan Murty, N. A., Bashivan, P., Abate, A., DiCarlo, J. J. & Kanwisher, N. Computational models of category-selective brain regions enable high-throughput tests of selectivity. Nat. Commun. 12, 5540 (2021).
pubmed: 34545079
pmcid: 8452636
doi: 10.1038/s41467-021-25409-6
Schrimpf, M. et al. Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? 407007 Preprint at https://doi.org/10.1101/407007 (2020).
Haxby, J. V. Multivariate pattern analysis of fMRI: The early beginnings. NeuroImage 62, 852–855 (2012).
pubmed: 22425670
doi: 10.1016/j.neuroimage.2012.03.016
Haynes, J.-D. A Primer on Pattern-Based Approaches to fMRI: Principles, Pitfalls, and Perspectives. Neuron 87, 257–270 (2015).
pubmed: 26182413
doi: 10.1016/j.neuron.2015.05.025
Kriegeskorte, N. & Kievit, R. A. Representational geometry: integrating cognition, computation, and the brain. Trends Cogn. Sci. 17, 401–412 (2013).
pubmed: 23876494
pmcid: 3730178
doi: 10.1016/j.tics.2013.06.007
Kriegeskorte, N., Goebel, R. & Bandettini, P. Information-based functional brain mapping. Proc. Natl Acad. Sci. 103, 3863–3868 (2006).
pubmed: 16537458
pmcid: 1383651
doi: 10.1073/pnas.0600244103
Dayan, P. & Abbott, L. F. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. (Massachusetts Institute of Technology Press, Cambridge, Mass, 2001).
Chang, N. et al. BOLD5000, a public fMRI dataset while viewing 5000 visual images. Sci. Data 6, 49 (2019).
pubmed: 31061383
pmcid: 6502931
doi: 10.1038/s41597-019-0052-3
Rajalingham, R. et al. Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks. J. Neurosci. 38, 7255–7269 (2018).
pubmed: 30006365
pmcid: 6096043
doi: 10.1523/JNEUROSCI.0388-18.2018
Schrimpf, M. et al. Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence. Neuron 108, 413–423 (2020).
pubmed: 32918861
doi: 10.1016/j.neuron.2020.07.040
Yamins, D. L., Hong, H., Cadieu, C. & DiCarlo, J. J. Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream. in Advances in Neural Information Processing Systems 26 (Curran Associates, Inc., 2013).
Krekelberg, B., Dannenberg, S., Hoffmann, K.-P., Bremmer, F. & Ross, J. Neural correlates of implied motion. Nature 424, 674–677 (2003).
pubmed: 12904793
doi: 10.1038/nature01852
Senior, C. et al. The functional neuroanatomy of implicit-motion perception or ‘representational momentum. Curr. Biol. 10, 16–22 (2000).
pubmed: 10660297
doi: 10.1016/S0960-9822(99)00259-6
Shirai, N. & Imura, T. Implied motion perception from a still image in infancy. Exp. Brain Res. 232, 3079–3087 (2014).
pubmed: 24888536
doi: 10.1007/s00221-014-3996-8
Nishimoto, S. et al. Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies. Curr. Biol. 21, 1641–1646 (2011).
pubmed: 21945275
pmcid: 3326357
doi: 10.1016/j.cub.2011.08.031
Seeliger, K., Sommers, R. P., Güçlü, U., Bosch, S. E. & Gerven, M. A. J. van. A large single-participant fMRI dataset for probing brain responses to naturalistic stimuli in space and time. 687681 Preprint at https://doi.org/10.1101/687681 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. in 770–778 (2016).
Kietzmann, T. C. et al. Recurrence is required to capture the representational dynamics of the human visual system. Proc. Natl Acad. Sci. 116, 21854–21863 (2019).
pubmed: 31591217
pmcid: 6815174
doi: 10.1073/pnas.1905544116
Koivisto, M., Railo, H., Revonsuo, A., Vanni, S. & Salminen-Vaparanta, N. Recurrent Processing in V1/V2 Contributes to Categorization of Natural Scenes. J. Neurosci. 31, 2488–2492 (2011).
pubmed: 21325516
pmcid: 6623680
doi: 10.1523/JNEUROSCI.3074-10.2011
Pascual-Leone, A. & Walsh, V. Fast backprojections from the motion to the primary visual area necessary for visual awareness. Science 292, 510–512 (2001).
pubmed: 11313497
doi: 10.1126/science.1057099
Silvanto, J., Cowey, A., Lavie, N. & Walsh, V. Striate cortex (V1) activity gates awareness of motion. Nat. Neurosci. 8, 143–144 (2005).
pubmed: 15643428
doi: 10.1038/nn1379
Silvanto, J., Lavie, N. & Walsh, V. Double Dissociation of V1 and V5/MT activity in Visual Awareness. Cereb. Cortex 15, 1736–1741 (2005).
pubmed: 15703247
doi: 10.1093/cercor/bhi050
Kubilius, J. et al. Brain-like object recognition with high-performing shallow recurrent ANNs. in Advances in Neural Information Processing Systems 32 (Curran Associates, Inc., 2019).
Lin, J., Gan, C. & Han, S. TSM: Temporal Shift Module for Efficient Video Understanding. in 7083–7093 (2019).
Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A. & Oliva, A. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci. Rep. 6, 27755 (2016).
pubmed: 27282108
pmcid: 4901271
doi: 10.1038/srep27755
Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).
pubmed: 24812127
pmcid: 4060707
doi: 10.1073/pnas.1403112111
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. in 4510–4520 (2018).
Bertasius, G., Wang, H. & Torresani, L. Is Space-Time Attention All You Need for Video Understanding? in Proceedings of the 38th International Conference on Machine Learning 813–824 (PMLR, 2021).
Kay, W. et al. The Kinetics Human Action Video Dataset. Preprint at https://doi.org/10.48550/arXiv.1705.06950 (2017).
Miech, A. et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips. in 2630–2640 (2019).
Cichy, R. M., Pantazis, D. & Oliva, A. Similarity-based fusion of MEG and fMRI reveals spatio-temporal dynamics in human cortex during visual object recognition. Cereb. Cortex 26, 3563–3579 (2016).
pubmed: 27235099
pmcid: 4961022
doi: 10.1093/cercor/bhw135
Kriegeskorte, N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1, 417–446 (2015).
pubmed: 28532370
doi: 10.1146/annurev-vision-082114-035447
Eickenberg, M., Gramfort, A., Varoquaux, G. & Thirion, B. Seeing it all: convolutional network layers map the function of the human visual system. NeuroImage 152, 184–194 (2017).
pubmed: 27777172
doi: 10.1016/j.neuroimage.2016.10.001
Wurm, M. F., Caramazza, A. & Lingnau, A. Action categories in lateral occipitotemporal cortex are organized along sociality and transitivity. J. Neurosci. 37, 562–575 (2017).
pubmed: 28100739
pmcid: 6596756
doi: 10.1523/JNEUROSCI.1717-16.2016
Bakhtiari, S., Mineault, P., Lillicrap, T., Pack, C. & Richards, B. The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning. in Advances in Neural Information Processing Systems 34 25164–25178 (Curran Associates, Inc., 2021).
Güçlü, U. & van Gerven, M. A. J. Increasingly complex representations of natural movies across the dorsal stream are shared between subjects. NeuroImage 145, 329–336 (2017).
pubmed: 26724778
doi: 10.1016/j.neuroimage.2015.12.036
Honey, C. J. et al. Slow cortical dynamics and the accumulation of information over long timescales. Neuron 76, 423–434 (2012).
pubmed: 23083743
pmcid: 3517908
doi: 10.1016/j.neuron.2012.08.011
Wang, L. et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. in Computer Vision—ECCV 2016 eds. Leibe, B., Matas, J., Sebe, N. & Welling, M.) 20–36 (Springer International Publishing, Cham) https://doi.org/10.1007/978-3-319-46484-8_2 . (2016)
Kiebel, S. J., Daunizeau, J. & Friston, K. J. A hierarchy of time-scales and the brain. PLoS Comput. Biol. 4, e1000209 (2008).
pubmed: 19008936
pmcid: 2568860
doi: 10.1371/journal.pcbi.1000209
Kanwisher, N. Functional specificity in the human brain: a window into the functional architecture of the mind. Proc. Natl Acad. Sci. 107, 11163–11170 (2010).
pubmed: 20484679
pmcid: 2895137
doi: 10.1073/pnas.1005062107
Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311 (1997).
pubmed: 9151747
pmcid: 6573547
doi: 10.1523/JNEUROSCI.17-11-04302.1997
Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017).
doi: 10.1162/tacl_a_00051
Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-Networks. https://doi.org/10.48550/ARXIV.1908.10084 . (2019)
Downing, P. E., Jiang, Y., Shuman, M. & Kanwisher, N. A Cortical Area Selective for Visual Processing of the Human Body. Science 293, 2470–2473 (2001).
pubmed: 11577239
doi: 10.1126/science.1063414
Epstein, R. & Kanwisher, N. A cortical representation of the local visual environment. Nature 392, 598–601 (1998).
pubmed: 9560155
doi: 10.1038/33402
Grill-Spector, K., Kourtzi, Z. & Kanwisher, N. The lateral occipital complex and its role in object recognition. Vis. Res. 41, 1409–1422 (2001).
pubmed: 11322983
doi: 10.1016/S0042-6989(01)00073-6
Hardwick, R. M., Caspers, S., Eickhoff, S. B. & Swinnen, S. P. Neural correlates of action: Comparing meta-analyses of imagery, observation, and execution. Neurosci. Biobehav. Rev. 94, 31–44 (2018).
pubmed: 30098990
doi: 10.1016/j.neubiorev.2018.08.003
Doerig, A. et al. Semantic scene descriptions as an objective of human vision. https://doi.org/10.48550/ARXIV.2209.11737 . (2022)
Kosakowski, H. L. et al. Selective responses to faces, scenes, and bodies in the ventral visual pathway of infants. Curr. Biol. 32, 265–274.e5 (2022).
pubmed: 34784506
doi: 10.1016/j.cub.2021.10.064
Bonner, M. F. & Epstein, R. A. Object representations in the human brain reflect the co-occurrence statistics of vision and language. Nat. Commun. 12, 4081 (2021).
pubmed: 34215754
pmcid: 8253839
doi: 10.1038/s41467-021-24368-2
Wang, J. et al. GIT: a generative image-to-text transformer for vision and language. Preprint at https://doi.org/10.48550/arXiv.2205.14100 (2022).
Goetschalckx, L., Moors, P. & Wagemans, J. Image memorability across longer time intervals. Memory 26, 581–588 (2018).
pubmed: 28974153
doi: 10.1080/09658211.2017.1383435
Isola, P., Parikh, D., Torralba, A. & Oliva, A. Understanding the intrinsic memorability of images. in Advances in Neural Information Processing Systems 24 (Curran Associates, Inc., 2011).
Khosla, A., Raju, A. S., Torralba, A. & Oliva, A. Understanding and predicting image memorability at a large scale. in 2390–2398 (2015).
Jaegle, A. et al. Population response magnitude variation in inferotemporal cortex predicts image memorability. eLife 8, e47596 (2019).
pubmed: 31464687
pmcid: 6715346
doi: 10.7554/eLife.47596
Lahner, B., Mohsenzadeh, Y., Mullin, C. & Oliva, A. Visual perception of highly memorable images is mediated by a distributed network of ventral visual regions that enable a late memorability response. PLoS Biol. 22, e3002564 (2024).
pubmed: 38557761
pmcid: 10984539
doi: 10.1371/journal.pbio.3002564
Cohen, J. D. et al. Temporal dynamics of brain activation during a working memory task. Nature 386, 604–608 (1997).
pubmed: 9121583
doi: 10.1038/386604a0
Martin, A. & Chao, L. L. Semantic memory and the brain: structure and processes. Curr. Opin. Neurobiol. 11, 194–201 (2001).
pubmed: 11301239
doi: 10.1016/S0959-4388(00)00196-3
Riou, B., Lesourd, M., Brunel, L. & Versace, R. Visual memory and visual perception: when memory improves visual search. Mem. Cogn. 39, 1094–1102 (2011).
doi: 10.3758/s13421-011-0075-2
Slotnick, S. D., Thompson, W. L. & Kosslyn, S. M. Visual memory and visual mental imagery recruit common control and sensory regions of the brain. Cogn. Neurosci. 3, 14–20 (2012).
pubmed: 24168646
doi: 10.1080/17588928.2011.578210
Vermeulen, N., Corneille, O. & Niedenthal, P. M. Sensory load incurs conceptual processing costs. Cognition 109, 287–294 (2008).
pubmed: 18996513
doi: 10.1016/j.cognition.2008.09.004
Weinberger, N. M. Specific long-term memory traces in primary auditory cortex. Nat. Rev. Neurosci. 5, 279–290 (2004).
pubmed: 15034553
pmcid: 3590000
doi: 10.1038/nrn1366
Bainbridge, W. A. & Baker, C. I. Multidimensional memory topography in the medial parietal cortex identified from neuroimaging of thousands of daily memory videos. Nat. Commun. 13, 6508 (2022).
pubmed: 36316315
pmcid: 9622880
doi: 10.1038/s41467-022-34075-1
Furman, O., Dorfman, N., Hasson, U., Davachi, L. & Dudai, Y. They saw a movie: long-term memory for an extended audiovisual narrative. Learn. Mem. 14, 457–467 (2007).
pubmed: 17562897
pmcid: 1896095
doi: 10.1101/lm.550407
Boyle, J. A. et al. The Courtois project on neuronal modelling-first data release. in 26th annual meeting of the organization for human brain mapping (2020).
Zhou, M. et al. A large-scale fMRI dataset for human action recognition. Sci. Data 10, 415 (2023).
pubmed: 37369643
pmcid: 10300118
doi: 10.1038/s41597-023-02325-6
Horikawa, T. & Kamitani, Y. Generic decoding of seen and imagined objects using hierarchical visual features. Nat. Commun. 8, 15037 (2017).
pubmed: 28530228
pmcid: 5458127
doi: 10.1038/ncomms15037
Heim, S. et al. The role of human parietal area 7A as a link between sequencing in hand actions and in overt speech production. Front. Psychol. 3, 534 (2012).
pubmed: 23227016
pmcid: 3514541
doi: 10.3389/fpsyg.2012.00534
Peeters, R. et al. The representation of tool use in humans and monkeys: common and uniquely human features. J. Neurosci. 29, 11523–11539 (2009).
pubmed: 19759300
pmcid: 6665774
doi: 10.1523/JNEUROSCI.2040-09.2009
Peeters, R. R., Rizzolatti, G. & Orban, G. A. Functional properties of the left parietal tool use region. NeuroImage 78, 83–93 (2013).
pubmed: 23591073
doi: 10.1016/j.neuroimage.2013.04.023
Murray, J. D. et al. A hierarchy of intrinsic timescales across primate cortex. Nat. Neurosci. 17, 1661–1663 (2014).
pubmed: 25383900
pmcid: 4241138
doi: 10.1038/nn.3862
Piasini, E. et al. Temporal stability of stimulus representation increases along rodent visual cortical hierarchies. Nat. Commun. 12, 4448 (2021).
pubmed: 34290247
pmcid: 8295255
doi: 10.1038/s41467-021-24456-3
Hu, M., Ge, P., Wang, X., Lin, H. & Ren, F. A spatio-temporal integrated model based on local and global features for video expression recognition. Vis. Comput. 38, 2617–2634 (2022).
doi: 10.1007/s00371-021-02136-z
Kahou, S. E. et al. EmoNets: multimodal deep learning approaches for emotion recognition in video. J. Multimodal Use. Interfaces 10, 99–111 (2016).
doi: 10.1007/s12193-015-0195-2
Tzirakis, P., Zhang, J. & Schuller, B. W. End-to-End Speech Emotion Recognition Using Deep Neural Networks. in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5089–5093. https://doi.org/10.1109/ICASSP.2018.8462677 . (2018)
Carreira, J. & Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. in 6299–6308 (2017).
Feichtenhofer, C., Pinz, A. & Zisserman, A. Convolutional Two-Stream Network Fusion for Video Action Recognition. in 1933–1941 (2016).
Ji, S., Xu, W., Yang, M. & Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2013).
pubmed: 22392705
doi: 10.1109/TPAMI.2012.59
Wang, Y. et al. Eidetic 3D LSTM: A Model for Video Prediction and Beyond. in (2023).
Fan, L., Zhang, T. & Du, W. Optical-flow-based framework to boost video object detection performance with object enhancement. Expert Syst. Appl. 170, 114544 (2021).
doi: 10.1016/j.eswa.2020.114544
Shafiee, M. J., Chywl, B., Li, F. & Wong, A. Fast YOLO: A fast you only look once system for real-time embedded object detection in video. https://doi.org/10.48550/ARXIV.1709.05943 (2017).
Chen, Z., Qing, J. & Zhou, J. H. Cinematic mindscapes: high-quality video reconstruction from brain activity. In Advances in Neural Information Processing Systems 36 (Curran Associates, Inc., 2024).
Kupershmidt, G., Beliy, R., Gaziv, G. & Irani, M. A Penny for Your (visual) Thoughts: Self-Supervised Reconstruction of Natural Movies from Brain Activity. https://doi.org/10.48550/ARXIV.2206.03544 (2022).
Luo, A. F., Henderson, M. M., Wehbe, L. & Tarr, M. J. Brain diffusion for visual exploration: cortical discovery using large scale generative models. In Advances in Neural Information Processing Systems 36 (Curran Associates, Inc., 2024).
Gu, Z. et al. NeuroGen: activation optimized image synthesis for discovery neuroscience. NeuroImage 247, 118812 (2022).
pubmed: 34936922
doi: 10.1016/j.neuroimage.2021.118812
Han, K. et al. Variational autoencoder: an unsupervised model for encoding and decoding fMRI activity in visual cortex. NeuroImage 198, 125–136 (2019).
pubmed: 31103784
doi: 10.1016/j.neuroimage.2019.05.039
Shmuelof, L. & Zohary, E. Dissociation between ventral and dorsal fMRI activation during object and action recognition. Neuron 47, 457–470 (2005).
pubmed: 16055068
doi: 10.1016/j.neuron.2005.06.034
Spunt, R. P., Satpute, A. B. & Lieberman, M. D. Identifying the what, why, and how of an observed action: an fMRI study of mentalizing and mechanizing during action observation. J. Cogn. Neurosci. 23, 63–74 (2011).
pubmed: 20146607
doi: 10.1162/jocn.2010.21446
Urgen, B. A., Pehlivan, S. & Saygin, A. P. Distinct representations in occipito-temporal, parietal, and premotor cortex during action perception revealed by fMRI and computational modeling. Neuropsychologia 127, 35–47 (2019).
pubmed: 30772426
doi: 10.1016/j.neuropsychologia.2019.02.006
Julian, J. B., Fedorenko, E., Webster, J. & Kanwisher, N. An algorithmic method for functionally defining regions of interest in the ventral visual pathway. NeuroImage 60, 2357–2364 (2012).
pubmed: 22398396
doi: 10.1016/j.neuroimage.2012.02.055
Guzman-Martinez, E., Leung, P., Franconeri, S., Grabowecky, M. & Suzuki, S. Rapid eye-fixation training without eyetracking. Psychon. Bull. Rev. 16, 491–496 (2009).
pubmed: 19451374
pmcid: 2777709
doi: 10.3758/PBR.16.3.491
Adelson, E. H. & Bergen, J. R. Spatiotemporal energy models for the perception of motion. JOSA A 2, 284–299 (1985).
doi: 10.1364/JOSAA.2.000284
Watson, A. B. & Ahumada, A. J. Model of human visual-motion sensing. JOSA A 2, 322–342 (1985).
doi: 10.1364/JOSAA.2.000322
Born, R. T. & Bradley, D. C. Structure and function of visual area Mt. Annu. Rev. Neurosci. 28, 157–189 (2005).
pubmed: 16022593
doi: 10.1146/annurev.neuro.26.041002.131052
Nishimoto, S. & Gallant, J. L. A three-dimensional spatiotemporal receptive field model explains responses of area MT neurons to naturalistic movies. J. Neurosci. 31, 14551–14564 (2011).
pubmed: 21994372
pmcid: 3338855
doi: 10.1523/JNEUROSCI.6801-10.2011
Kamitani, Y. & Tong, F. Decoding seen and attended motion directions from activity in the human visual cortex. Curr. Biol. 16, 1096–1102 (2006).
pubmed: 16753563
pmcid: 1635016
doi: 10.1016/j.cub.2006.04.003
Roe, A. W. et al. Toward a unified theory of visual area V4. Neuron 74, 12–29 (2012).
pubmed: 22500626
pmcid: 4912377
doi: 10.1016/j.neuron.2012.03.011
Smith, A. T., Greenlee, M. W., Singh, K. D., Kraemer, F. M. & Hennig, J. The processing of first- and second-order motion in human visual cortex assessed by functional magnetic resonance imaging (fMRI). J. Neurosci. 18, 3816–3830 (1998).
pubmed: 9570811
pmcid: 6793149
doi: 10.1523/JNEUROSCI.18-10-03816.1998
Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Zenodo https://doi.org/10.5281/zenodo.7430291 (2022).
Esteban, O et al. nipy/nipype: 1.8.3. Zenodo https://doi.org/10.5281/ZENODO.596855 (2022).
Gorgolewski, K. et al. Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python. Front. Neuroinformatics 5, 13 (2011).
doi: 10.3389/fninf.2011.00013
Tustison, N. J. et al. N4ITK: improved N3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320 (2010).
pubmed: 20378467
pmcid: 3071855
doi: 10.1109/TMI.2010.2046908
Avants, B. B., Epstein, C. L., Grossman, M. & Gee, J. C. Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 12, 26–41 (2008).
pubmed: 17659998
doi: 10.1016/j.media.2007.06.004
Zhang, Y., Brady, M. & Smith, S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 20, 45–57 (2001).
pubmed: 11293691
doi: 10.1109/42.906424
Dale, A. M., Fischl, B. & Sereno, M. I. Cortical surface-based analysis. NeuroImage 9, 179–194 (1999).
pubmed: 9931268
doi: 10.1006/nimg.1998.0395
Klein, A. et al. Mindboggling morphometry of human brains. PLoS Comput. Biol. 13, e1005350 (2017).
pubmed: 28231282
pmcid: 5322885
doi: 10.1371/journal.pcbi.1005350
Fonov, V., Evans, A., McKinstry, R., Almli, C. & Collins, D. Unbiased nonlinear average age-appropriate brain templates from birth to adulthood. NeuroImage 47, S102 (2009).
doi: 10.1016/S1053-8119(09)70884-5
Glasser, M. F. et al. The minimal preprocessing pipelines for the human connectome project. NeuroImage 80, 105–124 (2013).
pubmed: 23668970
doi: 10.1016/j.neuroimage.2013.04.127
Greve, D. N. & Fischl, B. Accurate and robust brain image alignment using boundary-based registration. NeuroImage 48, 63–72 (2009).
pubmed: 19573611
doi: 10.1016/j.neuroimage.2009.06.060
Jenkinson, M., Bannister, P., Brady, M. & Smith, S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage 17, 825–841 (2002).
pubmed: 12377157
doi: 10.1006/nimg.2002.1132
Cox, R. W. & Hyde, J. S. Software tools for analysis and visualization of fMRI data. NMR Biomed. 10, 171–178 (1997).
pubmed: 9430344
doi: 10.1002/(SICI)1099-1492(199706/08)10:4/5<171::AID-NBM453>3.0.CO;2-L
Power, J. D. et al. Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage 84, 320–341 (2014).
pubmed: 23994314
doi: 10.1016/j.neuroimage.2013.08.048
Behzadi, Y., Restom, K., Liau, J. & Liu, T. T. A component based noise correction method (CompCor) for BOLD and perfusion based fMRI. NeuroImage 37, 90–101 (2007).
pubmed: 17560126
doi: 10.1016/j.neuroimage.2007.04.042
Satterthwaite, T. D. et al. An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data. NeuroImage 64, 240–256 (2013).
pubmed: 22926292
doi: 10.1016/j.neuroimage.2012.08.052
Lanczos, C. Evaluation of noisy data. J. Soc. Ind. Appl. Math. Ser. B Numer. Anal. 1, 76–85 (1964).
doi: 10.1137/0701007
Abraham, A. et al. Machine learning for neuroimaging with scikit-learn. Front. Neuroinformatics 8, 14 (2014).
doi: 10.3389/fninf.2014.00014
Wang, L., Mruczek, R. E. B., Arcaro, M. J. & Kastner, S. Probabilistic Maps of Visual Topography in Human Cortex. Cereb. Cortex 25, 3911–3931 (2015).
pubmed: 25452571
doi: 10.1093/cercor/bhu277
Glasser, M. F. et al. A multi-modal parcellation of human cerebral cortex. Nature 536, 171–178 (2016).
pubmed: 27437579
pmcid: 4990127
doi: 10.1038/nature18933
Lage-Castellanos, A., Valente, G., Formisano, E. & Martino, F. D. Methods for computing the maximum performance of computational models of fMRI responses. PLOS Comput. Biol. 15, e1006397 (2019).
pubmed: 30849071
pmcid: 6426260
doi: 10.1371/journal.pcbi.1006397
Carlin, J. D. & Kriegeskorte, N. Adjudicating between face-coding models with individual-face fMRI responses. PLOS Comput. Biol. 13, e1005604 (2017).
pubmed: 28746335
pmcid: 5550004
doi: 10.1371/journal.pcbi.1005604
Nili, H. et al. A toolbox for representational similarity analysis. PLOS Comput. Biol. 10, e1003553 (2014).
pubmed: 24743308
pmcid: 3990488
doi: 10.1371/journal.pcbi.1003553
Li, Y., Song, Y. & Luo, J. Improving Pairwise Ranking for Multi-Label Image Classification. in 3617–3625 (2017).
Kriegeskorte, N. & Douglas, P. K. Interpreting encoding and decoding models. Curr. Opin. Neurobiol. 55, 167–179 (2019).
pubmed: 31039527
pmcid: 6705607
doi: 10.1016/j.conb.2019.04.002
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
doi: 10.1145/3065386
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
doi: 10.1007/s11263-015-0816-y