Modeling short visual events through the BOLD moments video fMRI dataset and metadata.

Carandini, M. Do We Know What the Early Visual System Does? J. Neurosci. 25, 10577–10597 (2005).

pubmed: 16291931 pmcid: 6725861 doi: 10.1523/JNEUROSCI.3726-05.2005

DeYoe, E. A. & Van Essen, D. C. Concurrent processing streams in monkey visual cortex. Trends Neurosci. 11, 219–226 (1988).

pubmed: 2471327 doi: 10.1016/0166-2236(88)90130-0

DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How Does the Brain Solve Visual Object Recognition? Neuron 73, 415–434 (2012).

pubmed: 22325196 pmcid: 3306444 doi: 10.1016/j.neuron.2012.01.010

Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex N. Y. N. 1, 1–47 (1991).

doi: 10.1093/cercor/1.1.1

Logothetis, N. K. & Sheinberg, D. L. Visual object recognition. Annu. Rev. Neurosci. 19, 577–621 (1996).

pubmed: 8833455 doi: 10.1146/annurev.ne.19.030196.003045

Ress, D. & Heeger, D. J. Neuronal correlates of perception in early visual cortex. Nat. Neurosci. 6, 414–420 (2003).

pubmed: 12627164 pmcid: 2278238 doi: 10.1038/nn1024

Fairhall, S. L., Albi, A. & Melcher, D. Temporal Integration Windows for Naturalistic Visual Sequences. PLoS ONE 9, e102248 (2014).

pubmed: 25010517 pmcid: 4092072 doi: 10.1371/journal.pone.0102248

Hasson, U., Yang, E., Vallines, I., Heeger, D. J. & Rubin, N. A Hierarchy of Temporal Receptive Windows in Human Cortex. J. Neurosci. 28, 2539–2550 (2008).

pubmed: 18322098 pmcid: 2556707 doi: 10.1523/JNEUROSCI.5487-07.2008

Lingnau, A. & Downing, P. E. The lateral occipitotemporal cortex in action. Trends Cogn. Sci. 19, 268–277 (2015).

pubmed: 25843544 doi: 10.1016/j.tics.2015.03.006

Orlov, T. & Zohary, E. Object Representations in Human Visual Cortex Formed Through Temporal Integration of Dynamic Partial Shape Views. J. Neurosci. 38, 659–678 (2018).

pubmed: 29196319 pmcid: 6596194 doi: 10.1523/JNEUROSCI.1318-17.2017

Wurm, M. F. & Caramazza, A. Two ‘what’ pathways for action and object recognition. Trends Cogn. Sci. 26, 103–116 (2022).

pubmed: 34702661 doi: 10.1016/j.tics.2021.10.003

McMahon, E., Bonner, M. F. & Isik, L. Hierarchical organization of social action features along the lateral visual pathway. Curr. Biol. 33, 5035–5047.e8 (2023).

pubmed: 37918399 doi: 10.1016/j.cub.2023.10.015

Pitcher, D., Dilks, D. D., Saxe, R. R., Triantafyllou, C. & Kanwisher, N. Differential selectivity for dynamic versus static information in face-selective cortical regions. NeuroImage 56, 2356–2363 (2011).

pubmed: 21473921 doi: 10.1016/j.neuroimage.2011.03.067

Pitcher, D. & Ungerleider, L. G. Evidence for a Third Visual Pathway Specialized for Social Perception. Trends Cogn. Sci. 25, 100–110 (2021).

pubmed: 33334693 doi: 10.1016/j.tics.2020.11.006

Bainbridge, W. A. Chapter One—Memorability: How what we see influences what we remember. in Psychology of Learning and Motivation (eds. Federmeier, K. D. & Beck, D. M.) 70 1–27 (Academic Press, 2019).

Bylinskii, Z., Goetschalckx, L., Newman, A. & Oliva, A. Memorability: An Image-Computable Measure of Information Utility. in Human Perception of Visual Information (eds. Ionescu, B., Bainbridge, W. A. & Murray, N.) 207–239 (Springer International Publishing, Cham, 2022). https://doi.org/10.1007/978-3-030-81465-6_8 .

Han, J. et al. Learning Computational Models of Video Memorability from fMRI Brain Imaging. IEEE Trans. Cybern. 45, 1692–1703 (2015).

pubmed: 25314715 doi: 10.1109/TCYB.2014.2358647

Hasson, U., Furman, O., Clark, D., Dudai, Y. & Davachi, L. Enhanced Intersubject Correlations during Movie Viewing Correlate with Successful Episodic Encoding. Neuron 57, 452–462 (2008).

pubmed: 18255037 pmcid: 2789242 doi: 10.1016/j.neuron.2007.12.009

Schneider, W. X. Selective visual processing across competition episodes: a theory of task-driven visual attention and working memory. Philos. Trans. R. Soc. B Biol. Sci. 368, 20130060 (2013).

doi: 10.1098/rstb.2013.0060

Bartels, A. & Zeki, S. Functional brain mapping during free viewing of natural scenes. Hum. Brain Mapp. 21, 75–85 (2004).

pubmed: 14755595 doi: 10.1002/hbm.10153

Konen, C. S. & Kastner, S. Representation of Eye Movements and Stimulus Motion in Topographically Organized Areas of Human Posterior Parietal Cortex. J. Neurosci. 28, 8361–8375 (2008).

pubmed: 18701699 pmcid: 2685070 doi: 10.1523/JNEUROSCI.1930-08.2008

Press, W. A., Brewer, A. A., Dougherty, R. F., Wade, A. R. & Wandell, B. A. Visual areas and spatial summation in human visual cortex. Vis. Res. 41, 1321–1332 (2001).

pubmed: 11322977 doi: 10.1016/S0042-6989(01)00074-8

Schultz, J. & Pilz, K. S. Natural facial motion enhances cortical responses to faces. Exp. Brain Res. 194, 465–475 (2009).

pubmed: 19205678 pmcid: 2755747 doi: 10.1007/s00221-009-1721-9

Yildirim, I., Wu, J., Kanwisher, N. & Tenenbaum, J. An integrative computational architecture for object-driven cortex. Curr. Opin. Neurobiol. 55, 73–81 (2019).

pubmed: 30825704 pmcid: 6548583 doi: 10.1016/j.conb.2019.01.010

Buccino, G. et al. Action Observation Activates Premotor and Parietal Areas in a Somatotopic Manner: An fMRI Study. in Social Neuroscience (Psychology Press, 2004).

Kret, M. E., Pichon, S., Grèzes, J. & de Gelder, B. Similarities and differences in perceiving threat from dynamic faces and bodies. An fMRI study. NeuroImage 54, 1755–1762 (2011).

pubmed: 20723605 doi: 10.1016/j.neuroimage.2010.08.012

Hasson, U. et al. Neurocinematics: The Neuroscience of Film. Projections 2, 1–26 (2008).

doi: 10.3167/proj.2008.020102

Roberts, J., Wallis, G. & Breakspear, M. Fixational eye movements during viewing of dynamic natural scenes. Front. Psychol. 4, 797 (2013).

pubmed: 24194727 pmcid: 3810780 doi: 10.3389/fpsyg.2013.00797

Kriegeskorte, N. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. https://doi.org/10.3389/neuro.06.004.2008 (2008).

Bainbridge, W. A., Dilks, D. D. & Oliva, A. Memorability: A stimulus-driven perceptual neural signature distinctive from memory. NeuroImage 149, 141–152 (2017).

pubmed: 28132932 doi: 10.1016/j.neuroimage.2017.01.063

Bainbridge, W. A. & Rissman, J. Dissociating neural markers of stimulus memorability and subjective recognition during episodic retrieval. Sci. Rep. 8, 1–11 (2018).

doi: 10.1038/s41598-018-26467-5

Mohsenzadeh, Y., Mullin, C., Oliva, A. & Pantazis, D. The perceptual neural trace of memorable unseen scenes. Sci. Rep. 9, 6033 (2019).

pubmed: 30988333 pmcid: 6465597 doi: 10.1038/s41598-019-42429-x

Misaki, M., Luh, W.-M. & Bandettini, P. A. Accurate decoding of sub-TR timing differences in stimulations of sub-voxel regions from multi-voxel response patterns. NeuroImage 66, 623–633 (2013).

pubmed: 23128073 doi: 10.1016/j.neuroimage.2012.10.069

Prince, J. S. et al. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife 11, e77599 (2022).

pubmed: 36444984 pmcid: 9708069 doi: 10.7554/eLife.77599

Wittkuhn, L. & Schuck, N. W. Dynamics of fMRI patterns reflect sub-second activation sequences and reveal replay in human visual cortex. Nat. Commun. 12, 1795 (2021).

pubmed: 33741933 pmcid: 7979874 doi: 10.1038/s41467-021-21970-2

Mineault, P., Bakhtiari, S., Richards, B. & Pack, C. Your head is there to move you around: Goal-driven models of the primate dorsal pathway. in Advances in Neural Information Processing Systems 34 28757–28771 (Curran Associates, Inc., 2021).

Aliko, S., Huang, J., Gheorghiu, F., Meliss, S. & Skipper, J. I. A naturalistic neuroimaging database for understanding the brain using ecological stimuli. Sci. Data 7, 347 (2020).

pubmed: 33051448 pmcid: 7555491 doi: 10.1038/s41597-020-00680-2

Allen, E. J. et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 25, 116–126 (2022).

pubmed: 34916659 doi: 10.1038/s41593-021-00962-x

Hanke, M. et al. A studyforrest extension, simultaneous fMRI and eye gaze recordings during prolonged natural stimulation. Sci. Data 3, 160092 (2016).

pubmed: 27779621 pmcid: 5079121 doi: 10.1038/sdata.2016.92

Hebart, M. N. et al. THINGS: A database of 1,854 object concepts and more than 26,000 naturalistic object images. PLOS ONE 14, e0223792 (2019).

pubmed: 31613926 pmcid: 6793944 doi: 10.1371/journal.pone.0223792

Newman, A. et al. Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability. in Computer Vision—ECCV 2020 (eds. Vedaldi, A., Bischof, H., Brox, T. & Frahm, J.-M.) 223–240 (Springer International Publishing, Cham). https://doi.org/10.1007/978-3-030-58517-4_14 (2020).

Monfort, M. et al. Moments in Time Dataset: One Million Videos for Event Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42, 502–508 (2020).

pubmed: 30802849 doi: 10.1109/TPAMI.2019.2901464

Monfort, M. et al. Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 44, 9434–9445 (2022).

pubmed: 34752386 doi: 10.1109/TPAMI.2021.3126682

Berkes, P., Orbán, G., Lengyel, M. & Fiser, J. Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment. Science 331, 83–87 (2011).

pubmed: 21212356 pmcid: 3065813 doi: 10.1126/science.1195870

Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

pubmed: 8637596 doi: 10.1038/381607a0

Olshausen, B. A. & Field, D. J. Natural image statistics and efficient coding. Netw. Comput. Neural Syst. 7, 333–339 (1996).

doi: 10.1088/0954-898X_7_2_014

Smyth, D., Willmore, B., Baker, G. E., Thompson, I. D. & Tolhurst, D. J. The Receptive-Field Organization of Simple Cells in Primary Visual Cortex of Ferrets under Natural Scene Stimulation. J. Neurosci. 23, 4746–4759 (2003).

pubmed: 12805314 pmcid: 6740783 doi: 10.1523/JNEUROSCI.23-11-04746.2003

Baddeley, A. Working Memory. Science 255, 556–559 (1992).

pubmed: 1736359 doi: 10.1126/science.1736359

Barrouillet, P., Bernardin, S. & Camos, V. Time Constraints and Resource Sharing in Adults’ Working Memory Spans. J. Exp. Psychol. Gen. 133, 83–100 (2004).

pubmed: 14979753 doi: 10.1037/0096-3445.133.1.83

Hasson, U., Nir, Y., Levy, I., Fuhrmann, G. & Malach, R. Intersubject Synchronization of Cortical Activity During Natural Vision. Science 303, 1634–1640 (2004).

pubmed: 15016991 doi: 10.1126/science.1089506

Haxby, J. V. et al. A Common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron 72, 404–416 (2011).

pubmed: 22017997 pmcid: 3201764 doi: 10.1016/j.neuron.2011.08.026

Haxby, J. V., Guntupalli, J. S., Nastase, S. A. & Feilong, M. Hyperalignment: modeling shared information encoded in idiosyncratic cortical topographies. eLife 9, e56601 (2020).

pubmed: 32484439 pmcid: 7266639 doi: 10.7554/eLife.56601

Buccino, G., Binkofski, F. & Riggio, L. The mirror neuron system and action recognition. Brain Lang. 89, 370–376 (2004).

pubmed: 15068920 doi: 10.1016/S0093-934X(03)00356-0

Buccino, G. et al. Neural circuits underlying imitation learning of hand actions: an event-related fMRI study. Neuron 42, 323–334 (2004).

pubmed: 15091346 doi: 10.1016/S0896-6273(04)00181-3

Gazzola, V. & Keysers, C. The observation and execution of actions share motor and somatosensory voxels in all tested subjects: single-subject analyses of unsmoothed fMRI data. Cereb. Cortex 19, 1239–1255 (2009).

pubmed: 19020203 doi: 10.1093/cercor/bhn181

Rizzolatti, G. & Sinigaglia, C. The functional role of the parieto-frontal mirror circuit: interpretations and misinterpretations. Nat. Rev. Neurosci. 11, 264–274 (2010).

pubmed: 20216547 doi: 10.1038/nrn2805

Lafer-Sousa, R., Conway, B. R. & Kanwisher, N. G. Color-biased regions of the ventral visual pathway lie between face- and place-selective regions in humans, as in Macaques. J. Neurosci. 36, 1682–1697 (2016).

pubmed: 26843649 pmcid: 4737777 doi: 10.1523/JNEUROSCI.3164-15.2016

Hutchison, R. M. et al. Dynamic functional connectivity: Promise, issues, and interpretations. NeuroImage 80, 360–378 (2013).

pubmed: 23707587 doi: 10.1016/j.neuroimage.2013.05.079

Smith, S. M. et al. Functional connectomics from resting-state fMRI. Trends Cogn. Sci. 17, 666–682 (2013).

pubmed: 24238796 pmcid: 4004765 doi: 10.1016/j.tics.2013.09.016

Zhou, B., Lapedriza, A., Khosla, A., Oliva, A. & Torralba, A. Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2018).

pubmed: 28692961 doi: 10.1109/TPAMI.2017.2723009

Hebart, M. N. et al. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife 12, e82580 (2023).

pubmed: 36847339 pmcid: 10038662 doi: 10.7554/eLife.82580

Monfort, M. et al. Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions. in 14871–14881 (2021).

Gorgolewski, K. J. et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data 3, 160044 (2016).

pubmed: 27326542 pmcid: 4978148 doi: 10.1038/sdata.2016.44

Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2019).

pubmed: 30532080 doi: 10.1038/s41592-018-0235-4

Kay, K., Jamison, K. W., Zhang, R.-Y. & Uğurbil, K. A temporal decomposition method for identifying venous effects in task-based fMRI. Nat. Methods 17, 1033–1039 (2020).

pubmed: 32895538 pmcid: 7721302 doi: 10.1038/s41592-020-0941-6

Le, A., Vesia, M., Yan, X., Crawford, J. D. & Niemeier, M. Parietal area BA7 integrates motor programs for reaching, grasping, and bimanual coordination. J. Neurophysiol. 117, 624–636 (2017).

pubmed: 27832593 doi: 10.1152/jn.00299.2016

Silver, M. A. & Kastner, S. Topographic maps in human frontal and parietal cortex. Trends Cogn. Sci. 13, 488–495 (2009).

pubmed: 19758835 pmcid: 2767426 doi: 10.1016/j.tics.2009.08.005

VanRullen, R. & Thorpe, S. J. The Time Course of Visual Processing: From Early Perception to Decision-Making. J. Cogn. Neurosci. 13, 454–461 (2001).

pubmed: 11388919 doi: 10.1162/08989290152001880

Esteban, O. et al. MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites. PLOS ONE 12, e0184661 (2017).

pubmed: 28945803 pmcid: 5612458 doi: 10.1371/journal.pone.0184661

Friston, K. J. et al. Statistical parametric maps in functional imaging: A general linear approach. Hum. Brain Mapp. 2, 189–210 (1994).

doi: 10.1002/hbm.460020402

Khosla, M., Ratan Murty, N. A. & Kanwisher, N. A highly selective response to food in human visual cortex revealed by hypothesis-free voxel decomposition. Curr. Biol. 32, 4159–4171.e9 (2022).

pubmed: 36027910 pmcid: 9561032 doi: 10.1016/j.cub.2022.08.009

Naselaris, T., Kay, K. N., Nishimoto, S. & Gallant, J. L. Encoding and decoding in fMRI. NeuroImage 56, 400–410 (2011).

pubmed: 20691790 doi: 10.1016/j.neuroimage.2010.07.073

Ratan Murty, N. A., Bashivan, P., Abate, A., DiCarlo, J. J. & Kanwisher, N. Computational models of category-selective brain regions enable high-throughput tests of selectivity. Nat. Commun. 12, 5540 (2021).

pubmed: 34545079 pmcid: 8452636 doi: 10.1038/s41467-021-25409-6

Schrimpf, M. et al. Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? 407007 Preprint at https://doi.org/10.1101/407007 (2020).

Haxby, J. V. Multivariate pattern analysis of fMRI: The early beginnings. NeuroImage 62, 852–855 (2012).

pubmed: 22425670 doi: 10.1016/j.neuroimage.2012.03.016

Haynes, J.-D. A Primer on Pattern-Based Approaches to fMRI: Principles, Pitfalls, and Perspectives. Neuron 87, 257–270 (2015).

pubmed: 26182413 doi: 10.1016/j.neuron.2015.05.025

Kriegeskorte, N. & Kievit, R. A. Representational geometry: integrating cognition, computation, and the brain. Trends Cogn. Sci. 17, 401–412 (2013).

pubmed: 23876494 pmcid: 3730178 doi: 10.1016/j.tics.2013.06.007

Kriegeskorte, N., Goebel, R. & Bandettini, P. Information-based functional brain mapping. Proc. Natl Acad. Sci. 103, 3863–3868 (2006).

pubmed: 16537458 pmcid: 1383651 doi: 10.1073/pnas.0600244103

Dayan, P. & Abbott, L. F. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. (Massachusetts Institute of Technology Press, Cambridge, Mass, 2001).

Chang, N. et al. BOLD5000, a public fMRI dataset while viewing 5000 visual images. Sci. Data 6, 49 (2019).

pubmed: 31061383 pmcid: 6502931 doi: 10.1038/s41597-019-0052-3

Rajalingham, R. et al. Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks. J. Neurosci. 38, 7255–7269 (2018).

pubmed: 30006365 pmcid: 6096043 doi: 10.1523/JNEUROSCI.0388-18.2018

Schrimpf, M. et al. Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence. Neuron 108, 413–423 (2020).

pubmed: 32918861 doi: 10.1016/j.neuron.2020.07.040

Yamins, D. L., Hong, H., Cadieu, C. & DiCarlo, J. J. Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream. in Advances in Neural Information Processing Systems 26 (Curran Associates, Inc., 2013).

Krekelberg, B., Dannenberg, S., Hoffmann, K.-P., Bremmer, F. & Ross, J. Neural correlates of implied motion. Nature 424, 674–677 (2003).

pubmed: 12904793 doi: 10.1038/nature01852

Senior, C. et al. The functional neuroanatomy of implicit-motion perception or ‘representational momentum. Curr. Biol. 10, 16–22 (2000).

pubmed: 10660297 doi: 10.1016/S0960-9822(99)00259-6

Shirai, N. & Imura, T. Implied motion perception from a still image in infancy. Exp. Brain Res. 232, 3079–3087 (2014).

pubmed: 24888536 doi: 10.1007/s00221-014-3996-8

Nishimoto, S. et al. Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies. Curr. Biol. 21, 1641–1646 (2011).

pubmed: 21945275 pmcid: 3326357 doi: 10.1016/j.cub.2011.08.031

Seeliger, K., Sommers, R. P., Güçlü, U., Bosch, S. E. & Gerven, M. A. J. van. A large single-participant fMRI dataset for probing brain responses to naturalistic stimuli in space and time. 687681 Preprint at https://doi.org/10.1101/687681 (2019).

He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. in 770–778 (2016).

Kietzmann, T. C. et al. Recurrence is required to capture the representational dynamics of the human visual system. Proc. Natl Acad. Sci. 116, 21854–21863 (2019).

pubmed: 31591217 pmcid: 6815174 doi: 10.1073/pnas.1905544116

Koivisto, M., Railo, H., Revonsuo, A., Vanni, S. & Salminen-Vaparanta, N. Recurrent Processing in V1/V2 Contributes to Categorization of Natural Scenes. J. Neurosci. 31, 2488–2492 (2011).

pubmed: 21325516 pmcid: 6623680 doi: 10.1523/JNEUROSCI.3074-10.2011

Pascual-Leone, A. & Walsh, V. Fast backprojections from the motion to the primary visual area necessary for visual awareness. Science 292, 510–512 (2001).

pubmed: 11313497 doi: 10.1126/science.1057099

Silvanto, J., Cowey, A., Lavie, N. & Walsh, V. Striate cortex (V1) activity gates awareness of motion. Nat. Neurosci. 8, 143–144 (2005).

pubmed: 15643428 doi: 10.1038/nn1379

Silvanto, J., Lavie, N. & Walsh, V. Double Dissociation of V1 and V5/MT activity in Visual Awareness. Cereb. Cortex 15, 1736–1741 (2005).

pubmed: 15703247 doi: 10.1093/cercor/bhi050

Kubilius, J. et al. Brain-like object recognition with high-performing shallow recurrent ANNs. in Advances in Neural Information Processing Systems 32 (Curran Associates, Inc., 2019).

Lin, J., Gan, C. & Han, S. TSM: Temporal Shift Module for Efficient Video Understanding. in 7083–7093 (2019).

Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A. & Oliva, A. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci. Rep. 6, 27755 (2016).

pubmed: 27282108 pmcid: 4901271 doi: 10.1038/srep27755

Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).

pubmed: 24812127 pmcid: 4060707 doi: 10.1073/pnas.1403112111

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. in 4510–4520 (2018).

Bertasius, G., Wang, H. & Torresani, L. Is Space-Time Attention All You Need for Video Understanding? in Proceedings of the 38th International Conference on Machine Learning 813–824 (PMLR, 2021).

Kay, W. et al. The Kinetics Human Action Video Dataset. Preprint at https://doi.org/10.48550/arXiv.1705.06950 (2017).

Miech, A. et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips. in 2630–2640 (2019).

Cichy, R. M., Pantazis, D. & Oliva, A. Similarity-based fusion of MEG and fMRI reveals spatio-temporal dynamics in human cortex during visual object recognition. Cereb. Cortex 26, 3563–3579 (2016).

pubmed: 27235099 pmcid: 4961022 doi: 10.1093/cercor/bhw135

Kriegeskorte, N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1, 417–446 (2015).

pubmed: 28532370 doi: 10.1146/annurev-vision-082114-035447

Eickenberg, M., Gramfort, A., Varoquaux, G. & Thirion, B. Seeing it all: convolutional network layers map the function of the human visual system. NeuroImage 152, 184–194 (2017).

pubmed: 27777172 doi: 10.1016/j.neuroimage.2016.10.001

Wurm, M. F., Caramazza, A. & Lingnau, A. Action categories in lateral occipitotemporal cortex are organized along sociality and transitivity. J. Neurosci. 37, 562–575 (2017).

pubmed: 28100739 pmcid: 6596756 doi: 10.1523/JNEUROSCI.1717-16.2016

Bakhtiari, S., Mineault, P., Lillicrap, T., Pack, C. & Richards, B. The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning. in Advances in Neural Information Processing Systems 34 25164–25178 (Curran Associates, Inc., 2021).

Güçlü, U. & van Gerven, M. A. J. Increasingly complex representations of natural movies across the dorsal stream are shared between subjects. NeuroImage 145, 329–336 (2017).

pubmed: 26724778 doi: 10.1016/j.neuroimage.2015.12.036

Honey, C. J. et al. Slow cortical dynamics and the accumulation of information over long timescales. Neuron 76, 423–434 (2012).

pubmed: 23083743 pmcid: 3517908 doi: 10.1016/j.neuron.2012.08.011

Wang, L. et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. in Computer Vision—ECCV 2016 eds. Leibe, B., Matas, J., Sebe, N. & Welling, M.) 20–36 (Springer International Publishing, Cham) https://doi.org/10.1007/978-3-319-46484-8_2 . (2016)

Kiebel, S. J., Daunizeau, J. & Friston, K. J. A hierarchy of time-scales and the brain. PLoS Comput. Biol. 4, e1000209 (2008).

pubmed: 19008936 pmcid: 2568860 doi: 10.1371/journal.pcbi.1000209

Kanwisher, N. Functional specificity in the human brain: a window into the functional architecture of the mind. Proc. Natl Acad. Sci. 107, 11163–11170 (2010).

pubmed: 20484679 pmcid: 2895137 doi: 10.1073/pnas.1005062107

Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311 (1997).

pubmed: 9151747 pmcid: 6573547 doi: 10.1523/JNEUROSCI.17-11-04302.1997

Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017).

doi: 10.1162/tacl_a_00051

Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-Networks. https://doi.org/10.48550/ARXIV.1908.10084 . (2019)

Downing, P. E., Jiang, Y., Shuman, M. & Kanwisher, N. A Cortical Area Selective for Visual Processing of the Human Body. Science 293, 2470–2473 (2001).

pubmed: 11577239 doi: 10.1126/science.1063414

Epstein, R. & Kanwisher, N. A cortical representation of the local visual environment. Nature 392, 598–601 (1998).

pubmed: 9560155 doi: 10.1038/33402

Grill-Spector, K., Kourtzi, Z. & Kanwisher, N. The lateral occipital complex and its role in object recognition. Vis. Res. 41, 1409–1422 (2001).

pubmed: 11322983 doi: 10.1016/S0042-6989(01)00073-6

Hardwick, R. M., Caspers, S., Eickhoff, S. B. & Swinnen, S. P. Neural correlates of action: Comparing meta-analyses of imagery, observation, and execution. Neurosci. Biobehav. Rev. 94, 31–44 (2018).

pubmed: 30098990 doi: 10.1016/j.neubiorev.2018.08.003

Doerig, A. et al. Semantic scene descriptions as an objective of human vision. https://doi.org/10.48550/ARXIV.2209.11737 . (2022)

Kosakowski, H. L. et al. Selective responses to faces, scenes, and bodies in the ventral visual pathway of infants. Curr. Biol. 32, 265–274.e5 (2022).

pubmed: 34784506 doi: 10.1016/j.cub.2021.10.064

Bonner, M. F. & Epstein, R. A. Object representations in the human brain reflect the co-occurrence statistics of vision and language. Nat. Commun. 12, 4081 (2021).

pubmed: 34215754 pmcid: 8253839 doi: 10.1038/s41467-021-24368-2

Wang, J. et al. GIT: a generative image-to-text transformer for vision and language. Preprint at https://doi.org/10.48550/arXiv.2205.14100 (2022).

Goetschalckx, L., Moors, P. & Wagemans, J. Image memorability across longer time intervals. Memory 26, 581–588 (2018).

pubmed: 28974153 doi: 10.1080/09658211.2017.1383435

Isola, P., Parikh, D., Torralba, A. & Oliva, A. Understanding the intrinsic memorability of images. in Advances in Neural Information Processing Systems 24 (Curran Associates, Inc., 2011).

Khosla, A., Raju, A. S., Torralba, A. & Oliva, A. Understanding and predicting image memorability at a large scale. in 2390–2398 (2015).

Jaegle, A. et al. Population response magnitude variation in inferotemporal cortex predicts image memorability. eLife 8, e47596 (2019).

pubmed: 31464687 pmcid: 6715346 doi: 10.7554/eLife.47596

Lahner, B., Mohsenzadeh, Y., Mullin, C. & Oliva, A. Visual perception of highly memorable images is mediated by a distributed network of ventral visual regions that enable a late memorability response. PLoS Biol. 22, e3002564 (2024).

pubmed: 38557761 pmcid: 10984539 doi: 10.1371/journal.pbio.3002564

Cohen, J. D. et al. Temporal dynamics of brain activation during a working memory task. Nature 386, 604–608 (1997).

pubmed: 9121583 doi: 10.1038/386604a0

Martin, A. & Chao, L. L. Semantic memory and the brain: structure and processes. Curr. Opin. Neurobiol. 11, 194–201 (2001).

pubmed: 11301239 doi: 10.1016/S0959-4388(00)00196-3

Riou, B., Lesourd, M., Brunel, L. & Versace, R. Visual memory and visual perception: when memory improves visual search. Mem. Cogn. 39, 1094–1102 (2011).

doi: 10.3758/s13421-011-0075-2

Slotnick, S. D., Thompson, W. L. & Kosslyn, S. M. Visual memory and visual mental imagery recruit common control and sensory regions of the brain. Cogn. Neurosci. 3, 14–20 (2012).

pubmed: 24168646 doi: 10.1080/17588928.2011.578210

Vermeulen, N., Corneille, O. & Niedenthal, P. M. Sensory load incurs conceptual processing costs. Cognition 109, 287–294 (2008).

pubmed: 18996513 doi: 10.1016/j.cognition.2008.09.004

Weinberger, N. M. Specific long-term memory traces in primary auditory cortex. Nat. Rev. Neurosci. 5, 279–290 (2004).

pubmed: 15034553 pmcid: 3590000 doi: 10.1038/nrn1366

Bainbridge, W. A. & Baker, C. I. Multidimensional memory topography in the medial parietal cortex identified from neuroimaging of thousands of daily memory videos. Nat. Commun. 13, 6508 (2022).

pubmed: 36316315 pmcid: 9622880 doi: 10.1038/s41467-022-34075-1

Furman, O., Dorfman, N., Hasson, U., Davachi, L. & Dudai, Y. They saw a movie: long-term memory for an extended audiovisual narrative. Learn. Mem. 14, 457–467 (2007).

pubmed: 17562897 pmcid: 1896095 doi: 10.1101/lm.550407

Boyle, J. A. et al. The Courtois project on neuronal modelling-first data release. in 26th annual meeting of the organization for human brain mapping (2020).

Zhou, M. et al. A large-scale fMRI dataset for human action recognition. Sci. Data 10, 415 (2023).

pubmed: 37369643 pmcid: 10300118 doi: 10.1038/s41597-023-02325-6

Horikawa, T. & Kamitani, Y. Generic decoding of seen and imagined objects using hierarchical visual features. Nat. Commun. 8, 15037 (2017).

pubmed: 28530228 pmcid: 5458127 doi: 10.1038/ncomms15037

Heim, S. et al. The role of human parietal area 7A as a link between sequencing in hand actions and in overt speech production. Front. Psychol. 3, 534 (2012).

pubmed: 23227016 pmcid: 3514541 doi: 10.3389/fpsyg.2012.00534

Peeters, R. et al. The representation of tool use in humans and monkeys: common and uniquely human features. J. Neurosci. 29, 11523–11539 (2009).

pubmed: 19759300 pmcid: 6665774 doi: 10.1523/JNEUROSCI.2040-09.2009

Peeters, R. R., Rizzolatti, G. & Orban, G. A. Functional properties of the left parietal tool use region. NeuroImage 78, 83–93 (2013).

pubmed: 23591073 doi: 10.1016/j.neuroimage.2013.04.023

Murray, J. D. et al. A hierarchy of intrinsic timescales across primate cortex. Nat. Neurosci. 17, 1661–1663 (2014).

pubmed: 25383900 pmcid: 4241138 doi: 10.1038/nn.3862

Piasini, E. et al. Temporal stability of stimulus representation increases along rodent visual cortical hierarchies. Nat. Commun. 12, 4448 (2021).

pubmed: 34290247 pmcid: 8295255 doi: 10.1038/s41467-021-24456-3

Hu, M., Ge, P., Wang, X., Lin, H. & Ren, F. A spatio-temporal integrated model based on local and global features for video expression recognition. Vis. Comput. 38, 2617–2634 (2022).

doi: 10.1007/s00371-021-02136-z

Kahou, S. E. et al. EmoNets: multimodal deep learning approaches for emotion recognition in video. J. Multimodal Use. Interfaces 10, 99–111 (2016).

doi: 10.1007/s12193-015-0195-2

Tzirakis, P., Zhang, J. & Schuller, B. W. End-to-End Speech Emotion Recognition Using Deep Neural Networks. in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5089–5093. https://doi.org/10.1109/ICASSP.2018.8462677 . (2018)

Carreira, J. & Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. in 6299–6308 (2017).

Feichtenhofer, C., Pinz, A. & Zisserman, A. Convolutional Two-Stream Network Fusion for Video Action Recognition. in 1933–1941 (2016).

Ji, S., Xu, W., Yang, M. & Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2013).

pubmed: 22392705 doi: 10.1109/TPAMI.2012.59

Wang, Y. et al. Eidetic 3D LSTM: A Model for Video Prediction and Beyond. in (2023).

Fan, L., Zhang, T. & Du, W. Optical-flow-based framework to boost video object detection performance with object enhancement. Expert Syst. Appl. 170, 114544 (2021).

doi: 10.1016/j.eswa.2020.114544

Shafiee, M. J., Chywl, B., Li, F. & Wong, A. Fast YOLO: A fast you only look once system for real-time embedded object detection in video. https://doi.org/10.48550/ARXIV.1709.05943 (2017).

Chen, Z., Qing, J. & Zhou, J. H. Cinematic mindscapes: high-quality video reconstruction from brain activity. In Advances in Neural Information Processing Systems 36 (Curran Associates, Inc., 2024).

Kupershmidt, G., Beliy, R., Gaziv, G. & Irani, M. A Penny for Your (visual) Thoughts: Self-Supervised Reconstruction of Natural Movies from Brain Activity. https://doi.org/10.48550/ARXIV.2206.03544 (2022).

Luo, A. F., Henderson, M. M., Wehbe, L. & Tarr, M. J. Brain diffusion for visual exploration: cortical discovery using large scale generative models. In Advances in Neural Information Processing Systems 36 (Curran Associates, Inc., 2024).

Gu, Z. et al. NeuroGen: activation optimized image synthesis for discovery neuroscience. NeuroImage 247, 118812 (2022).

pubmed: 34936922 doi: 10.1016/j.neuroimage.2021.118812

Han, K. et al. Variational autoencoder: an unsupervised model for encoding and decoding fMRI activity in visual cortex. NeuroImage 198, 125–136 (2019).

pubmed: 31103784 doi: 10.1016/j.neuroimage.2019.05.039

Shmuelof, L. & Zohary, E. Dissociation between ventral and dorsal fMRI activation during object and action recognition. Neuron 47, 457–470 (2005).

pubmed: 16055068 doi: 10.1016/j.neuron.2005.06.034

Spunt, R. P., Satpute, A. B. & Lieberman, M. D. Identifying the what, why, and how of an observed action: an fMRI study of mentalizing and mechanizing during action observation. J. Cogn. Neurosci. 23, 63–74 (2011).

pubmed: 20146607 doi: 10.1162/jocn.2010.21446

Urgen, B. A., Pehlivan, S. & Saygin, A. P. Distinct representations in occipito-temporal, parietal, and premotor cortex during action perception revealed by fMRI and computational modeling. Neuropsychologia 127, 35–47 (2019).

pubmed: 30772426 doi: 10.1016/j.neuropsychologia.2019.02.006

Julian, J. B., Fedorenko, E., Webster, J. & Kanwisher, N. An algorithmic method for functionally defining regions of interest in the ventral visual pathway. NeuroImage 60, 2357–2364 (2012).

pubmed: 22398396 doi: 10.1016/j.neuroimage.2012.02.055

Guzman-Martinez, E., Leung, P., Franconeri, S., Grabowecky, M. & Suzuki, S. Rapid eye-fixation training without eyetracking. Psychon. Bull. Rev. 16, 491–496 (2009).

pubmed: 19451374 pmcid: 2777709 doi: 10.3758/PBR.16.3.491

Adelson, E. H. & Bergen, J. R. Spatiotemporal energy models for the perception of motion. JOSA A 2, 284–299 (1985).

doi: 10.1364/JOSAA.2.000284

Watson, A. B. & Ahumada, A. J. Model of human visual-motion sensing. JOSA A 2, 322–342 (1985).

doi: 10.1364/JOSAA.2.000322

Born, R. T. & Bradley, D. C. Structure and function of visual area Mt. Annu. Rev. Neurosci. 28, 157–189 (2005).

pubmed: 16022593 doi: 10.1146/annurev.neuro.26.041002.131052

Nishimoto, S. & Gallant, J. L. A three-dimensional spatiotemporal receptive field model explains responses of area MT neurons to naturalistic movies. J. Neurosci. 31, 14551–14564 (2011).

pubmed: 21994372 pmcid: 3338855 doi: 10.1523/JNEUROSCI.6801-10.2011

Kamitani, Y. & Tong, F. Decoding seen and attended motion directions from activity in the human visual cortex. Curr. Biol. 16, 1096–1102 (2006).

pubmed: 16753563 pmcid: 1635016 doi: 10.1016/j.cub.2006.04.003

Roe, A. W. et al. Toward a unified theory of visual area V4. Neuron 74, 12–29 (2012).

pubmed: 22500626 pmcid: 4912377 doi: 10.1016/j.neuron.2012.03.011

Smith, A. T., Greenlee, M. W., Singh, K. D., Kraemer, F. M. & Hennig, J. The processing of first- and second-order motion in human visual cortex assessed by functional magnetic resonance imaging (fMRI). J. Neurosci. 18, 3816–3830 (1998).

pubmed: 9570811 pmcid: 6793149 doi: 10.1523/JNEUROSCI.18-10-03816.1998

Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Zenodo https://doi.org/10.5281/zenodo.7430291 (2022).

Esteban, O et al. nipy/nipype: 1.8.3. Zenodo https://doi.org/10.5281/ZENODO.596855 (2022).

Gorgolewski, K. et al. Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python. Front. Neuroinformatics 5, 13 (2011).

doi: 10.3389/fninf.2011.00013

Tustison, N. J. et al. N4ITK: improved N3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320 (2010).

pubmed: 20378467 pmcid: 3071855 doi: 10.1109/TMI.2010.2046908

Avants, B. B., Epstein, C. L., Grossman, M. & Gee, J. C. Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 12, 26–41 (2008).

pubmed: 17659998 doi: 10.1016/j.media.2007.06.004

Zhang, Y., Brady, M. & Smith, S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 20, 45–57 (2001).

pubmed: 11293691 doi: 10.1109/42.906424

Dale, A. M., Fischl, B. & Sereno, M. I. Cortical surface-based analysis. NeuroImage 9, 179–194 (1999).

pubmed: 9931268 doi: 10.1006/nimg.1998.0395

Klein, A. et al. Mindboggling morphometry of human brains. PLoS Comput. Biol. 13, e1005350 (2017).

pubmed: 28231282 pmcid: 5322885 doi: 10.1371/journal.pcbi.1005350

Fonov, V., Evans, A., McKinstry, R., Almli, C. & Collins, D. Unbiased nonlinear average age-appropriate brain templates from birth to adulthood. NeuroImage 47, S102 (2009).

doi: 10.1016/S1053-8119(09)70884-5

Glasser, M. F. et al. The minimal preprocessing pipelines for the human connectome project. NeuroImage 80, 105–124 (2013).

pubmed: 23668970 doi: 10.1016/j.neuroimage.2013.04.127

Greve, D. N. & Fischl, B. Accurate and robust brain image alignment using boundary-based registration. NeuroImage 48, 63–72 (2009).

pubmed: 19573611 doi: 10.1016/j.neuroimage.2009.06.060

Jenkinson, M., Bannister, P., Brady, M. & Smith, S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage 17, 825–841 (2002).

pubmed: 12377157 doi: 10.1006/nimg.2002.1132

Cox, R. W. & Hyde, J. S. Software tools for analysis and visualization of fMRI data. NMR Biomed. 10, 171–178 (1997).

pubmed: 9430344 doi: 10.1002/(SICI)1099-1492(199706/08)10:4/5<171::AID-NBM453>3.0.CO;2-L

Power, J. D. et al. Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage 84, 320–341 (2014).

pubmed: 23994314 doi: 10.1016/j.neuroimage.2013.08.048

Behzadi, Y., Restom, K., Liau, J. & Liu, T. T. A component based noise correction method (CompCor) for BOLD and perfusion based fMRI. NeuroImage 37, 90–101 (2007).

pubmed: 17560126 doi: 10.1016/j.neuroimage.2007.04.042

Satterthwaite, T. D. et al. An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data. NeuroImage 64, 240–256 (2013).

pubmed: 22926292 doi: 10.1016/j.neuroimage.2012.08.052

Lanczos, C. Evaluation of noisy data. J. Soc. Ind. Appl. Math. Ser. B Numer. Anal. 1, 76–85 (1964).

doi: 10.1137/0701007

Abraham, A. et al. Machine learning for neuroimaging with scikit-learn. Front. Neuroinformatics 8, 14 (2014).

doi: 10.3389/fninf.2014.00014

Wang, L., Mruczek, R. E. B., Arcaro, M. J. & Kastner, S. Probabilistic Maps of Visual Topography in Human Cortex. Cereb. Cortex 25, 3911–3931 (2015).

pubmed: 25452571 doi: 10.1093/cercor/bhu277

Glasser, M. F. et al. A multi-modal parcellation of human cerebral cortex. Nature 536, 171–178 (2016).

pubmed: 27437579 pmcid: 4990127 doi: 10.1038/nature18933

Lage-Castellanos, A., Valente, G., Formisano, E. & Martino, F. D. Methods for computing the maximum performance of computational models of fMRI responses. PLOS Comput. Biol. 15, e1006397 (2019).

pubmed: 30849071 pmcid: 6426260 doi: 10.1371/journal.pcbi.1006397

Carlin, J. D. & Kriegeskorte, N. Adjudicating between face-coding models with individual-face fMRI responses. PLOS Comput. Biol. 13, e1005604 (2017).

pubmed: 28746335 pmcid: 5550004 doi: 10.1371/journal.pcbi.1005604

Nili, H. et al. A toolbox for representational similarity analysis. PLOS Comput. Biol. 10, e1003553 (2014).

pubmed: 24743308 pmcid: 3990488 doi: 10.1371/journal.pcbi.1003553

Li, Y., Song, Y. & Luo, J. Improving Pairwise Ranking for Multi-Label Image Classification. in 3617–3625 (2017).

Kriegeskorte, N. & Douglas, P. K. Interpreting encoding and decoding models. Curr. Opin. Neurobiol. 55, 167–179 (2019).

pubmed: 31039527 pmcid: 6705607 doi: 10.1016/j.conb.2019.04.002

Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).

doi: 10.1145/3065386

Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).

doi: 10.1007/s11263-015-0816-y

Modeling short visual events through the BOLD moments video fMRI dataset and metadata.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Benjamin Lahner (B)

Kshitij Dwivedi (K)

Polina Iamshchinina (P)

Monika Graumann (M)

Alex Lascelles (A)

Gemma Roig (G)

Alessandro Thomas Gifford (AT)

Bowen Pan (B)

SouYoung Jin (S)

N Apurva Ratan Murty (NA)

Kendrick Kay (K)

Aude Oliva (A)

Radoslaw Cichy (R)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH