Recognizing the Gist of a Scene

People can recognize the meaning, or “gist,” of a scene, for example that it is a beach, a dining room, or a street, during their first eye fixation on it. In fact, our own research has shown that viewers can recognize the gist of a scene at over 80% accuracy after as little as 36 milliseconds of uninterrupted processing time (click the image to see an example). This raises the questions of how we are able to recognize images so rapidly, and what information we use to recognize them. Answering these questions is important for our understanding of scene perception, because research has shown that the gist of a scene activates our prior knowledge associated with the scene’s category (e.g., that beaches have water, sand, and possibly sunbathers and palm trees). This knowledge strongly guides where we pay attention, it may help us recognizing objects in the scene, and it plays a big role in determining information what we remember from a scene. At its core, research on scene gist recognition explores the interface between perception and cognition—a problem that has proved extremely challenging to workers in both artificial intelligence and cognitive psychology. Such research can be applied in designing artificial intelligence systems capable of recognizing the categories of scenes.

We have carried out a number of studies on scene gist recognition over the past several years. A key question we have investigated is what information people use to rapidly categorize a scene as a “beach,” “street,” “mountain,” etc. Some prominent computational theories of scene gist recognition have proposed the counter-intuitive and provocative hypothesis that the unlocalized amplitude spectrum of images, that is their spatial frequencies and orientations, without regard to their location in the image, provides much of the most important information for categorizing a scene. In simple terms, this suggests that for recognizing a beach scene, it is more important to know that there is a strong horizontal and a strong diagonal than to know that the horizontal (the horizon) is above the diagonal (the water line). However, our studies with human subjects suggest that while the spatial frequencies and orientations of an image certainly play some role in recognizing it, they are not enough by themselves to categorize a scene—localized information is necessary for that. The importance of localization therefore suggests that the layout of a scene (the scene’s global configuration) is probably very important in recognizing its gist.

White Noise Mask RISE Mask Recognizable Mask
A related topic that we have investigated is the masking of scene gist. Visual masking is when one stimulus interferes with processing of another stimulus (click the appropriate thumbnail to see a demonstration). Masking is an important tool for studying the time course of visual processing, and it has an over 100 year history in the field of psychology. Yet very little is known about the masking of complex stimuli like scene images, or relatively high level perceptual tasks such as scene gist recognition. We have compared the effects of low level spatial masking (i.e., masking by spatial frequencies and orientations) with the effects of higher level “conceptual masking” (i.e., masking by meaning). Previous research has shown that recognition memory for a scene is more strongly masked by a recognizable scene (i.e., a scene masking another scene) than by meaningless noise and this has been used to argue for the existence of conceptual masking. A key hypothesis we have tested is that such conceptual masking effects are actually due to the greater visual similarity between 1) any given pair of scenes versus 2) any given scene compared with random noise. Our results do not rule out the existence of conceptual masking of scene gist, because pure visual similarity, in terms of spatial frequencies and orientations, cannot explain all of the masking produced by a recognizable scene mask. However, our results also show that a good proportion of what has been called conceptual masking (namely, the greater masking produced by a recognizable scene compared to that produced by white noise) can be actually be produced by an unrecognizable noise image that shares many statistical properties with a scene. Such research holds the potential to expand our understanding of both scene gist processing and the masking of complex stimuli.

Past and present collaborators on this work have included a number of students at Kansas State University, including Adam Larson, Elise Matz, Dan Ochs, Jeremy Corbeille, Katie Brewton, Laura Artman, Ben Bilyeu, and Nick Forristal (Psychology, Kansas State University), Scott Smerchek (Computer & Information Science, Kansas State University), Tejaswi Pydimarri (formerly a master’s student in Computer & Information Science at Kansas State University), Dan Simons (Psychology, University of Illinois) and Amit Sethi (formerly a doctoral student in Electrical and Computer Engineering at University of Illinois).

Related References:

Loschky, L.C., Hansen, B.C., Sethi, A. & Pydimarri, T. (in press). The roles of higher-order image statistics in scene gist masking. Attention, Perception & Psychophysics.

Loschky, L.C., & Larson, A.M. (in press). The natural/man-made distinction is made prior to basic-level distinctions in scene gist processing. Visual Cognition.

Larson, A.M. & Loschky, L.C. (2009). The contributions of central versus peripheral vision to scene gist recognition. Journal of Vision, 9(10):6, 1-16, http://journalofvision.org/9/10/6/, doi:10.1167/9.10.6.

Loschky, L.C. & Larson, A. M. (2008). Localized information is necessary for scene categorization, including the Natural/Man-made distinction. Journal of Vision, 8(1):4, 1-9.

Loschky, L.C., Sethi, A., Simons, D.J., Pydimari, T., Ochs, D., & Corbeille, J. (2007). The Importance of Information Localization in Scene Gist Recognition. Journal of Experimental Psychology: Human Perception and Performance, 33(6), 1431-1450.

Loschky, L.C., Simons, Smerchek, S., Matz, E., Bilyeu, B., & Artman, L. (2007). Is Unlocalized Amplitude Information of Any Use for Scene Gist Recognition? [Abstract]. Journal of Vision, 7(9):1051, 1051a, http://journalofvision.org/7/9/1051/

Loschky, L. C., Sethi, A., Simons, D. J., Pydimarri, T. N., Forristal, N., Corbeille, J., et al. (2006). The roles of amplitude and phase information in scene gist recognition and masking [Abstract]. Journal of Vision, 6(6), 802a, http://www.journalofvision.org/6/6/799/

Loschky, L.C., Sethi, A., Simons, D.J., Ochs, D., Corbeille, J. & Gibb, K. (2005, November). Using visual masking to explore the nature of scene gist. Poster presented at the 46th Annual Meeting of the Psychonomic Society, Toronto, Canada.

Loschky, L. C., & Simons, D. J. (2004). The effects of spatial frequency content and color on scene gist perception [Abstract]. Journal of Vision, 4(8), 881a, http://journalofvision.org/4/8/881/