Recognizing the Gist of a Scene
People can
recognize the meaning, or “gist,” of a scene, for example
that it is a beach, a dining room, or a street, during their first eye
fixation on it. In fact, our own research has shown that viewers can
recognize the gist of a scene at over 80% accuracy after as little as
36 milliseconds of uninterrupted processing time (click the
image to see an example). This raises the questions of how we
are able to recognize images so rapidly, and what information we use to
recognize them. Answering these questions is important for our
understanding of scene perception, because research has shown that the
gist of a scene activates our prior knowledge associated with the
scene’s category (e.g., that beaches have water, sand, and
possibly sunbathers and palm trees). This knowledge strongly guides
where we pay attention, it may help us recognizing objects in the
scene, and it plays a big role in determining information what we
remember from a scene. At its core, research on scene gist recognition
explores the interface between perception and cognition—a problem
that has proved extremely challenging to workers in both artificial
intelligence and cognitive psychology. Such research can be applied in
designing artificial intelligence systems capable of recognizing the
categories of scenes.
We have carried out a number of studies on scene gist recognition over the past several years. A key question we have investigated is what information people use to rapidly categorize a scene as a “beach,” “street,” “mountain,” etc. Some prominent computational theories of scene gist recognition have proposed the counter-intuitive and provocative hypothesis that the unlocalized amplitude spectrum of images, that is their spatial frequencies and orientations, without regard to their location in the image, provides much of the most important information for categorizing a scene. In simple terms, this suggests that for recognizing a beach scene, it is more important to know that there is a strong horizontal and a strong diagonal than to know that the horizontal (the horizon) is above the diagonal (the water line). However, our studies with human subjects suggest that while the spatial frequencies and orientations of an image certainly play some role in recognizing it, they are not enough by themselves to categorize a scene—localized information is necessary for that. The importance of localization therefore suggests that the layout of a scene (the scene’s global configuration) is probably very important in recognizing its gist.
| White Noise Mask | RISE Mask | Recognizable Mask |
|---|---|---|
Past and present collaborators on this work have included a number of students at Kansas State University, including Adam Larson, Elise Matz, Dan Ochs, Jeremy Corbeille, Katie Brewton, Laura Artman, Ben Bilyeu, and Nick Forristal (Psychology, Kansas State University), Scott Smerchek (Computer & Information Science, Kansas State University), Tejaswi Pydimarri (formerly a master’s student in Computer & Information Science at Kansas State University), Dan Simons (Psychology, University of Illinois) and Amit Sethi (formerly a doctoral student in Electrical and Computer Engineering at University of Illinois).
Related References:
Loschky, L.C., Simons, Smerchek, S., Matz, E., Bilyeu, B., & Artman, L. (2007). Is Unlocalized Amplitude Information of Any Use for Scene Gist Recognition? [Abstract]. Journal of Vision, 7(9):1051, 1051a, http://journalofvision.org/7/9/1051/
Loschky, L. C., Sethi, A., Simons, D. J., Pydimarri, T. N., Forristal, N., Corbeille, J., et al. (2006). The roles of amplitude and phase information in scene gist recognition and masking [Abstract]. Journal of Vision, 6(6), 802a, http://www.journalofvision.org/6/6/799/
Loschky, L.C., Sethi, A., Simons, D.J., Ochs, D., Corbeille, J. & Gibb, K. (2005, November). Using visual masking to explore the nature of scene gist. Poster presented at the 46th Annual Meeting of the Psychonomic Society, Toronto, Canada.
Loschky, L. C., & Simons, D. J. (2004). The effects of spatial frequency content and color on scene gist perception [Abstract]. Journal of Vision, 4(8), 881a, http://journalofvision.org/4/8/881/