Visual Narrative Perception and Comprehension
Visual Narrative Comprehension: Drawing Bridging Inferences
How do we comprehend visual narratives, such as films, comics, picture stories, or other graphic narratives? One interesting piece of the puzzle concerns how we draw bridging inferences that fill in gaps in a visual narrative. For example, if you look at the images from a children's picture story below (from Mercer Mayer's book, "Frog, Where Are You?") you should have no problem in comprehending the events depicted in them. However, in one version it shows three images, showing a beginning-state, a bridging-event, and an end-state, while in the other version it shows only two of the images--with the bridging-event missing:
In the missing bridging event version, you have to make an inference, namely that the boy tripped. That helps to explain what happened in the end-state image, namely that the boy fell in the pond. This seems to be a very simple inference to make, and when viewing visual narratives we do this all the time without thinking about it. But research suggests that drawing such bridging inferences requires working memory. For example, if you didn't expect the boy to fall in the water after seeing him run down the hill in the previous image, then when you see him with his head in the water and feet in the air, you need to explain that to maintain a coherent understanding of the story. In order to explain that, you assumedly need to remember that in the previous image the boy was running down the hill (from short-term memory) and retrieve relevant information from long-term memory about how running down hills and falling are related--for example by tripping. These processes would assumedly occur in working memory. If so, then what sort of working memory is involved in generating such inferences while comprehending visual narratives? Is it visuospatial working memory (because the medium of the story is visual and spatial in nature) or verbal working memory (because it is a narrative, which, like language, involves sequentially ordered information), or a combination of both? In a recent study (Magliano, Larson, Higgs & Loschky, 2015--Online First), we investigated these questions. Our results suggest that both visuospatial and linguistic working memory are involved, but that sub-vocal processing (narrating the story verbally in your head) is not. For more details, see Magliano, Larson, Higgs and Loschky (2015-OnlineFirst).
Film Comprehension and Eye Movements
When watching a film, movie viewers typically aren’t aware of how they make sense of the plot and understand the events that are occurring. Without realizing it, viewers must make connections and inferences across different scenes and camera shots. To comprehend what a character is thinking or feeling, they must interpret their actions, body language, and facial expressions. A lot of unconscious effort goes into creating a mental model of an on-going film, but experience makes this task easier, since many people are introduced to television at a young age and grow up watching movies and learning to make sense of them.
There are several elements that make up a film itself. A key element of a film is the sequence of the ‘shots’, which mark an editing boundary or a change in camera angle. Shots are considered to be the minimal production unit of meaning - the smallest piece that can be used to understand the story at that moment (Magliano et al., 2013). Over the years, Hollywood-style films have evolved to exert greater control over attention by using more numerous cuts and shorter shot lengths, more motion, and higher contrast between the light and dark areas of scenes (Cutting, Brunick, DeLong, Iricinschi, & Candan, 2011). This results in attentional synchrony, which occurs when viewers look at the same things at the same times. With these low-level cues providing an exogenous, stimulus-driven control of attention, viewers are less likely to make eye movements based on an endogenous, motivation-driven control of attention. This may eliminate individual differences among what viewers pay attention to, which we have dubbed the tyranny of film (Loschky, Larson, Magliano, & Smith, 2015). This phenomenon could potentially affect how viewers comprehend films and create inferences, suggesting that such control of attention could even affect how viewers build mental models and memory for films.
A good example of using an inference to understand a film is illustrated in the 6 Shot sequence from the James Bond film, Moonraker, shown above. This shows the character Jaws falling through the air and trying to open his parachute (Shot 1), realizing it will not deploy (Shots 2-3), and a couple of shots of a circus tent (Shots 4 and 6), with a Jaws flapping his arms in an attempt to fly in between them (Shot 5). Previous research showed that in the context of watching the entire movie, after Shot 6, 100% of viewers made the predictive inference that Jaws would fall on the circus tent (Magliano, Dijkstra, & Zwaan, 1996). (In fact, in the next shot, that is what happens.) Making this inference clearly shows high level understanding of the film since viewers make the inference before seeing it.
A recent study from our lab has used this 6-shot movie clip to study how people perceive and comprehend film (Loschky et al., 2015). This study used the Moonraker movie clip (shown above) in what we call the jumped-in-the-middle paradigm. This paradigm uses the common experience of having difficulty understanding a movie at first when you begin watching it in the middle of the story, compared to someone who was watching the movie from the beginning. For these experiments, there were two conditions: the Context condition, where viewers watched almost 3 minutes of the film before viewing the 12-second clip, which gave them enough context to create a working mental model of the narrative; and the No-context condition, where viewers only watched the final 12-second clip and did not see anything before that. This research also employed both comprehension and perception measures. In three separate experiments, we used think-aloud protocols, event segmentation, and eye tracking, to study how participants create inferences and event models, and how this relates to their eye movements.
In the experiment using a think-aloud protocol, participants typed out their thoughts and interpretations of the current events in the film immediately after seeing each shot. This experiment showed that viewers in the Context condition were more likely to make the critical inference that Jaws would fall on the circus tent, while viewers in the No-context condition simply described the events shown in each shot.
In the experiment using event segmentation, participants responded when they thought something new had happened in the film. The findings from this experiment showed that viewers in the Context condition were less likely to think that the first shot showing the circus tent (Shot 4) was a new event, indicating that seeing the previous context and creating a mental model allowed them to create continuity between the scenes shown in Shots 1-3 and Shot 4. Together, the think-aloud protocol experiment and event segmentation experiment established that viewers in the Context condition had a better understanding of the film clip than those in the No-context condition. The question then was, what effect, if any, would these differences in film comprehension have on viewers' eye movements?
The last experiment had participants freely view the film clips without any interruptions and recorded their eye movements, and then asked them at the end of the clip to guess what would happen next in the film. We subdivided the eye movement analyses based on the viewing condition (Context vs. No-context) and whether or not the viewers made the critical predictive inference (that Jaws would fall on the circus tent). The results showed strong attentional synchrony between all participants and conditions, which supports of the idea of the tyranny of film and indicates that eye movements were directed by exogenous cues in the film regardless of their comprehension of it. However, more detailed eye movement analyses did reveal some differences between conditions – there was greater attentional synchrony in the Context condition, suggesting that viewers who had created a mental model did not need to explore the scene as much to understand what was happening, and viewers without a previously established mental model tended to look around the screen more. This was statistically significant only in Shot 4, when the circus tent was first shown (see picture shown above). In addition, foveation probability (similar to fixation durations) was greatest for viewers in the No-context condition who made the critical inference, indicating that they were struggling and working harder mentally to draw the inference. Conversely, foveation probability was significantly less for viewers in the Context condition (almost all of whom made the predictive inference), because the context made it easier to infer, and also for viewers in the No-Context condition who did not make the inference, assumedly because they were not trying to make sense of the film clip. That is, the Context viewers had an easier time understanding the clip and so exerted less mental effort, while No-Context viewers who did not make the inference that Jaws would fall on the circus tent were “lazy viewers” who did not exert the extra mental effort necessary to understand what was happening.
This research shows the power of attentional synchrony and how the tyranny of film can nearly eliminate individual differences in eye movements and attention that are due to differences in film understanding. However, some subtle variations in eye movements can still reveal differences between viewers depending on their mental model and their understanding of a film. We are carrying out further research to understand the relationship between control of attention and the perception and comprehension of films.
A video of a talk on our follow-up research given at the 2015 European Conference on Eye Movements (ECEM) can be seen here.
Related Publications [Current or Former Students' Names in Italics]
Loschky, L.C.,Larson, A.M., Magliano, J.P., & Smith, T.J. (2015). What would Jaws do? The tyranny of film and the relationship between gaze and higher-level narrative film comprehension. PLoS ONE 10(11): e0142474. doi:10.1371/journal.pone.0142474
Magliano, J. P., Larson, A. M., Higgs, K., & Loschky, L. C. (2015-OnlineFirst). The relative roles of visuospatial and linguistic working memory systems in generating inferences during visual narrative comprehension. Memory & Cognition.
Magliano, J. P., Loschky, L. C., Clinton, J., & Larson, A. M. (2013). Is viewing a narrative the same as reading a narrative? Differences and similarities in processing narratives across textual and visual media. In B. Miller, L. Cutting, and P. McCardle (Eds.), Unraveling the Behavioral, Neurobiological, & Genetic Components of Reading Comprehension, Baltimore, MD: Brookes Publishing Co.
Related Conference Presentations [Students Names in Italics]
Loschky, L.C., Hutson, J., Larson, A. M., Magliano, J. P., & Smith, T. (2015, Aug). The “tyranny of film”: Movie viewers’ gaze minimally reflects differences in their comprehension processes. Talk presented at the European Conference on Eye Movements, Vienna, Austria.
Hutson, J., Smith, T., Magliano, J. P., Heidebrecht, G., Hinkel, T., Tang, J.-L., & Loschky, L.C. (2015, June). A general dissociation of eye movements and comprehension in Orson Welles’ “Touch of Evil”: The role of context and protagonist in narrative film viewing. Poster presented at the annual meeting of the Society for Cognitive Studies of the Moving Image, London, UK.
Loschky, L. C., Smith, T., Magliano, J. P., (2015, June). An integrative framework for visual narrative perception and comprehension. Talk presented at the annual meeting of the Society for Cognitive Studies of the Moving Image, London, UK.
Hutson, J., Smith, T., Magliano, J. P., Heidebrecht, G., Hinkel, T., Tang, J.-L. Loschky, L.C. (2015, May). Eye movements while watching narrative film: a dissociation of eye movements and comprehension. Poster presented at the Annual Meeting of the Vision Sciences Society, St. Pete Beach, FL.
Magliano, J. P., Larson, A. M., Higgs, K., & Loschky, L. C. (2015, May).Generating bridging inferences while viewing visual narratives. Poster presented at the Annual Meeting of the Vision Sciences Society, St. Pete Beach, FL.
Hutson, J., Smith, T., Magliano, J., & Loschky, L. C. (2014, Nov). What drives eye movements in narrative film viewing? The roles of the film stimulus versus higher-level comprehension. Poster presented at the 2014 Annual Meeting of the Psychonomic Society, Long Beach, CA.
Loschky, L. C.,Hutson, J., Magliano, J. P., Larson, A. M., & Smith, T. (2014, June). Explaining the Film Comprehension/Attention Relationship with the Scene Perception and Event Comprehension Theory (SPECT). Talk presented at the annual meeting of the Society for Cognitive Studies of the Moving Image, Lancaster, PA.
Hutson, J., Smith, T., Magliano, J., & Loschky, L. C. (2014, June). The tyranny of film: Understanding the eye-movements/comprehension relationship in Orson Welles’ “Touch of Evil.” Poster presented at the annual meeting of the Society for Cognitive Studies of the Moving Image, Lancaster, PA.
Hutson, J., Loschky, L. C., Smith, T., & Magliano, J. (2014, May). The Look of Evil: How are Eye Movements Influenced by Film Comprehension? Poster presented at the annual meeting of the Vision Sciences Society, St. Pete Beach, FL.
Loschky, L.C., Larson, A.M., Magliano, J.P., & Smith, T.J. (2014, May). What Would Jaws Do? The tyranny of film and the relationship between gaze and higher-level comprehension processes for narrative film. Poster presented at the Vision Sciences Society Annual Meeting, Naples, FL.
Loschky, L.C., Larson, A.M., Magliano, J.P., & Smith, T.J. (2013, Nov.) What would Jaws do? Investigating the eye movements and movie comprehension relationship. Talk presented at the 2013 Annual Meeting of the Psychonomic Society, Toronto, Canada.