Film Perception and Comprehension


When watching a film, movie viewers typically aren’t aware of how they make sense of the plot and understand the events that are occurring. Without realizing it, viewers must make connections and inferences across different scenes and camera shots. To comprehend what a character is thinking or feeling, they must interpret their actions, body language, and facial expressions. A lot of unconscious effort goes into creating a mental model of an on-going film, but experience makes this task easier, since many people are introduced to television at a young age and grow up watching movies and learning to make sense of them.

There are several elements that make up a film itself.  A key element of a film is the sequence of the ‘shots’, which mark an editing boundary or a change in camera angle. Shots are considered to be the minimal production unit of meaning - the smallest piece that can be used to understand the story at that moment (Magliano et al., 2013). Over the years, Hollywood-style films have evolved to exert greater control over attention by using more numerous cuts and shorter shot lengths, more motion, and higher contrast between the light and dark areas of scenes (Cutting, Brunick, DeLong, Iricinschi, & Candan, 2011). This results in attentional synchrony, which occurs when viewers look at the same things at the same times. With these low-level cues providing an exogenous, stimulus-driven control of attention, viewers are less likely to make eye movements based on an endogenous, motivation-driven control of attention. This may eliminate individual differences among what viewers pay attention to, which we have dubbed the tyranny of film (Loschky, Larson, Magliano, & Smith, submitted). This phenomenon could potentially affect how viewers comprehend films and create inferences, suggesting that such control of attention could even affect how viewers build mental models and memory for films.


jaws scenes

A good example of using an inference to understand a film is illustrated in the 6 Shot sequence from the James Bond film, Moonraker, shown above. This shows the character Jaws falling through the air and trying to open his parachute (Shot 1), realizing it will not deploy (Shots 2-3), and a couple of shots of a circus tent (Shots 4 and 6), with a Jaws flapping his arms in an attempt to fly in between them (Shot 5).  Previous research showed that in the context of watching the entire movie, after Shot 6, 100% of viewers made the predictive inference that Jaws would fall on the circus tent (Magliano, Dijkstra, & Zwaan, 1996).  (In fact, in the next shot, that is what happens.)  Making this inference clearly shows high level understanding of the film since viewers make the inference before seeing it.

A set of recent experiments conducted by our lab has used this 6-shot movie clip to study how people perceive and comprehend film (Loschky et al., submitted).  These studies used the Moonraker movie clip (shown above) in what we call the jumped-in-the-middle paradigm.  This paradigm uses the common experience of having difficulty understanding a movie at first when you begin watching it in the middle of the story, compared to someone who was watching the movie from the beginning.  For these experiments, there were two conditions: the Context condition, where viewers watched almost 3 minutes of the film before viewing the 12-second clip, which gave them enough context to create a working mental model of the narrative; and the No-context condition, where viewers only watched the final 12-second clip and did not see anything before that.  This research also employed both comprehension and perception measures.  In three separate experiments, we used think-aloud protocols, event segmentation, and eye tracking, to study how participants create inferences and event models, and how this relates to their eye movements.

In the experiment using a think-aloud protocol, participants typed out their thoughts and interpretations of the current events in the film immediately after seeing each shot. This experiment showed that viewers in the Context condition were more likely to make the critical inference that Jaws would fall on the circus tent, while viewers in the No-context condition simply described the events shown in each shot.

In the experiment using event segmentation, participants responded when they thought something new had happened in the film. The findings from this experiment showed that viewers in the Context condition were less likely to think that the first shot showing the circus tent (Shot 4) was a new event, indicating that seeing the previous context and creating a mental model allowed them to create continuity between the scenes shown in Shots 1-3 and Shot 4.  Together, the think-aloud protocol experiment and event segmentation experiment established that viewers in the Context condition had a better understanding of the film clip than those in the No-context condition.  The question then was, what effect, if any, would these differences in film comprehension have on viewers' eye movements?


jaws heat maps

The last experiment had participants freely view the film clips without any interruptions and recorded their eye movements, and then asked them at the end of the clip to guess what would happen next in the film.  We subdivided the eye movement analyses based on the viewing condition (Context vs. No-context) and whether or not the viewers made the critical predictive inference (that Jaws would fall on the circus tent).  The results showed strong attentional synchrony between all participants and conditions, which supports of the idea of the tyranny of film and indicates that eye movements were directed by exogenous cues in the film regardless of their comprehension of it. However, more detailed eye movement analyses did reveal some differences between conditions – there was greater attentional synchrony in the Context condition, suggesting that viewers who had created a mental model did not need to explore the scene as much to understand what was happening, and viewers without a previously established mental model tended to look around the screen more. This was statistically significant only in Shot 4, when the circus tent was first shown (see picture shown above).  In addition, foveation probability (similar to fixation durations) was greatest for viewers in the No-context condition who made the critical inference, indicating that they were struggling and working harder mentally to draw the inference.  Conversely, foveation probability was significantly less for viewers in the Context condition (almost all of whom made the predictive inference), because the context made it easier to infer, and also for viewers in the No-Context condition who did not make the inference, assumedly because they were not trying to make sense of the film clip. That is, the Context viewers had an easier time understanding the clip and so exerted less mental effort, while No-Context viewers who did not make the inference that Jaws would fall on the circus tent were “lazy viewers” who did not exert the extra mental effort necessary to understand what was happening.

This research shows the power of attentional synchrony and how the tyranny of film can nearly eliminate individual differences in eye movements and attention that are due to differences in film understanding. However, some subtle variations in eye movements can still reveal differences between viewers depending on their mental model and their understanding of a film.  We are carrying out further research to understand the relationship between control of attention and the perception and comprehension of films.

Related Publications

Magliano, J. P., Loschky, L. C., Clinton, J., & Larson, A. M. (2013). Is viewing a narrative the same as reading a narrative? Differences and similarities in processing narratives across textual and visual media. In B. Miller, L. Cutting, and P. McCardle (Eds.), Unraveling the Behavioral, Neurobiological, & Genetic Components of Reading Comprehension, Baltimore, MD: Brookes Publishing Co.

Related Conference Presentations

Loschky, L.C., Larson, A.M., Magliano, J.P., & Smith, T.J. (2013, Nov.) What would Jaws do? Investigating the eye movements and movie comprehension relationship. Talk presented at the 2013 Annual Meeting of the Psychonomic Society, Toronto, Canada.