Volume 3, Number 9, Abstract 3, Page 3a doi:10.1167/3.9.3 http://journalofvision.org/3/9/3/ ISSN 1534-7362
Top-down control of visual attention in real world scenes
Aude Oliva
Department of Psychology, Cognitive Science Program, Michigan State University, USA
[e-mail]
Antonio Torralba
Artificial Intelligence Laboratory, MIT, USA
Monica S Castelhano
Department of Psychology, Cognitive Science Program, Michigan State University, USA
John M Henderson
Department of Psychology, Cognitive Science Program, Michigan State University, USA
Abstract

During the first glance at a complex scene, attention of the observer is driven towards a particular region in the scene and the first saccade is programmed. Studies in scene recognition have acknowledged that significant structural and spatial layout information is extracted within a glance to form a semantic "gist" of the scene. Therefore, information included in the gist forms a visual context that is likely to modulate in a top-down manner where attention will land in a complex scene image. In this presentation, we extend a computational model of the gist that encodes the coarse spectral layout of a scene image to incorporate attentional guidance mechanisms and generate eye movements. The model uses the statistical correlations that exist between global scene structure (e.g. a street scene is in perspective) and object properties (e.g. location of pedestrian) to define a region of interest in the image that is relevant for solving a task (e.g., looking for people). Eye movements of 8 human observers were monitored, while instructed to search for a specific object (people) in 36 real world scenes. The region of interest scrutinized by observers and determined by the gist guidance schema overlap in more than 85% of the cases. Multiple fixation points (e.g. saccades) within the region of interest were generated by integrating a bottom-up saliency model with the top-down attentional guidance mechanism. Using a set of similarity metrics, we show that the locations of the multiple fixations of attention generated by the integrative model and by human observers were very similar. The results validate the proposition that top-down information from visual context modulates early the saliency of image regions during the task of object detection.

History
Received August 22, 2003; published October 22, 2003
Citation
Oliva, A., Torralba, A., Castelhano, M. S., & Henderson, J. M. (2003). Top-down control of visual attention in real world scenes [Abstract]. Journal of Vision, 3(9):3, 3a, http://journalofvision.org/3/9/3/, doi:10.1167/3.9.3.
Keywords
None
On-Line Presentation
for related articles by these authors

for papers that cite this paper
Get citation
Get help with this






jov