| Volume 5, Number 10, Article 7, Pages 834-862 |
doi:10.1167/5.10.7 |
http://journalofvision.org/5/10/7/ |
ISSN 1534-7362 |
Focus cues affect perceived depth
Simon J. Watt |
School of Psychology, University of Wales, Bangor, United Kingdom |
|
Kurt Akeley |
Microsoft Research Asia, Beijing, China |
|
Marc O. Ernst |
Max-Planck Institute for Biological Cybernetics, Tübingen, Germany |
|
Martin S. Banks |
Vision Science Program, Department of Psychology, and Wills Neuroscience Institute, University of California, Berkeley, CA, USA |
|
Abstract
Depth information from focus cues—accommodation and the gradient of retinal blur—is typically incorrect in three-dimensional (3-D) displays because the light comes from a planar display surface. If the visual system incorporates information from focus cues into its calculation of 3-D scene parameters, this could cause distortions in perceived depth even when the 2-D retinal images are geometrically correct. In Experiment 1 we measured the direct contribution of focus cues to perceived slant by varying independently the physical slant of the display surface and the slant of a simulated surface specified by binocular disparity (binocular viewing) or perspective/texture (monocular viewing). In the binocular condition, slant estimates were unaffected by display slant. In the monocular condition, display slant had a systematic effect on slant estimates. Estimates were consistent with a weighted average of slant from focus cues and slant from disparity/texture, where the cue weights are determined by the reliability of each cue. In Experiment 2, we examined whether focus cues also have an indirect effect on perceived slant via the distance estimate used in disparity scaling. We varied independently the simulated distance and the focal distance to a disparity-defined 3-D stimulus. Perceived slant was systematically affected by changes in focal distance. Accordingly, depth constancy (with respect to simulated distance) was significantly reduced when focal distance was held constant compared to when it varied appropriately with the simulated distance to the stimulus. The results of both experiments show that focus cues can contribute to estimates of 3-D scene parameters. Inappropriate focus cues in typical 3-D displays may therefore contribute to distortions in perceived space.
 |
|
History
Received April 28, 2005; published December 15, 2005
Citation
Watt, S. J., Akeley, K., Ernst, M. O., & Banks, M. S. (2005). Focus cues affect perceived depth.
Journal of Vision, 5(10):7, 834-862,
http://journalofvision.org/5/10/7/,
doi:10.1167/5.10.7.
Keywords
accommodation, blur, cue combination, depth perception, stereoscopic displays, virtual reality
for related articles by these authors
for papers that cite this paper |
Consider two viewing conditions: a complex real scene viewed binocularly and a computer display of the same scene. The computer display is carefully constructed so all the traditional depth cues—binocular disparity, texture gradients, occlusion, shading, etc.—are geometrically correct. Thus, the geometric patterns of stimulation striking the two eyes are the same in the two cases. Despite the fact that the stimulation patterns are the same, psychophysical research (e.g., Buckley & Frisby, 1993; Ellis, Smith, Grunwald, & McGreevy, 1991; Frisby, Buckley, & Duke, 1996; Frisby, Buckley, & Horsman, 1995; van Ee, Banks, & Backus, 1999) and experience with virtual reality displays (Thompson et al., 2004) leads one to expect that the perceived 3-D structure will differ in the two cases: the depth in the computer display will appear flattened relative to the real scene from which it is derived. A plausible cause for depth flattening is the fact that computer displays present images on one surface: the phosphor grid for cathode-ray displays (CRTs), the pixel grid for liquid crystal displays (LCDs), and the projection screen for projectors. This means that depth information from focus cues—accommodation and the retinal blur gradient—is inconsistent with the depicted scene. Instead the information specifies the depth of the display surface. We examined whether such inappropriate focus cues contribute to distortions in perceived depth when viewing 3-D computer displays. Combining information from multiple depth cues The 3-D structure of a visual scene is inferred from the 2-D retinal images. The visual system does not rely arbitrarily on one depth cue or another but combines information from multiple available cues to estimate the 3-D parameters of the scene. Consider the case of recovering the slant of a plane. The visual system's estimate of slant from a given cue can be represented by where S is the slant being estimated and f is the operation by which the visual system does the estimation; the cue is represented by the subscript. Estimates of slant from each cue ( i) are subject to error. When multiple cues are available, the most likely slant can be calculated from a weighted linear combination of the slant indicated by each cue (provided that the noises associated with cue measurement are independent and Gaussian distributed, and that all slants are equally likely)  | (1) |
where  | (2) |
The weights ( wi) are proportional to the normalized inverse variances (  ) of the cue distributions ( i), so greater weight is assigned to less variable (i.e., more reliable) cues (Backus & Banks, 1999; Ernst & Banks, 2002; Ghahramani, Wolpert, & Jordan, 1997; Jacobs, 1999; Oruç, Maloney, & Landy, 2003). The variance of the combined estimate is lower than the variance of any single-cue estimate, so by combining information from several depth cues, the visual system can in principle estimate slant (or any other 3-D property) with greater precision than it can by relying on one cue alone. There are now many empirical studies showing that cue reliability is taken into account when combining sensory signals (e.g., Backus & Banks, 1999; Buckley & Frisby, 1993; Jacobs, 1999; Körding & Wolpert, 2004; van Beers, Sittig, & Denier van der Gon, 1998; van Beers, Wolpert, & Haggard, 2002). Furthermore, several studies have tested the quantitative predictions of this model by measuring the reliability of the underlying estimators when only one cue is informative and using these to predict performance when multiple cues are available (Alais & Burr, 2004; Ernst & Banks, 2002; Gepshtein & Banks, 2003; Hillis, Watt, Landy, & Banks, 2004; Knill & Saunders, 2003; Landy & Kojima, 2001). These studies show that performance is often close to that predicted by the statistically optimal model (in the sense of being the minimum variance unbiased estimate; Ghahramani et al., 1997). Inappropriate focus cues in 3-D displays The abovementioned research suggests that the visual system uses all available sources of information to compute 3-D scene parameters. This has important implications for 3-D computer displays because unmodeled depth cues could affect the percept, causing it to differ from the depicted scene. In almost all computer displays, the focal distance of the light from the display is fixed because the images are presented on one surface (for counterexamples, see Akeley, Watt, Girshick, & Banks, 2004; McQuaide, Seibel, Burstein, & Furness, 2002). This provides inappropriate depth information in two ways. First, the variation in blur in the retinal image is consistent with the fixed distance of the display surface and not with the distances in the simulated scene. With real scenes, the amount of retinal blur varies because the distance of points in the scene varies with respect to the eye's focal distance: the retinal image is sharpest for objects at the focal distance and blurred for points nearer and farther away. In computer displays, the variation in blur specifies the constant distance of the display surface and is thus a cue to flatness. Second, accommodation provides an extra-retinal cue signaling the constant distance of the display surface. As the eye looks around a real scene, commands are sent to the ciliary muscles to change the refractive power of the crystalline lens and thereby minimize blur for the fixated part of the scene. As the eye looks around the simulated scene in a computer display, the focal distance of the light does not vary appropriately, so this again signals flatness rather than the simulated depth variation. If blur and accommodation provide inputs to the calculation of depth, their erroneous values can in principle adversely affect percepts of 3-D scene structure. Inappropriate motion parallax in 3-D displays In many settings (including psychophysical experiments), the observer's head position is not strictly constrained. For a viewing distance of 28.5 cm (used in our first experiment), head movements of a few millimeters could result in a detectable signal to depth from motion parallax (Rogers & Graham, 1982). As with focus cues, residual motion parallax specifies the distance to the display rather than distances in the simulated scene. If parallax is figured into the brain's calculation of depth, its erroneous value will adversely affect 3-D percepts. Unlike the problem of inappropriate focus cues, there are straightforward solutions to this problem: one can track head position and update the image accordingly (Welch et al., 1999) or one can immobilize the head position. Therefore, we did not explicitly examine whether residual motion parallax contributes to distortions in perceived depth when viewing 3-D displays (but see the Isolating information from accommodation and blur section). Implications for psychophysics Powerful 3-D computer graphics has revolutionized research on depth perception. Psychophysicists no longer have to rely on shadow casters (Gibson, Gibson, Smith, & Flock, 1959), glass plates (Ogle, 1950), or other mechanical means to create stimuli. Using modern computer graphics, they can now create realistic 3-D images and independently manipulate depth cues. As a result, great advances have occurred in the last three decades. However, if focus cues affect perceived depth from conventional computer displays, many observations in the depth perception literature may not be representative of vision in the natural environment. Here we describe two illustrative examples from the literature: (1) the perceived depth of computer-displayed versus real ridges, and (2) the slant-contrast illusion. Buckley and Frisby ( 1993) examined the perceived depth of CRT-displayed and real ridges. The stimuli depicted vertical or horizontal parabolic ridges. The authors independently manipulated the disparity- and texture-specified depths of the ridges. With CRT stimuli, they did this in conventional fashion by programming different disparity and texture signals. With the real ridges, they did it by distorting the texture on the card covering wooden forms to create the desired texture gradient viewed from the observer's eye. The data from the CRT stimuli (vertical ridges) revealed clear effects of disparity and texture: Disparity dominated when the texture-specified depth was large and texture dominated when the texture depth was small. In the framework of the cue-weight model ( Equations 1 and 2), the disparity and texture weights changed depending on the texture-specified depth. The data from the real-ridge stimuli were quite different: The disparity-specified depth now dominated the percept. The important point for our purposes is that the CRT-based and real-ridge results differed dramatically. Buckley and Frisby ( 1993) speculated that focus cues played an important role in the striking difference between the CRT and real results. In Appendix C, we quantify and generalize their argument by translating it into the framework of the weight model. The fact that more depth was perceived in real than in CRT-displayed ridges suggests that focus cues contributed to the depth calculation in their experiments (see also Frisby et al., 1995). We cannot tell from the Buckley and Frisby ( 1993) experiments whether depth percepts were veridical once focus cues were consistent with the depth specified by disparity and texture. The reason is that responses were judged depth in cm and we cannot know whether the mapping between perceived depth and depth responses is veridical. For our purposes, the important point is that observers reported and therefore presumably saw more depth when focus cues were consistent with the depth specified by other cues. Now consider the second example: the slant-contrast illusion (Sato & Howard, 2001; van Ee & Erkelens, 1996; Werner, 1937). In this illusion, a central object is presented that has the disparity and texture gradients of a frontoparallel plane. It is surrounded by a surface that typically has the texture gradient of a frontoparallel plane but the disparity gradient of a slanted plane. The presence of the surrounding plane causes the central object to appear slanted in a direction opposite to the disparity-specified slant of the surround. Interesting psychophysical effects draw researchers' attention, so several theories have been developed to explain the illusory slant. Most share the idea that disparity-encoding mechanisms have antagonistic, center-surround receptive fields for disparity (in analogy to the center-surround organization of receptive fields in the luminance domain). Such mechanisms are allegedly less responsive to zero- and first-order disparities (absolute disparity and the relative disparity associated with a slanted plane, respectively) than to second- and higher-order disparities (the disparity associated with curvature or discontinuities in depth) (Anstis, Howard, & Rogers, 1978; Brookes & Stevens, 1989; Gillam, Chambers, & Russo, 1988; Mitchison, 1993; Rogers & Graham, 1983; van Ee & Erkelens, 1996; Westheimer, 1986). van Ee et al. ( 1999) measured the magnitude of the slant-contrast illusion when the stimulus was presented as a conventional computer display and as real surfaces. They observed a typically large illusion with the computer display, but no illusion at all with the real surfaces. The computer-displayed and real-surface stimuli had the same dimensions and were viewed from the same distance, so the disparity- and texture-gradient signals created by the two stimuli were identical. The fact that one produced the illusion and the other did not means that the encoding of disparity (and the texture gradient) per se cannot be the cause of the illusion. van Ee et al. argued that cue conflicts between geometric cues (disparity and texture) and inappropriate focus cues caused the illusion in the computer-displayed stimuli. The conflicts were eliminated in the real-surface stimulus and so the illusion was eliminated. Sato and Howard ( 2001) also showed that manipulating the magnitude of cue conflicts has a large effect on the slant-contrast illusion when the disparity signals are held constant. Our point is that cue conflicts between disparity, texture, and the previously unmodeled cues of blur and accommodation affect or may even cause the slant-contrast illusion. Thus, previous theories of the illusion are attempting to explain an illusion that may not occur in the natural environment, when all cues signal the same depth structure. The potential importance of inappropriate focus cues is not restricted to stereoscopic vision. We argue in the General discussion section that investigations of any aspect of visual space perception should take the potentially confounding effects of those cues into account. Recovering depth from blur We define blur in the retinal image as the spread of the optical point-spread function (Westheimer, 1986). For a fixed accommodative state, the amount of blur in the image of an object is roughly proportional to the focus error in diopters (Green & Campbell, 1965; Mather & Smith, 2000; Smith, Jacobs, & Chan, 1989). Objects at different distances are blurred by different amounts, signaling depth variations in the scene. Interpreting this signal is complicated by two factors. First, the sign of depth variation is undetermined because the retinal images of objects nearer or farther than fixation can be equally blurred. Second, the magnitude of the depth signaled is ambiguous because for a given accommodative state, blur depends not only on the distance of an object from fixation, but also on the visual system's depth of focus, which in turn depends on pupil size and the spatial frequency content of the input, neither of which is known independently (Green, Powers, & Banks, 1980). For these reasons, it seems unlikely that metric depth can be recovered directly from retinal blur. However, the continuous microfluctuations that occur in accommodation (Campbell & Westheimer, 1959) and chromatic aberration could be used to disambiguate the blur signal (Nguyen, Howard, & Allison, 2005; Pentland, 1987). Additionally, eye movements could be used to sample changes in blur dynamically as the observer focuses on different parts of the scene. The sign of depth variations could also be disambiguated by other depth cues including binocular disparity and occlusion. Some psychophysical studies have reported a modest effect of the blur gradient on judgments of perceived depth (Marshall, Burbeck, Ariely, Rolland, & Martin, 1996; Mather, 1996, 1997; Mather & Smith, 2000, 2002; O'Shea, Govan, & Sekuler, 1997). In these studies, the blur gradient was varied artificially by blurring the displayed object in selected regions to simulate the effects of defocus, and most used brief presentations. This means that the abovementioned strategies for disambiguating the depth signaled by blur could not have been used. It is thus possible that the blur gradient is a more useful depth cue in natural viewing than previously realized. Recovering depth from accommodation The efferent signal to the muscles controlling the crystalline lens could be a depth cue because the magnitude of the response required to focus the retinal image depends directly on the distance from the eye to the fixated object. To be a useful depth cue, the accommodative system must respond reliably to changes in focal distance and the visual system must be able to monitor the muscle commands. Accommodation to isolated, high-contrast targets is reliably related to changes in a target's focal distance (Campbell & Westheimer, 1959; Charman & Tucker, 1977; Heath, 1956). Indeed, accommodation can occur to changes in retinal blur that are below perceptual threshold (Kotulak & Schor, 1986). In contrast to the blur gradient (and most other depth cues), accommodation can in principle be used to recover the absolute distance to fixation. Several studies have examined distance estimates with verbal or pointing responses based on the accommodative response to single targets and have shown that observers' estimates are correlated with target distance, but that accuracy is poor and variability is high (Baird, 1903; Biersdorf, 1966; Dixon, 1895; Fisher & Ciuffreda, 1988; Foley, 1977; Hillebrand, 1894; Künnapas, 1968; Mon-Williams & Tresilian, 2000; Peter, 1915; Swenson, 1932; Wundt, 1862). In principle, accommodation can also provide information about surface structure if estimates of relative distance are compared over successive fixations. Accommodation, like blur, could therefore be a more useful depth cue in complex scenes than the existing psychophysical data suggest. Direct versus indirect influence of focus cues on perceived depth The above discussion examined how blur and accommodation could be used directly in estimating depth. Accommodation could also have an indirect effect on perceived depth by interacting with stereopsis. Binocular disparity is an important and reliable depth cue. But horizontal disparities are inherently ambiguous because 3-D layout cannot be determined from them without scaling by an estimate of viewing distance (Gårding, Porrill, Mayhew, & Frisby, 1995). To perform the scaling, the brain uses the eyes' vergence and the horizontal gradient of vertical disparity (Rogers & Bradshaw, 1995). In principle, accommodation can also provide an estimate of fixation distance, which may in turn influence disparity scaling. In computer displays, the accommodative stimulus is the distance to the display screen and not the simulated distance. This erroneous information may affect depth percepts indirectly via disparity scaling. There is a small literature on indirect effects of accommodation on perception. Fisher and Ebenholtz ( 1986), Mon-Williams and Tresilian ( 2000), and Wallach and Norris ( 1963) observed an influence of accommodation on depth interpretation (for a negative result, see Ritter, 1977). Heinemann, Tulving, and Nachmias ( 1959) and von Holst ( 1973) observed an influence of accommodation on perceived size. Direct and indirect effects of focus cues were examined in Experiments 1 and 2, respectively. In the first experiment, slant specified by geometric cues (texture and binocular disparity) was varied independently from slant specified by focus cues. Because so many reliable depth cues are available in natural viewing, focus cues should have only a small influence on the recovery of 3-D scene properties in natural conditions. The simulated scenes used in psychophysical experiments are often impoverished in order to study individual cues and their interactions. An example is the sparse random-dot stereogram, which allows researchers to isolate binocular disparity while making all other cues uninformative or unreliable. Focus cues may have more influence under these circumstances. To examine this possibility, we measured the effect of varying focus cues on slant estimates when the stimulus was defined by only binocular disparity or by only the texture gradient. Three observers participated, aged 24–29 years. All had normal vision and stereoacuity. All were experienced psychophysical observers. One (AJW) was naïve to the experimental purpose. The other two knew the general purpose but not the specifics. The layout of the apparatus is schematized in Figure 1. The stimuli were displayed on a conventional 21-in. CRT (KDS VS21e) with 1600 × 1024 resolution. Each pixel subtended 2.9 × 2.9 arcmin. To manipulate the information from focus cues, the monitor was rotated about the vertical axis passing through the center of its front surface. Figure 1. Layout of the apparatus for Experiment 1. The stimulus monitor was straight ahead of the observer at different distances. It could be rotated about a vertical axis passing through the center of its front surface. The response monitor was to the left; observers made an eye movement to view the response figure on this monitor, which was visible only to the right eye. The response figure consisted of two line segments, one horizontal and the other variable in orientation. Observers adjusted the variable segment until the angle between it and the horizontal segment was the same as the perceived slant of the stimulus. Focus cues issuing from the phosphor grid specified a surface that was not exactly a plane for two reasons. (1) The surface containing the phosphor grid was slightly curved, and (2) the grid's virtual distance was affected by refraction due to the front glass plate. (We could not use a flat-panel LCD because the luminance of such displays depends strongly on viewing angle.) Dichoptic presentation of the left- and right-eye images was achieved using CrystalEyes™ liquid crystal shutter glasses. The monitor refresh rate was 100 Hz, so each eye's image was redrawn at 50 Hz. It was crucial to have no artifactual cues to the monitor's slant, so we were careful to eliminate cross-talk through the glasses (aided by drawing the images with the red phosphor only) and to eliminate the observer's ability to see the monitor casing (accomplished by masking the casing and by periodically light-adapting the observer). We checked that observers could not determine the monitor's slant in a pilot experiment. In the monocular-viewing conditions of the main experiment, observers wore a patch over their left eye. We used anti-aliasing to specify the position of stimulus elements to subpixel accuracy. Stimuli were rendered using OpenGL (Segal & Akeley, 2002) and the associated utility library, GLUT (Kilgard, 1996). Precise reproduction of visual directions was achieved using a spatial calibration technique similar to the one described by Backus, Banks, van Ee, and Crowell ( 1999). A wire-filament loom was placed in a known position in front of the monitor and the experimenter aligned individual dots with the loom intersections. During calibration, the experimenter's head was carefully positioned using a bite bar, which was adjusted so as to position the eyes' centers of rotation in known positions relative to the display. Two-dimensional polynomial functions were used to fit the x and y values from the loom calibration to pixel space in which the stimuli were rendered. These equations provided a continuous look-up table relating pixel space and physical screen space. When the stimulus was drawn, each stimulus element (squares or lines) was subdivided into a series of smaller polygons and the position of each vertex of these was corrected using the look-up table. This procedure corrected overall dot positions and line endpoints, and it also closely approximated the correct calibration for the outlines of the stimulus elements. Because of the calibration procedure, the geometric properties of the stimulus were matched for all monitor slants. The spatial calibration procedure was carried out separately for the left and right eyes at each monitor slant used in the experiment. During the main experiment, the observer's head position was stabilized using a conventional chin rest. A sighting technique (Hillis & Banks, 2001) was used to position the chin rest precisely. We chose this method for head constraint to mimic the most common practice in the psychophysical literature. As discussed previously, it is possible that motion parallax resulting from small head movements may have provided an additional cue to the physical slant of the monitor. Possible implications of this, and additional control conditions in which the head was immobilized with a bite bar, are described in the Isolating information from accommodation and blur section. A response figure was presented on a second CRT. It was viewed via a mirror so that observers could respond without making head movements ( Figure 1). The stimuli were planes rotated about the vertical axis (tilt=0°). We independently manipulated two cues to slant: (1) focus cues, which were manipulated by varying monitor slant, and (2) the simulated slant of the surface, which was specified by geometric information from disparity and texture cues. We refer to monitor slant as Sm and simulated slant as Ss, respectively ( Figure 2). Figure 2. Plan view of the stimulus configuration for Experiment 1. The slants Sm and Ss were defined relative to the cyclopean line of sight. Slant in both cases is the angle between the line of sight to the middle of the display monitor (dotted line) and the surface normal for each cue (red and blue lines). Positive slant (shown here) is “right side back”. Ss was specified either by binocular disparity (disparity condition) or by the perspective projection of a textured pattern (texture condition). For all viewing conditions and values of Ss and Sm, the stimulus width was 35° with respect to the cyclopean eye. Figure 3 shows how the stimuli were created. The stimulus generation method was used in both the disparity and texture conditions; only the right-eye's image was displayed in the latter case. The stimulus width was matched with respect to the cyclopean eye (midway between the two eyes). Therefore, its angular extent in the right eye (its width in the texture condition) varied slightly as a function of Ss. The angular extent of the stimulus in either eye (and all other geometric properties) was unaffected by variations in Sm. Stimulus height at the axis of rotation on average was 28°. Due to random aspects of the stimulus generation method, there were small variations in stimulus height and width from trial to trial. The distance from the cyclopean eye to the rotation axis of the stimulus was always 28.5 cm. We chose this distance because it was short enough to create discriminable changes in focal distance while being long enough to allow accurate accommodation. Figure 3. The method of stimulus generation for Experiment 1. Step 1: Coordinates were defined for a homogeneous, frontoparallel pattern (randomly positioned squares or a Voronoi texture) 35° wide, measured at the cyclopean eye (CE, midway between the two eyes). Step 2: This pattern was scaled and translated in x such that after rotation by the angle Ss, it remained 35° wide, measured at the cyclopean eye. Step 3: The left- and right-eye's images were determined by projecting the pattern onto the monitor plane using each eye's position as the center of projection. The screen space was spatially calibrated (see text) so that the visual direction of each point on the stimulus was appropriate, and the retinal images at each value of Ss were geometrically equivalent at each monitor slant, Sm. In the disparity condition, Ss was specified by the difference between left- and right-eye projections (calculated for each observer's inter-ocular distance) of a pattern of randomly positioned square elements. We used squares instead of the more typical Gaussian blobs to provide a better stimulus to accommodation. The initial 2-D pattern ( Figure 3, Step 1) was generated by drawing x and y square positions from a uniform random distribution. The average size of each square was 1.7 × 1.7 mm (1.7 mm ≈0.34° at the center of the stimulus). We minimized the informativeness of the texture cue by presenting few squares—roughly 0.2 square/deg 2—in random positions. We also clipped the stimulus with an elliptical window (whose size and orientation varied randomly within a small range) so that the outline of the stimulus pattern did not provide a cue to Ss. The scaling process ( Figure 3, Step 2) stretched the entire pattern, including the squares, so that when the stimulus was rotated, the angular width of the squares was on average constant across values of Ss. Each eye's view was calculated by finding the intersection with the monitor plane of rays through the stimulus pattern and each eye's center of rotation ( Figure 3, Step 3). Using this procedure, the outline of each square was correctly projected in each eye's view. This meant that the simulated slant of each square was consistent with Ss, and the monocular texture cue in each eye (including square density) was consistent with the disparity-specified slant. We could have used the conventional method, in which stimulus elements are shifted by equal and opposite amounts in the two eyes, thereby creating a texture gradient that is consistent with a frontoparallel plane. It was preferable, however, to use correct perspective projection because the conventional method yields a texture-specified slant of zero, which would have complicated the data interpretation. In the texture condition, Ss was defined by the perspective projection of a Voronoi pattern (de Berg, van Kreveld, Overmars, & Schwarzkopf, 2000; see also Knill, 1998) viewed with the right eye. The stimuli consisted of 320 Voronoi cells on average. To create the Voronoi patterns, the initial pattern consisted of a grid of 20 × 16 regularly spaced points. The x and y coordinates of each point were then perturbed by a random amount in the range ±0.2 times the inter-point spacing (equivalent to 0.36° in the center), and the Voronoi pattern defined by these points was calculated. The resultant had ~0.33 Voronoi cells/deg 2. The stimulus was then scaled, rotated, and perspective projected into the monitor plane following the procedure in Figure 3. As with the random-dot stimulus, each line segment was correctly projected for the slant angle, Ss. The average luminous intensity of a square or line seen through the shutter glasses was 0.9 cd/m2, and the background luminance was 0.01 cd/m2. A new stimulus was drawn on each trial in both the disparity and texture conditions. In our experimental design, it was critical that the geometric information at a given value of Ss was equivalent for all values of Sm. We checked this empirically by viewing a simulated frontoparallel plane (Ss=0°) through the calibration loom. The stimulus was identical at a range of values of Sm. Our stimuli should have been good stimuli to accommodation because they were spatially complex and therefore contained a wide range of spatial frequencies (Charman & Tucker, 1977). Observers reported the amount of perceived slant for each combination of monitor slant ( Sm) and simulated slant ( Ss): 0°, ±10°, ±20°, and ±30°. They did so by setting the angle between two line segments to be equal to the perceived slant of the stimulus. The response figure consisted of a fixed horizontal line and a rotatable oblique line, the former representing the frontoparallel plane and the latter the perceived slant of the experimental stimulus. The oblique line started at a random orientation on each trial and could be adjusted by key presses in either direction in increments as small as 0.5°. This figure was viewed by the right eye in a second monitor via a mirror by making a small eye movement ( Figure 1). Before each trial, a small fixation square (0.35° × 0.35°) was presented in the center of the screen. The square was constructed and calibrated using the same methods as the stimulus. Its simulated slant was always frontoparallel, so it did not provide a cue to monitor slant. Each trial followed the same sequence. The fixation square first appeared for 1 s, then the stimulus for 2 s. Following stimulus offset, the response figure appeared on the second monitor and observers indicated the amount of slant they had seen. The response figure then disappeared followed by a 1-s blank display before the fixation square appeared on the main monitor for the next trial. The fixation square was not present during the stimulus presentation and observers were given no specific instructions about where to look. The observers completed six trials for each Sm × Ss combination in both the disparity and texture conditions: a total of 588 trials. Trials were blocked by monitor slant and viewing condition, both randomly ordered. The apparatus was concealed behind a curtain when observers entered the room, and the experiment was conducted in complete darkness. Observers were always unaware of the monitor's slant (the naïve observer was not aware that the monitor ever rotated). Between experimental blocks, observers were exposed to normal light levels to prevent dark adaptation. Before the main experiment, the observers completed two blocks of practice trials. All three observers reported a clear percept of depth in binocular and monocular conditions, and they were all readily able to do the task. Normalization of slant estimates We cannot know the mapping between perceived slant and response setting, so we used the settings with the cues-consistent stimuli (Sm=Ss) to normalize the other data. We did this by transforming the raw data as follows. For each observer and condition, a response-mapping function was derived by least-squares fitting of a line (y = mx + c) to the mean slant estimates from the subset of the data for which Sm=Ss. If it is assumed that perceived slant was veridical for these cues-consistent stimuli, the settings can then be used as a yardstick to transform the data in the other conditions. The fitted function was used to scale each response into a normalized slant estimate. These values were then used to calculate the points in the data figures. Because the data were merely scaled to make effect sizes equivalent across conditions and observers, the relative effects within each condition were unaffected. The data were in every case well fitted by a line. The slopes of the normalization functions for the disparity (blue-gray bars) and texture conditions (red bars) are shown in Figure 4. The observers' settings in the cues-consistent conditions were reasonably consistent across the disparity and texture conditions with the exception of observer JDB, whose settings in the texture condition were considerably smaller than in the corresponding disparity condition. Despite possible differences in the use of the response measure, it seems likely that the texture-defined planes looked less slanted than the disparity-defined planes to this observer. Figure 4. Effect of slant on slant settings for the cues-consistent stimuli for each observer and condition in Experiment 1. The abscissa values are different observers. Blue-gray and red bars represent the disparity and texture conditions, respectively. The dark-blue and green bars represent two additional monocular conditions, described in isolating information from accommodation and blur. The ordinate values are the slopes of the best-fitting lines relating slant to observer responses in the cues-consistent ( Sm= Ss) subset of the data in each viewing condition. These values were used to normalize the raw responses for each observer (see the Normalization of slant estimates section). Effects of monitor orientation on perceived slant Figure 5 plots each observer's average normalized slant estimates as a function of Sm in the disparity and texture conditions. Different colors represent different values of Ss. The solid lines are the best-fitting lines for each value of Ss. The data are plotted as a function of Sm, so effects of this variable are indicated by deviations from a horizontal line. (The data were fitted with lines for simplicity, but one would expect departures from linearity because the effect of focus cues is likely to vary with Sm; see the Evidence for reliability-based cue weighting section.) The normalization of the data is indicated here by the diamonds on the right side of each panel (see caption for explanation). Figure 5. Average normalized slant estimates for each value of Ss as a function of Sm in Experiment 1. The upper row shows the data for the disparity condition and the lower row the data for the texture condition. The columns show the data for different observers. The horizontal dashed lines represent veridical estimates for each Ss. The colored symbols represent the data, different colors denoting different values of Ss. The circled points are the data for the cues-consistent ( Sm= Ss) conditions. The colored lines are the best fits to the data for each Ss. The error bars in the upper left corner of each panel are ± the average SEM. The diamonds on the right side of each panel indicate the actual response settings for the cues-consistent stimuli at Sm= Ss=±30°. The data were normalized such that the fitted settings at those points plotted at ordinate values of ±30°. Consider the data for observer PRM. In the disparity condition, the data are clearly separated according to Ss, indicating that disparity was an effective slant cue. Monitor slant did not affect his judgments in this condition. For example, his slant estimates in the cues-inconsistent conditions (Sm≠Ss) did not differ noticeably from estimates in cues-consistent conditions (Sm=Ss; circled data points). In contrast, PRM's slant estimates in the texture condition reveal a clear effect of monitor slant. Again, the data are separated according to Ss, indicating that the texture cue was effective. However, for most values of Ss, increasing or decreasing monitor slant had a systematic effect on his estimates, suggesting that focus cues affected perceived slant. The results for observer AJW are similar. In the disparity condition, her slant estimates were less consistent than those of PRM, but there was no systematic effect of monitor slant. Her slant estimates in the texture condition varied systematically with Sm. The results for observer JDB are more variable, but reasonably consistent with those of the other two observers. He showed no effect of monitor slant in the disparity condition and a somewhat inconsistent effect in the texture condition; in his data, the effect of monitor slant in the texture condition is most evident when one compares perceived slant when Sm= Ss to perceived slant when Sm=0 (see Figure 6). Figure 6. Average normalized slant estimates for Sm= Ss and Sm=0 in Experiment 1. Each panel plots the normalized estimates as a function of Ss. Each column shows the data for a different observer. The upper and lower rows show the data from the disparity and texture conditions, respectively. The black circles are the data for Sm= Ss and the blue squares are the data for Sm=0. The lines are best fits to the data. The slopes of the fitted lines to the cues-consistent data are constrained to be 1 as a result of the normalization process. Error bars are ±1 SEM. Implications of the direct effect of focus cues for 3-D displays Figure 6 illustrates implications for viewing simulated scenes as opposed to real scenes. The figure re-plots two subsets of the data: (i) the cues-consistent conditions ( Sm= Ss), and (ii) the Ss=0° condition. Normalized slant estimates are now plotted as a function of Ss instead of Sm. The cues-consistent condition is essentially equivalent to real-world viewing in that all cues specify the same depth structure. The Sm=0 condition is the typical viewing situation in psychophysics in which the display surface is frontoparallel. The lines in Figure 6 are the best fits for each data subset. The slopes of the fitted lines to the cues-consistent data (black lines) are constrained to be 1 as a result of the normalization process. There was no systematic difference in the disparity condition between the cues-consistent and cues-inconsistent conditions. For all three observers, we calculated the difference between slant estimates in the cues-consistent and cues-inconsistent conditions at each value of Ss (except for Ss=0, where the data in the two conditions are the same). The signs of the differences were adjusted so that a negative difference always indicated less estimated slant (stimulus appeared closer to frontoparallel) in the Sm=0 condition irrespective of the sign of Ss. A one-sample t test showed that these difference scores were not significantly different from zero, indicating that slant estimates in the disparity condition were not reliably different in the cues-consistent and cues-inconsistent conditions, t(17)=0.19, p = 0.85. This shows again that focus cues had no direct influence on slant percepts under binocular viewing. In the texture condition, all three observers reported seeing less slant when the monitor was frontoparallel ( Sm=0) compared to when all cues were consistent ( Sm= Ss). The difference score analysis described above showed that this effect was statistically significant, t(17)=4.18, p < 0.001. This suggests again that focus cues affected slant percepts directly under monocular viewing. Isolating information from accommodation and blur To determine if residual motion parallax contributed to the monitor-slant effect, we re-ran the monocular condition with the observers' heads completely stabilized with a bite bar. To determine whether the blur gradient or accommodation made a greater contribution to the monitor-slant effect, we compared performance in two conditions: (1) the eye movement condition, in which observers made two horizontal eye movements from one edge of the stimulus to the other and back during the 2-s presentation, and (2) the fixation condition, in which observers maintained fixation on a small cross (0.75° × 0.75°) in the center of the screen before and during stimulus presentation. Accommodation should have varied much less in the fixation than in the eye movement condition, so by comparing the data in the eye movement and fixation conditions, we could assess the contribution of accommodation. By comparing the data in these two conditions to the original data in Figures 5 and 6, we could determine the contribution of residual motion parallax. The data were normalized using the abovementioned procedure. Figure 4 shows the slopes of the normalization functions for the eye movement (dark-blue bars) and fixation conditions (green bars). Once again, the observers' settings were consistent across conditions with the exception of observer JDB, who made very small settings in the fixation condition (similar to those he made in the original texture condition). Again, this may be because for this observer the surfaces looked less slanted in this condition, although it is unclear why this should have been the case. Figure 7 shows the results of the eye movement and fixation conditions in the same format as Figure 5. Results for the eye movement and the fixation conditions were quite similar to the results in the original texture condition ( Figure 5, see also Figure 9). The similarity between the results in Figures 5 and 7 implies that the monitor-slant effect in the texture condition was not caused by residual motion parallax or by differential accommodation accompanying eye movements. We conclude that retinal blur was the primary cause of the effect of monitor slant under monocular viewing. Figure 7. Average normalized slant estimates for Ss as a function of Sm in the eye movement and fixation conditions in Experiment 1. The upper and lower rows show the data from the eye movement and fixation conditions, respectively. Each column shows data from a different observer. The horizontal dashed lines represent veridical estimates for each Ss. The colored symbols represent the data, different colors denoting different values of Ss. The circled points are the data for the cues-consistent ( Sm= Ss) conditions. The colored lines are the best fits to the data for each Ss. The error bar in the upper left corner of each panel represents ± the average SEM. The diamonds on the right side of each panel indicate the actual response settings for the cues-consistent stimuli at Sm= Ss=±30°. The data were normalized such that the fitted settings at those points plotted at ordinate values of ±30°. Figure 8 re-plots two subsets of the data: the cues-consistent conditions ( Sm= Ss), and the Sm=0° condition, in the same format as Figure 6. The abscissa is Ss. The lines are the best fits for each data subset. The slopes of the fitted lines to the cues-consistent data (black lines) are constrained to be 1 as a result of the normalization process. All three observers in both conditions reported less slant when Sm=0 than when Sm= Ss (cues consistent), with the exception of AJW in the eye movement condition (she, however, showed a consistent effect of monitor slant overall; Figure 7). One-sample t tests were carried out on the differences between slant estimates for Sm= Ss and Sm=0°. The difference in reported slant was statistically significant: observers reported less slant when Sm=0 than when Sm= Ss in the eye movement condition, t(17)=2.63, p < 0.05, and fixation condition, t(17)=3.46, p < 0.01. This suggests again that focus cues affected slant percepts directly under monocular viewing. Figure 8. Average normalized slant estimates for Sm=Ss and Sm=0 in the eye movement and fixation conditions. Each panel plots the normalized estimates as a function of Ss. Each column shows the data for a different observer. The upper and lower rows show the data from the eye movement and fixation conditions, respectively. Both of those conditions were conducted with monocular viewing. The black circles are the data for Sm=Ss and the blue squares are the data for Sm=0. The lines are best fits to the data. The slopes of the fitted lines to the cues-consistent data are constrained to be 1 as a result of the normalization process. Error bars are ±1 SEM. The effects of monitor slant are summarized in Figure 9. The normalized data from EACH observer in each condition were entered into a multiple regression analysis with Sm and Ss as factors. The figure plots the regression weight for Sm separately for each condition and observer, as well as an average weight for each condition. The regression weights are the average weights given to monitor slant across all values of Sm and Ss. Regression weights greater than 0 indicate an effect of monitor slant. No effect was observed in the disparity condition. A consistent effect was observed in the texture condition and it persisted in the eye movement and fixation conditions where head position was fixed. Thus, residual motion parallax with chin rest constraint had no discernible effect, perhaps because the head movements were small. The fact that the effect persisted in the fixation condition, where observers held fixation on one point in the stimulus, suggests also that accommodation accompanying 3-D eye movements had no effect. Figure 9. Regression weights for Sm in Experiment 1. The abscissa values are the three observers and an overall summary. Different colors represent different viewing conditions. The ordinate values are the multiple regression weights for Sm, obtained by entering the slant estimates in each case into a multiple regression analysis with Sm and Ss as factors. The overall weights were calculated by entering the data from all three observers into a single analysis. The regression weights are equivalent to the weights given to Sm in each condition, averaged across all values of Sm and Ss. Error bars are +95% confidence intervals for the regression weights. With monocular viewing, observers' slant estimates were systematically affected by the orientation of the monitor surface (Sm). Observers reported seeing more slant when Sm=Ss (as occurs with real stimuli) than when Sm=0 (as usually occurs in psychophysical experiments and with most 3-D displays). The effect for the conditions of our experiment was small but quite consistent. These results show that information from focus cues (specifically, retinal blur) can, under monocular viewing, contribute directly to the visual system's estimate of 3-D surface orientation. Evidence for reliability-based cue weighting We next asked whether our findings are consistent with reliability-based cue weighting ( Equations 1 and 2). To answer this, we first estimated the reliabilities of focus cues as well as texture and disparity cues for our stimuli. We then used those reliabilities to predict perceived slant for each combination of Sm and Ss in our experiment. Although we did not determine the single-cue reliabilities by our own experimental measurements, the exercise is useful for understanding the data. According to Equation 2, the reliability of each cue is the normalized reciprocal variance of the underlying estimator for that cue. To estimate this variance for the disparity and texture cues, we used previous slant discrimination measurements for each cue in isolation. To estimate this variance for focus cues, we simulated slant discrimination from blur using previous measurements of the visual system's depth of focus. Figure 10 plots estimates of the JNDs for slant from disparity, texture, and focus cues as a function of surface slant (tilt=0) and distance. Details of the calculations are provided in Appendix A. Figure 10. JND estimates for slant from disparity, texture, and focus as a function of slant and viewing distance. The different colored surfaces represent JNDs based on the individual cues. The disparity and texture JNDs were estimated from the data of Hillis et al. ( 2004) (see Appendix A). The focus JNDs were estimated by calculations described in Appendix A. The calculations determined how much slant would be required for the difference in defocus at the nearest and farthest points in the stimulus plane to exceed the visual system's depth of focus. The estimated JNDs from focus cues become very large at far distances and small slants, so the top portion of the focus surface has been clipped at 40°. The texture JNDs were estimated from measurements made by Hillis et al. ( 2004) for monocularly viewed Voronoi patterns (see Appendix A). They are represented by the orange surface in Figure 10. Texture JNDs decrease with increasing slant because the image changes associated with a given change in slant increase (Blake, Bülthoff, & Steinberg, 1993; Knill, 1998). Texture JNDs do not change with distance because doubling the size of a given textured surface and viewing it from twice the distance leaves the retinal image unchanged. The disparity JNDs were derived from discrimination thresholds for slant from disparity alone (Hillis et al., 2004), measured using sparse random-dot stereograms (see Appendix A). They are represented by the blue surface in Figure 10. Disparity JNDs increase with viewing distance because the magnitude of binocular disparities for a given depth difference decreases with increasing viewing distance (Howard & Rogers, 2002; Ogle, 1950). Disparity JNDs also vary with slant, which is expected from the viewing geometry (Hillis et al., 2004). The variation is distance dependent: JNDs increase with slant at long viewing distances and decrease with slant at short ones (see also Banks, Hooge, & Backus, 2001; Knill & Saunders, 2003). The steep rise at large slant and short viewing distance probably reflects the influence of the disparity-gradient limit. In that situation, the horizontal disparity gradient increases significantly, and the two retinal images are difficult to fuse (Banks, Gepshtein, & Landy, 2004; Burt & Julesz, 1980; Hillis et al., 2004). We could not measure thresholds for slant from blur independent of other slant cues, but we can make a rough estimate of JNDs by considering how much slant would be required for the difference in defocus at the nearest and farthest points in the stimulus plane to exceed the visual system's depth of focus. We did this calculation for each combination of slant and distance in Figure 10, using the same stimulus viewing frustum as Experiment 1 (see the Methods section). Details are provided in Appendix A. For our dim viewing conditions, pupil size was 5–7 mm (Wyszecki & Stiles, 1982), so depth of focus was approximately ±0.33 diopters (Campbell, 1957; Charman & Whitefoot, 1977; Green & Campbell, 1965; Green et al., 1980). The red surface in Figure 10 represents the estimated focus JNDs as a function of slant and distance. The focus JNDs are generally larger than the disparity and texture JNDs, but the differences depend on slant and viewing distance. Specifically, focus-cue JNDs increase with increasing distance and decrease with increasing slant. The optimal cue-combination scheme ( Equations 1 and 2) predicts therefore that focus cues should have little effect on 3-D percepts for many viewing situations. At short distances and large slants, however, focus JNDs can be equal to or less than those for disparity and texture. In these cases, optimal combination predicts a noticeable effect of focus cues. We can use the estimated JNDs to derive predictions of the effect of focus cues in the conditions of our experiment. The left panel in Figure 11 plots the estimated JNDs for slant from disparity, texture, and focus cues for the range of slants (±30°) and the viewing distance (28.5 cm) used in Experiment 1. From those JNDs, we estimated the standard deviations of the estimators associated with disparity, texture, and focus cues. Then using Equations 1 and 2, we calculated the slants an observer would perceive if he weighted the three cues optimally. The middle and right panels show those predicted perceived slants, plotted in the same format as Figures 5 and 7. In the disparity condition, the optimal cue combination predicts a small effect of monitor slant because the standard deviation of the disparity estimator is generally small relative to that of focus cues. In the texture condition, the model predicts a more systematic effect of monitor slant because in many cases the standard deviation associated with the competing cue—the texture gradient—does not differ very much from the standard deviation associated with focus cues. Figure 11. Estimated slant JNDs and predicted results for Experiment 1. Left: Estimated JNDs for slant from disparity, texture, and focus cues, plotted as a function of slant at the 28.5 cm viewing distance used in Experiment 1. The curves are a slice through the contours of Figure 10. Middle: Predicted perceived slant for the disparity-defined stimulus. Right: Predicted perceived slant for the texture-defined stimulus. The format of the middle and right panels is the same as Figures 5 and 7. The curves are plotted as a function of Sm; each color represents a different value of Ss. The variance of each cue's slant estimate was calculated from the estimated JNDs in the left panel. The predicted perceived slants were calculated using those variances and the cue-combination scheme described by Equations 1 and 2. Our empirical findings ( Figures 5 and 7) are generally quite similar to these predictions. The data exhibit a small but consistent effect of monitor slant in the texture condition; that effect is similar in magnitude to the predicted effect. The data reveal no effect of monitor slant in the disparity condition, while a very small effect is predicted. From a multiple regression analysis of the predictions and data, we find that the average predicted weights given to focus cues in the disparity and texture conditions were 0.07 and 0.15, respectively, and that empirical weights were 0.01 and 0.12 ( Figure 9). Despite this general similarity, the model does not capture the details of the empirical findings. In particular, in the monocular viewing conditions we found a significant difference between slant estimates in the cues-consistent ( Sm= Ss) and the Sm=0° conditions. The model predicts only small differences between these conditions because focus-cue JNDs are large when Sm=0. It is important to note that our predictions are based on a simple and untested model of how the visual system discriminates changes in slant from focus cues ( Appendix A). We do not know how the brain actually computes slant from those cues. Therefore, it is quite possible that the discrepancy between the predictions and observed effects of focus cues resulted from inadequacies in our model. Furthermore, the reliability of slant estimates from focus cues surely depends on several factors including the spatial frequency, luminance, and contrast of the stimulus, as well as on fixation patterns and pupil size. Thus, a proper analysis would require empirical measurement of slant from focus for the stimuli used in the main experiment. Nonetheless, our analysis yields insight into the informativeness of focus cues as a function of slant and viewing distance, and the relationship to the informativeness of texture and disparity. Under reasonable assumptions, the pattern of effects across conditions in our empirical data was generally consistent with reliability-based cue weighting. We examined only two conventional depth cues, disparity and texture, so it remains to be determined whether inappropriate focus cues also contribute to perceived depth for stimuli defined by other conventional cues. Experiment 1 revealed that focus cues can have a direct effect on 3-D percepts. Accommodation could also affect perceived depth indirectly through the process of disparity scaling. The disparity ( δ) created by two points in space is related to viewing distance as follows:  | (3) |
where Δ D is depth, D is viewing distance, and I is inter-pupillary distance (Howard & Rogers, 2002). To recover Δ D from δ, D must be estimated. We know that viewing distance is estimated from the eyes' vergence and the horizontal gradient of vertical disparity (Rogers & Bradshaw, 1993, 1995). In principle, it could also be estimated from accommodation. In computer displays, the focal distance to the display surface is fixed and often quite different from the simulated distances in the virtual scene. If the stimulus to accommodation (the focal distance of the display surface) affects the estimate of viewing distance, the distance to simulated points nearer than the display surface will be overestimated and the distance to points farther than the display surface will be underestimated. Such estimation errors might affect disparity scaling and hence the depth interpretation. There have been many studies of disparity scaling (e.g., Bradshaw, Glennerster, & Rogers, 1996; Glennerster, Rogers, & Bradshaw, 1996; Johnston, Cumming, & Parker, 1993; O'Leary & Wallach, 1980; Rogers & Bradshaw, 1993, 1995; van Damme & Brenner, 1997), but only one (Ritter, 1977) examined the contribution of focal distance directly, and he observed no effect. Frisby et al. ( 1996) observed veridical disparity scaling with real stimuli. The general consensus is that disparity scaling is most accurate when multiple cues are available and consistent with one another (e.g., vergence, vertical disparity, familiar size), but that scaling is usually non-veridical. At near viewing distances, the visual system behaves as if distance is overestimated, and at far distances, as if distance is underestimated (Collett, Schwarz, & Sobel, 1991; Foley, 1980; Glennerster et al., 1996; Johnston, 1991; Johnston et al., 1993; Rogers & Bradshaw, 1995; Wallach & Zuckerman, 1963). Although many of these studies varied display distance and simulated distance concordantly, this pattern of results is also generally what one would expect if focal distance (of a fixed display) affects the distance used for disparity scaling. In Experiment 2 we examined the contribution of accommodation to the estimate of the distance used to scale horizontal disparities. In particular, we examined the indirect influence of focal distance on disparity scaling by independently manipulating vergence distance (by varying absolute disparity) and focal distance, referred to hereafter as accommodative distance (by varying the distance to the display). The experiment required observers to decouple vergence and accommodation, which many people find difficult (Judge & Miles, 1985; Wann & Mon-Williams, 1997). We piloted the experiment on 12 observers and chose the four who could fuse the stimulus in all the conditions. They were 24, 25, 28, and 40 years old. Two had normal uncorrected vision and two wore their normal corrective lenses during the experiment. All four had normal stereoacuity and were experienced psychophysical observers. All were naïve to the specific purposes of the experiment. We used the same apparatus and stimulus-rendering techniques as in Experiment 1 except that only the stimulus display monitor was used. Monitor slant was always zero. The observer's head position was restrained using the same bite-bar apparatus. Distance to the display was varied by moving the bite bar relative to the monitor. Spatial calibration was done for each eye at each of the three viewing distances. To ensure precise, repeatable positioning of the observers, the table holding the bite bar was fixed with drilled holes in the floor. We used a task similar to the apparently circular cylinder task (Johnston, 1991). The stimuli were concave hinges: two planes slanted about the vertical axis (tilt=0°) and joined at their far point to form an “open book”. The dihedral angle between the two planes is the hinge angle. Observers indicated whether the perceived hinge angle was larger or smaller than 90°. With a related stimulus, Johnston, Cumming, and Landy ( 1994) showed that observers perceive the 3-D shape veridically when depth cues (disparity, texture gradient, and motion parallax) are consistent. The surfaces were defined by sparse randomly positioned squares. Square size and density were constant at 0.18° and 1.6 squares/deg 2, respectively. The width and height of the stimuli measured from the cyclopean eye were on average constant across hinge angles at 8.5° and 2°, respectively. A small random perturbation was added to both on each trial. The stimulus was clipped by an elliptical aperture to make the outline shape uninformative. To create the stimulus, we first rendered two frontoparallel grids of regularly spaced squares, one for each plane of the hinge. Each square's position was then jittered by a random amount horizontally and vertically in the range ±1.25 times the inter-square separation. Each plane was then rotated about the vertical axis by the appropriate amount for the desired hinge angle. Overlapping squares at the intersection of the hinge were deleted. The position of each square on the display surface was determined separately for the left- and right-eye's images by calculating where projections from the each eye's position intersected the monitor plane (calculated for each observer's inter-pupillary distance); the texture gradient was always appropriate for the slants of the two planes and the individual squares had disparities consistent with the simulated surface slant. We were careful not to introduce uncontrolled cues into our stimulus that could confound the measurements. Previous studies have often used dense random-dot stereograms in which disparity was introduced by shifting dots horizontally in each eye's view. In such stimuli, the monocular texture gradient at each eye specifies a frontoparallel surface, which could cause objects to appear flatter than if disparity were the only informative cue. Moreover, the reliability of the disparity cue decreases with increasing distance while the reliability of the texture cue does not (Hillis et al., 2004), so the texture cue would likely be given more weight with increasing viewing distance, causing the stimulus to appear increasingly flattened. This is the same pattern of results that would be produced by misestimates of the distance for disparity scaling (Hillis et al., 2004). To minimize the probability of this bias, perspective projection of the stimulus was correct in each eye's view (the texture cue was consistent with disparity). We also attempted to minimize the contribution of the texture cue by presenting few stimulus elements. A pilot experiment confirmed that observers could not do the task with monocular information. Although vertical disparities were correct for the simulated distance, we minimized their influence by using short stimuli (Backus et al., 1999; Rogers & Bradshaw, 1995). In this way we could isolate the effects of vergence and accommodation. Of course, our stimulus contained a blur gradient consistent with a flat surface, but the failure to observe an effect of blur gradient with disparity-defined stimuli in Experiment 1 implies that the gradient did not affect performance in the direct sense in Experiment 2. On each trial, the entire hinge stimulus was rotated around the vertical axis (at the intersection of the hinge) by an angle chosen randomly in the range ±10°, so observers could not do the task by estimating the slant of one plane. A fixation cross was drawn in the center of the stimulus at the same depth as the intersection of the hinge planes. The cross was a good stimulus to accommodation, although it was slightly blurred by anti-aliasing. Because the cross was displayed at the intersection of the hinge, its vergence- and accommodation-specified distances were unaffected by changes in hinge angle. The room was dark except for the stimulus. The frame of the monitor was not visible. We were concerned that observer biases would affect the settings and would therefore affect the interpretation of the data. So we ran a pretest with a real hinge in which all cues were available and consistent and compared those data with the data from the computer-displayed stimuli. The real hinge consisted of two plywood planes (tilt=0) joined at their inside edges. The visible surfaces were covered with paper on which a Voronoi pattern like the one in Experiment 1 was printed. Pattern size was changed at each viewing distance so that the average angular size of the Voronoi cells and the line thickness were constant. The stimulus was illuminated by a diffuse light positioned so that there was no detectable variation in shading with changes in hinge angle. Observers viewed the stimulus through circular apertures, one for each eye, so that the visible portion of the stimulus had a diameter of 20°. At this size vertical disparities could provide reliable distance information (Rogers & Bradshaw, 1995). Head position was restrained using a chin-and-forehead rest. Nothing was visible except the hinge stimulus itself. The hinge was moved up or down behind the apertures after each trial so that different parts of the Voronoi patterns were presented on each trial. The entire hinge assembly was rotated from trial to trial in the range ±10° (as with the simulated hinge). Viewing distance was varied by positioning the observer at different distances from the apparatus. In the simulated surface conditions, each combination of accommodation distance (Da) and vergence distance (Dv)—28.5, 57.0, and 85.5 cm—was presented for a total of nine conditions. In the real-surface condition, the same set of distances was presented for a total of three conditions. The hinge task has significant advantages over other methods. For instance, it avoids the problem of response quantization (which can occur when observers are asked to report numerical estimates), and the problem of also having to estimate width in the apparently circular cylinder task (Johnston, 1991; Johnston et al., 1994). On each trial the fixation cross was presented for 2 s, followed by the hinge stimulus and the fixation cross for another 2 s. Observers were told to fixate the cross throughout each presentation. If they were unable to fuse the cross before the stimulus appeared, the trial was discarded; this rarely occurred. Observers indicated on each trial whether the hinge angle was larger or smaller than a right angle. We used 2-down/1-up and 1-down/2-up staircase procedures to vary the hinge angle. Each staircase was terminated after 12 reversals, resulting in 120–150 trials per condition. The responses were used to construct psychometric functions (percentage of responses that the angle was larger than 90° as a function of the specified angle). The 50% point was estimated by a maximum-likelihood procedure (Wichmann & Hill, 2001). That point served as the estimate of the dihedral angle that was perceived as 90°; we refer to this angle as the PSE. Our method allowed rapid measurement, which was important because vergence accommodation dissociations can cause response adaptation and fatigue (Schor & Tsuetaki, 1987). Before each trial in the real-surface condition, observers' eyes were closed as the experimenter set the hinge to the appropriate angle for the next presentation. They then opened their eyes and indicated whether the hinge angle was larger or smaller than 90°. Then they closed their eyes in preparation for the next trial. Viewing time was not strictly controlled but was usually 2–3 s. The hinge angle was again varied according to 2-down/1-up and 1-down/2-up staircases. The simulated surface conditions were run in blocks in which Dv was varied and Da was constant. There were three values of Dv, so a block consisted of three randomly interleaved staircases. The two staircase rules were run in different blocks. Da was varied across blocks. Thus, there were six blocks in the simulated surface condition: three values of Da and two staircase rules. The real-surface conditions were run with one distance and one staircase rule in each block. There were again six blocks, but the blocks were briefer. Each observer completed all 12 blocks in random order, over several days. It is important to distinguish the stimulus to vergence from the vergence response, and the stimulus to accommodation from the accommodative response. We manipulated the stimuli to vergence and accommodation, which presumably produced changes in the responses, but we do not know how well correlated the responses were with changes in the stimuli because we did not measure the responses per se. Therefore, as we discuss the means by which vergence and accommodation contribute to distance estimation in disparity scaling, we will refer to the stimuli—vergence-specified distance (Dv) and accommodation-specified distance (Da)—and not to the responses. The PSE in different conditions—the hinge angle perceived on average as a right angle—was used to determine the equivalent distance in the various conditions. The method for calculating the equivalent distance is schematized in Figure 12. Each hinge angle PSE corresponds to a pattern of horizontal disparities that was perceived as a right angle. The disparity pattern can be expressed as the horizontal size ratio (HSR)—the ratio of the widths of a small surface patch in the left- and right-eye's images (Rogers & Bradshaw, 1993)—for both planes of the hinge. For straight-ahead viewing, slant is  | (4) |
where D is viewing distance and I is inter-pupillary distance (Howard & Rogers, 2002; Ogle, 1950). Equation 4 shows that a given HSR is consistent with many slants and therefore many hinge angles depending on what the distance is. Figure 12. The calculation of equivalent distance for Experiment 2. (a) Example psychometric functions for observer CRLC at an accommodative distance ( Da) of 57 cm. The proportion of trials in which he responded “greater than 90°” is plotted as a function of the hinge angle. Different colors represent different vergence distances ( Dv). The vertical lines indicate the PSE, the hinge angle that was judged as greater than 90° 50% of the time. The size of the data points is proportional to the number of trials at that point. (b) Each curve is an “iso-disparity” line showing different hinge angles/distances consistent with the pattern of disparities defined by each of the PSEs from panel a. The horizontal lines show the simulated distance in each case. (c) The curves are the same as in panel b. The arrows show the distance at which the pattern of disparities associated with each PSE are actually consistent with a 90° hinge angle (using the relationship in Equation 4). This is the equivalent distance. Figure 12a plots example psychometric functions at an accommodation distance ( Da) of 57 cm for each of three vergence distances ( Dv). Figure 12b shows the range of hinge angles that is consistent with the disparity pattern specified by the PSE (calculated by rearranging Equation 4, and considering the two planes of the hinge separately). The range of hinge angles consistent with the disparity pattern is large because the disparities specify different angles at different distances. Assuming that the observer's internal standard for 90° is unbiased (and that the visual system measures disparities without bias), the disparity pattern associated with the observer's setting would specify a right angle at some distance. This is equivalent distance shown in Figure 12c. Figure 12a shows that for the near vergence distance (28.5 cm, red lines and symbols) an angle larger than 90° looked like a right angle. The distance at which this disparity pattern specifies a 90° hinge angle is 44.8 cm (red curve and arrow in Figure 12c). This suggests that viewing distance, which was 28.5 cm according to vergence, was overestimated. The blue lines and symbols denote the data for the far vergence distance ( Dv=85.5 cm); they show the converse pattern, consistent with an underestimate of viewing distance. Performance at the middle distance (green) was close to veridical. These data show a pattern of underconstancy with respect to changes in vergence-specified distance, in which near distances are overestimated and far distances are underestimated. Figure 13 shows the equivalent distances for the simulated and real surfaces as a function of the vergence-specified distance ( Dv). The black points and lines represent the data from the real-surface measurements in which all cues were consistent and the colored points and lines represent the data from the simulated surface measurements, each color representing a different accommodative distance ( Da). The circled points are the cues-consistent data: the subset of simulated surface data for which Da= Dv. Figure 13. Equivalent distance as a function of vergence-specified distance in Experiment 2. The panels show data from different observers. The dotted diagonal lines represent veridical performance with respect to changes in Dv. The red, green, and blue symbols represent the simulated surface data for Dv=28.5, 57.0, and 85.5 cm, respectively. The colored lines are the best fits to these data. The data points for the cues-consistent conditions ( Da= D |