 |
| Volume 4, Number 12, Article 12, Pages 1136-1169 |
doi:10.1167/4.12.12 |
http://journalofvision.org/4/12/12/ |
ISSN 1534-7362 |
Crowding is unlike ordinary masking: Distinguishing feature integration from detection
Denis G. Pelli |
Psychology & Neural Science, New York University, New York, NY, USA |
|
Melanie Palomares |
Psychology & Neural Science, New York University, New York, NY, USA |
|
Najib J. Majaj |
Center for Neural Science, New York University, New York, NY, USA |
|
Abstract
A letter in the peripheral visual field is much harder to identify in the presence of nearby letters. This is “crowding.” Both crowding and ordinary masking are special cases of “masking,” which, in general, refers to any effect of a “mask” pattern on the discriminability of a signal. Here we characterize crowding, and propose a diagnostic test to distinguish it from ordinary masking. In ordinary masking, the signal disappears. In crowding, it remains visible, but is ambiguous, jumbled with its neighbors. Masks are usually effective only if they overlap the signal, but the crowding effect extends over a large region. The width of that region is proportional to signal eccentricity from the fovea and independent of signal size, mask size, mask contrast, signal and mask font, and number of masks. At 4 deg eccentricity, the threshold contrast for identification of a 0.32 deg signal letter is elevated (up to six-fold) by mask letters anywhere in a 2.3 deg region, 7 times wider than the signal. In ordinary masking, threshold contrast rises as a power function of mask contrast, with a shallow log-log slope of 0.5 to 1, whereas, in crowding, threshold is a sigmoidal function of mask contrast, with a steep log-log slope of 2 at close spacing. Most remarkably, although the threshold elevation decreases exponentially with spacing, the threshold and saturation contrasts of crowding are independent of spacing. Finally, ordinary masking is similar for detection and identification, but crowding occurs only for identification, not detection. More precisely, crowding occurs only in tasks that cannot be done based on a single detection by coarsely coded feature detectors. These results (and observers’ introspections) suggest that ordinary masking blocks feature detection, so the signal disappears, while crowding (like “illusory conjunction”) is excessive feature integration — detected features are integrated over an inappropriately large area because there are no smaller integration fields — so the integrated signal is ambiguous, jumbled with the mask. In illusory conjunction, observers see an object that is not there made up of features that are. A survey of the illusory conjunction literature finds that most of the illusory conjunction results are consistent with the spatial crowding described here, which depends on spatial proximity, independent of time pressure. The rest seem to arise through a distinct phenomenon that one might call “temporal crowding,” which depends on time pressure (“overloading attention”), independent of spatial proximity.
 |
|
History
Received October 17, 2001; published December 30, 2004
Citation
Pelli, D. G., Palomares, M., & Majaj, N. J. (2004). Crowding is unlike ordinary masking: Distinguishing feature integration from detection.
Journal of Vision, 4(12):12, 1136-1169,
http://journalofvision.org/4/12/12/,
doi:10.1167/4.12.12.
Keywords
crowding, masking, peripheral vision, feature integration, illusory conjunction, critical spacing, letter identification, object recognition, isolation field, integration field, second-order mechanisms
for related articles by these authors
for papers that cite this paper |
Object identification involves the moderately well
understood process of feature detection, followed by a mysterious
“integration” process that combines the detected features to produce
a classification decision. The purpose of this paper is to characterize
“crowding.” Crowding is excessive integration, which spoils
identification and reveals the inner workings. With this characterization in
hand, one can address some longstanding questions about object identification,
such as whether faces are recognized by parts and the roles of letter and word
recognition in reading (Martelli, Pelli, & Majaj, in press; Su, Berger, Majaj, & Pelli, 2004).
Crowding and ordinary masking are special cases of
masking. In general, “masking” refers to the impairment of the
discriminability of a signal by another pattern. Ordinary masking, such as
masking by gratings (Legge & Foley, 1980;
Swift & Smith, 1983; Levi, Klein, &
Hariharan, 2002) or noise (Stromeyer &
Julesz, 1972; Pelli & Farrell, 1999), is usually only effective when the mask
overlaps the signal. However, in the normal periphery or the amblyopic fovea,
neighboring letters with no overlap severely impair the identification of a
signal letter (Korte, 1923; Ehlers, 1936, 1953;
Bouma, 1970; Anstis, 1974; Flom, 1991). This particular masking phenomenon is
called “crowding” (Stuart & Burian, 1962; for historical review, see Strasburger,
Harvey, & Rentschler, 1991, or
Strasburger, 2002). Crowding is not
specific to letters. We will argue that ordinary masking occurs when signal and
mask stimulate the same feature detector and that crowding occurs when signal
and mask stimulate different feature detectors that both reach the same feature
integrator (where features are combined to recognize an object).
Despite progress in vision research, we still can only
barely begin to answer a simple question like, “How do I recognize the
letter A?” The literature on grating detection, with the ideas of spatial
frequency channels (feature detectors) and probability summation, offers a good
answer to the easier question, “How do I tell whether the screen is
blank?” The answer goes under many names, including “channels”
and “probability summation.” We follow Graham ( 1980) in calling it “feature
detection.” The observer has many independent units, “feature
detectors,” each with a receptive field (linear weighting over space and
time, summed to yield one number) followed by a nonlinear process that results
in a sharply increasing probability of response with contrast. 1, 2 The image that matches a detector’s
receptive field is called its “feature.” All feature detectors
operate independently and the observer detects a displayed image if and only if
any of the detectors do (Brindley, 1960;
Quick, 1974; Graham, 1980, 1989; Robson & Graham, 1981).
In the various relevant papers, the word
“feature” sometimes refers, as above, to the elementary component of
the visual analysis (e.g., Graham, 1980)
and sometimes refers to the labeled value (e.g., “red” or
“A” or “triangle”) of a stimulus dimension that the
experimenter chose to vary (e.g., Treisman & Schmidt, 1982, p. 139). We will be referring to
elementary features, except in the Section 4.7
discussion of illusory conjunctions and Feature Integration Theory.
Elementary-feature detection provides a good account of
detecting simple targets (i.e., for which detection of a single feature suffices
for a correct response). However, identifying (or detecting a second-order
signal) usually requires combining the information from several feature
detections to respond correctly (see Chubb, Olzak, & Derrington, 2001). This (nonlinear) assembly process is called
“feature integration” (or “binding”). Feature
integration may internally represent the combined features as an object, but we
will not address that here. We will suggest that crowding is excessive feature
integration, integrating over an inappropriately large area that includes the
flanking mask as well as the signal.
This Introduction presents a simple intuition
( Section 1.1) that brings together ideas about
feature detection ( 1.2) with facts of ordinary
masking ( 1.3) and crowding ( 1.5). Later, in Discussion, we will review the close
connection between crowding and illusory conjunction ( 4.7).
This paper characterizes crowding, distinguishing it
from ordinary masking. We believe that the term “crowding” should
encompass not just the original task of identifying a letter among letters in
the periphery (or amblyopic fovea), but also any other task with similar
results: critical spacing proportional to eccentricity and independent of size.
A diagnostic test is proposed in Discussion (Section
4.1).
Past attempts to characterize and explain crowding have each varied a few parameters in similar tasks. In this experimental and theoretical synthesis we have tried to be more comprehensive. As we attempt to put it all together into one story, there are many points of agreement between our proposed explanation and earlier suggestions, but there are also some important differences. What is new here arrived late in the process, forced upon us by the data, after a long period of stumbling in the dark.
Perhaps the most important new fact emerging from this
union of old and new results is the effect of which task the observer is
assigned. In ordinary masking the signal disappears, so the observer cannot say
anything about it, and fails all tasks (Thomas, 1985b). Many investigators have assumed that
this would be true of crowding as well (e.g., see Cavanagh, 2001). But, in fact, conditions of crowding
that severely impair identification of a letter (reported here) or orientation
of a grating (Wilkinson, Wilson, & Ellemberg, 1997) have little or no effect on the
detectability of the target. Observers report seeing a jumbled target that
incorporates features from the mask. We struggled with this
detection/identification dichotomy for a long time, and failed in our attempts
to crowd gratings, until we eventually realized that the dichotomy is more
subtle than just detection versus identification. All the tasks susceptible to
crowding are tasks that, with some plausible assumptions, require more than one
feature-detection event (a “conjunction” of several feature
detections). Tasks that require only a single feature-detection event are
immune, or nearly so. This parallels the dichotomy found in searching for one
feature versus a conjunction of features — a feature pops out and a
conjunction does not 3 — and is strong
evidence that crowding interferes with feature integration, not feature
detection. The multiple detections must be integrated, and that integration is
susceptible to crowding; the single detection doesn’t need to be
integrated, so there’s no crowding.
Previous authors, aware that ordinary masking is
selective, have shown that crowding too is selective (e.g., Kooi, Toet,
Tripathy, & Levi, 1994). Here we compile old
and new results showing that the selectivity of crowding is vastly broader than
that of ordinary masking. Ordinary masking reveals the narrow selectivity of a
feature detector (the first stage), whereas crowding reveals the broad
selectivity of a feature integrator (the second stage).
It is more-or-less established that in ordinary masking
the same feature detector mediates the effects of mask and signal (Legge &
Foley, 1980; Foley & Chen, 1999; Wilson & Kim, 1998). 1
A new finding, the effect of mask contrast as a function of spacing ( Section 3.6), provides strong evidence that, in crowding,
distinct feature detectors mediate the effects of mask and signal.
We survey the literature on illusory conjunctions at
the end of Discussion (Section 4.7), but the
only prerequisite for reading that section is the vocabulary established here in
the Introduction. Most of the illusory
conjunction papers’ results are consistent with crowding, as defined here,
but a few papers, including Treisman and Schmidt ( 1982), describe a different phenomenon that
we will call “temporal
crowding.” 1.2 Feature detection and integration
The familiar notion that the observer detects features
(components of the image) independently and then integrates them to perceive an
object goes back to Weber’s ( 1834, 1846) and Sherrington’s ( 1906) suggestions, based on their
psychophysical evidence, that neural receptive fields mediate the sense of
touch. Indeed, simply supposing that independent detection of features is a
necessary first stage of vision (i.e., cannot be bypassed) implies that any
observer response (e.g., object recognition) that communicates information about
a combination of features must be based on an integration (combination) of
several detected features (e.g., Selfridge, 1959; Neisser, 1967; Campbell & Robson, 1968; Thomas, Padilla, & Rourke, 1969; Rosch & Lloyd, 1978; Treisman & Gelade, 1980; Sagi & Julesz, 1984; Olzak & Thomas, 1986). Despite its appealing simplicity, feature
detection has been hard to establish convincingly. The grating detection
literature is convincing (e.g., Campbell & Robson, 1968; Robson & Graham, 1981; Graham, 1989), but that leaves open the possibility
that other tasks and targets (e.g., identifying letters) might bypass feature
detection. Judging whether or not a screen is blank, as one does in detection
experiments, might not be representative of what the visual system can do. Some
capabilities might appear only for important highly practiced tasks, like
reading faces or text. Part of this concern is allayed by the finding that
thresholds for identifying letters, across the entire range of size, font, and
alphabet, is accounted for by a slight extension of the standard
“probability summation” model of independent feature detection
(Pelli, Burns, Farell, & Moore, in
press). Finding, as predicted by feature detection, that efficiency for
identification is inversely proportional to complexity (number of features),
even when highly practiced, is strong evidence that observers cannot bypass the
feature-detection bottleneck (Pelli, Farell, & Moore, 2003).
We all want to know how features are integrated, but
findings to date provide only hints as to the nature of this computation.
Perception of coherent motion of two-grating plaids is based on a nonlinear
combination of the two grating components (Adelson & Movshon, 1982) and some MT neurons actually implement
this combination rule (Movshon, Adelson, Gizzi, & Newsome, 1986). Speed discrimination is affected by
whether the components are perceived to form an object (Verghese & Stone, 1995, 1996). Applying the classic summation
paradigm to motion discrimination and texture segregation reveals the exponent
of the nonlinear combination of multiple components (Morrone, Burr, & Vaina,
1995; Graham & Sutter, 1998). Accounts of texture discrimination suppose linear combination of nonlinearly transformed feature detection signals (for review, see Chubb et al., 2001; Landy & Graham, 2004). Visual search and crowding experiments have also contributed hints, as we will see below. Accounts of the feature integration that underlies identification of objects are more speculative. Much of the debate has distinguished the recognition-by-components approach championed by Biederman ( 1987) from the alignment approach championed by Ullman and Poggio (see Tarr & Bulthoff, 1998).
Alas, putting together the hints from all these studies fails to provide clear
guidance as to how to address the larger question of what kind of computation
underlies object
recognition.
Masking provides an important part of the evidence for
feature detection. Masking goes beyond the narrow domain of the question,
“Is the screen blank?” to examine the effect of an irrelevant
background mask on visibility of the signal. In ordinary masking, it is
generally supposed that the mask affects the visibility of the signal only to
the extent that the mask stimulates the receptive fields of the feature
detectors that pick up the signal. We will argue that crowding cannot be
explained as ordinary masking (i.e., mediated by mask stimulation of the feature
detector(s) that detect the signal).
Ordinary masking is most effective when the mask has
more or less the same spatial frequency, orientation, and location as the signal
(Legge & Foley, 1980; Phillips &
Wilson, 1984; Levi, Klein, et al., 2002). Critical-band masking experiments have
shown that the spatial frequency tuning of grating detection (Greis &
Rohler, 1970; Stromeyer & Julesz, 1972; Solomon & Pelli, 1994) and letter identification (Solomon
& Pelli, 1994; Majaj, Pelli, Kurshan,
& Palomares, 2002; Chung, Levi, &
Legge, 2001) is 1.6 octaves wide. And it is
independent of eccentricity, having the same tuning in central and peripheral
vision (Mullen & Losada, 1999).
Ordinary masking has very similar effects on detection
and identification (Thomas, 1985a, 1985b). As we shall see, our results show
that crowding affects only identification, not detection. (We would expect
crowding to affect detection of second-order signals, but no one has tried it
yet.) With no mask, threshold contrasts for identifying a signal are usually
higher than for detecting it, but, for a wide range of signal size (Pelli et
al., in press) and viewing
eccentricities (Raghavan, 1995; Thomas, 1987), identification and detection
thresholds are in a constant ratio (also see Graham, 1985). In critical band masking studies,
channel frequencies for detection and discrimination (of letters and gratings)
are the same (Majaj et al., 2002). Threshold
contrasts for identification and detection have similar dependence on mask
contrast (Raghavan, 1995; Pelli, Levi, &
Chung, 2004). These characteristics of
ordinary masking are evidence for the popular idea that ordinary masking impairs
discriminability of the signal by directly stimulating the feature detector that
mediates our judgments about the signal. The very different characteristics of
crowding will require a different kind of
explanation.
We restrict our scope to simultaneous mask and signal,
of any duration. A flanker that is delayed or prolonged relative to the signal
can produce “metacontrast” or “object substitution”
masking (Breitmeyer, 1984; Enns & Di
Lollo, 1997, 2000; Tata, 2002; Enns, 2004). These phenomena seem to be closely related
to motion perception (Didner & Sperling, 1980; Reeves, 1982; Burr, 1984;
Bischof & Di Lollo, 1995), and may be
related to what we will call “temporal crowding.” They are not
directly relevant to understanding (spatial) crowding, and will not be discussed
here (see Huckauf & Heller, 2004).
Our final conclusions rest on objective measurements:
thresholds for detection and identification. However, the subjective crowding
experience, all by itself, makes a strong case for a key point. Examine the two
blocks of letters in this demo while fixating on the central cross:
What you
see on the left is a block of four
A ’s. What you see on the right is
much harder to describe. It’s a block of four letter-like objects. But
they aren’t clearly A ’s or
B ’s; they’re in-between and
unstable. Each letter may seem at times to be an A and sometimes a B, but most
of the time it has a confusing hybrid
A -B appearance that would be impossible
to draw. We usually assume that visual object recognition segments the scene and
accounts for each segment by hypothesizing an “object” with
appropriate properties. One supposes that all the object’s properties are
estimated from the same image segment. Surprisingly, this demo shows that a
single object’s several properties are estimates from various regions, large and small. Each letter is an object. The perceived presence and locations of the letters distinguish four objects, arranged in a square. To resolve four items these properties must each be assessed over a more-or-less one-letter region. Yet each item’s
shape has a hybrid A-B appearance, incorporating information from a region that
includes several letters. (Using your finger to cover other letters in the demo
above, you will find that to see one letter clearly you must cover the rest of
the letters in the block.) This seriously undermines the notion of object
recognition as a unitary process that takes in a region of the image and emits
an “object” with properties. Instead, our demo shows that, in this
case, the distinct properties of location (where) and shape (what) are estimates
from very differently sized regions. Perhaps, despite its unitary appearance, an
“object” is just a loose bundle of independently estimated
properties. [This differs from the Wolfe & Bennett ( 1997) suggestion that loose bundling results from inattention. Our demo of loose bundling occurs with full attention.] This demo, like the rest of this paper, reveals a dichotomy between properties (e.g., presence or location) that may be estimated from a single detected feature and those (e.g., letter identity or shape) that require integration of several features.
It is as if there is a pressure on both
sides of the word that tends to compress it. Then the stronger, i.e. the more
salient or dominant letters, are preserved and they ‘squash’ the
weaker, i.e. the less salient letters, between them. (Korte [ 1923], translated by Uta Wolfe)
It looks like one big mess. I keep seeing [the letter]
‘A’ even though there is no ‘A’ in the Sloan alphabet. I
seem to take features of one letter and mix them up with those of another.
(Observer JG)
When it’s difficult, I see a unit that is a
combination of letters and I can’t say how many there are. (Observer
MLL)
I know that there are three letters. But for some
reason, I can’t identify the middle one, which looks like it’s being
stretched and distorted by the outer flankers. (Observer MCP)
These are observers’ descriptions of how they see
a letter that is flanked by other letters in the periphery. This was first
described by Korte ( 1923), and was dubbed
crowding by Stuart & Burian (1962).
They and others showed that acuity is greatly impaired by crowding (Ehlers, 1936, 1953;
Woodworth, 1938; Flom, Weymouth, &
Kahneman, 1963; Bouma, 1970), which backs up the introspective
descriptions by objective measurement of impaired form recognition.
For identifying a letter among letters, the spatial
extent of crowding is roughly half the eccentricity (Bouma, 1970; Toet & Levi, 1992). For identifying a numeric character among
numeric characters, Strasburger et al. ( 1991) reported a similar proportionality
constant, 0.4, independent of character size (0.05 – 1.4 deg). Latham and
Whitaker ( 1996) report similar results for a
3-bar acuity target among four such distractors of random orientation. Tripathy
and Cavanagh ( 2002) report similar
results for identifying the orientation of a T among “squared
thetas.” Wilkinson et al. ( 1997), as
well, report a proportionality constant of 0.4 for fine discrimination of the
contrast or spatial frequency of a grating among gratings. Levi, Hariharan, and
Klein ( 2002, p. 175) report a
(center-to-center) proportionality constant of 0.5 for masking of an E by a bar,
both made up of grating patches.
This scaling with eccentricity, independent of size, is
utterly unlike ordinary masking, where critical spacing scales with signal size,
independent of eccentricity. As we’ll see, the most dramatic difference
— for us the defining difference
( Section 4.1) — between crowding and
ordinary masking is the complementary effects of signal size and eccentricity.
Many lateral masking studies have varied size and
eccentricity, but, unfortunately, typically not in a way that would distinguish
crowding from ordinary masking. Under the rationale that acuity scaling would
provide a more level playing field for comparing different eccentricities, most
studies that varied signal size or eccentricity, varied both together, roughly
in proportion (e.g., Andriessen & Bouma 1976; Loomis, 1978; Jacobs, 1979; Santee & Egeth, 1982b; Chung et al., 2001). Alas, proportional increase of the stimulus size and spacing with eccentricity would not be expected to affect either crowding or ordinary masking and thus does not distinguish the two kinds of effect. Chung et al. ( 2001) studied some of the
properties of crowding of letters by letters to compare crowding with ordinary
“pattern” masking of gratings by gratings. Filtering target and mask letters to one-octave bands, they identified the most effective mask frequency as a function of target frequency, and found that this agreed with the earlier literature on ordinary masking. At a large, near-critical spacing they found a shallow log-log slope (0.13 – 0.3) for the effect of mask contrast on threshold
contrast for identifying the target, which they noted is much shallower than the
slopes of 0.5 to 1 generally found in ordinary masking. Using ordinary,
unfiltered letters we further investigate the contrast response function here
( Figures 9– 11, below).
Levi, Klein, et al. ( 2002) and Levi, Hariharan, et al. ( 2002) used a tumbling E and a flanking bar that were both made up of grating patches. They separately varied eccentricity, grating frequency, and patch extent. In the fovea, the critical spacing was proportional to signal extent, consistent with ordinary masking. In the periphery, the critical spacing was proportional to eccentricity, consistent with crowding.
Another important difference between crowding and
ordinary masking is that ordinary masking blocks both detection and
identification — the signal disappears — whereas crowding affects
only identification — the signal remains visible, but is jumbled with the
mask. This dichotomy has not been spelled out in the earlier literature,
although Wilkinson et al. ( 1997) noted a
much weaker effect of crowding on detection than on identification: Their
signals were still detectable when they could no longer be identified.
Because the range of crowding is roughly half the
eccentricity, it extends only a few minutes of arc for foveal targets (Flom,
Weymouth, et al., 1963; Bouma, 1970; Loomis, 1978; Jacobs, 1979; Levi, Klein, & Aitsebaomo, 1985; Toet & Levi, 1992; Wilkinson et al., 1997; Leat, Li, & Epp, 1999; Hess, Dakin, & Kapoor, 2000; Chung et al., 2001). Liu and Arditi ( 2000) found that letter-string length is
underestimated when observers are asked to judge the number of acuity-sized
letters in the fovea. Their descriptions of this foveal effect are similar to
those by Korte ( 1923) and our observers of
crowding in the periphery, but with the greatly reduced range, less than 5
arcmin, that one would expect from its proportionality to eccentricity. Thus
crowding treats fovea and periphery alike, following one eccentricity rule
throughout.
Lateral masking studies with larger signals find no
effect of nonoverlapping flankers on foveal targets (Strasburger et al., 1991; Leat et al., 1999). Bondarko and Danilova ( 1996; 1997) showed that nonoverlapping bars
slightly decrease acuity for a Landolt
 signal in the fovea.
In foveal tasks that do show effects of laterally displaced masks, the spatial
extent of the lateral interference scales with the size of the signal: maximum
effect at a spacing of 5 times the gap width of a Landolt C (Flom, 1991) and 3 times the wavelength of a
grating (Polat & Sagi, 1993; Levi, 2000). Levi, Klein, et al. ( 2002) found that the critical spacing of a
tumbling E and flanking bars (all made up of grating patches) is proportional to
signal extent over a 50:1 range, independent of spatial frequency. This scaling
with signal size is characteristic of ordinary masking and unlike
crowding.
Because our experiments are done mostly with letters,
we postpone until Discussion the rest of
our review of crowding with other stimuli ( Section
4.2). What we have reviewed so far tells us to look at the effects of
spacing, eccentricity, contrast, and task. With those results in hand, we will
be ready to tackle illusory conjunctions ( Section
4.7).
We begin by replicating previous results on the spatial
extent of crowding as a function of viewing eccentricity. We then explore the
effects of varying signal and mask (size, contrast, complexity, and type: letter
and grating) and task (identification and detection). (See Table 1.) The effects of spacing, eccentricity,
size, contrast, and task distinguish crowding from ordinary masking. The other
manipulations help characterize the selectivity of crowding. The selectivity of
ordinary masking is that of the feature detector. Our results indicate that the
selectivity of crowding is that of the feature integrator.
|
|
Effect of
|
Task
|
Signal and flanker (font or grating)
|
Signal size (deg)
|
Flanker size (deg)
|
Ecc. (deg)
|
Observer
|
|
|
eccentricity
|
Identify
|
Sloan
|
1
|
1
|
0, 2, 4, 8, 12, 20, 24
|
MCP, SJR, SSA
|
|
|
fovea vs. periphery
|
Identify
|
Sloan
|
0.32
|
0.32
|
0, 4
|
MCP, AG, MLL
|
|
|
size
|
Identify
|
Sloan
|
0.32, 0.5, 1, 2
|
same as signal
|
4
|
MCP, AG, SSA
|
|
|
flanker size
|
Identify
|
Sloan
|
0.32
|
0.32, 0.64, 0.96, 1.6, 3.2
|
4
|
MCP, AG
|
|
|
font
|
Identify
|
2x3 Checkers, Sloan, Bookman,
Outline Sloan
|
0.25, 0.32, 0.50, 0.32
|
same as signal
|
4
|
MCP, MLL
|
|
|
# of flankers
|
Identify
|
Sloan
|
0.32
|
0.32
|
6
|
MS, MLM, MCP
|
|
|
flanker contrast
|
Identify
|
Sloan
|
0.32
|
0.32
|
4
|
MCP, AG, MLL
|
|
|
task
|
identify, detect
|
Sloan
|
0.32
|
0.32
|
4
|
MCP, AG, MLL
|
|
|
eccentricity
|
Detect
|
Sloan
|
0.75
|
0.75
|
2, 4, 8
|
MLM
|
|
|
size
|
Detect
|
Sloan
|
0.75, 1.5, 3.0
|
same as signal
|
8
|
MCP, MLL
|
|
|
extent
|
Identify
|
1 c/deg grating
|
2, 4, 8
|
same as signal
|
20
|
MCP, AG
|
|
|
letter vs. grating
|
identify, detect
|
Sloan, 8 c/deg grating
|
0.32,0.52
|
0.32,0.52
|
4
|
MCP, AG, MLL
|
Table 1. The experiments. For gratings,
“size” is the 1/e radius of the Gaussian envelope, and the observer
“identified” the ±45° orientation. Regarding Figure 8, observer MCP was tested at 4 instead of
6 deg eccentricity.
The experiments were exploratory, trying to
characterize the phenomenon, especially as a window into the mysterious feature
integration process. The results indicate that the observer’s
identification response is based on an amalgam of all the features detected in a
large region we call the “integration field,” which is approximately
centered on the signal (Toet & Levi, 1992).
Most relevant to this conclusion are the effect of task and the combined effects
of mask contrast and spacing.
Seven observers with normal or corrected-to-normal
acuity performed these experiments binocularly (see Table 1). One observer (MCP) is an author. The
other observers were paid for participating.
All experiments were performed on Apple Power Macintosh
computers using MATLAB software with the
Psychophysics Toolbox extensions
(Brainard, 1997; Pelli, 1997). The background luminance was set to the
middle of the monitor range, about 18 cd/m 2. Sloan letters were based
on Louise Sloan’s design specified by the NAS-NRC ( 1980). (The Sloan font is available from http://psych.nyu.edu/pelli/software.html). Sloan letters were usually 0.32 deg high and wide. Sinewave gratings were 1 or 8 c/deg with a circularly symmetric Gaussian envelope with a 1/e radius that we specify as “size.”
Observers viewed a gamma-corrected grayscale monitor
(Pelli & Zhang, 1991). The fixation
point was a 0.15 deg black square. The position of the fixation point on the
screen determined the eccentricity of the signal (always presented at the center
of the display). For peripheral viewing conditions, the fixation point was
displayed for the entire trial. For foveal viewing, the fixation point was
presented for 200 ms, followed by a 200 ms blank and then the signal. The
signal, flanked by two horizontally aligned high-contrast masks of either
letters or gratings, appeared at the center of the screen for 200 ms ( Figure 1). Signal eccentricity was controlled by
varying the position of the fixation point on the screen. Thus the signal was
presented at various eccentricities along the horizontal meridian in the right
visual field. Letter contrast is defined as the ratio of luminance increment to
background. Letter contrast can be greater than 1. Flanker contrast was usually
0.85. Each signal presentation was accompanied by a beep. Mask-to-signal spacing
is measured center to center. Usually the signal and each flanking letter were
independent random samples from the same alphabet. A response screen followed,
showing all the possible signals (usually the 10 letters
 of the Sloan
alphabet) at 80% contrast. Observers identified the signal by using a
mouse-controlled cursor to point and click on their answer. Correct
identification was rewarded with a
beep.
Figure 1. Typical condition for crowding. The black square is a 0.15 deg fixation mark. The signal is a faint 0.32 deg Sloan letter at 4 deg in the right visual field. Two 85%-contrast masks (S, Z) flank a signal letter (R) with a signal-to-mask center-to-center spacing of 0.64 deg. Letter contrast is defined as the ratio of luminance increment to background. Letter contrast can be greater than 1. The signal contrast changes from trial to trial.
The signal duration (200 ms) is too brief for eye
movements in response to the signal to help see it. We occasionally watched the
observer’s eyes while the observer was doing the task to detect
anticipatory eye movements, but we never saw any. The results presented in this
paper (e.g. Figure 3a) reveal a
more-than-tenfold threshold elevation and a steep dependence on spacing.
Anticipatory eye movements would reduce the signal eccentricity by an amount
that would vary between trials and among observers. The steep dependence of
threshold on spacing (e.g., Figure 3a) and the
consistent critical spacing among observers (e.g., Figure 3b) indicate that anticipatory eye
movements were not a problem.
Threshold contrast was measured by a modified QUEST
staircase procedure (Watson & Pelli, 1983; King-Smith, Grigsby, Vingrys, Benes,
& Supowit, 1994) using an 82% criterion
and β of 3.5 for 40-trial runs.
Log thresholds were averaged over two runs for each condition.
In the detection task, the signal letter was randomly
presented in one of two consecutive intervals. The flankers were displayed in
both intervals, independently randomly selected for each interval. Observers
indicated their choice of interval by clicking the mouse once for first and
twice for second. Correct responses were rewarded with a
beep.
Strasburger et al. ( 1991) suggested that threshold contrast
for target identification is a good way to measure the effect of crowding, and
we agree. Most of our data are threshold contrast plotted against spacing, and
have a generally sigmoidal shape. We fit a clipped line to the data by eye. This
fit has three parts: a horizontal ceiling, a falling slope, and a horizontal
floor ( Figure 2).
Threshold elevation (a ratio) is
measured from floor to ceiling. Critical
spacing is the least spacing at which there is no threshold elevation in
the fit (i.e., edge of the
floor).
Figure 2. Clipped line fit: threshold
contrast as a function of center-to-center spacing of signal and flanker. (At
zero spacing the signal and flanker are superimposed, added on top of one
another.) We fit a clipped line to each data set by eye. Two parameters of that
fit are of interest. Threshold
elevation is the ratio of thresholds at zero and infinite flanker spacing
(i.e., ceiling:floor ratio). Critical
spacing is the least spacing at which there is no threshold elevation
(i.e., edge of the floor).
Figure 3. Effect of eccentricity. Each symbol is the geometric average of two threshold estimates. (a). Threshold as a function of flanker spacing for a 1 deg Sloan letter between flankers at various eccentricities. The solid lines are clipped-line fits, as explained in Figure 2. The horizontal line at the bottom left, below the graph, represents the width of the signal in deg. Contrast is the ratio of increment to background, and can exceed 1. (b). Critical spacing plotted against viewing eccentricity for three observers. Like Bouma ( 1970), we find that critical spacing is roughly half of viewing eccentricity. Observers MCP, SJR, and SSA.
Figures 3– 16 present our results. To help the reader make
sense of it all, Table 2 presents the nine
empirical differences between crowding and ordinary masking. We recommend
focusing on the sheer strangeness of crowding. Our intuitions, based on
familiarity with ordinary masking, were defied at every turn. The single most
important result is Bouma’s ( 1970),
greatly extended here, that critical spacing is roughly half the eccentricity
(distance from fixation), independent of everything
else.
|
|
Ordinary masking
|
Crowding
|
Figures
|
|
a
|
Similar in fovea
and periphery.
|
Normally evident only in the periphery (Korte, 1923; Stuart & Burian, 1962; Flom, Weymouth, et al., 1963; Bouma, 1970).
|
|
|
b
|
Signal disappears,
suppressed by mask.
|
Signal is visible but ambiguous, incorporating features from mask (Korte, 1923; Flom, Weymouth, et al., 1963; Andriessen & Bouma, 1976; Wolford & Shum, 1980; Wilkinson et al., 1997; Parkes, Lund, Angelucci, Solomon, & Morgan, 2001).
|
|
|
c
|
Occurs for any
task and signal.
|
So far, specific to identification of letters (Flom, 1991), orientation of tumbling E (Levi, Hariharan, et al., 2002), and fine discrimination of contrast, spatial frequency, and orientation (Andriessen & Bouma, 1976; Wilkinson et al., 1997; Parkes et al., 2001).
|
|
|
d
|
Similar effect on
identification and
detection.
|
Little or no effect on detection (Wilkinson et al., 1997) and coarse discrimination.
|
|
|
e
|
Narrow critical spacing,
little or no effect of nonoverlapping
mask.
|
Wide critical spacing can be more than 10 times bigger than a small signal (Korte, 1923; Stuart &
Burian, 1962; Bouma, 1970; Toet & Levi, 1992; Levi, Hariharan, et al., 2002).
|
|
|
f
|
Critical spacing scales with signal, independent of
eccentricity.
|
Critical spacing is roughly half of viewing eccentricity (Bouma, 1970; Toet & Levi, 1992), independent of signal size (Strasburger et al., 1991; Levi, Hariharan, et al., 2002),
mask size, mask contrast, and number of masks.
|
|
|
g
|
Spatiotemporal
selectivity more or less consistent
with a receptive
field.
|
Remarkably unselective, showing equal effect over a wide range of flanker type (letter, black disk, or square; Eriksen & Hoffman, 1973; Loomis, 1978), flanker size (10:1), and flanker number (≥2).
|
|
|
h
|
Shallow power-law
contrast response (log-log slope of
0.5 to 1).
|
Steep sigmoidal contrast response. Log-log slope of 2 for close spacing. Log ceiling and log slope fall exponentially with spacing.
|
|
|
i
|
Threshold mask
contrast depends
on spacing. No
saturation.
|
Threshold and saturation mask contrasts are independent of spacing.
|
|
Table 2. Facts: summary of the differences between
crowding and ordinary masking. We cite the authors of the known facts about
crowding, many replicated here, and italicize
our new findings. We take line f as the defining difference: critical spacing scales with eccentricity, not size. (a). The extremely-short-range foveal effect described by Liu and Arditi ( 2000) is likely to be crowding. (c). Andriessen and Bouma ( 1976) show a large crowding-like
effect of flanking bars on fine discrimination of bar orientation, and a small
effect on detection threshold, too small to account for the effect on
orientation discrimination. Illusory conjunction provides evidence for crowding
of conjunction of color vs. shape (Treisman & Schmidt, 1982). (d). The critical spacing for detecting a letter among letters can be as large as that for identification, but we call it ordinary masking, not crowding, because it scales with letter size ( Figure 14b), not eccentricity ( Figure 13b). Despite refutals of the feature vs.
conjunction dichotomy in the search literature, we still expect a robust feature
vs. conjunction dichotomy in crowding. 3 (e). Figures
12, 16a, and 16b are examples of weak effects of nonoverlapping masks in ordinary masking. (f). At 0 deg eccentricity, Levi, Klein, et al. ( 2002)
found that critical spacing is proportional to signal size over a 50:1 range.
“Roughly half” can be as low as 0.3, as in Figure 5b. Fine ( 2003) also reported crowding to be independent of contrast. (g). We say “more or less
consistent” because current feature detector models to explain ordinary
masking have not just one but several similar receptive fields (to implement
divisive inhibition) as noted earlier. 1 The
spatiotemporal selectivity found by Chung et al. ( 2001) with filtered letters is like that for
ordinary masking, unlike our summary for crowding, but it is not certain whether
their paradigm elicited crowding or ordinary masking (see Section 4.6). Many studies have documented systematic effects
of the similarity of target and flanker (e.g., Estes, 1982; Ivry & Prinzmetal, 1991; Nazir, 1992;
Kooi et al., 1994; Donk, 1999; Chung et al., 2001).
There is a minor caveat to Bouma’s rule, but it
does not affect the basic intuition. The caveat is that critical spacing is
asymmetric, greater in the peripheral than in the central direction from the
target (Bouma 1970, 1973; Townsend, Taylor, & Brown, 1971; Banks, Larson, & Prinzmetal, 1979; Chastain & Lawson, 1979; Wolford & Shum, 1980; Toet & Levi, 1992). It is greater in radial directions
(peripheral and central) than in circumferential directions (Chambers &
Wolford, 1983; Toet & Levi, 1992). It is greater in the upper than in the lower
visual field (Intriligator & Cavanagh, 2001). These details matter when comparing results across conditions, but they reinforce the basic intuition that the extent of crowding depends almost exclusively on the local anatomy of the visual field, independent of the signal, unlike ordinary masking, which is co-extensive with the signal, independent of location in the visual field. 3.1 Effects of spacing and eccentricity
One of the stranger aspects of crowding is
Bouma’s ( 1970) finding that the critical
spacing is proportional to eccentricity, which we replicate here. We measured
the threshold contrast for identifying a 1 deg letter as a function of
signal-to-mask spacing. The signal was at 0 to 24 deg eccentricity in the right
visual field (see Table 1). There were two
flankers, one to the left and one to the right of the signal. Figure 3a shows that the letter masks have a very
strong effect, raising threshold tenfold. For each eccentricity, the
clipped-line fit provides an estimate of the critical spacing. Figure 3b shows that critical spacing is
proportional to eccentricity. Our data confirm the finding (Bouma, 1970; Strasburger et al., 1991; Toet & Levi, 1992; Levi, Hariharan, et al., 2002, p. 173) that the critical spacing is
roughly half of the viewing eccentricity. (Bouma, 1970, was right to say “roughly” 0.5.
For some of our data, this value drops as low as 0.3, as we will see below.)
Andriessen and Bouma ( 1976) report a
critical spacing of 0.4 of eccentricity for fine discrimination of line
orientation. Wilkinson et al. ( 1997) report
a critical spacing of 0.4 of eccentricity for fine discrimination of crowded
grating contrast and spatial frequency, and slightly higher for fine
discrimination of orientation.
Figure 4 shows
threshold for observer AG in the presence of one mask, as a function of
horizontal mask offset, for a 0.32 deg signal. The width of the critical region
is the sum of the critical spacings, left and right. Separate curves show
results at 0 and 4 deg eccentricity. In the fovea, the critical region (i.e.,
the sum of critical spacings left and right) is about as wide (0.40 deg) as the
signal (0.32 deg). In the periphery, the critical spacings are 1.00 deg to the
left and 1.25 deg to the right, for a total critical region width (2.25 deg)
about 7 times the 0.32 deg width of the
signal. This replicates the asymmetry of previous findings that, for a given
signal location, crowding extends farther in the peripheral direction than in
the central direction (Bouma 1970, 1973; Townsend et al., 1971; Banks et al., 1979; Chastain & Lawson, 1979; Wolford & Shum, 1980; Toet & Levi, 1992).
Figure 4. Fovea vs. periphery. Threshold
for identifying a 0.32 deg Sloan letter (at 0 or 4 deg in right visual field) in
the presence of a single flanker of the same size, as a function of spacing
(i.e., flanker position). (Positive values are positions to the right of the
signal. Negative values are to the left.) At 4 deg eccentricity, the critical
region is 7 times wider than the letter (indicated by the horizontal bar).
Observer AG. Not shown: similar results for observers MCP and MLL. Similar
results at 6 deg eccentricity appear in Figure
8.
It may seem surprising that a more peripheral mask is
more effective than a more central mask equally distant from the target.
However, critical spacing is proportional to eccentricity, suggesting that the
relevant cortical representation of visual space is progressively more
compressed at greater eccentricity. Thus the more-eccentric mask is effectively
closer than the less-eccentric mask (i.e., at a smaller fraction of the
ever-increasing critical spacing).
Section 3.5 will show
that a single flanker is much less effective than flankers on both sides (Bouma,
1970).
Ordinary masking would lead one to expect that the
critical spacing in crowding would be proportional to signal size, not
eccentricity. What is the effect of size on critical spacing? Levi, Klein, et
al. ( 2002) and Levi, Hariharan, et al. ( 2002) found that, at 0 deg eccentricity,
the critical spacing is proportional to size over a 50:1 range, but that, in the
periphery, critical spacing is proportional to eccentricity, independent of
size. We measured threshold contrast for letters of various sizes at 4 deg
viewing eccentricity. Figure 5a shows threshold
contrast as a function of spacing for letter sizes of 0.32, 0.5, 1, and 2 deg.
For these sizes, threshold is elevated 26-fold (geometric mean). Figure 5b shows that the critical spacing did not
change with letter size, instead remaining constant at about 1.2 deg, which
replicates the Strasburger et al. ( 1991) finding, for numerals, that the
spatial extent of crowding is 1.2 deg at 4 deg eccentricity, independent of
size. Threshold elevation increases as a function of size ( Figure 5c) because, as Figure 5a shows, the ceiling remains fixed at about 0.7 while the floor drops with size. This is just the familiar fact that contrast sensitivity for letters depends on size (see Pelli & Farell, 1999).
Figure 5. Effect of size. Identification of 0.32 - 2 deg Sloan letter (at 4 deg in the right visual field) between 2 flankers of the same size. (a). Threshold as a function of spacing. For observer AG, the threshold contrasts for all four sizes, 0.32, 0.5, 1, and 2 deg, nearly superimpose at spacings up to 1 deg. (b). Critical spacing vs. size, for three observers, showing no effect. Average (horizontal line) is 1.2 deg. (c). Threshold elevation increases somewhat with size: log-log slope of 0.6. This replicates the familiar finding that threshold contrast depends on size (Pelli & Farell, 1999).
3.3 Effect
of flanker size
We also measured the effect of mask size on critical
spacing. We kept signal size at 0.32 deg and varied mask size from 0.32 to 3.2
deg. We didn’t know what to expect. On the one hand, increasing the
mask’s size increases its contrast energy, which we thought might increase
the mask’s effect. (For a letter, contrast energy is the product of area
and squared contrast.) On the other hand, enlarging the mask makes it less
similar to the signal, which might lessen its effect. Surprisingly, Figure 6 shows that the threshold curves nearly
superimpose, hardly affected by mask size, retaining a critical spacing of about
1.3 deg. ( Figure 6a is for one observer; Figure 6b is for another.) Unlike ordinary
masking, the crowding effect is not tuned to size. The range (spatial extent) of
crowding is independent of signal size ( Figure
5b) and mask size ( Figure 6), depending
solely on eccentricity ( Figure
3b).
Figure 6. Effect of flanker size. Signal is 0.32 deg Sloan letter at 4 deg in the right visual field. The two flankers were 1 to 10 times the size of the signal. One fit was made to data for all mask sizes. (a). Critical spacing is 1.3 deg for MCP. (b). Critical spacing is 1.2 deg for AG. Horizontal line at the bottom left of the graph represents the width of the signal.
We wondered whether perimetric complexity (perimeter,
squared, over ink area; Pelli et al., in
press) or some other aspect of letter shape is important for crowding. Figure 7a shows threshold as a function of spacing for several fonts, including a meaningless alphabet of twenty-six 2x3 checkers (e.g., d) and another
alphabet consisting of just two letters,
 and
 , from the Sloan
alphabet. These curves are quite similar to each other, differing from one
another by large, but unimportant, vertical translations and small horizontal
translations. The vertical shifts track the different threshold contrasts for
different fonts, which is not of interest here. The small horizontal shifts are
small differences in critical spacing, which ranged from 1 to 1.3. Pelli et al.
( in press) showed that efficiency for
letter identification is inversely proportional to perimetric complexity of the
font, but complexity seems to be irrelevant to crowding. Figures 7b and 7c
plot critical spacing and threshold elevation as a function of complexity,
showing no systematic effect of complexity.
Figure 7.
Effect of font. Signal is 0.32 deg letter at 4 deg in the right visual field. (a).
Threshold contrast as a function of spacing for various fonts, Bookman
 , 2 x 3
Checkers
 , Sloan
 , NZ Sloan
 , and Outline Sloan
 . (b). Critical spacing is independent
of complexity. (c). Threshold elevation as a function of perimetric
complexity, perimeter squared divided by area, showing that threshold elevation
does not seem to be systematically related to complexity. Observer MLL. Not
shown: similar results for observer MCP.
3.5 Effect
of number of flankers
Would adding more flankers increase the crowding
effect? Figure 8a plots threshold for letter
identification in the periphery with 1, 2, and 4 flankers. The signal and
flankers are all Sloan letters, right-side up. Figure 8b shows that critical spacing is
independent of number of flankers. It is about 0.4 of the eccentricity. Figure 8c shows that threshold elevation increased
when flankers were increased from 1 to 2, but threshold was not further elevated
when flankers were increased from 2 to 4. Consistent with this, Wilkinson et al.
( 1997) reported that reducing the number of
flanking gratings from 14 down to 2 did not significantly reduce their effect on
the discriminability of the signal. Toet and Levi ( 1992) report extensive measurements of the effect
of two T flankers on judging orientation of a T target, adding that, in pilot
measurements, they found no effect of a single flanker. However, Strasburger et
al. ( 1991) did report an increased threshold elevation when they increased the number of flankers from 2 to 4.
Figure 8. Effect of number of flankers. Signal is 0.32 deg Sloan letter at 6 deg in the right visual field. Flankers are letters, too, also right-side up, but displaced vertically or horizontally. (a). Threshold contrast as a function of spacing, for 1, 2, or 4 flankers. Flanker position (e.g., “right”) is relative to signal position. The
horizontal line at the bottom left of the graph represents the width of the
signal. Note that, lacking data at zero spacing (because the flankers would have
collided), it is not clear whether there is a ceiling at small spacing, so that
part of the clipped-line fit is somewhat arbitrary. The one-flanker data shown
here, for a signal at 6 deg ecc., are similar to the data shown in Figure 4, for a signal at 4 deg ecc. We replicate the well-established finding that the critical spacing is greater in the peripheral than in the central direction. (b). Critical spacing (estimated separately for each condition, but averaging results for 1-left and 1-right) as a function of number of flankers. (c). Threshold elevation (estimated separately for each condition) as a function of number of flankers. This last graph is tentative because it depends on the somewhat arbitrary ceilings of the clipped-line fits in panel a. Observers MS and MLM. Not shown: similar results for observer MCP at 4 deg eccentricity.
It makes sense that a single flanker would be much less effective than multiple flankers that surround the object. One imagines
that when there is only one flanker the observer may use a large but offset
integration field to pick off the exposed target. This strategy is not available
when there are two or more flankers surrounding the target.
For a signal 6 deg to the right of fixation, we find a
smaller critical spacing for flankers above and below,
 , instead of
left and right of the signal,  , which is consistent with Toet and Levi’s
( 1992) finding that the critical spacing is
smaller along the circumference than along a radial ray from the
fovea. 3.6 Effect of flanker contrast
The experiments presented above used flankers of a high
contrast, 0.85. Figure 9a shows threshold
signal contrast as a function of spacing for several mask contrasts. Figure 9b shows that critical spacing is
independent of mask contrast. There is an outlier, the X representing a critical
spacing of 0.5 deg at a mask contrast of 0.1 for observer MLL. This is based on
the fit shown in panel a to the 0.1 mask contrast data (solid diamonds). Note
that threshold is elevated only when the mask overlaps the
signal.
Figure 9. Effect of flanker contrast for three observers identifying a 0.32 deg Sloan letter at 4 deg in the right visual field. (a). Threshold contrast as a function of spacing for several flanker contrasts. (b). Critical spacing as a function of flanker contrast. Mask contrasts below 0.1 did not elevate threshold so they have no critical spacing. Observers MCP, AG, and MLL.
Thus the anomalous point in 9b seems to
represent ordinary masking, not crowding. The rest of the data show no
consistent effect of mask contrast on critical spacing: For one observer,
critical spacing rises slightly with mask contrast, but it falls slightly for
the other two observers. Fine ( 2003), too,
reported crowding to be independent of contrast. So far, we have seen that
critical spacing is independent of signal size, mask size, mask contrast, signal
and mask font, and number of masks. Figure 10 demonstrates the effect of mask
contrast, showing an abrupt transition as mask contrast is increased. Once the
mask becomes visible it soon saturates, producing its full effect on the
signal.
Figure 10. Effect of flanker contrast.
Starting at the top, in each row, fixate the black square, and try to identify
the middle letter on the right. As you read down the chart, the contrast of the
center letter is always 0.50, while the contrast of the two outer letters
increases (0, 0.10, 0.15, 0.25, and 0.50). You’ll find that the central
letter becomes much harder to identify as soon as the flankers are at all
visible.
Based solely on this demo, one might wonder whether the
crowding is determined by similarity. The flankers become more similar to the
signal as their contrast approaches that of the signal. However, Chung et al.
( 2001) manipulated signal and flanker contrast
to test this hypothesis, finding that, at least in their conditions, more mask
contrast always increased masking, even when this made the masks less similar to
the signal.
In another view of the same data, Figure 11a shows threshold contrast as a function
of mask contrast for several spacings. For a 0.32 deg letter, the contrast
response curves show that threshold elevation increases abruptly with mask
contrast, going from none to full effect as the mask goes from 0.1, the
threshold contrast for identifying an isolated letter, to about 3 times that,
saturating at higher contrast. There are two critical mask contrasts. In our
clipped-line fit, mask threshold is the
mask contrast at which threshold contrast of the signal begins to increase (edge
of floor). And mask saturation is the
mask contrast at which threshold contrast of the signal stops increasing (edge
of ceiling).
Figure 11. Effect of flanker contrast for
three observers identifying a 0.32 deg Sloan letter at 4 deg in the right visual
field. Same data as Figure 9. (a). Threshold contrast for identifying the target letter as a function of mask contrast for observer MLL. Clipped lines (shown) are fit to the (roughly sigmoidal) data, constrained to have equal threshold and saturation contrasts of the mask for all conditions. Threshold contrast rises at 0.1 mask contrast and saturates at 0.25 mask contrast for all spacings. Clipped lines (not shown) were also fit independently to the data for each condition for each observer, and the parameters of these fits are plotted in panel d. (b). Psychometric function. Proportion correct identification of a letter as a function of contrast. The knees (critical contrasts) of this psychometric function roughly match those of the contrast response function in panel a. This is a maximum likelihood fit of a Weibull function to the measured proportion correct (not shown) at several contrasts (see Pelli et al., in press; Strasburger, 2001). The lower asymptote is 1/10 because that is the chance of correctly guessing the identity of one of 10 letters. (c). The threshold elevation (left scale of panel c) and log-log slope (right scale) of the fits in panel a (and similar data for observers MCP and AG) are high at small spacings and fall exponentially with increased spacing. (d). Threshold and saturation contrasts of the mask as a function of spacing. Mask threshold is the first knee, where the signal threshold rises. Mask saturation is the second knee, where the signal threshold saturates. Each pair of points (solid and open) is based on an independent clipped-line fit (not shown) to the data for one condition and observer. The threshold contrast for identifying the mask may be estimated from that for the signal (0.1) at low (0.01) mask contrast (panel a). Observers MCP, AG, and MLL.
This contrast-response curve is quite unlike what is
usually seen in ordinary masking. Here the function rises steeply and hits a
hard ceiling, with no further increase over a wide range of high mask contrasts
(0.25 – 1). In ordinary masking, the function rises with a log-log slope
of 0.5 to 1 and continues to increase relentlessly. The log-log slope of the
(clipped line) contrast-response function for crowding is 2 at the closest
spacing and falls exponentially with spacing ( Figure 11c, right hand scale). The function found
here is more reminiscent of the sigmoidal form of a frequency-of-seeing curve,
rising suddenly from floor to ceiling over a narrow range of contrast. For
comparison, Figure 11b shows the
observer’s proportion of correct identifications for an unflanked signal
at this eccentricity as a function of signal contrast.
Chung et al. ( 2001)
measured the contrast-response function for a bandpass-filtered letter among
similar letters at a single separation (2.2 deg) at an eccentricity of 5 deg,
obtaining shallow log-log slopes (0.3 and 0.1) that are consistent with the less
than 0.4 slope found here at our maximum separation (1.5 deg at an eccentricity
of 4 deg). Testing at such large (near-critical) separations (about 0.4 of
eccentricity), the threshold elevation and slope are nearly
gone. The series of functions plotted in Figure 11a reveal something quite remarkable. It
is hardly surprising that the threshold elevation (on the vertical scale) is
reduced at greater spacings, as shown in Figure
11c. But we were surprised to find that the critical mask contrasts (0.1 and
0.25 on the horizontal scale) are unaffected by the spacing. In Figure 11a every curve (one for each spacing)
turns up at a mask contrast of 0.1 and saturates when the mask contrast reaches
0.25, no matter how far away the signal is. Figure
11d shows explicitly that the critical mask contrasts are independent of
spacing. We will come back to this in Discussion.
3.7 Effect of task: identification and detection of letters and gratings
Most crowding studies have used identification tasks,
whereas most masking studies have used detection tasks. To determine whether
crowding depends on task, Figure 12 shows
identification and detection thresholds for a letter among letters as a function
of flanker spacing for two observers. ( Figure
16a shows similar results for a third observer.) For identification,
averaging across the three observers, the threshold elevation is large
(ten-fold) and extends out to 1.3 deg (four signal widths). For detection, the
threshold elevation is only three-fold but extends about as far (average is 1.5
deg).
Figure 12. Effect of task.
Identification or detection of a letter between letter flankers. Signal is 0.32
deg Sloan letter at 4 deg in the right visual field. Threshold curves for
identifying a letter are sigmoidal with an average threshold elevation of about
1 log unit and a critical spacing of 1.2 deg. (a). Observer MCP. (b). Observer
MLL. There are some observer differences, but detection threshold is always
lower, with less threshold elevation. The average critical spacing for detection
is 1.5 deg. Horizontal line at the bottom left of the graph represents the width
of the signal. Figure 16a shows similar results
for a third observer.
To distinguish crowding from masking, we assessed the
effect of eccentricity and size on critical
spacing.
We measured the effect of eccentricity (2, 4, and 8 deg
in right visual field) on detection thresholds for 0.75 deg Sloan letters. Figure 13a plots threshold as a function of
spacing for each eccentricity. Figure 13b shows
that the critical spacing for detection is independent of eccentricity, unlike
the proportionality found for identification ( Figure
3b).
Figure 13. Effect of eccentricity on detection. Detection of a letter among letters at several eccentricities in the right visual field. Signal and flankers are 0.75 deg Sloan. (a). Threshold as a function of spacing. (b). Critical spacing as a function of eccentricity. The critical spacing for letter detection is independent of eccentricity. This is characteristic of ordinary masking, whereas in crowding the critical spacing is proportional to eccentricity, as in Figure 3b.
Observer MLM.
We measured detection thresholds for Sloan letters of
three sizes, 0.75, 1.5, and 3 deg, at 8 deg in the right visual field. The
results in Figure 14b show that the critical spacing for letter detection is proportional to size; critical spacing for letter identification is independent of size ( Figure
5b).
Figure 14. Effect of size on detection. Detection of a letter among letters at several sizes. Signal and flankers have equal size. Signal is Sloan letter at 8 deg in the right visual field. (a). Threshold as a function of spacing. (b). Critical spacing as a function of size. The critical spacing for letter detection is proportional to letter size. This is characteristic of ordinary masking, whereas in crowding the critical spacing is independent of size, as in Figure 5b.
Observer MCP. Not shown: similar results for observer MLL.
To us, this is the most telling difference: In ordinary
masking (e.g. letter detection), the critical spacing is proportional to signal
size ( Figure 14b), independent of eccentricity
( 13b), whereas in crowding (e.g., letter
identification) the critical spacing is proportional to eccentricity ( 3b), independent of size ( 5b).
We also changed the envelope size of 1 c/deg gratings
in a ±45° orientation discrimination task at 20 deg viewing
eccentricity ( Figure 15a). We saw earlier ( Figure 5b) that changing the size of letters did
not affect critical spacing. However, for gratings, the critical spacing scales
with the size of the envelope ( Figure 15b).
There is no mystery here: The gratings mask each other only when they overlap;
at their critical spacing they are
abutting.
Figure 15. Effect of grating extent.
Size is the 1/e radius of the Gaussian envelope. Grating (at 20 deg in the right visual field) flanked by two gratings. (a). Threshold contrast for identification of ±45° orientation of 1 c/deg grating between two flanking gratings as a function of spacing, for several envelope sizes. Signal and flanking gratings had same spatial frequency and same size envelope. (b). Critical spacing as a function of envelope size. Observer MCP. Not shown: similar results for observer AG.
3.8 Effect of letter vs. grating
Here we tried letters (0.32 deg Sloan) and gratings
(8 c/deg) in every combination of target and flanker. Majaj et al. ( 2002) show that identification of letters is
mediated by a channel with a center frequency determined by the stroke frequency
of the letter. For a 0.32 deg Sloan letter the stroke frequency is 1.6/0.32 = 5
c/deg, and, by their formula, the channel frequency is 6.3 c/deg, which is very
close to the 8 c/deg spatial frequency of the grating we used. Thus the
identification of letter and grating in this experiment was mediated by channels
tuned to similar spatial
frequencies.
We measured thresholds for detection and identification
of 8 c/deg sinewave gratings. Signal and flanker gratings were each randomly
tilted ±45° on each trial. In the detection task the observer was
required to choose which of two intervals contained the signal grating (ignoring
orientation). In the identification task there was only one interval and the
observer was asked, on the response screen, to identify its +45° or
-45° orientation.
Figures 16c and 16d show that neither grating nor letter flankers
raised the grating signal’s threshold unless they overlapped it. (Letter
size is 0.32 deg; grating size is 0.52 deg; see Table 1.) Grating threshold elevation at all
spacings is similar for both tasks (detection and identification) and flanker
types (letter and grating). Compared with identifying a letter among letters,
the grating curves show no ceiling and have a small critical spacing (about one
signal width). The grating’s narrow critical spacing — threshold is
elevated only when the flanker overlaps the grating — suggests ordinary
masking, not crowding.
Figure 16. Effect of letter vs. grating. Identification or detection of a letter or grating flanked by letters or gratings. Signal at 4 deg in the right visual field. (a). Letter flanked by letters. Averaging across Figures 12ab and 16a, the critical spacing is about 5 letter widths (1.5 deg) for both identification and detection. (b). Letter flanked by gratings. Critical spacing is 2 times the width of the signal for identification, and 5 times the width of the signal for detection. (c). Grating flanked by letters. (d). Grating flanked by gratings. The results show that threshold is elevated only when the flankers overlap the signal. Sloan letters were 0.32 deg wide and sinewave gratings were 8 c/deg with a 0.52 deg Gaussian window (radius at 1/e). Horizontal line at lower left corner represents the width of the signal. There were always two flankers, to the right and left of the signal. Observer AG. Not shown: similar results for observers MCP and MLL.
When we originally got the grating results reported in
Figures 16c and 16d, we were led to think, wrongly as it turns
out, that gratings are immune to crowding. Our identification task was too
coarse. We had asked the observer to distinguish orientations 90° apart.
Ordinary masking studies have shown that we see gratings by means of feature
detectors that have an orientation bandwidth of ±15° to ±30°
(Phillips & Wilson, 1984). Thus
orthogonal gratings are detected by distinct feature detectors, and we would
expect the label of a single feature detector to suffice for identifying the
coarse orientation of the grating. Indeed, Thomas and Gille ( 1979) reported that two gratings differing in
orientation by 20° to 30° are identified just as accurately as they
are detected. And the thresholds for detection and identification in Figures 16c and 16d seem to be identical. This is the logic that
Watson and Robson ( 1981) applied to frequency
identification. When the two signals stimulated different detectors, observers
could identify at the threshold for detection. When the same feature detector
picks up both signals, then the observer cannot identify based on a single
feature detection and requires at least two detectors to be active. The ratio of
the detector responses would presumably be a good basis for fine discrimination.
(Treisman, 1991, makes the same point
for other stimulus dimensions.) Thus a parametric change in the task, from a
coarse (>2:1) to a fine (<2:1) frequency discrimination, results in a
qualitative change in the observer’s computational algorithm, from single-
to multi-feature detection and integration (also see Verghese & Nakayama, 1994, and Discussion, Section 4.3).
Our seven theoretical conclusions about the difference
between crowding and ordinary masking are listed in Table 3 and discussed in Sections 4.1 – 4.7. The discussion of illusory conjunctions comes
last ( Section 4.7), but its only prerequisite
is the vocabulary established in the Introduction ( Section 1). We begin the discussion by proposing a
definition.
|
|
Ordinary masking
|
Crowding
|
|
Section
|
|
a
|
Critical spacing is proportional to size and independent of eccentricity.
|
Critical spacing is proportional to eccentricity (Bouma, 1970) and independent of size (Strasburger et al., 1991; Levi, Hariharan, et al., 2002).
|
f
|
|
|
b
|
Occurs for any task.
|
Specific to tasks that could not be performed based on a single detection by coarsely coded feature detectors.
|
b - d
|
|
|
c
|
Same feature detector mediates the effects of mask and signal.
|
Distinct feature detectors mediate the effects of mask and signal.
|
g - i
|
|
|
d
|
Eccentricity doesn’t matter.
|
In the periphery, the observer uses an inappropriately large integration field because smaller fields
are absent.
|
a - i
|
|
|
e
|
Impairs feature detection.
|
Impairs feature integration (Flom, Weymouth, et al., 1963; Wolford & Shum, 1980; He et al., 1996; Parkes et al., 2001; Chung et al., 2001; Levi, Hariharan, et al., 2002).
|
a - i
|
|
|
f
|
Selectivity is that of the feature detector.
|
Selectivity is that of
the feature integrator.
|
g
|
|
|
g
|
No signal feature is detected, so the signal is invisible.
|
Features of both signal and mask are detected and combined, so the signal is visible, but jumbled with the mask (Korte, 1923; Wolford & Shum, 1980; Parkes et al., 2001; Levi, Hariharan, et al., 2002).
|
b - d
|
|
Table 3. Theory: summary of the differences between
crowding and ordinary masking. We cite the authors of existing theories about
crowding, and italicize our new ideas. (b). Treisman ( 1991) makes a similar suggestion for illusory conjunctions. (c). This idea is implicit in the models that Wolford and Shum ( 1980), Treisman
and Schmidt ( 1982), Wilkinson et al. ( 1997), and Parkes et al. ( 2001) use to explain their results. (f). Current feature detector models have several receptive fields, to implement divisive inhibition, but the differences in selectivity of these various fields are too small to matter here. (g). Treisman and Schmidt ( 1982) make a similar suggestion for
illusory conjunction.
Using published and new results, we have established that
the original crowding phenomenon — impaired identification of a letter
among letters in the periphery — is unlike ordinary masking. We suggest
that the term “crowding” be applied to any phenomenon that exhibits
the critical-spacing dependence reported by Bouma ( 1970).
When defining a term already in use, the desire to
sharpen must be tempered by the need to respect established usage. Crowding was
discovered in the course of measuring letter acuity in patients with central
field loss (Korte, 1923) or amblyopia (Ehlers,
1936). 4
Stuart and Burian ( 1962) coined the term
“crowding” for the impairment of identification of a peripheral
letter by neighboring letters. Since then the term has been used primarily, but
not exclusively, to refer to lateral masking of letters by letters. Most
writings on crowding — and this manuscript is no exception |