Schor Lab
Home Page

VS 220 Course Reader

Perceived Visual Direction

Oculocentric Direction

Under binocular viewing conditions we perceive a single view of the world as though seen by a single cyclopean eye. Singleness results from a mapping of the two visual fields onto a common binocular space. The topography of this map will be described subsequently as the horopter, which is an analytical tool that provides a reference for quantifying retinal image disparity. Stimulation of binocularly corresponding points by targets on the horopter results in percepts by each eye in identical visual directions (i.e. directions in reference to the point of binocular fixation). This eye referenced description of direction (oculocentric) can be transformed to a head referenced description (egocentric direction) by including information about eye position as well as a reference point from which the two eyes can judge direction.

The Cyclopean Eye

If we only had one eye, direction could be judged from the nodal point of the eye, a site where viewing angle in space equals visual angle in the eye, assuming the nodal point is close to the radial center of the retina. However two eyes presents a problem for a system that operates as though it only has a single cyclopean eye. The two eyes have view points separated by approximately 6.5 cm. When the two eyes converge accurately on a near target placed along the midsagittal plane, the target appears straight ahead of the nose, even when one eye is occluded. In order for perceived egocentric direction to be the same when either eye views the near target monocularly, there needs to be a common reference point. This reference point is called the cyclopean locus or egocenter, and is located midway on the interocular axis. The location of the egocenter is found empirically by the site where perceptually aligned points at different depths in space are perceived to intersect the face. Thus the egocenter is the percept of a reference point for judging visual direction with either eye alone or under binocular viewing conditions. The validity of the egocenter is supported by sighting behavior in young children (< 2 years if age). When asked to sight targets through a tube, they place it between the eyes (Barbeito, 1983) .

Egocentric Direction

Direction and distance can be described in polar coordinates as the angle and magnitude of a vector originating at the egocenter. For targets imaged on corresponding points, this vector is determined by the location of the retinal image and the direction of gaze resulting from versional or conjugate eye position. The angle the two retinal images form with the visual axes is added to the conjugate rotational vector component of binocular eye position (the average of right and left eye position). This combination yields the perceived egocentric direction. Convergence of the eyes, which results from disconjugate eye movements, has no influence on perceived egocentric direction. Thus, when the two eyes fixate near objects to the left or right of the midline in asymmetric convergence, only the version or conjugate component of the two eyes positions contributes to perceived direction. These facets of egocentric direction were summarized by Hering (1879) as five laws of visual direction and they have been restated by Howard (1982) . The laws are mainly concerned with targets imaged on corresponding retinal regions (i.e. targets on the horopter).

Visual Directions of Disparate Images

How are visual directions judged for disparate targets (i.e. targets located nearer or farther than the horopter)? When target disparity is small and within Panum's fusional area, such that the target appears single, egocentric direction is based upon the average retinal image locus of the two eyes and it deviates from either monocular perceived direction of the disparate target by half the angular disparity. The consequence of averaging monocular visual directions of disparate targets is that binocular visual directions are mislocalized by half their retinal image disparity. Binocular visual directions can only be judged accurately for targets lying on the horopter. When retinal image disparity becomes large, disparate targets appear diplopic (i.e. they are perceived in two separate directions). The directions of monocular components of the diplopic pair are perceived as though each one was stimulated by a target on the horopter (i.e. the diplopic images are seen as though both had paired images on corresponding points in their respective contralateral eye).

Visual Direction of Partially Occluded Objects

There are ambiguous circumstances where a target in the peripheral region of a binocular field is only seen by one eye because of occlusion by the nose. The monocular target could lie at a range of viewing distances, however its direction is judged as though it was at the distance of the horopter such that if it were seen binocularly, its images would be formed on corresponding retinal points (Barbeito & Simpson, 1991) .

Violations of Hering's Laws of Visual Direction

The rules suggested by Hering for computing visual direction apply in many circumstances. However, several violations of Herings' rules for visual direction have been observed in both abnormal and normal binocular vision. In violation of Hering's rules, unilateral-constant strabismics can have constant diplopia, and they use the position of their preferred fixation eye to judge visual direction regardless of whether they fixate a target with their preferred or deviating eye. Alternating strabismics use the position of whichever eye is fixating to judge direction of objects (Mann, Hein, & Diamond, 1979) . Both classes of strabismus use the position of only one eye to judge direction whereas non-strabismics use the average position of the two eyes to judge direction by either eye alone.

Two violations in normal binocular vision involve monocular images and the other involves judgment of direction of binocular-disparate targets. Hering's rules predict that if a target is fixated monocularly, it will appear to move in the temporalward direction if the eyes accommodate, even if monocular fixation remains accurate. The temporalward movement results from the nasalward movement of the covered eye caused by the synkinesis between accommodation and convergence (Müller, 1843) . Hering predicts that the average position of the eyes determines the egocentric direction of the foveated target. The first violation occurs when the apparent temporalward motion is greater during monocular fixation by one eye than the other. This violation resembles the perception of egocentric direction in constant-unilateral strabismus and it may be related to an extreme form of eye dominance.

The second violation of Hering's rules occurs when monocular targets are viewed in close proximity to disparate binocular targets as might occur naturally in the periphery or in the vicinity of a proximal surface that occludes a portion of the background in the central visual field. The direction of the monocular target is judged as though it was positioned at the same depth as the disparate binocular target rather than at the horopter. The visual system might assume there is an occluded counterpart of the monocular line in the contralateral eye that has the same disparity as the nearby binocular target, even though the image is seen only by one eye. The behavioral observation that is consistent with this hypothesis is that alignment of a monocular and binocular line is based on information presented only to the eye seeing both targets (Erkelens & van de Grind, 1994; Erkelens & Van Ee, 1996a; Erkelens & Van Ee, 1996b) .

A third violation of Hering's rules is demonstrated by the biasing of the visual direction of a fused disparate target when its monocular image components have unequal contrast (Banks, van Ee, & Backus, 1997) . Greater weight is given to the retinal locus of the disparate pair of images that has the higher contrast. The average location in the cyclopean eye of the two disparate retinal sites is biased toward the monocular direction of the higher-contrast image. These are minor violations that mainly occur for targets lying nearer or farther from the plane of fixation or distance of convergence. Since visual direction of off-horopter targets are mislocalized even when Hering's rules of visual direction are obeyed, the violations have only minor consequences.

Binocular Correspondence

We perceive space with two eyes as though they were merged into a single cyclopean eye. This merger is made possible by a sensory linkage between the two eyes that is facilitated by the anatomical superposition of homologous regions of the two retinae in the visual cortex. This is achieved by partial decussation which is a characteristic of visual systems with overlapping visual fields. The Newton Mueller-Sudden law states that the degree of hemi-decussation is proportional to the amount of binocular overlap.

Why are the two retinal images matched at all? Primarily the matching allows us to reconstruct a 3-D world percept from a flat 2-D image. Three dimensional space can be derived geometrically by comparing the small differences between the two retinal images that result from the slightly different vantage points of the two eyes caused by their 6.5 cm separation. Each eye sees slightly more of the temporal than nasal visual field and it also sees more of the ipsilateral side of a binocularly viewed object. This yields stereopsis but comes at a price of a reduced visual field from 360 deg to 190 deg. The binocular overlapping region is 114 deg and the remaining monocular portion is 37 deg for each eye.

Binocular Disparity

Binocular disparity results from the projection of 3D objects onto two 2 dimensional retinal surfaces that face the objects from slightly different angles and view or vantage points. The regions of the visual cortex that receive input from each eye are sensitive to various perspective differences or disparities of the two retinal images. These disparities take the form of horizontal, vertical, torsional and distortion or shear differences between the two images. The disparities result from surface shape and depth as well as the direction and distance of gaze, and the torsion of the eyes (van Ee and Erkelens 1996). These disparities are used to judge the layout of 3D space and to sense the solidness or curvature of surfaces as well as to break through camouflage in images such as seen in foliage.

Description and quantification of binocular disparity requires a coordinate system which is primarily for our convenience since many types of coordinate systems could accomplish this task and we are uncertain what system is used to encode disparity by the visual system. The coordinate system requires a reference point from which to describe distance and direction. Since we are describing disparities of the two retinal images, a coordinate system that is typically chosen is retinal based rather than one that is head or world based. The reference point is the fovea and distance from the fovea is traditionally described in Cartesian X and Y components of azimuth and elevation but a polar description could and has also been used (Liu, Stevenson and Schor, 1994a).

Since retinal locations are described by the position of targets in space that are imaged on them, a transformation is needed to link retinal and visual space. The optical transformation is described above by visual direction. In computations of visual direction, retinal images are projected or sighted out through the nodal point of the eye, so that directions from objects in spaces to image on the retinal do not deviate from straight lines. As long as the eyes remain stationary, differences in visual directions correspond to differences in retinal locations.

Corresponding Retinal Points

Hering (1879) defined binocular correspondence by retinal locations in the two eyes, which when stimulated, resulted in a percept in identical visual directions. For a fixed angle of convergence, some of these identical visual directions converged upon real points in space. In other cases, corresponding points have visual directions that do not intersect in real space. Accordingly, some corresponding regions of the two retinas might only be stimulated by real objects in space under limited circumstances. We shall see that this only occurs for infinite viewing distances and that at finite viewing distances, only a small portion of corresponding points can be stimulated by real targets in space.

The Horizontal Horopter:

The horopter is the locus in space of real objects or points whose images can be formed on corresponding retinal points. To appreciate the shape of the horopter, consider a theoretical case in which corresponding points are defined as homologous locations on the two retinae. Begin by first considering binocular matches between the horizontal meridians or equators of the two retinae. Under this circumstance, corresponding retinal loci lie equidistant from their respective foveas and the intersection of their visual directions in space defines the longitudinal horopter.

A geometric theorem that states "any two points on a circle subtend equal angles at any other two points on the same circle". Consider a circle that passes through the fixation point and the two nodal points of the eyes. Let two points be the two nodal points and let two other points be the fixation point and any other point on the circle. The theorem predicts that angles formed by the two nodal points and the other two points in space are equal. Since the angles pass through the nodal points they are also equal in the two eyes. One of the points is imaged on the two foveas, and the other point will be imaged at retinal loci that are equidistant from their respective foveas. By definition, these non-foveal points are geometrically corresponding points. From this you can generalize that any point on this circle will be imaged at equal eccentricities from the two foveas on corresponding points in the two eyes except for the small arc of the circle that lies between the two eyes. This is the theoretical or geometric horopter. It was described by Alhazen , Aguiloneus (1613) and finally by Vieth and Muller (1818) and it bears their name (the Vieth-Muller (V-M) circle) (Fig 1).

The empirical horopter differs from the theoretical horopter in two ways. It can be skewed or tilted about a vertical axis as shown in Figure 2 and its curvature can be flatter or steeper than the V-M circle as shown in Figure 3. These two effects are described by fitting a conic section such as an ellipse through the empirical data, that includes the fixation point and the nodal points of the eyes. The curvature variation of the best fit conic section from a circle at the fixation point is referred to as the Hering Hillebrand deviation and the skew or tilt is described by an overall magnification of an array of points along the retinal equator in one eye that correspond to an array of points along the equator the other eye. These deviations cause parts of the empirical horopter to lie either distal or proximal from the theoretical horopter. When this occurs, the points on the horopter no longer subtend equal angles at the two eyes. The spatial plot of the horopter shown in Figure 2 illustrates that points on the empirical horopter that are closer than the theoretical horopter subtend a smaller angle in the ipsilateral than contralateral eye. When this occurs, empirically measured corresponding points are not equidistant from their respective foveas.

The horopter can also be represented analytically by a plot of the ratio of longitudinal angles subtended empirical horopter points (Right/Left) on the Y axis as a function of retinal eccentricity of the image in the right eye (Fig4). Changes in curvature from a circle to an ellipse result from non-uniform magnification of retinal points in one eye. If the empirical horopter is flatter than the theoretical horopter, corresponding retinal points are more distant from the fovea on the nasal than temporal hemi retina (see Fig 3). A tilt of the horopter around a vertical axis, results from a uniformly closer spacing of corresponding points to the fovea in one eye than the other eye (uniform magnification effect) (see Fig 2).

The Vertical Horopter

The theoretical vertical point-horopter for a finite viewing distance is limited by the locus of points in space where homologous visual directions will intersect real objects and is described by a vertical line that passes through the fixation point in the mid-sagittal plane (Fig 5). Eccentric points in tertiary gaze (points with both azimuth and elevation) lie closer to one eye than the other and because they are imaged at different vertical eccentricities from the two foveas they can not be imaged on theoretically corresponding retinal points. However all points at an infinite viewing distance can be imaged on homologous retinal regions and at this viewing distance the vertical horopter becomes a plane.

The empirical vertical horopter is declinated in comparison to the theoretical horopter(Fig 6). Helmholtz (1909) reasoned this was because of a horizontal shear of the two retinal images which causes a real vertical plane to appear inclinated or a real horizontal plane such as the ground to lie close to the empirical vertical horopter.

Coordinate Systems for Binocular Disparity

All coordinate systems for binocular disparity use the same theoretical horopter as a reference for zero retinal image disparity. Targets not lying on the horopter subtend non-zero disparities and the magnitudes of these non-zero disparities depend on the coordinate system used to describe them. Howard and Rogers (1996) describe 5 coordinate systems in their Table 2.1 and these correspond to the oculomotor coordinate systems of Polar, Helmholtz, Fick, Harmes and Hess (Schor et al 1994). Howard and Rogers and most vision scientists use the Harms system. The different coordinate systems will quantify vertical and horizontal disparity equally for targets at optical infinity since all disparities equal zero at this distance, but they will differ for finite viewing distances. This is because the visual axes and retinal projection screens are not parallel or coplanar at near viewing distances associated with convergence of the two eyes. Each of the 5 coordinate systems may be suitable for only one aspect of vision and apparently none of them is adquate to discribe binocular disparity as processed by the visual system.

Corresponding points can be described in terms of Cartesian coordinates of constant azimuth and elevation. Contours in space that appear at identical constant horizontal eccentricities or constant heights by the two eyes are, by Hering's definition of identical visual directions, imaged on binocular corresponding points. The theoretical retinal regions that correspond to constant azimuth and elevation can be described with epipolar geometry. Retinal division lines are retinal loci that give rise to the sense of constant elevation or azimuth. Regions in space that lie along the perspective projection of these retinal division lines are derived with transformations from the spherical retinal space to isopters in a planar Euclidean space. Isopters are contours on a tangent screen that appear to be straight and horizontal (or vertical). The relationship between the retinal division lines and isopters is a perspective projection through a projection center. The choice of retinal division lines (major or minor circles) and projection center (nodal point or bulbcenter) determines the coordinate system (Figure 7) (Tschermak-Seysenegge,1952). The theoretical case is simplified by assuming the projection point and nodal point of the eye both lie at the center of curvature of the retinal sphere (bulbcenter). As shown in Figure 7, there are several families of theoretical retinal division lines. There are great circles, such as in the Harms system, which resemble lines of longitude on the globe (Schor et al 1994). Their projections all pass through the bulbcenter as planes and form straight line isopters on a tangent screen. There could also be a family of minor circles, such as in the Hess system, which resemble lines of latitude on the globe (Schor et al 1994). Their projections also pass through the bulbcenter as conic sections and form parabolic isopters on a tangent screen. None of these necessarily represents the set of retinal division lines used by the visual system because retinal division lines, by definition, must produce a sense of equal elevation or equal azimuth. As will be discussed below, empirical measures of iso-elevation indicate that the projection point actually lies in front of the nodal point, such that it produces barrel distortions of the visual field (Liu and Schor, 1997).

The tangent plane projections represent a map of constant or iso-elevation and iso-azimuth contours. The only viewing distance at which projection maps of the two eyes can be superimposed is at optical infinity. At finite viewing distances, the eyes converge and the retinal projection screens are no longer parallel or coplanar such that a single tangent plane will make different angles with the axes of the two eyes. Because the coordinate systems describing binocular disparity are retinal based, their origins (the lines of sight) move with the eyes so that they project relative to the direction of the visual axis. In eye movements, the reference systems are head-centric at the primary position of gaze so that the coordinate systems don't change during convergence. The consequence of having a retinal based coordinate system is that the projected tangent planes of the two eyes are not coplanar in convergence. This problem is solved by mapping both eyes onto a common tangent plane that is parallel to the face (fronto-parallel plane) which results in trapezoidal distortion of the projected isopters (Liu and Schor, 1997).

When the fronto-parallel maps of the two eyes isopters are superimposed, they only match along the primary vertical and horizontal meridians and they are disparate everywhere else. Consequently, the theoretical horopter is not a surface but rather it is made up of a horizontal (Vieth Muller) circle and vertical line in the midsagittal plane (Figure 5). All coordinate systems will predict the same pure horizontal or vertical disparity along these primary meridians but they will predict different magnitudes of combined vertical and horizontal disparities in tertiary field locations.

The separations of the two eye's iso-elevation and iso-azimuth isopters describe vertical and horizontal misalignment of visual directions originating from corresponding retinal division lines (binocular discrepancy). Binocular discrepancy is distinguished from binocular disparity which refers to the misalignment from corresponding points of retinal images of any object in space. Binocular discrepancy provides the objective reference points of each eye in the fronto-parallel plane for zero binocular disparity (Liu and Schor, 1997). Binocular discrepancy and disparity are related since the discrepancy map also describes the pattern of binocular disparities subtended by points lying in the fronto-parallel plane.

Monocular Spatial Distortions and the Empirical Binocular Disparity Map

The mismatch of the projection fields of iso-elevation and iso-azimuth isopters at near viewing distances results in a binocular discrepancy map or field on the fronto-parallel plane. An empirically derived map needs to take into account the horizontal shear of the two monocular image spaces, described by Helmholtz, as well as other monocular spatial distortions. For example horizontal contours that are viewed eccentrically above or below the point of fixation are seen as bowed or convex relative to the primary horizontal meridian (barrel distortion). Lines must be adjusted to a pincushion distortion to appear straight (Helmholtz, 1909). The retinal division lines corresponding to the pincushion adjustments needed to make lines appear at a constant or iso-elevation resemble the minor circles of the Hess coordinate system (Liu and Schor, 1997). These retinal division lines have been used to predict an empirical binocular discrepancy map for finite viewing distances.

The discrepancy map is derived from the combination of horizontal shear with empirically measured iso-elevation and iso-azimuth retinal division lines. Their projection from the two eyes onto isopters in a common fronto-parallel plane centered along the midsagittal axis produces a complex pattern of vertical discrepancies which are greater in the lower than upper visual field as is illustrated in Figure 8 (Liu and Schor, 1997). Because the pincushion distortions vary dramatically between observers (Liu and Schor, 1997), it is impossible to generalize these results and predict the vertical discrepancy map for individuals and the magnitude of vertical disparities that are quantified relative to this map.

While the general pattern of binocular discrapency distribution is the direct consequence of the basic monocular distorsions (horizontal shear and vertical trapezoid) and therefore should be valid for most cases, the quantitative details of the map may vary among observers because Liu & Schor (1997) have demonstrated that different observers have different amounts of monocular pincushion distorsion. It is important to realize the ideosyncrasy of binocular correspondence when designing experiments which involve manipulating large disparities in the periphery. Several theoretical models suggested that vertical disparity may carry information which is necessary for scaling stereoscopic depth (Mayhew, & Longuet-Higgins, 1982; Gillam & Lawergren,1983). Because vertical disparity is much smaller in magnitude comparing to horizontal disparity, the empirical verification of the above theoretical speculations usually involves manipulating vertical disparities in the far periphery (Cumming, Johnston & Parker, 1991; Rogers and Bradshaw, 1993). In such studies, accurate knowledge about the status of vertical correspondence across the visual field becomes critical because the discrepancy between geometric stimulus correspondence and empirical retinal correspondence may be large enough in the periphery to actually affect the amount of vertical disparity delivered by the stimulus.

Binocular Sensory Fusion

Panum's Fusional Areas

Binocular correspondence is described above as though there was an exact point to point linkage between the two retinal images. However Panum (1858) observed that corresponding points on the two retinae are not points at all, but regions. He reported that in the vicinity of the fovea, the fusional system would accept two disparate contours as corresponding if they fell within a radius of 0.026 mm of corresponding retinal loci. This radius corresponds to a visual angle of approximately 1/4 degree. The consequence of Panum's area is that there is a range of disparities that yield binocular singleness. The horizontal extent of Panum's area gives the horopter a thickness in space (Figure 9). The vertical extent also contributes to single binocular vision when targets are viewed eccentrically in tertiary directions of gaze. Spatial geometry dictates that because these tertiary targets are closer to one eye than the other, they form unequal image sizes in the two eyes which subtend different heights and accordingly vertical disparities. Binocular sensory fusion of these targets is made possible by Panum's areas.

Panum's area functions as a buffer zone to eliminate diplopia for small disparities near the horopter. Stereopsis could exist without singleness but the double images near the fixation plane would be a distraction. The depth of focus of the human eye serves a similar function. Objects that are nearly conjugate to the retina appear as clear as objects focused precisely on the retina. The buffer for the optics of the eye is much larger than the buffer for binocular fusion. The depth of focus of the eye is approximately 0.75 diopters. Panum's area can be expressed in equivalent units is only 0.8 meter angles or approximately one tenth the magnitude of the depth of focus. Thus we are more tolerant of focus errors than we are of convergence errors.

Both accommodation and stereopsis can discriminate differences in distance within their respective buffer zones. Accommodation can respond to defocus that is below our threshold for blur detection (Kotulak & Schor, 1986) , and the eyes can sense small depth intervals (< 10 arc sec) and converge in response to disparities (3 arc min) (Riggs & Niehl, 1960) that are smaller than Panum's area. Thus Panum's area is not a threshold or limit to depth sensitivity, nor is it limited by the minimal noise or instability of the binocular vergence system. However, it does allow the persistence of single binocular vision in the presence of constant changes in retinal image disparity caused by various oculomotor disturbances. For example, considerable errors of binocular alignment (> 15 arc min) may occur during eye tracking of dynamic depth produced either by object motion or by head and body movements (Steinman & Collewijn, 1980) .


The comparison between the depth of focus and Panum's area suggests that single binocular vision is simply a threshold or resolution limit of visual direction discrimination (Le Grand, 1953) . While this is partially true as will be described below, Panum's area also provides a combined percept in 3-D space that can be different in shape than either of its monocular components. The averaging of monocular shapes and directions is termed allelotropia. For example, it is possible to fuse two horizontal lines curved in opposite directions and perceive a straight line binocularly. There are limits to how dissimilar the two monocular images can be to support fusion and when these limits are exceeded, binocular rivalry suppression occurs in which only one eye perceives in a given visual direction at one time. Clearly fusion is not a simple summation process or a suppression process (Blake & Camisa, 1978) or one of rapid alternate viewing as proposed by Verhoeff (1935) . The combination of the two retinal images follows many of the rules of binocular visual direction as described by Hering (Ono, 1979) .

Spatial Constraints:

The range of singleness or size of Panums area and its shape are not a constant. It is not a static zone of fixed dimension; it varies with a wide variety of parameters. The classical description of the fusional area, as described by Panum, is an ellipse with the long axis in the horizontal meridian (Ogle & Prangen, 1953; Panum, 1858) . The horizontal radius is 1/4 degree while the vertical radius is 1/20 degree. The elliptical shape indicates that we have a greater tolerance for horizontal than vertical disparities. Perhaps this is because vergence fluctuations associated with accommodation are mainly horizontal, and because the range of horizontal disparities presented in a natural environment is far greater for vertical disparities.

Spatial Frequency

Both the shape and size of Panum's area vary. Panum's area increases in size as the spatial frequency of the fusion target decreases (Schor, Wood, & Ogawa, 1984a) (Figure 10). The horizontal extent increases with spatial frequencies below 2.5 cpd. Panum's fusional area, centered about the fovea, has a range from 10 to 400 arc min as spatial frequency is decreased to 0.075 cpd. At these lower spatial frequencies, the fusion range approaches the upper disparity limit for stereopsis. The variation of the horizontal fusion range may be interpreted as two sub-components of the fusion mechanism which process the position and phase of disparity. The fusion limit is determined by the least sensitive of the two forms of disparity. The fusion range at high spatial frequencies can be attributed to a 10 arc min positional limit and the range at low spatial frequencies can be attributed to a 90 degree phase disparity. At high spatial frequencies, the 90 degree phase limit corresponds to a smaller angle than the 10 arc min positional disparity, and consequently fusion is limited by the larger 10 arc min positional disparity. At low spatial frequencies, the 10 arc min positional disparity is smaller than the 90 degree phase disparity and consequently the fusion range rises at a rate fixed by the constant 90 degree phase limit (Schor, et al., 1984a) . Recently, DeAngeles et al have proposed a physiological analogue of this model that is supported by their observations of phase and position encoding disparity processing units in cat striate cortex (DeAngelis, Ohzawa, & Freeman, 1995) .

The shape of Panum's area also changes when spatial frequency is decreased to 2.5 cpd. Panum's area changes from an elliptical shape at high frequencies to a circular shape at frequencies lower than 2.5 cpd. This is because the vertical dimension continues to decrease as spatial frequency is increased above 2.5 cpd but the horizontal dimension remains constant at higher spatial frequencies. Interestingly, vertical disparity limits for fusion only follow the phase limit. Their dimension is not limited at high spatial frequencies by a constant positional disparity.

Retinal Eccentricity

Panum's area also increases with retinal eccentricity of fusion stimuli (Crone & Leuridan, 1973; Hampton & Kertesz, 1983; Mitchell, 1966; Ogle, 1952) . The increase in the fusion range is approximately 7% of the retinal eccentricity. These measures of fusion range have been made with broad band spatial frequency stimuli. When fusion ranges are measured with narrow band spatial frequency stimuli, such as the difference of Gaussian, (Schor, Wesson, & Robertson, 1986; Wilson, Blake, & Pokorny, 1988) , the range of fusion does not change with retinal eccentricity. Fusion ranges remain small with spatial frequencies above 2.5 cpd as retinal eccentricity increases, as long as the fusion stimulus can be resolved. Eventually, when visual resolution decreases below 2.5 cpd (at 10 degrees retinal eccentricity) lower spatial frequencies than 2.5 cpd must be used to stimulate fusion and Panum's area increases. However the fusion range is still the same as it would be at the fovea when measured with the same low spatial frequency. When measured with a broad band stimulus, the highest resolvable spatial frequency will limit the fusion range (Schor, Heckman, & Tyler, 1989) . Higher spatial frequency components are processed from the broad band stimulus when imaged in the central retina and the sensory fusion range will begin to increase when the peripheral retina is not able to resolve frequencies above 2.5 cpd and fusion is limited by the remaining lower spatial frequencies. These results and other studies of the independence of fusion limits and image contrast and luminance (Mitchell, 1966; Schor, et al., 1989; Siegel & Duncan, 1960) suggest that binocular fusion is based on information in independent spatial-frequency channels rather than on the overall luminance distribution of a broad-band stimulus.

Disparity Gradient Limits

Fusion ranges are also reduced by the presence of other nearby stimuli that subtend different disparities (Braddick, 1979; Helmholtz, 1909; Schor & Tyler, 1981) . As a rule of thumb, two stimuli of unequal disparity can not be fused simultaneously when their disparity difference is greater than their separation (Burt & Julesz, 1980) . This disparity gradient limit, defined as the ratio of disparity difference over separation, is 1.0. For example the two dots pairs of unequal disparity shown in Fig 11 can be fused as long as their vertical separation is greater than their disparity difference. The interaction between nearby targets is also influenced by their spatial frequency content. The disparity gradient limit is very strong when a small high spatial frequency crossed disparity stimulus is presented adjacent to a slightly lower spatial frequency background ( 2 octaves lower) subtending zero disparity. However a much lower spatial frequency background (4 octaves lower) or a higher spatial frequency background has less or no influence on the fusion range with the same foreground stimulus (Scheidt & Kertesz, 1993; Wilson, Blake, & Halpern, 1991) . This coarse to fine limit demonstrates how low frequency stimuli can constrain the matches of higher spatial frequency stimuli. This can be beneficial in large textured surfaces that contain coarse and fine spatial frequency information. The coarse features have less ambiguous matches than the high frequency information and the disparity gradient limit helps to bias matches in ambiguous stimuli such as tree foliage to solve for smooth surfaces rather than irregular depth planes.

Temporal constraints.

The size and dimensions of Panum's area also depend upon the exposure duration and velocity at which disparity increases. The horizontal radius of PFA increases from 2 to 4 arc min as exposure of pulsed disparities increased from 5 to 100 msec and remains constant for longer durations (Woo, 1974) . The horizontal dimension of Panum's area also increases as the velocity of slow continuous variations of disparity decreases while the vertical dimension is unaffected by disparity velocity (Schor & Tyler, 1981) . Thus at low velocities (2 arc min/sec) , Panum's area extends beyond the static disparity limit to 20 arc min horizontally and has an elliptical shape. At higher velocities (> 10 arcmin) the horizontal dimension shrinks to equal the size of the vertical dimension (8 arcmin) and has a circular shape. This velocity dependence of the fusion range may contribute to the small hysteresis of fusion in which the amplitude of PFA is larger when measured with slowly increasing disparities than with decreasing large disparities (Erkelens, 1988; Fender & Julesz, 1967; Piantanida, 1986) .

Color Fusion.

When narrow band green (530 mµ ) and red (680 mµ) are viewed dichoptically in a small field, a binocular yellow percept occurs (Prentice, 1948) . Color fusion is facilitated by small fields with textured patches, low luminance and desaturated colors and flicker. Dichoptic color fusion or mixture suggests that a cortical process is involved (Hovis, 1989) .

Encoding Disparity: The Matching Problem

We derive stereoscopic depth from small differences or disparities in the two retinal images that arise from the two separate vantage points of our eyes caused by their lateral separation. Disparities of the two retinal images could be analyzed in various ways. In a local analysis, individual perceived forms or images could be compared to derive their disparity and sense their depth. In this analysis, form perception would precede depth perception (Helmholtz, 1909). In a global analysis, luminance properties of the scene, such as texture and other token elements, could be analyzed to code a disparity map from which depth was perceived. The resulting depth map would yield perceptions of form. In this analysis, depth perception would precede form perception. In the local analysis it is clear which monocular components of the perceived binocular images are to be compared because of their uniqueness. The global analysis is much more difficult since many similar texture elements, such as exist in foliage on a tree, must be matched and the correct pairs of images to match are not obvious. An example of our prowess at accomplishing this feat is the perception of depth in the autostereogram shown in Fig 12. Free fusion of the two random dot patterns yields the percept of a checkerboard. There are thousands of similar texture elements (dots) yet we can correctly match them to derive the disparity map necessary to see a unique form in depth. An important question is how does the visual system identify corresponding monocular features? This problem is illustrated in Fig 13 which shows a schematic of four dots imaged on the two retinae. This pair of retinal images could arise from many depth-dot patterns depending on which dots were paired in the disparity analysis. The number of possible depth patterns that could be analyzed is N! where N is the number of dots. Thus for 10 dots there are 3,628,800 possible depth patterns that could be yielded from the same pair of retinal images. How does the visual system go about selecting one of these many possibilite solutions? Clearly the problem must be constrained or simplified by limiting the possible number of matches. This is done by restricting matches to certain features of texture elements (types of primitives or tokens) and by prioritizing certain matching solutions over others. In addition there is certainly an interaction between the local and global analysis to which simplifies the process. Local features are often visible within textured fields. For example there are clumps of leaves in foliage patterns that are clearly visible prior to sensing depth. Vergence eye movements may be cued to align these unique monocularly patterns on or near corresponding retinal points and thereby reduce the overall range of disparity subtended by the stimulus (Marr & Poggio, 1976) . Once this has been done the global system can begin to match tokens based upon certain attributes and priority solutions.

Classes of Matchable Tokens:

There are many possible image qualities that are easily seen under monocular conditions such as size, color, orientation, brightness and contrast. These attributes are processed early in the visual system by low-level filters and they could be used to pair binocular images. Marr proposed that the most useful tokens would be invariant, i.e. reliable, under variable lighting conditions and that perhaps the visual system had prioritized these invariant features. For example, contrast is a more reliable feature than brightness in the presence of variations in lighting conditions caused by shadows, view point of each eye, and variable intensity of the light source. Zero-crossings, or the maximum rate of change in the luminance distribution would be a locus in the retinal image that would not change with light level. Similarly, color and locations of uniquely oriented line segments or contours would be other stable feature that varied slightly with vantage point. Points of peak contrast and patterns of contrast variation are other local cues that could also be used to match binocular images (Frisby & Mayhew, 1978; Hess & Wilcox, 1994) .

When tested individually, none of these local tokens has been found to provide as much information as theoretically possible. Zero crossings predict better performance on stereo tasks at high spatial frequencies (>2 cpd) than observed empirically (Schor, Wood, & Ogawa, 1984b) . Contour orientation contributes to matching when the line segments are longer than 3 arc min (Mitchell and O'Hagen, 1972). Contrast polarity is a very important token for the matching process. As shown in Figure 14, sustained stereoscopic depth is impossible in patterns containing coarse detail of opposite contrast polarity (Krol and van de Grind, 1983), however stereoscopic depth can be seen in line drawings of opposite contrast principally as a result of misalignment of convergence to bring like contrast edges into alignment. Similarly, contrast variation within a patch can be used in matching to perform stereo-tasks near the upper disparity limits. Using Gabor patches, these studies show that the upper disparity limit for stereopsis is not limited by carrier spatial frequency but rather it increases with the size of the envelope or Gabor patch (Hess & Wilcox, 1994) . These results could be attributed to either luminance or contrast coding of disparity.

The upper disparity limit could increase with the size of first order binocular receptive fields that encode luminance, or it could increase with the size of second order binocular receptive fields that encode contrast. In the former case, stereopsis would require that similar spatial frequencies be presented to the two eyes, whereas in the latter case stereopsis would occur with very different spatial frequencies presented to the two eyes as long as the envelope in which they were presented was similar. Second order or contrast coding requires that information be rectified such that contrast variations could be represented by changes in neural activity and this information could be used for binocular matching. Color has been investigated with stereo-performance using isoluminance patterns and found to only support stereopsis with coarse detail(de Weert and Sadza, 1983). However, color can disambiguate binocular matches in conditions such as depth transparency when combined with other cues(Jordan de al, 1990). Clearly, the visual system does not simply match one class of tokens and redundant information present in several classes of tokens improves binocular matching performance.

Matching Constraints:

Because we are able to perceive form in textured scenes such as tree foliage by virtue of stereo-depth, the binocular matching process must be constrained to simplify the task of choosing between the extreme number of possible solutions to the matching problem. When matching targets in the fixation plane which is parallel to the face, two rules can completely specify correct binocular matches. The nearest-neighbor rule specifies that matches are made between tokens that subtend the smallest disparity (a bias for precepts along the horopter), and the unique-match rule specifies that each individual token may only be used for a single match. Once a feature is matched it can not have double duty and be matched to other features as well. Without this restriction the number of possible matches in a random dot field would greatly exceed N!. The nearest neighbor rule is demonstrated in the double nail illusion (Fig 15). Two nails are placed in the midsagittal plane and convergence is adjusted to an intermediate distance between them. There are 2 possible matches, one corresponding to the true midsagittal depth and one corresponding to two nails located at the same depth in the fixation plane to the left and right in the point of convergence. The former solution is one of two different disparities, and the latter solution is one of two very small or zero disparities. The visual system chooses the latter solution and the two nails appear side by side even though they are really at different depths in the midsagittal plane. The unique match rule appears to be violated by the phenomenon known as Panum's limiting case which can be demonstrated by aligning two nails along one eye's line of sight (Fig 16) such that a single image is formed on one retina and two images corresponding to the two nails are imaged on the other retina. Hering believed that the vivid appearance of depth of the two nails resulted from multiple matches of the single image in the aligned eye with the two images in the unaligned eye. However, recently Nakayama and Shimojo (1992) have accounted for the illusion as the result of a monocular or partial occlusion cue rather than a violation of the unique match rule. These two matching rules can be applied over a wide range of viewing distances when accompanied by convergence eye movements that can bring depth planes at any viewing distance into close proximity with the horopter.

Other rules account for our ability to match the images of targets that subtend large disparities because they lie at different depths from the horopter and fixation plane. Usually these matches result in a single percept of the target, however when disparities become too large they exceed a diplopia threshold and are seen as double. Diplopic targets, especially those subtending small amounts of disparity, can still be seen in stereoscopic depth which suggests that some form of binocular matching is occurring for non-fused precepts. However as noted by Hering, depth of diplopic and even monocular images can also be judged on the basis of monocular cues such as hemi-retinal locus (Harris & McKee, 1996; Kaye, 1978) in which case binocular matching would not be required. Irrespective of whether the percept is diplopic or haplopic, several rules are needed to simplify the matching of images subtending retinal disparities. One of these is the smoothness or similarity constraint. Clearly it is much easier to fuse and see depth variations in a gravel bed than it is to see the form of the leafy foliage on a tree. If the matching process assumes that surfaces and boundary contours in nature are generally smooth, then matches will be biased to result in similar disparities between adjacent features. This rule is enforced by a disparity-gradient limit for stereopsis and fusion. Figure 11 is a stereogram that illustrates how two points lying at different depths are both fused if they are widely separated and only one can be fused at a time if they are crowded together. This effect is summarized by the rate of change of disparity as a function of the two targets separation (disparity gradient). When the difference in disparity between the two targets is less than their separation (disparity gradient less than one), both targets can be fused. When the change in disparity is equal to or greater than their separation (disparity gradient equal to or greater than one), the targets can no longer be fused (Burt & Julesz, 1980) . Thus the bias helps to obtain the correct match in smooth surfaces but interferes with obtaining matches in irregular surfaces. Edge continuity might be considered as a corollary of the smoothness constraint. Matching solutions that result in continuous edges or surface boundaries are not likely to result from chance and are strong indicators that the correct binocular match has been made. Furthermore subsequent matches along the same boundary contour will be biased to converge on the same solution.

Finally the number of potential matches can be reduced dramatically by restricting matches to retinal meridians that lie in epipolar planes. The retinal locus of epipolar images lie in a plane that contains the target in space and the two nodal points of the eyes. Assuming that the nodal point lies very near the center of curvature of the retina, this plane intersects the two retinae and forms great circles whose radius of curvature equals the diameter of the eye-globe. When presented with multiple images, matching would be greatly simplified if searches were made along these epipolar lines. Because the eyes undergo cyclovergence when we converge or elevate our direction of gaze, the epipolar planes will intersect different co-planar retinal meridians depending on gaze. Thus utilization of the epipolar constraint requires that eye position and torsion information be used to determine which retinal meridians lie in the earth referenced epipolar plane. While feasible, apparently the visual system does not obey this restriction. Stevenson & Schor(1997) have shown that a wide range of binocular matches can be made between random-dot targets containing combinations of large horizontal and vertical disparities (>1 degree). The epipolar constraint would however be ideal for artificial vision systems in which the orientation of two cameras could be used to determine which meridians in the two screen planes were coplanar and epipolar.

Computational Algorithms:

In addition to the constraints listed above, several computational algorithms have been developed to solve the correspondence problem. Some of these algorithms exhibit global cooperativety in that the disparity processed in one region of the visual field influences the disparity solution in another region. Matches for different tokens are not made independently of one another. These algorithms are illustrated with a Keplarian Grid which represents an array of disparity detectors that sense depth over a range of distances and eccentricities in the visual plane from the point of convergence. Biological analogs to the nodes in this array are the binocularly innervated cortical cells in the primary visual cortex that exhibit sensitivity to disparity of similar features imaged in their receptive fields (Hubel & Wiesel, 1970; Poggio, Gonzalez, & Krause, 1988) . Notice that the many possible matches of the points in Figure 17 fall along various nodes in the Keplarian Grid. Cooperative models enforce the smoothness and disparity gradient constraints by facilitating activity of nodes stimulated simultaneously in the fronto-parallel plane and inhibiting activity of nodes stimulated simultaneously in the orthogonal depth planes (e.g. midsagittal). The consequence is different disparity detectors inhibit one another and like disparity detectors facilitate one another (Dev, 1975; Nelson, 1975) . This general principal has been elaborated upon in other models that extend the range of facilitation to regions falling outside the intersection of the visual axes and areas of inhibition to areas of the Keplarian grid that lie between the two visual axes (Marr & Poggio, 1976) . For an excellent review of other cooperative models see Blake and Wilson (1991) .

Several serial processing models have also been proposed to optimize solutions to the matching problem. These serial models utilize the spatial filters that operate at the early stages of visual processing. Center surround receptive fields and simple and complex cells in the visual cortex have optimal sensitivity to limited ranges of luminance periodicity that can be described in terms of spatial frequency. The tuning or sensitivity profiles of these cells have been modeled with various mathematical functions (difference of Gaussian, Gabor patches, Kauche functions etc.) all of which have band pass characteristics. They are sensitive to a limited range of spatial frequencies referred to as a channel, and there is some overlap in the sensitivity range of adjacent channels. These channels are also sensitive or tuned to limited ranges of different orientations. Thus they encode both the size and orientation of contours in space. There is both psychophysical and neurophysiologcial evidence that disparities or binocular matches are formed within spatial channels. These filters serve to decompose a complex image into discrete ranges of its spatial frequency components. In the Pollard Mayhew and Frisby (PMF) model (Frisby & Mayhew, 1980) three channels tuned to different spatial scales filter different spatial scale image components. Horizontal disparities are calculated between the contours having similar orientation and match contrast polarities within each spatial scale. Matches are biased to obtain edge continuity. In addition matches for contours of the same orientation and disparity are biased that agree across all three spatial scales. Most of these models employ both mutual facilitation and inhibition however several stereo-phenomenon suggest a lesser role for inhibition. For example we are able to see depth in transparent planes, such as views of a stream bed through a textured water surface, or views of a distant scene through a spotted window or intervening plant foliage. In addition depth of transparent surfaces can be averaged, seen as filled in or as two separate surfaces as their separation increases (Stevenson, Cormack, & Schor, 1989) . Inhibition between dissimilar disparity detectors would make these precepts impossible.

Another serial model reduces the number of potential solutions with a coarse to fine strategy (Marr & Poggio, 1976) . Problems arise in matching the high and low frequency components in a complex image when the disparity of the target exceeds one half period of the highest frequency component. A veridical match can be made for any spatial frequency for any disparity that is less than a half the period (180 deg phase shift between binocular image components). However, there are unlimited matches that could be made once disparity exceeds the half period of a high frequency component. False matches in the high frequency range sensed by a small scale channel could be reduced by first matching low frequency components that have a larger range within their 180 deg phase limit for unambiguous matches in a large scale channel. The large scale solution constrains the small scale (high frequency component) solution. Indeed Wilson et al (1991) have shown that a low frequency background will bias the match of a high frequency pattern in the foreground to a solution that is greater than its half period unambiguous match. A phase limit of 90 rather than 180 degrees for of the upper disparity limit for binocular matches has been found empirically using band pass targets observe (Schor, et al., 1984b) . The 90 degree phase limit results from the two octave bandwidth of the stimulus as well as the spatial channels that process spatial information since phase is referenced to the mid frequency of the channel rather than its upper frequency range and the 180 degree phase limit still applies to the upper range of these channels.

The matching task could be facilitated by vergence responses to the large scale component of the image which would reduce overall disparity and bring the small-scale components within unambiguous phase range of a small-scale channel (Marr & Poggio, 1979) . Eventually through iterations, a common disparity solution will be found for all spatial frequency components. A similar result could be obtained in a parallel process with a single broadly tuned spatial channel that could be constrained to find a common match for all spatial frequency components of the image, as long as they had a wide enough range of spatial frequencies. When the frequency range is too narrow, then the nearest neighbor match will prevail, regardless of the real disparity of the target. This is seen in the wall paper illusion in which a repetitive patter of vertical lines can be fused at any horizontal vergence angle and the depth of the grating is always the solution that lies nearest to the point of convergence, irrespective of the real distance of the wall paper pattern.

The serial models or any spatial scale model of binocular matching that rely on the 180 deg phase limit for unambiguous disparity are not supported by empirical observations of fusion limits and stereopsis (Schor, Wood and Ogawa, 1984a,b). The theories predicts that stereo threshold and binocular fusion ranges should become progressively lower as spatial frequency is increased. This prediction is born out as spatial frequency increases up to 2.5 cpd however both stereo acuity and horizontal fusion ranges are constant at frequencies above 2.5 cpd. As a result, disparities are processed in the high spatial scales that greatly exceed the 180 degree phase limit. This is a clear violation of theories of disparity sensitivity based upon phase sensitivity within spatial channels. In addition to phase sensitivity, there may be other limits of disparity resolution, such as a disparity position limit. Given the presence of both a phase and position limit, the threshold would be set by which ever of these two limits was least sensitive. For large disparities, the constant position limit would be smaller than the phase limit, and the converse would be true for high spatial frequencies.

Inter Ocular Correlation (IOC):

Computational algorithms and matching constraints described above rely upon a preattentive mechanism that is capable of making comparisons of the two retinal images for the purpose of quantifying the strength of the many possible binocular matches. Ideally, all potential matches could be quantified with a cross-correlation or interocular correlation function (IOC). This is represented mathematically as the convolution integral of the two retinal images.

IOC(d)=Ú f(x)h(x+d)dx

where f(x) and h(x) represent the intensity profiles (or some derivative of them) along the horizontal meridian of the right and left eye's retinae. The IOC can be thought of as the degree to which the two retinal images match one another.

This nonlinear operation represents the strength of various matches with products between the two eyes images as a function of retinal disparities along epipolar lines. A random dot stereogram (RDS) is an ideal target to test the binocular matching process in human vision because it is devoid of clear monocular forms that are only revealed after stereoscopic depth is perceived (Julesz, 1964) . Inter ocular correlation of a random dot stereogram equals the proportion of dots in one eye's image that match dots in the other eye's image with the same contrast at the same relative location. The middle RDS shown in Figure 18 has 50 percent density and is composed of only black and white dots. Those dots that do not match correspond to images of opposite contrast. If all dots match, the correlation is +1. If no dots match, that is they all have paired opposite contrasts, the correlation is -1. If half the dots match, the correlation is zero.

The image correspondence of the RDS can be quantified by a cross-correlation analysis of the luminance or contrast profile of the random dot pattern. Figure 18 illustrates 3 patterns of random element stereograms whose correlation varies from +1 at the top where all dots match, to zero at the bottom where left and right image contrasts are randomly related. Note the variation in flatness of the fused images. Figure 19 illustrates the cross correlation function of the previous autostereogram images, where the peak of each function represents the stimulus correlation at the disparity of the match, and the average amplitude of the surrounding noise represent a zero correlation composed of 50 percent matched dots. The noise fluctuations result from spurious matches that vary at different disparities. The negative side lobes about the peak result form our use of edge contrast rather than luminance in computing the cross-correlation function. With only two dot contrasts, the interocular correlation equals the percent matched dots in the stimulus minus 50 percent divided by percent.

IOC=2Pd-1 where Pd is the proportion of matching dots

The IOC is analogous to contrast in the luminance domain or to a Weber fraction. It represents the visibility of the stimulus disparity in the presence of a mean background correlation of zero with 50 percent matches. The cross-correlation provides a means of quantifying the visibility of the disparity much like contrast quantifies the visibility of a luminance contour.

Off-Horopter IOC Sensitivity:

The RDS has been used to measure the sensitivity of the binocular system to correlation at different standing disparities with respect to the horopter. This is analogous to extra-horopteral studies of stereopsis by Blakemore (1970) and by Badcock and Schor (1985) which measured stereopsis as a function of distance in front or behind the fixation plane, only we are measuring correlation detection as opposed to differential disparity detection. The task with the RDS is for the subject to identify which of two intervals presents a partially correlated surface as opposed to one having zero correlation where 50% of the dots match. Surprisingly, we are extremely good at this task, and under optimal conditions, some subjects can detect as little as 5% increment in correlation whereas others can only detect 10%. Thus for a 10 percent correlation threshold, the latter subject is able to discriminate between 50 and 55 percent matching dots.

Figure 20 illustrates the off-horopter correlation thresholds for three subjects measured as a function of the disparity subtended by the correlated dots. Sensitivity falls off abruptly away from the horopter until 100% correlation is needed to discriminate between a zero correlated field at one degree disparity on either side of the horopter. Beyond that distance all correlations appear the same as zero. The range can be extended to 2 degrees simply by increasing the number of visible dots as a result of increasing field size or reducing dot size. There is improvement as the number of dots is increased up to 10,000 dots (Cormack, Stevenson, & Schor, 1994) and then performance remains static demonstrating the limited efficiency of the visual system. The function illustrates that the horopter is to binocular vision as the fovea is to spatial resolution. Its the locus along the depth axis where correlation sensitivity is highest.

Extrinsic and Intrinsic Noise and IOC:

The matching problem is basically one of detecting a signal in the presence of noise. The peak of the cross-correlation function could represent the signal which is to be detected in the presence of various sources of noise. One extrinsic noise source results from the spurious matches at non-optimal disparities. These are seen as the ripples in the flanks surrounding the peak of the IOC distribution. There are also intrinsic sources of noise that could result from the variable responses of physiological disparity detectors. The influence of these two noise sources can be revealed by measuring the correlation threshold of a RDS as a function of contrast. The IOC threshold is most sensitive at high contrasts and remains constant as contrast is reduced to 10 times the contrast threshold (approximately 16 percent contrast) for detecting the RDS. Figure 21 illustrates that at lower contrasts, the threshold increased proportionally with the square of the reduction (i.e. slope of -2 on a log-log scale). For example if contrast is lowered by a factor of two the correlation threshold for perception of a plane increases by a factor of four. Figures 22 and 23 illustrate the variation of signal to noise ratio that account for the flat and root 2 regions of the contrast function. In Figure 22 assume the noise results from spurious matches in the stimulus. Accordingly, as the contrast of an image pair with some fixed interocular correlation is reduced, both the signal amplitude and the noise level are decreased by the square of contrast. The covariation of signal and noise with contrast results in a constant signal to noise ratio. Figure 23 assumes the noise results from an intrinsic source that is independent of the stimulus and that this intrinsic noise is greater than the extrinsic noise when the image contrast has been reduced below 10 times detection threshold. As contrast is reduced below a value of 10 times detection threshold, the signal is still reduced with the square of contrast however the intrinsic noise remains constant. Accordingly, signal-to-noise ratio decreased abruptly with the square of contrast causing a rapid rise in IOC threshold. These results illustrate the presence of intrinsic and extrinsic noise sources as well as a non-linearity which can be described as the binocular cross-correlation or product of the two monocular images.

Estimating Disparity Magnitude:

Once features have been matched in the two eyes, the resulting disparities must be estimated and scaled with viewing distance in order to obtain a veridical quantitative sense of depth. Lehky et al (1990) have summarized three classes of general mechanisms that have been proposed to quantify disparity. These include an array of narrowly tuned units with non-overlapping sensitivity, that have sensitivities distributed over the range of disparities that stimulate stereopsis. The spatial layout of binocular receptive fields for these multiple local channels forms the nodes described in a Keplarian grid (Marr & Poggio, 1976; Mayhew & Frisby, 1980) . The value of a disparity determines which channel is stimulated. A second process uses rate encoding to specify disparity with a single channel. Firing rate increases as disparity increases as suggested by models by Julesz (1971) and Marr and Poggio (1979) . The third and most physiologically plausible means of disparity quantification utilizes a distribution of multiple channels with partially overlapping sensitivity. Because there is overlapping sensitivity, the activity of a single channel is ambiguous, however disparity amplitude can be computed from the activity of all channels by such methods as averaging (Stevenson, Cormack, Schor, & Tyler, 1992) or by spectrum representation (Lehky & Sejnowski, 1990) .

In all cases, if the disparity analysis is localized to a small region of space, there is little problem with confounding the analysis with multiple disparity stimuli. This only occurs in dense random-depth scenes such as close up views of tree foliage. In these circumstances, averaging mechanisms would have difficulty in resolving the separation of closely spaced depth planes, whereas spectral representations would have unique distributions for multiple disparities that could be analyzed with templates (Lehky, 1990). Our ability to solve the correspondence problem suggests however that there is some interaction between adjacent stimuli as described above in cooperative-global models. This is thought to involve inhibitory interactions between disparity detectors which have been described as inhibitory side-lobes of disparity tuned channels (Cormack, Stevenson, & Schor, 1993; Lehky, et al., 1990; Poggio, et al., 1988; Stevenson, et al., 1992) . This produces a band-pass sensitivity to periodic spatial variations of disparity such as seen in a depth-corrugated surface like a curtain (Tyler, 1975).

Our ability to perceive depth transparency is often cited as a phenomenon that is inconsistent with many models of stereopsis. However if a single values of depth are analyzed along discrete directions of space, some patches of space will be coded as near and some are far. These regions could be interpolated at a higher level to perceive that the near and far surfaces are actually continuous. The alternative is that the visual system processes the multiple matches in every visual direction, resulting in true transparency.

Finally there is the question of the metric used to quantify disparity. Disparity has been described either as a positional offset of the two retinal images from corresponding points (measured in angular units) or as a phase offset, where phase describes the disparity as a proportion of the luminance spatial period to which the detector is optimally tuned (Schor, Wood, & Ogawa, 1984a) . Thus a 1/4 degree disparity would be a phase disparity of 180 degrees for a unit tuned to 2 cycles/deg. Positional disparity could be coded by the relative misalignment of two receptive fields that have the same distribution of excitatory and inhibitory zones in the two eyes (Barlow, Blakemaore, & Pettigrew, 1967) and phase disparity could be coded by receptive fields that are not offset in the two eyes but have relative offsets of between excitatory and inhibitory zones within the monocular receptive fields such that one cell could have a peak centered in cosine phase and the other a displaced peak in sine phase (Figure 24) (DeAngelis, Ohzawa, & Freeman, 1991; Freeman & Ohzawa, 1990) .

These models and physiological measures suggest several questions about stereopsis that include what is the minimum number of disparity channels that could account for stereo acuity, what are the crowding limits on stereo-resolution for adjacent stimuli, are multiple depths averaged or biased toward one another, is there evidence supporting the existence of inhibitory interactions in stereo-processing mechanisms, and is disparity coded by phase or position?

Disparity Pools or Channels:

Stereopsis has both a transient component that senses depth of very large (up to 12 degrees), briefly presented (<200 msec) disparities (Ogle, 1952; Westheimer & Tanzman, 1956) and a sustained sense of small disparities (<1 degree) (Ogle, 1952) . Behavioral studies suggest that different channel structures underlie these two forms of stereopsis. Transient depth stimuli are perceived qualitatively as nearer or farther from the fixation plane. Many individuals are unable to perceive transient depth to any disparity magnitude presented in one depth direction, either far or near, from the fixation plane while they are able to perceive it in response to a wide range of disparities presented in the opposite depth direction (Richards & Regan, 1973) . This condition is referred to as stereo anomalous (Richards, 1971) . Stereo anomalous subjects have normal stereo-acuity when measured with static (sustained) disparities. The stereo-anomalous observations have been used as evidence for three classes of disparity pools (crossed, uncrossed and zero) that sense transient disparities. Jones (1977) also observed a transient disparity vergence deficit in stereo-anomalous subjects that was consistent with the three-pool model.

Simulations with the three-pool model indicate that it has insufficient resolution to account for the static (sustained) stereo-hyperacuity of 5 to 10 arc seconds (Lehky, et al., 1990) . Correlation detection studies employing depth adaptation (Stevenson, et al., 1992) and sub threshold summation (Cormack, et al., 1993) revealed multiple sustained-disparity tuned mechanisms with peak sensitivities along approximately a 2 degree disparity continuum that had opponent center-surround organization. The width of these disparity tuning functions varied from 5 arc min at the horopter to 20 arc min at a distance of 20 arc min from the fixation plane. Models of sustained stereo sensitivity based upon these data as well as measures of differential depth sensitivity in front and behind the fixation plane (Badcock & Schor, 1985) indicate that a minimum of approximately 20 channels are necessary to account for the sensitivity of the sustained stereo system (Lehky, et al., 1990; Stevenson, et al., 1992) .

Stereoscopic Depth Perception

Depth Ordering and Scaling

A variety of cues are used to interpret a three-dimensional space from the two dimensional retinal images. Static monocular cues rely upon some familiarity with the absolute size and shape of targets in order to make quantitative estimates of their relative distance and surface curvature. Monocular cues such as overlap do not require any familiarity with objects. They only give qualitative information about depth ordering however they do not provide depth magnitude information. Stereopsis and dynamic motion parallax cues do yield a quantitative sense of relative depth and 3-D shape and they not depend upon familiarity with size and shape of objects. Depth from stereo and motion parallax can be calculated from geometrical relationships (triangulation) between two separate views of the same scene, taken simultaneously in stereopsis or sequentially in motion parallax.

Three independent variables involved in the calculation stereo-depth are retinal image disparity, viewing distance, and the separation in space of the two view-points (i.e. the baseline). In stereopsis, the relationship between the linear depth interval between two objects and the retinal image disparity that they subtend is approximated by the following expression.

Delta d= h * d2/2a

where h is retinal image disparity in radians, d is viewing distance, 2a is the interpupillary distance and Delta d is the linear depth interval. 2a, d, and Delta d are all expressed in the same units (e.g.. meters). The formula implies that the visual system has some knowledge about the interpupillary distance and the viewing distance which could be sensed from the angle of convergence (Foley, 1980) or from other retinal cues such as oblique or vertical disparities that are produced geometrically by targets in tertiary directions from the point of fixation (Garding, Porrill, Mayhew, & Frisby, 1995; Mayhew and Longuet-Higgins, 1982, Gillam and Lawergren, 1983; Liu, Stevenson, & Schor, 1994a; Rogers & Bradshaw, 1993; Westheimer & Pettet, 1992) .

The equation illustrates that for a fixed retinal image disparity, the corresponding linear depth interval increases with the square of viewing distance and that viewing distance is used to scale the horizontal disparity into a linear depth interval. When objects are viewed through base-out prisms which stimulate additional convergence, perceived depth should be reduced by underestimates of viewing distance. Furthermore, the pattern of zero retinal image disparities described by the curvature of the longitudinal horopter varies with viewing distance. It can be concave at near distances and convex at far distances in the same observer (Fig 3) (Ogle, 1964) . Thus, without distance information, the pattern of retinal image disparities across the visual field is insufficient to sense either depth ordering (surface curvature) or magnitude (Garding, et al., 1995) . Similarly, information about direction of gaze is an important source of information to interpret disparity fields associated with slanting surfaces since the same pattern of horizontal disparity can correspond to different slants about a vertical axis presented at various gaze eccentricities (Ogle, 1964). Clearly, stereo-depth perception is much more than a disparity map of the visual field.

There are lower and upper limits of retinal image disparity that can be coded by the nervous system and used to interpret relative depth. The region of useful disparities is illustrated in Figure 25 (Tyler, 1983) which plots perceived depth as a function of binocular disparity. The lower left-hand corner of the function represents the lower disparity limit which equals stereo-acuity. The lower right-hand corner represents the upper disparity limit for stereopsis, beyond which no stereo depth is perceived. The upper disparity limit for stereopsis (aproximately 1000 arc min) is much greater than for singleness or Panum's fusional area, indicated by the vertical line at 6 arc min (Ogle, 1952) . However when evaluated with narrow band stimuli the upper disparity limit is only slighty greater than the fusion limit (Figure 10). The largest upper disparity limit for stereopsis occurs at low spatial frequencies and correspond to the large upper stereo limit shown at the right side of the horizontal axis in Figure 25. Below Panum's limit, targets are seen singly and in depth, whereas above Panum's limit they are seen as double and in depth for a limited range. Between the upper and lower limits there is a region where there is a veridical match between perceived depth and actual depth. The maximum perceived depth occurs just beyond the fusion limit. Then perceived depth actually diminishes as disparity is increased up to the upper disparity limit. The rising limb of this function describes quantitative stereopsis, in which perceived depth increases monotonically with retinal image disparity. The falling limb describes qualitative stereopsis in which the sign or direction if depth is registered (near vs. far) but not its true magnitude (Ogle, 1952) . The quantitative system is optimal with sustained stimuli lasting at least one second whereas the qualitative system is optimal with transient or short exposure durations (less than 200 msec). These two components of stereopsis function as focused attention and surveillance mechanisms respectively. Quantitative stereo is used to interpret the depth and shape of targets near the horopter in the vicinity of the point of convergence whereas qualitative stereo is a preattentive process used to roughly localize the appearance or sudden depth changes of targets that lie away from the point of attentive fixation.

Hyper-Acuity, Super-Resolution and Gap Resolution:

There are three classes of visual-direction acuity described by Westheimer (1979; 1987) . They are referred to as Hyperacuity, Super-Resolution Acuity and Gap Resolution Acuity (Fig 26). Hyperacuity tasks involve detection of a small relative displacement of targets which are separated in space or time. The term hyperacuity refers to the extremely low threshold which is less than the width of a single photoreceptor in the retina. A classical example is Vernier acuity in which misalignment is detected between targets that are separated along a space orthogonal to the axis of the displacement to be detected. Super-resolution involves size discrimination between sequentially viewed targets. Width discrimination is a super-resolution task. Gap-resolution represents our ability to resolve space between two separate targets that produces a dip in the combined target luminance profile. It is a judgment based upon something like a Raleigh criterion. Measures of visual acuity with a Snellen E or Landolt C are examples of gap-resolution.

There are forms of stereo acuity that are analogous to the three forms of monocular acuity (Stevenson et al 1989). As shown in Fig 27, stereo-hyper acuity tasks involve discrimination of depth between adjacent targets. Stereo-super-resolution involves discriminating between the depth-axis thickness of surfaces (pykno-stereopsis) (Tyler, 1983) . Stereo-gap-resolution tasks require discrimination between a single thick surface and two overlaying separate surfaces. Stereo-gap perception of two overlapping depth surfaces is referred to as dia-stereopsis (Tyler, 1983) . The thresholds for these three stereo tasks are similar to their monocular counterparts. Thresholds for stereo-hyper acuity range from 3 to 6 arc seconds. Thresholds for stereo-super resolution range from 15 to 30 arc sec and thresholds for stereo-gap resolution are approximately 200 arc sec. These distinct thresholds demonstrate that stereopsis can sub serve different types of acuity tasks and performance on these tasks follows performance on analogous visual direction acuity tasks. The thresholds presumably result from the spread functions depicted in Fig 28 which represents the combined noise produced by optical filtering, oculomotor vergence noise and neural filtering. Acuity limits for stereo and visual direction tasks could be attributed to a three-dimensional ellipsoid of positional uncertainty formed by the spread functions on each spatial dimension.


Under optimal conditions we are able to resolve depth differences as small as 2 to 6 seconds of arc. This performance level can be achieved with foveal fixation of targets located at the plane of fixation (on the horopter). When the stereo-threshold is measured with two spatial offsets in each eye, such as illustrated in Fig 29, it is remarkable that the offset in each eye at stereo threshold is smaller than vernier acuity or the minimum offset that can be detected by one eye alone (Berry, 1948; McKee & Levi, 1987; Schor & Badcock, 1985; Westheimer & McKee, 1979) . In particular this observation poses problems for ideal detector models of hyper-acuity that attempt to explain limitations of vernier-acuity with retinal factors.

Relative Disparity:

Stereopsis is our ability to detect relative depth between two objects and there are several ways that the relative depth could be computed. Each eye has an image of the two objects and relative disparity could be computed as the difference in the image separations formed in the two eyes. Alternatively, each of the compared objects subtends an absolute disparity at the horopter and relative depth could be computed from the difference in these absolute disparities. It has been argued that relative disparities for stereo-perception are computed from differences in absolute disparity rather than from comparisons of spatial offsets of the two monocular images (Westheimer & McKee, 1979) . Although absolute disparity provides only a vague sense of depth, which is supplemented by oculomotor convergence (Foley) and monocular depth cues, depth could be discriminated with hyper resolution by comparing activity of two separate detectors of absolute disparity (Regan, 1982) .

Stereo-Depth Contrast

Veridical space perception requires that the visual system does not confuse a disparity generated by object surfaces in our environment and disparities produced by errors of view point caused for example by body sway or errors of gaze direction or eye alignment (Enright, 1990). For example errors of cyclo vergence and of gaze direction would produce disparity maps that would correspond to inclination and slant of real surfaces in space. Because the visual system is insensitive to whole field disparity gradients (uniform and shear) (Shipley and Hyson, 1972; Gillam et al 1984; Mitchisen and Westheimer 1984; 1990; Stevens and Brookes, 1987; 1988; van Ee and Erkelens, 1995) any potential confusion between real depth surfaces and view point errors is averted. In part our insensitivity to whole field disparity gradients results from a global process that normalizes continuous variations of disparity (disparity gradients) so that an abrupt depth variation between a small surface within a background disparity gradient is perceived as a depth change relative to a normalized fronto-parallel background. The result is that the small object within a background composed of a disparity gradient will appear as though it were on a zero-disparity or fronto-parallel background field.

Depth contrast (Werner 1937; 1938) is a specific class of a general category of percepts referred to as simultaneous contrast. Contrast effects in other visual sensory modalities include luminance contrast (Helmholtz, 1909), induced motion such as the moon and moving cloud illusion (Duncker, 1929); color contrast (Kirschmann 1890); tilt or orientation contrast as seen in the rod and frame illusion (Gibson, 1937); spatial frequency (size) (MacKay, 1973) and curvature contrast (Gibson, 1937). These various forms of simultaneous contrast allow a constancy of space perception in the presence of luminance and spectral changes in overall lighting, optical distortion caused by atmospheric thermals or optical aberrations of the eye, and motion caused by passive movements of the body and eyes. Figures X and Y are illustrations of brightness and depth contrast illusions produced by a background gradient and a ring figure of either constant luminance (Koffka's ring) (Koffka, 1935) or zero disparity (Brookes and Stevens 1989).

Local and Global effects: Simultaneous contrast between adjacent or overlapping targets is referred to as a local effect if the contrast results from interactions between individual nearby disparities. Simultaneous depth contrast also operates globally over long ranges or large separations (Fahle and Westheimer, 1988) The global effect is the referencing of all depths relative to the overall background disparity which has been normalized to the fronto-parallel plane. The local effects are relative depth contrasts between adjacent objects or surfaces, all of which are seen with respect to the normalized background disparity. The magnitude of long-range global interactions varies with background disparity, contrast and size (Kumar and Glaser, 1991). Local or short range simultaneous contrast between two disparity gradients is greater when they are separated along a common slant axis (van Ee and Erkelens, 1996). These effects are enhanced by short exposure durations (10 msec) (Werner, 1937) and are dramatically reduced with several seconds of exposure (Kumar and Glaser, 1992).

Position and Phase Limits

As with sensory fusion, stereo acuity depends upon spatial scale. When tested with a narrow band stimulus of 1.75 octaves, stereo-threshold becomes elevated when tested with spatial frequencies below 2.5 cycles/deg (Schor & Wood, 1983; Schor, et al., 1984b; Smallman & MacLeod, 1994) . Interocular differences in spatial frequency also influence sensitivity to binocular disparity. Figure 30 illustrates that sensitivity to depth of small tilted patches composed of DOG or Gabor patches with moderate bandwidths (1.75 octaves), produced by horizontal disparities between patches of unequal size becomes reduced when center spatial frequencies of the patches differ by more than 3 octaves over a frequency range from 0.075 to 2 cpd. At higher spatial frequencies, (2-19 cpd) stereoscopic depth thresholds of slanted patches remains acute over a much wider range of interocular differences in target width and corresponding spatial frequencies. These results suggest that disparity is processed early in the visual system within a broad range of linear channels with band-limited tuning for spatial frequency.

Differences in the range of tolerable interocular differences observed at high and low spatial frequencies may be related to the position and phase limits for disparity processing described previously for binocular sensory fusion. Physiologically, binocular receptive fields have two fundamental organizational properties. The receptive fields that represent the two eyes can be offset from a point of retinal correspondence (position disparity), and they can also have different structural organization in which there is a phase shift of the areas of excitation and inhibition in one eye's receptive field compared to the other (phase disparity) (Fig 24) (DeAngelis, Ohzawa, & Freeman, 1995) . The position limit at high spatial frequencies could result from positional jitter in all binocular receptive field sizes which has a minor influence on the sensitivity of large receptive fields but that it is large relative to the phase coding ability of small (high-frequency) receptive fields. The outcome is a breakdown of the size-disparity correlation at high spatial frequencies. Other models assume that disparity processing is based soley upon positional information in spatial channels tuned to frequencies above 2.5 cpd (Kontsevich & Tyler, 1994) . In this model, elevated stereo-thresholds at low spatial frequencies (below 2.5 cpd) result from reduced effective contrast of low spatial frequencies passed by the lowest binocular spatial channel that is tuned to 2.5 cpd. The band-pass stereo tuning characteristics observed with interocular differences in low spatial frequencies (< 2.5 cpd) ( Schor, Wood and Ogawa, 1984) could result from the differences in effective contrast of these stimuli and interocular inhibition (Kontsevich & Tyler, 1994).

A variety of factors influence the stereo-threshold and conditions which yield peak stereo-acuity are fairly specific. Geometrically, relative disparities or differences in the absolute disparity subtended by two objects in depth is independent of the distance of the two targets from the fixation plane (Fig 31). The same pattern of relative retinal disparities remains if the eyes converge to a nearer point or diverge to a farther point of fixation. However our stereo-sensitivity to relative disparity varies dramatically with distance from the horopter. The distance of the two targets from the fixation plane is described as a depth pedestal. Depth discrimination thresholds, measured as a function of the depth pedestal, describe a depth-discrimination threshold function that is summarized by a Weber fraction (stereo-threshold/depth pedestal). The empirical measures of depth discrimination shown in Fig 32 indicate that the noise or variability of the absolute disparities subtended by the two targets is less than 5% over a range of disparity pedestals up to 0.5 degrees (Badcock & Schor, 1985) . Judgments of depth beyond this range of pedestals is clouded by the appearance of diplopia which can be overcome by jittering the magnitude of the pedestal (Blakemore, 1970) and interleaving crossed and uncrossed disparity pedestals (Siderov & Harwerth, 1993) . When these precautions are taken the stereo-threshold grows exponentially with disparity pedestal when the targets contain high spatial frequency components ( Fig 33). However when they are composed primarily of low spatial frequencies the stereo threshold tends to level off at pedestals greater than 0.5 degrees, even though the targets remain in the singleness or fusion range. Stereo-depth is perceived over a broader range of disparity pedestals with low-spatial frequency stimuli, however stereo-threshold becomes elevated as spatial frequency composition of targets is decreased below 2.5 cycles per degree, for targets located on or off of the horopter. Both the fall-off of sensitivity with disparity pedestal and the disparity range of quantitative stereo-depth indicate that different size-tuned channels process disparity differently.

Off-Horopter and Eccentric Depth Discrimination

The reduction of stereo acuity with off-horopter stimuli is not the result of retinal image eccentricity produced by the pedestals. Stereo-acuity is extremely robust with retinal eccentricity along the horopter or fixation plane. Thus the stereo-threshold is smaller at an eccentricity of 5 degrees than it is with a disparity pedestal of only 80 arc min centered in the midsagittal plane (Krekling, 1974) . Differences between stereo and vernier acuity described above are highlighted with tests of these two forms of hyper-acuity as a function of retinal eccentricity. Stereo-acuity remains constant with horizontal retinal eccentricities of up to 40 arc min whereas vernier acuity rises by nearly an order of magnitude over the same range of retinal eccentricities (Schor & Badcock, 1985) . Stereo-thresholds are elevated slightly with equivalent retinal eccentricities along the vertical meridian (McKee, 1983) . At larger horizontal eccentricities, stereo-thresholds increase gradually up to 6 degrees at an eccentricity of 8 degrees (Rawlings & Shipley, 1969) .

Spatial Interactions

Stereo-sensitivity can either be enhanced or reduced by nearby targets. The threshold for detecting depth corrugations of a surface such as the folds in a hanging curtain decreases with depth-modulation frequency (reciprocal of spacing between the folds) up to 0.3 cycles per degree where it is lowest (Tyler, 1975) . At depth-modulation frequencies lower than 0.3 cpd the threshold for stereopsis is elevataed and appears to be limited by a disparity gradient or minimal rate of change of depth/degree of target separation. At depth-modulation frequencies higher than 0.3 cpd, stereo-threshold is elevated as a result of depth averaging. Similar effects are seen with separations between point stimuli for depth (Westheimer and McKee, 1980; Westheimer, 1986 ; Westheimer and Levi, 1987).

When a few isolated targets are viewed foveally, changes in the binocular disparity of one introduces supra threshold depth changes or biases in others when their separation is less than 4 min arc. This depth attraction illustrates a pooling of disparity signals. When targets are separated by more than 4-6 arc min, the bias is in the opposite direction and features appear to repel one another in depth. Attraction and repulsion also occurs with cyclopean targets (Stevenson et al 1991), showing that they are not based simply on positional effects at the monocular level. The enhancement of depth differences by repulsion might be considered a depth-contrast phenomenon that is analogous to Mach bands in the luminance domain which are thought to result from lateral inhibitory interactions. Depth distortions that are analogous to Mach bands have been demonstrated with horizontal disparity variations between vertically displaced contours (Lunn & Morgan, 1996) , demonstrating analogous spatial interactions in the horizontal disparity domain and the luminance-contrast domain.

The Contrast Paradox

The visibility or luminance-contrast of targets also influences stereo-acuity. Cormack, Stevenson and Schor (1991) demonstrated that both correlation thresholds and stereo-acuity were degraded as contrast was reduced below 16 percent. Both thresholds were proportional to the square of the contrast reduction (e.g. reducing the contrast by a factor of two caused a four-fold elevation of the thresholds). This marked reduction of disparity sensitivity results from intrinsic noise in the disparity processing mechanism that exceeds the noise in low contrast stimuli.

Curiously, when the contrast of the two eye's stimuli is reduced unequally, stereo-acuity is reduced more than if the contrast of both targets is reduced equally (Halpern & Blake, 1988; Legge & Gu, 1989; Lit, 1959; Rady & Ishak, 1955; Schor & Heckman, 1989) ; (Cormack, Stevenson, & Landers, 1997) . Stereo acuity is reduced twice as much by a reduction of contrast in one eye than in both eyes. This contrast paradox only occurs for stereopsis and is not observed for other disparity based phenomenon including correlation detection thresholds (Cormack, Stevenson, & Landers, 1996) , Panum's fusional limits (Schor & Heckman, 1989) or transient disparity induced vergence (Schor, Edwards and Pope, 1997) . The contrast paradox occurs mainly for low spatial frequency stimuli (< 2.0 cycles/degree) (Cormack, et al., 1997; Halpern & Blake, 1988; Schor & Heckman, 1989) and it could result from a uncancelled noise sources in the two eyes monocular signals (Legge & Gu, 1989) , or from an interocular inhibitory process (Schor & Heckman, 1989; Kontsevich and Tyler, 1994) , or from a temporal asynchrony of the transmission time for the two eyes signals to the visual cortex (Howard & Rogers, 1995) .

Temporal Constraints

The duration and synchronization of monocular components of disparity signals greatly influence our stereo-sensitivity. While depth in a stereogram may be perceived with just a spark of illumination (Dove (1841) stereo-depth detection threshold decreases with exposure durations longer than 100 msec up to 3 seconds with line (Langlands, 1926) and random dot (Harwerth & Rawlings, 1977) patterns. Critical duration for perceiving depth in random dot stereograms can take several seconds in some individuals while depth perception is almost immediate with line stereograms for the same individuals. The prolonged exposures needed for random dot stereograms can be shortened by presenting monocular cues that guide vergence eye alignment with the depth plane of the stereogram (Kidd, Frisby, & Mayhew, 1979) suggesting that the extra time needed for sensing form and depth in the RDS results from the ambiguity of matches in the stereogram and the inability of the binocular system to resolve this ambiguity for disparities exceeding only one degree (Stevenson, Cormack, & Schor, 1994) . In effect, vergence must search blindly until the disparity of the correct match is reduced to less than one degree, within the operating range of the binocular cross-correlation process.

Stereo-acuity is optimal with simultaneous presentation of both eyes' stimuli. It remains high with interocular delays shorter than 25 msec (Ogle, 1964) or as long as visual persistence allows some overlap of the two monocular stimuli (Engel, 1970) . Optimal stereo-acuity with thresholds of 3 arc sec also requires simultaneous presentation of at least two targets that define the stereo-depth interval. When foveally viewed targets of different depth are presented sequentially, with no time gap between them, stereo-threshold can be elevated by an order of magnitude (22-52 arc sec) (Westheimer, 1979) . When time gaps are introduced between the two successive views that exceed 100 msec, the stereo-threshold increases dramatically to 0.5 degrees, presumably because of noise in the vergence system and a loss of memory of the vergence position of the first stimulus (Foley, 1976) . Interestingly, sequential stereopsis is approximately the same for wide spatial separations (10 degrees) between sequentially foveally-fixated targets and narrow ones (Enright, 1991) . This is surprising because the wide separation target requires saccades between the first and second target and these introduce a small time interval between the two targets equal to the duration of the saccade (approximately 50 msec) and one would also expect that vergence errors introduced by saccades would introduce a noise source that would elevate the sequential stereo-threshold. Enright reports that eye movements are also unstable with sequential stereo measures between adjacent stimuli and perhaps they are as variable as the disconjugacy of large saccades to widely spaced stimuli. It is also clear that eye movements are not entirely responsible for the reduction of sequential stereo acuity (Enright, 1991) and that perhaps temporal masking may contribute to the elevated sequential-stereo threshold. The small interval during the saccade between sequential stimuli might diminish the temporal masking between widely separated sequential stimuli compared to that associated with no time delay between adjacent sequential stimuli.

Upper Disparity Limit for Stereopsis:

The upper disparity limit (UDL) for stereopsis describes the maximum disparity at which a particular target can be perceived with stereo-depth. As with the lower disparity limit or stereo acuity, the upper limit is criterion dependent. It can describe the upper limit for the qualitative perception of depth that is typical of sustained stimuli or the more vague qualitative sense seen with diplopic transient images (Ogle, 1952) . The UDL also varies with several stimulus parameters including exposure duration (Blakemore, 1970; Ogle, 1952; Richards & Foley, 1971; Westheimer & Tanzman, 1956) , the spatial frequency and size of the stimulus (Hess & Wilcox, 1994; Schor & Wood, 1983; Schor, et al., 1984a) , and the spacing between depth targets (Schor, Bridgeman, & Tyler, 1983; Tyler, 1975) .

As described in the prior section, quantitative stereo-depth is optimal with sustained or prolonged viewing durations. When measured with broad band stimuli, the UDL for quantitative stereo ranges from 30-50 arc min (Schor and Wood, 1983; Schor, et al., 1984a) . However, when measured with narrow-band stimuli that present a limited range of spatial frequencies (eg. 1.75 octaves) the UDL for quantitative stereopsis increases with the spatial period of the stimulus when a fixed number of cycles of the pattern are used (Schor and Wood, 1983; Schor, et al., 1984a) The UDL ranges from 0.5 to 4 degrees as spatial frequency is reduced from 2 cpd to 0.1 cpd. If however, the size or envelope of the stimulus patch is varied independently of the number of cycles it contains, the upper disparity limit for sustained stereopsis varies primarily with stimulus size (envelope), and not stimulus spatial frequency content (Wilcox & Hess, 1995) . This result suggests possible linear or non-linear operations that could determine the UDL, such as luminance coding by binocular receptive fields of different sizes, or rectification of the luminance coded monocular images, prior to the derivation of the binocular disparity, respectively. Rectification or any power function would transform a narrow band limited stimulus into a low-pass broad band stimulus such that stereopsis could be derived between dichoptic stimuli composed of different textures or spatial frequencies.

The UDL of the sustained stereo system also varies with target separation. The UDL increases linearly with the vertical separation of targets whose depth is discriminated (Tyler, 1973) . This is similar to the upper disparity gradient limit reported for Panum's fusional area reported by Burt and Julesz (1980). The dependence of UDL on target separation undoubtedly serves the same function which is to reduce the likelihood of false matches that are often associated with abrupt changes in disparity between nearby features.

Transient Stereopsis

When the UDL is measured with transient or brief stimuli with durations of less than 0.5 seconds the UDL increases dramatically to more than 10 degrees of disparity (Blakemore, 1970; Richards & Foley, 1971; Westheimer & Tanzman, 1956) . The direction of depth stimulated with these large disparities is easily discriminated but the magnitude is vague or qualitative (Ogle, 1952). Transient stereopsis does not appear to have the high degree of spatial selectivity that is renown for sustained stereopsis. Large differences in shape or orientation between dichoptic stimuli can be tolerated by the transient stereo system (Mitchell, 1970) and vergence eye movements can be initiated by these shape-disparate stimuli (Jones & Kerr, 1972) . However the transient stereo system does have some degree of spatial selectivity. When tested with briefly presented (120 msec) narrow band Gabor patches, transient stereo responds to large 6 degree disparities subtended by matched spatial frequencies presented to the two eyes. The visibility of the stereo-depth decreases as spatial frequency is increased from 0.5 cpd to 5 cpd and the response is markedly attenuated with half octave interocular differences in spatial frequency or contrast (contrast paradox) (Edwards, Pope, Graf and Schor, 1997). Mixing a low and a high spatial frequency results in a performance level that is lower than that obtained with paired high frequencies. Thus transient disparities could be encoded by a single broad-band low-pass filter. The reduced performance with mixed spatial frequencies could result from the contrast paradox response to unequal strengths of different spatial freqeuncies when encoded by a single low-pass channel (Edwards et al 1997).

Narrow band Gabor patches with large interocular differences in spatial frequency (1 and 5 cycles/deg) can also initiate transient vergence responses (Schor, Edwards, & Pope 1997) . Unlike the stereo system, lowering the spatial frequency of either eye's stimulus facilitates vergence and mixed low and high contrasts produce the same result as when contrasts are matched at the high level. These observations suggest that there is a single broad band spatial channel that encodes disparity for initiating vergence eye movements and that vergence is not affected by a contrast paradox (Schor, Edwards and Pope, 1997).

Occlusion Stereopsis

Prior to Wheatstone's hallmark publication in 1838, depth perception was thought to be based on monocular vision. Euclid (300 BC), Galen (175) and Leonardo da Vinci (1452) all noted that monocular information resulting from partial occlusion of the visual field, could be used as a strong depth cue. Each eye has a unique view of the background on the temporal side of an occluder and these monocular views provide depth information (Figure 34). Usually, these monocular zones are regarded as problematic for solving the correspondence problem rather than as a useful source of information for the extraction of depth and direction. Matching the two eye's images requires judgments about what images are seen monocularly and binocularly. Most matching theories attempt to explain how the binocular images are matched and the monocular images are considered as a secondary problem. However, recent results show that the monocular images result in depth percepts similar to those produced by binocular disparity cues to stereopsis (Shimojo and Nakayama 1990;1992, Nakayama and Shimojo, 1990, Liu, Stevenson and Schor 1994b, and Anderson 1994) (Figure 35). Thus the matching process needs to segregate monocular and binocular images to utilize these two different sources of depth information.

Discriminating Between Monocular and Binocular Features

Computationally, occlusion geometry is consistent with an abrupt depth discontinuity (Gillam and Borsting, 1988). Monocular features could be identified by searching in the vicinity of depth discontinuities or steep disparity gradients between binocular images. Conversely, depth discontinuities could be searched for in regions containing monocular images. The monocular region of the partially occluded background would be surrounded by binocular regions of the occluder and background and two adjacent left-right eye monocular regions would never occur in natural viewing (Geiger, Landendorf and Yuille, 1995). In addition, a partially occluded region would always be seen on the temporalward side of the binocularly viewed occluding surface (Shimojo and Nakayama 1990).

Occlusion Geometry

The visual system is very adept at discriminating between monocular regions that meet and violate these geometric constraints. For example figure 36 illustrates a display that presents a binocularly viewed occluding disk whose image in one eye is surrounded on the left and right by monocular regions which are either geometric-valid or geometric-invalid occluders. Valid monocular regions are seen on the temporal side of the occluder and invalid monocular regions are seen on the nasal side of the occluder. When the stereogram is fused, a stable background is seen in the geometric-valid temporal-monocular region and rivalry occurs between the background and geometric-invalid nasal-monocular region.

Depth Ambiguity

Just as retinal image disparity provides insufficient information to make quantitative judgments of distance, shape and orientation of surfaces, the monocular image provides incomplete or ambiguous information regarding the depth-magnitude of targets in space. This problem is similar to the correspondence problem presented by Marr and Poggio (1979). The monocular image could be of a target lying at many possible depths and directions in space. Although the depth magnitude is ambiguous, it is not unlimited. It is bounded by the line of sight of the occluded eye and depth of the monocular image appears to lie at this boundary (Nakayama and Shimojo, 1990). Other solutions can be reached by constraining factors such as the proximity to other targets in the visual field (Gogel, 1965; Hakkinen and Nyman, 1996), vergence and version state of the eyes (Hering, 1861; Howard and Ohmi, 1992; Howard and Rogers, 1995), strength of monocular cues to perceived distance (Helmholtz,1909), sharpness of the occluder and background texture and their boundary (Marshall et al 1996), and the hemi-retinal location of the image (Hering, 1861; Kaye, 1978; Harris and McKee, 1996). Figure 35 is a unique demonstration of quantitative variations of depth resulting from occlusion cues because it lacks a reference binocular disparity of both the occluder and the background and it also lacks any of the above mentioned constraints. It is possible that binocular disparities of the foreground and background are derived from matches between dissimilar shapes seen by the two eyes that share a subset of common features (Liu, Stevenson and Schor, 1996). However if conventional positional disparity detectors are responsible for extracting depth in partially occluded scenes, then this kind of stimulus should be equally efficient in driving vergence eye movements in both horizontal and vertical directions since both horizontal and vertical vergence show robust tracking responses to conventional positional disparities. When vergence responses were compared between conventional disparities and horizontal and vertical occlusion cues, Liu et al (1995) found that vertical occlusion failed to drive vertical vergence whereas horizontal occlusion did drive horizontal vergence. These results indicate that depth-from-occlusion, or da Vince stereopsis (Nakayama and Shimojo, 1990) may play an important role in depth perception without corresponding binocular features.

Binocular Suppression

In a complex 3-D scene, binocular information can be described in three categories. Some information is coherent, such as images formed within Panum's fusional areas, some is fragmented, such as partially occluded regions of space resulting in visibility to only one eye, and some is uncorrelated information that is either ambiguous or in conflict with other information, such as the superposition of separate diplopic images arising from objects seen by both eyes behind or in front of the plane of fixation. One objective of the visual system is to preserve as much information from all three sources as possible to make inferences about objective space without introducing ambiguity or confusion of space perception. In some circumstances conflicts between the two eyes are so great that conflicting percepts are seen alternately or in some cases one image is permanently suppressed. For example, the two ocular images may have unequal clarity or blur such as in asymmetric convergence, or large unfusable disparities originating from targets behind or in front of the fixation plane may appear overlapped with other large diplopic images. In the latter case the matching problem is exacerbated, particularly for transient stereopsis and vergence which do not have selective feature specificity (Schor, Edwards and Pope, 1997). The former case is dispensed with by permanent suppression of the blurred image while the latter condition is resolved with alternating views. As demonstrated above in Fig 36, occlusion geometry may constrain the solution to one of these two outcomes.

Four classes of stimuli evoke what appear to be different interocular suppression mechanisms. These include 1) unequal contrast or blur of the two retinal images which causes interocular blur suppression, 2) physiologically diplopic images of targets in front or behind the singleness horopter which result in suspension of one of the redundant images (Cline, Hofstetter and Griffin 1989), 3) targets of different shape presented in identical visual directions which cause an alternating appearance of the two images referred to as either binocular retinal or percept rivalry suppression and 4) partial occluders that obstruct the view of one eye such that the background is seen by the unoccluded eye and the overlapping region of the occluder is permanently suppressed.

Interocular Blur Suppression

There are a wide variety of natural conditions that present the eyes with unequal image contrast. These include naturally occurring anisometropia, unequal amplitudes of accommodation, and asymmetric convergence on targets that are closer to one eye than the other. This blur can be eliminated in part by a limited degree of differential accommodation of the two eyes (Maran and Schor, 1997) and by interocular suppression of the blur. The latter mechanism is particularly helpful for a type of contact lens patient who can no longer accommodate (presbyopes) and prefer to wear a near contact lens correction over one eye and a far correction over the other (Monovision) rather than wearing bifocal spectacles. For most people, all of these conditions result in clear, non-blurred, binocular percepts with a retention of stereopsis (Schor, Landsman and Erickson, 1987) albeit with the stereo-threshold elevated by approximately a factor of two (see section above on unequal contrast and stereoacuity). Interocular blur suppression is reduced for high contrast targets composed of high spatial frequencies (Schor, Landsman and Erickson, 1987). There is an interaction between interocular blur suppression and binocular rivalry suppression. Measures of binocular rivalry reveal a form of eye dominance defined as the eye that is suppressed least when viewing dichoptic forms of different shape. When the dominant eye for rivalry and aiming or sighting is the same, interocular suppression is more effective than when there is crossed sighting and rivalry dominance (Collins and Goode, 1994).


While binocular alignment of the eyes provides us with the opportunity to derive depth information from binocular disparities near the horopter, it has the disadvantage of producing large disparities for objects far in front and behind the plane of fixation. These disparities contribute weakly to sustained depth perception and they introduce errors in perceived direction. Even thought these disparities are well beyond the limits of Panum's fusional areas, they rarely evoke the perception of diplopia under normal casual viewing conditions. However their suppression is not obligatory and physiological diplopia can be evoked by calling attention to the disparate target. The suppression of physiological diplopia is referred to as suspension (Cline, Hofstetter and Griffin 1989) because suppression does not alternate between the two images, but rather only one image is continually suppressed leaving the target visible in the temporal visual field or the image formed on the nasal hemi-retina (Kollner, 1914; Crovitz and Lipscomb, 1963; Fahle, 1987). This mechanism may be involved in the permanent suppression of pathological diplopia in the deviating eye of individuals with strabismus (Schor, 1977; Schor, 1978; Fahle, 1987; Harrad, 1996).

Binocular Retinal Rivalry

Binocular rivalry is stimulated by non-fusable or uncorrelated ocular images that are formed in the vicinity of corresponding retinal regions such that they appear in identical visual directions. For example when fusing two orthogonal gratings shown in Fig 37, rivalry suppression takes on several forms. At times only one set or lines is seen and after several seconds the image of the other set of lines appears to wash over the first. At other times the two monocular images become fragmented into small interwoven retinal patches from each eye that alternate independently of one another. In the latter case, suppression is regional and localized to the vicinity of the contour intersections. The spread of suppression is demonstrated by the Helmholtz cross figure which when fused produces a halo percept about the points of intersection of the vertical and horizontal line (Fig 38). Following a technique developed by Kaufman (1963), Liu and Schor (1994) presented two parallel band-pass spatially filtered lines to one eye and a single orthogonal line with the same spatial filtering to the other eye. The separation of the two parallel lines was increased to measure the largest gap that would totally suppress the overlapped region of the single line. Both the vertical and horizontal extents of the suppression zone increase with the spatial period or size of the lines and the resulting suppression areas shown in Fig 39 were much greater than Panum's fusional area measured with the same targets. The difference in size of rival zones and Panum's area suggests that fusion is not likely to result from alternate suppression of the two ocular images as proposed by Verhoeff (1935) and others. The zone of rivalry also increases with image contrast up to about 30 percent. As image contrasts is lowered to near threshold values, rivalry ceases and there is complete summation of the two ocular images (Fig 40) (Liu Tyler and Schor, 1992).

The rivalrous patches alternate between the two ocular images approximately once every 4 seconds. The rate of oscillation and its duty cycle vary with the degree of difference between the two ocular images and the stimulus strength. The rate of rivalry increases as the orientation difference increases beyond 22 degrees between the targets (Schor 1977) indicating that rivalry is not likely to occur within cortical orientation columns. Levelt (1968) has formulated a series of rules describing how the on and off phases of rivalry vary with stimulus strength variables such as brightness, contrast and motion. His basic observation is that the duration that a stimulus is suppressed decreases as its visibility or strength increases. If the strength of both eyes stimuli is increased, the off-time is decreased for both eyes, and the rate of rivalry increases. Rivalry has a latency of approximately 200 msec so that briefly presented non-fusable patterns appear superimposed (Hering 1920), however rivalry occurs between dichoptic patterns that are alternated rapidly at 7 Hz or faster, indicating an integration time for at least 150 msec (Wolfe 1983).

When rivalrous and fusable stimuli are presented simultaneously, fusion takes precedence over rivalry (Blake and Boothroyd, 1985), and the onset of a fusable target can terminate suppression although the fusion mechanism takes time (150-200 msec.) to become fully operational (Wolfe, 1986; Harrad et al 1994). While suppression and stereo-fusion appear to be mutually exclusive outcomes of binocular stimuli presented in a given retinal location (Timney et al 1989; Blake and O'Shea, 1988), it is possible to perceive them simultaneously when their respective stimuli are presented in different spatial frequency bands (Julesz and Miller1975), with different shaped targets (Fig 41) (Ogle and Wakefield, 1967; Wolfe, 1986) or with unequal contrast produced by monocular blur (Schor, Landsman and Erickson, 1987).

Binocular Percept Rivalry

Most theories of binocular rivalry model it as a competition between monocular images (see review by Fox, 1991). However it is not clear if it is the retinal images or the ocular percepts that rival. As described above, in conventional rivalry between orthogonal grids, such as shown in Figure 38, rivalrous images can appear as coherent alternations of each eye's view or mixed complementary fragments of the two images. Both of these percepts may occur in response to a natural condition in which images of objects that lie in separate directions appear in the same visual direction, such as occurs when viewing extended objects lying behind the fixation plane. Under these conditions, the overlap of the disparate image components presents a conflict or "confusion" to the visual system that is solved by taking sequential-rivalrous views of the two images that appear in the same visual direction. When each eye's image can be perceived as a separate coherent form, the full ocular images can alternate or rival (Levelt, 1968; Walker, 1975). When the monocular images appear fragmented, they can rival as a piecemeal alternation between the fragments. Factors that determine image coherency vs. fragmentation include contour continuity, orientation, texture, color and depth (Nakayama et al 1989; Treisman and Gelade, 1980; Crick 1984). Thus depending on the composition of the two ocular images, it is possible for either whole images or fragmented images to rival.

The rivalry could either be guided by eye-of-origin information or by perceptual grouped attributes. These two factors can be uncorrelated by presenting a complementary interocular mixture or patchwork of two coherent images to the two eyes. (Le Grand, 1967; Kovacs et al, 1997). Free fusion of the two half-images in figure 42 illustrates the same rivalry percept in the conventional stimulus at the bottom and mixed half-images of the top pair. In both cases, rivalry only affects parts of each ocular image in order to reconstruct a coherent figure such that neither a monocular image nor one related to a particular cerebral hemisphere is dominant. If rivalry resulted exclusively from a competition between eyes, then the upper image pair would always appear piecemeal, however it is clear that rivalry is between percepts, and not the individual retinal images. As discussed by Kovacs et al (1997) the illustration demonstrates that perceptual grouping can be derived interocularly as long as precise vergence aligns coherent image components on corresponding retinal points. There are also periods where the images do appear as fragmented combinations of the two percepts, especially when vergence becomes inaccurate indicating that eye-of-origine information also influences perceptual dominance.

Interocular grouping of dichoptic images has also been demonstrated with rivalry for contour (Le Grand, 1967; Whittle, Bloor and Pocock, 1968), for motion (Beusmans, 1996) and for color (Treisman, 1962; Kulikowski,1992). Blurring edges of the monocular image fragments facilitates the demonstration (Kovacs et al, 1997) by increasing the size of potentially binocularly fragmented rival zones or patches (Liu and Schor, 1994) which reduces the likelihood that they will disrupt the larger image percepts.

Logothetis et al, (1996) presents additional evidence that supports the existence of perceptual rivalry. During the 2-3 second periods of on-time or visibility of one image during rivalry between orthogonal grids, such as shown in Fig 38, interchanging the two ocular images every 330 msec did not alter longer dominance phase of a particular image orientation. Furthermore, increasing the contrast to one image orientation decreased its off-time even when it was rapidly switched between eyes. Thus the rivalry suppression followed the image percept, rather than a particular eye.

Both piecemeal and whole image rivalry could both result from a perceptual competition which could occur after the point of binocular combination in the striate cortex as is suggested by the phenomenon of monocular rivalry in which orthogonal components of a grid seen by one eye appear to alternate in time (Breese, 1899; Bradley and Schor, 1988). There could also be at least two rivalry mechanisms in which dominance was determined by either eye-of-origin information or perceptual grouping. However rivalry that is based on eye of origin can not explain the results presented in studies of whole percept rivalry.

Permanent-Occlusion Suppression:

There are many occasions in which the two eyes see dissimilar forms as a result of a partial occluder in the near visual field. Most often these are encountered when one eye's view of a distal target is partially obstructed by a nearby occluder such as our own nose or the condition of viewing distal objects through a narrow aperture. Under these ecological or naturally occurring conditions we tend to consistently suppress the occluder and retain a constant view of the background. A striking demonstration of this is to hold a cylindrical tube before the right eye and face the palm of you hand before the left eye near the end of the tube. The combined stable percept is a hole in the hand. The hand is seen as the region surrounding the aperture through which the background is viewed. This is a generic or ecologically valid example of occlusion which gives priority to the background seen through the aperture as described above in the section of occlusion stereopsis. This stable or permanent suppression of the center of the hand is unlike rivalry suppression in that it produces different changes in the increment-threshold spectral sensitivity function ( Ridder et al 1992). Other natural conditions such as the overlap of diplopic images of several objects behind the plane of fixation do not conform to partial occlusion geometry and they do undergo rivalry. The dominance of the background may be influenced by low-pass blur of the out-of-focus occluder and the relative brightness of the background compared to the occluder. It remains to be seen what other depth ordering information, in addition to overlap, might result in permanent as opposed to rivalry suppression.

Schor Lab home | UC Berkeley | School of Optometry