We limit the review to contours in two dimensions. Such contours are called planar curves since they lie on a two dimensional plane bounding a two dimensional region.
Planar Curves
Curves on a plane either have an explicit representation on a Cartesian plane, a parametric form in u-v space, or an implicit version that models the space occupied by the curve.
If a point is defined as p(u), the curve’s curvature can be mathematically given by the second derivative, . Given that a single variable, u, is used in the definition of the curvature, this is also known as the 1D definition of the contour. However, can one define curvature spatially and geometrically given only a set of discrete points?
Geometrically and spatially, the curvature is the inverse of the radius of the osculating circle. Given three points that lie along a curve,
, p(u),
, the curve that passes through these three points can be used to uniquely determine a circle. If
approaches 0, then the circle is called the osculating circle. If we let a line connect the center of the circle to the normal associated with the curve at the point p(u), then the radius, so formed, is called the radius of the osculating circle. The curvature is the inverse of this radius. See
Figure 1.
How does the human visual system form contours from everyday images on the retina? Is the system best understood as a mathematical definition of interconnected segments? Or is it a higher level cognitive process that builds a contour from curved or straight segments that cannot be represented in a very well defined formula?
When the eye sees a three dimensional object, the object projects onto a 2D plane that is normal to the eye with its shape living in a 2D region bounded by a 1D contour. See
Figure 2. This definition of the contour is also mentioned in the earlier works from Koenderink [
7] (p. 172).
V1
Hubel and Wiesel [
8,
9] in their seminal papers show the presence of excitatory and inhibitory regions in the cells in the striate cortex. While the presence of concentric cells that have agnostic excitatory and inhibitory regions for light in the Lateral Geniculate Nucleus (LGN) was established by Kuffler [
10], Hubel and Wiesel identified the physiology of cells that performed edge detection and line detection.
Hubel and Wiesel [
8,
9] showed that cells fire based on angle and direction of motion of the right kind of visual stimuli that is presented on the retina.
For simple cells, if light fell on the excitatory region, the cell would fire more than its threshold firing rate. If light fell on the inhibitory region, the cell would fire less than its threshold firing rate. A combination of inhibitory, excitatory and inhibitory region or a combination of excitatory,inhibition and excitatory region constituted the physiology of a line detector. See
Figure 3b,c for an edge detector and a line detector simple cell. Simple cells may show a variation in the size of their receptive fields. Complex cells are motion sensitive and their receptive fields fire on moving lines and edges when they move in a particular direction of motion.
Simple and complex cells perform one-dimensional image operations such as edge operators. Two-dimensional operations such as contour detection is achieved by end stopped cells, which we will discuss under V2.
End stopped cells offer line-ends, corner and curve segment detection. Hubel and Weisel’s study concerns how the area of the receptive fields of simple and complex cells changes with distance from the area centralis. This means as we move away from the fovea, the receptive fields grow larger. Psychophysical modelling of V1 cells is done using variants of the Gabor function. A Gabor function is a directional edge detector, modelled by a Gaussian kernel. One-dimensional feature extraction occurs via simple and complex cells [
9]. As a result, Gabor functions are quite often used to extract perceptually continous path features that would be extracted using the physiology of V1 cells, keeping the eccentricity [
1].
V2
Neurophysiological studies indicate that very early curved segments are detected in the region called V2. Length wise integration helps build a definition for orientation but it needs suppression in the orthogonal direction to capture the definition of a curve. This is done through end stopped cells [
12,
13,
15,
16]. With appropriate receptive fields that are small or oriented, they become selective for short line ends and corners.
Figure 4c shows the positioning of receptive fields that allows for very simple edges to be detected in V1 and more contoured representations in V4.
Heitger et al. [
14] build a model for contour perception based on two principles. The first principle detects the linear contrast border. The second linearly aggregates occlusion features using the concept of end-stopped cells in the cortex. The image is first convolved with even and odd symmetrical orientation filters. The original odd and even Gabor functions [
17] are gradually turned into zero mean functions by decreasing the frequency of the sine and cosine terms away from the center of the envelope. The filter outputs are combined to give an energy term. The energy term is then differentiated along the respective orientation of the single and double end stopped operator and the maxima computed. The maxima represents curvature by defining inflection points such as seen in strong curvatures or corners.
Hedge and Van Essen studied responses to 128 gratings and geometric line stimuli that showed variations in shape characteristic and complexity. Stimuli were oriented bars, sinusoidal bars, angles, arcs, intersecting lines and hyperbolic and polar gratings. The stimuli were organized in ten stimuli classes: (1) bars (2) three way intersection (3) cross (4) five and six-armed stars (5) acute angles (6) right angles (7) obtuse angles (8) quarter arcs (9) semi-circles (10) three-quarter arcs. The experiment asked the following questions: (1) Did V2 cells prefer simple or complex shapes? (2) Is the shape characteristic captured by orientation, size and spatial frequency alone? (3) What is the distribution of V2 cells that encode for the characteristic? One set of V2 cells showed narrow shape selectivity and were particularly selective to both geometric shape and orientation taken together. These cells do not fire for the component features such as orientation or the shape on its own.
For example, if the cell fired for a right angle, oriented in a particular direction, it would not fire for other right angled stimuli; oriented in other directions. Intersections containing right angles would not fire. The preference for acute angles is important from a perceptual perspective since they are often present in corners and occluding contours. Another set of V2 cells showed preferences for arcs and circle but only if they are significantly large. V2 showed a preference for a broad set of large curved contours. See
Figure 6b.
Grating stimuli, rather than simple shape stimuli, were found to be more effective in V2 than V4. The V2 area showed responsiveness even with modulated complex shape characteristics.The fact that V2 is able to represent complex shape information is remarkable considering this area is not too far removed from V1. An example of the stimuli presented to V2 cells is shown below.
Receptive fields increase across the hierarchies as the feed forward connections intensify anatomically [
20]. V4 had a marked affinity to large or small right angles at
orientation whereas V2 showed a preference to large angles but at different orientations [
19]. V2 preferred larger arcs but no intersections and this was markedly different from V4. The most effective stimuli for both groups are shown in red. See
Figure 6.
An interesting question that arises is how the selectivity from V2 passes over to V4 for further shape construction if the stimuli are not constituents in V2?
At any particular eccentricity, the average receptive field diameters double between V1 and V2 and again between V2 and V4 [
25,
30]. Likewise, a preference for the same shape selectivity at multiple hierarchies does not imply redundancy across the hierarchies. The selectivity for complex contour shapes in addition to arc and circles [
19], can be explained by possible lateral connections or top-down information flow [
31] or from the organization of the receptive field itself [
21,
32] which allows for variations in stimuli preference and long range horizontal connections in V1.
Figure 7 demonstrates how a set of neighbour receptive fields can be non-overlapping with a preference for orthogonal orientation and yet their membership in the same cortical column may allow for a stimuli falling outside their receptive field to influence them. These influences are often termed long range horizontal connections. These connections allow for perceptual contour closure even when there are gaps in between the segments.
There is no precise topological configuration that perfectly explains shape processing in the ventral stream. Perceptual closure depends anatomically on the cortical column and the retinal field of view can be probed further for location dependent contour formation. Psychophysical experiments can help understand this curvature formation at differing positions of the visual field.
V4
V4 is the biological structure responsible for putting together the input that feeds into the IT, the inferior temporal cortex, which then performs the task of object recognition.
V4 has been experimentally shown to display a marked preference for contour features such as angles and curves that point in a particular direction. V4 prefers convexity in curves over concavity in curves [
26] and shows multiplication in curvature processing [
34,
35].
Pasupathy and Connor [
2,
27,
28] designed a large set of contour features, curves and angles, and recorded responses from 152 V4 cells in awake macaques. A small set of those stimuli is presented below in (a), (b) and (c). The stimulus presented a single contour feature like an angle or a straight edge. The experiment found many V4 cells ,even while selective for complex features, were also selective for their low-order constituent contour features like angles and straight edges. Interestingly, this is in contrast to what was suggested by [
20] where the stimuli that showed great neuronal selectivity did not have more complex superset stimuli fire for the same features.
Stimuli were rendered as a function of three variables: convexity, curvature and acuteness. Convexity was represented either as convex projections, concave indentations or outlines. Curvature was represented as sharp angles or smooth B-spline approximations to that outline. Acuteness was represented as angles : 45
,90
,135
or 180
. The stimuli was a function of the above three parameters, presented at the center of the receptive field (RF), which was estimated to be approximately
based on the receptive field specifications studied by Gattas [
30].
In the set shown in
Figure 8 (a,b,c), the angle featured was 90
. The angle pointed towards the right but was either filled in for the convex representation, hollowed in for the concave presentation or was a mere outline. The stimuli were white and presented against a dark grey background.
The responses showed that there was a bias for convexity rather than concavity and the response was strongest when the convex feature was oriented between 135
and 180
. Sharp features were preferred over smooth features and acute features were preferred over obtuse features. See
Figure 9. The stimulus set suggests that angular acuity is a different measurable quantity as compared to line and edge orientation because certain line and edges that would not normally fire do fire when they are a part of the curvature is confirmed by several psychological studies [
26,
27,
28,
36,
37].
Pasupathy and Connor further extended the set by combining convex and concave boundary elements into closed shapes [
27,
28]. Tuning for a particular contour feature was captured by Gaussian functions operating on a curvature and position domain.
The stimuli had four parameters that uniquely defined it : curvature, orientation, angular position and radial position. The curvature-based tuning function fit at two or three curvature values suggesting a parts based approach of shape selectivity in V4 with a preference for acute convex or concave curvature and a convex angle next to a concave curve.
A single one dimensional Gaussian function with peak
and standard deviation,
, recorded response to a single characteristic of the stimuli. The response function was fit by Gaussian products that recorded individual characteristics.
Each stimulus was described by p points in an n-dimensional space with k being the amplitude of the n-dimensional Gaussian.
Yau et al. [
35] argue that simultaneous multi-orientation inputs trigger recurrent neural networks that synthesize curvature. Without the recurrent network in place, the original line segments themselves do not excite a non-linear threshold model. Line segments at orientations 45
, 90
and 135
, were used and combined with B-spline approximations to form new curved segments.
Figure 11.
[
33](p. 2) (a) Orientation inputs that fail to initiate recurrent network process for curvature synthesis (b) Orientation input, administered simultaneously, initiate recurrent network processes and generate a tuning curve for orientation components as seen in (c).
Figure 11.
[
33](p. 2) (a) Orientation inputs that fail to initiate recurrent network process for curvature synthesis (b) Orientation input, administered simultaneously, initiate recurrent network processes and generate a tuning curve for orientation components as seen in (c).
El-Shamayleh and Pasupathy [
29] show that the V4 neurons are scale-invariant by using differing scales of stimuli that have the same normalized curvature. Normalized curvature is described as the rate of change in tangent angle per unit of angular length. Absolute curvature is described as the rate of change in tangent angle with respect to contour length. When the shape scales up or down in relationship with object size, the normalized curvature remains the same while the absolute curvature changes. Most neurons maintained their selectivity for shape across size changes using normalized curvature as an explanation. A small proportion showed a shift in selectivity for shape when the object size changed.
Figure 12.
[
29] (p. 2) (a) Absolute curvature (b) Normalized curvature.
Figure 12.
[
29] (p. 2) (a) Absolute curvature (b) Normalized curvature.
In earlier papers, Pasupathy and Connor [
26,
28] show the selectivity of V4 neurons for certain local contour curvature. El-Shamayleh and Pasupathy [
29] extended the experiments by presenting the stimuli at different scales within its receptive field. Scale differentiates the normalized and absolute curvature definitions and the responses from their experiment show that V4 encodes objects in a size invariant manner and the normalized curvature definition better explains the responses observed in monkeys when comparing against the model that encodes normalized curvature.
An important feature of stimuli construction is the identification and boundary estimation of the receptive field. In Gattass et al. [
30], the diameter of the receptive field in the V4 region is defined as
Carlson et al. [
38] provide a synthetic evolutionary model that emulates sparse coding. The model adds control points during each generation, building new stimuli from Bezier splines, and generates strong population neural responses in V4 as it progresses. The results from the simulations are compared to the neural recordings from 165 V4 cells in monkeys. Higher responses are recorded in V4 towards contours containing acute convex or acute concave curvature. This preference for acuteness becomes defined as the sparseness constraints increase. The stimuli set is shown below.
Figure 13.
[
38] (a) Set of stimulus constructed using incrementally added bezier splines. The left shows the neural spike responses for first generation stimuli. The right shows the neural spike responses from seventh generation stimuli.
Figure 13.
[
38] (a) Set of stimulus constructed using incrementally added bezier splines. The left shows the neural spike responses for first generation stimuli. The right shows the neural spike responses from seventh generation stimuli.
Summarazing, the neural spikes, if representing linearly independent traits, can be combined together through a Gaussian product. The response can be made to fit an artificial neural network. However, state of the art synthetic neural networks are often too complex to fit weights to. We often study the responses from these systems rather than designing them from scratch.