Looking Into Pictures
Looking Into Pictures
edited by
Heiko Hecht
Robert Schwartz
Margaret Atherton
A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
© 2003 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any
electronic or mechanical means (including photocopying, recording, or informa-
tion storage and retrieval) without permission in writing from the publisher.
This book was set in Times New Roman on Quark by Asco Typesetters, Hong
Kong.
Printed and bound in the United States of America.
10 9 8 7 6 5 4 3 2 1
Contents
Acknowledgments ix
Introduction xi
PART I
THE DUAL NATURE OF PICTURE PERCEPTION 1
Chapter 1
In Defense of Seeing-In 3
Richard Wollheim
Chapter 2
Conjoint Representations and the Mental Capacity for Multiple Simultaneous
Perspectives 17
Rainer Mausfeld
Chapter 3
Relating Direct and Indirect Perception of Spatial Layout 61
H. A. Sedgwick
Chapter 4
The Dual Nature of Picture Perception: A Challenge to Current General Accounts
of Visual Perception 77
Reinhard Niederée and Dieter Heyer
Chapter 5
Perceptual Strategies and Pictorial Content 99
Mark Rollins
vi Contents
PART II
THE STATUS OF PERSPECTIVE 123
Chapter 6
Optical Laws or Symbolic Rules? The Dual Nature of Pictorial Systems 125
John Willats
Chapter 7
Perspective, Convention, and Compromise 145
Robert Hopkins
Chapter 8
Resemblance Reconceived 167
Klaus Sachs-Hombach
Chapter 9
What You See Is What You Get: The Problems of Linear Perspective 179
Klaus Rehkämper
Chapter 10
Pictures of Perspective: Theory or Therapy? 191
Patrick Maynard
PART III
THE NATURE AND STRUCTURE OF RECONCEIVED PICTORIAL
SPACE 213
Chapter 11
Reconceiving Perceptual Space 215
James E. Cutting
Chapter 12
Pictorial Space 239
Jan J. Koenderink and Andrea J. van Doorn
Chapter 13
Truth and Meaning in Pictorial Space 301
Sheena Rogers
vii Contents
Chapter 14
Line and Borders of Surfaces: Grouping and Foreshortening 321
John M. Kennedy, Igor Juricevic, and Juan Bai
Chapter 15
Irreconcilable Views 355
Hermann Kalkofen
References 379
Contributors 405
Index 407
This page intentionally left blank
Acknowledgments
Once Kepler correctly characterized how light reflected from objects pro-
jects a pattern of rays on the retina, a close relation between pictorial space
and perceived physical space became a guiding presupposition of most
work in the theory of vision. Given that the stimulus for vision is an
image, a picture of the world the light rays “paint” on the retina, percep-
tion of the spatial layout must somehow depend on processing this image.
We see the world through, or on the basis of, “picture images.” The link
between pictorial and physical space became even tighter as a result of the
Renaissance development of systems for perspective rendering. Pictures
constructed according to the rules of perspective seemed to mirror, and
hence resemble, what they depicted. In the case of trompe l’oeil, viewers
could not even tell the difference between the two. Perspective pictures
were thought, therefore, to provide a correct, perhaps the only correct,
way to represent physical space pictorially.
If any of this was in doubt, the subsequent invention of the camera
clinched the case for allying pictorial and perceived space. Photographs,
it was widely thought, represent space as it “really” is. Moreover, the
similarities between the eye and the camera are striking. As the camera
focuses light through a pinhole and lens onto a flat piece of film, so the
retinal image is a projection of the world through a lens and pinhole (iris)
onto the two-dimensional retinal surface. In both cases the projections
follow the path of linear perspective. Thus, the “eye as camera” metaphor
was born, and it came to dominate and set the problematic for the study
of perception. As influentially formulated by Hermann von Helmholtz
(1894) in the late nineteenth century, the goal of perceptual psychology
was to find the rules of inference our visual system uses to derive a unique
3-D interpretation from the 2-D retinal image.
xii Introduction
However one settles the theoretical issues in parts I and II, there remains
the practical problem of applying the answers in the conduct of experi-
mental research. Alternative conceptions of pictorial representation and
pictorial space structure the problem space, the evidence sought, and the
methods of collecting, measuring, and interpreting the data. Just how
these matters play out will, of course, vary somewhat with the topic of
concern (e.g., size, shape, and distance recognition). The chapters in part
III explore several core empirical issues against the background of con-
cerns discussed in the previous parts.
James Cutting’s chapter, chapter 11, provides a transition between parts
II and III. Cutting proposes solutions to a variety of theoretical questions
about perception and perspective. He suggests that all perceived space has
an inherent ordinal metric, thereby reuniting the dual aspects of pictures
and creating an elegant synthesis of direct and indirect perception. By
distinguishing between close personal space and a farther vista space,
Cutting puts pictures in their (vista) place, which amounts to a perceiver
mode where would-be spatial distortions are no longer an issue. He also
offers us a new explanation of why line drawings work as well as they do.
In chapter 12, Jan Koenderink and Andrea van Doorn offer an in-depth
analysis of the problem of measuring pictorial space, exploring along the
way why common alternative attempts to measure a space that is non-
physical have gone wrong. Their own nonreductionist approach is uncon-
ventional and constructive. Koenderink and van Doorn first discuss their
xviii Introduction
We hope that the reader will not only enjoy the diverse discussion of pic-
torial space found in this collection, but that the volume’s chapters will be
an inspiration to continue the theoretical and empirical work advanced.
PART I
THE DUAL NATURE OF PICTURE PERCEPTION
This page intentionally left blank
Chapter 1
In Defense of Seeing-In
Richard Wollheim
Figure 1.1
Edouard Manet, Emilie Ambre. Philadelphia Museum. See plate 1 for color
version.
Figure 1.2
Aaron Siskind, Chicago 1948.
Figure 1.3
Alexandre I. Leroy de Barde, Reunion of foreign birds placed within different cases.
Nineteenth-century watercolor. Inv.: 23692. Photo: Michèle Bellot. Copyright
© Réunion des Musées Nationaux / Art Resource, NY. Louvre, Paris, France.
8 Richard Wollheim
Figure 1.4
Barnett Newman, Vir Heroicus Sublimis. 1950–51. Oil on canvas, 751183 55_1759 41 55
(242.2 _513.6 cm). The Museum of Modern Art, New York. Gift of Mr. and
Mrs. Ben Heller. Photograph © 2001 The Museum of Modern Art, New York.
Figure 1.5
Domenico Ghirlandaio (1448–94). Portrait of an old man and his grandson. Copy-
right © Alinari / Art Resource, NY. Louvre, Paris, France.
limited in this respect: that it is out of the things that can be seen in his
picture that his intention determines what the picture represents. If some-
thing cannot be seen in a painting, it forfeits all chance of being part of its
content.
So the question arises, How can we determine, for a given picture,
whether something can be seen in that picture? What are the limits of
visibility in a picture?
The question is clearly ambiguous between an extensional and a nonex-
tensional reading. Since there is little doubt but that when we ask whether
10 Richard Wollheim
Figure 1.6
Nicolas Poussin, French, 1594–1665, Landscape with Saint John on Patmos, 1640,
oil on canvas, 100.3 _136.4 cm, A. A. Munger Collection, 1930.500 © The Art
Institute of Chicago. All rights reserved.
tion, whereas with the rest, such as Manet’s La Prune (figure 1.7), there is
no answer to this question. To generalize, some pictures are of particular
things or events, and others are of things or events merely of some partic-
ular kind.
The explanation for why we classify the two paintings in different ways
is not that there is in reality such a person as Madame Brunet and that
Manet painted his painting by painting her, whereas there is no particular
person whom La Prune represents, though all this is in fact true. For, in
the first place, I believe (contra Nelson Goodman) that we should put
Jean-Auguste-Dominique Ingres’s Jupiter and Thetis in the same category
as Madame Brunet even though Jupiter and Thetis are not real persons.
Second, to invoke this kind of explanation at this stage would be to go
12 Richard Wollheim
Figure 1.7
Edouard Manet, La Prune (Plum Brandy), c. 1877, oil on canvas, .736 _ .502
(29 _19 43 ); framed: .876 _.641_.057 (34 12_25 14_2 14). Collection of Mr. and
Mrs. Paul Mellon. Photograph © 2001 Board of Trustees, National Gallery of
Art, Washington.
13 In Defense of Seeing-In
Our straw man goes wrong in locating the explicandum of his inquiry
where he does. For, when we look at, say, Emilie Ambre, it is not the case
that we have an experience of the sort that would, in the absence of ancil-
lary evidence to the contrary, lead us to believe that we were seeing Emilie
Ambre face to face. But might our straw man not yet be right in what he
takes to be the explicans? He might be. It might be the case that, whenever
we saw something in a marked surface, we were led to do so by a prior
experience of the surface for the duration of which we inhibited ourselves
from seeing something in it. But I am skeptical that this is so: I cannot
readily believe that every occasion on which we see something in a marked
surface is preceded by our seeing that surface without seeing anything in
it. So the latter kind of experience cannot furnish an explanation of the
former kind.
A different kind of explanation of why we see something in a marked
surface, though the two can get confused, is an appeal not to some way in
which we see the surface but to some way in which the surface is. The way
the surface is, in such cases, is not necessarily something of which we are
aware, but, unless it were as it is, we would not see in it what we do.
It is sometimes argued that, in any account of what we see in a surface,
there is no room for any feature of the surface to play a role unless this
feature is itself visible. But this betrays a confusion between two kinds of
account. For it is certainly the case that in an analysis of what it is to see
something in a marked surface, a feature of the surface of which there is
no awareness is out of place. This is because an experience can only be
analyzed in terms of constituent experiences. But when it comes to an
explanation of what it is to see something in a surface, the situation is
different. For we can explain an experience in terms of other experiences,
but also in terms of features of the environment to which the experience is
sensitive, even though the person might have no awareness of them as
such. In the second case, the explanation does not double as an analysis.
Rainer Mausfeld
of the cognitive system, we can encounter phenomena that are likely due
to the internal handling of multiple conjoint and often competing repre-
sentations. From this point of view, picture perception and its dual char-
acter are a special instance where we exploit these given capacities in the
context of human artifacts. Before arguing for this view in greater detail,
I will briefly delimit the topic of my inquiry.
In picture perception, more than, say, in research on stereo vision or
color coding, the tension looms large between what are to be considered
universal and what cultural and conventional aspects, a tension that mir-
rors and maintains the time-honored distinctions that placed physis and
ethos, nature and convention, essential and accidental properties in well-
nigh irreconcilable opposition to one another. The corresponding issues
are a matter of much debate between—to use some fashionable jargon—
universalists and cultural relativists. Outside some areas of vision and
language, little of substance is presently known about where to draw a
dividing line between universal aspects and aspects of cultural variation,
individual plasticity or learning history. However, the entire idea of cog-
nitive science rests on the assumption that some such distinction can be
drawn at all. This also holds for inquiries centering on picture perception.
The perception of pictures, on the basis of multiple pictorial components
of very different status, involves highly complex interactions of our per-
ceptual faculty and various interpretative faculties, which are presently
not well understood. These interactions give rise to a high degree of cul-
tural variation. I shall deliberately ignore the cultural dimension of picture
perception and, with respect to the so-called dual character of pictures,
focus on structural elements of perception that seem to be part of our
basic cognitive endowment.
to achieve “visual truth” in their paintings. This idea gave rise to related
inquiries into artistic techniques for the evocation of space and, in partic-
ular, into techniques for creating geometrically correct two-dimensional
pictorial representations on a canvas of the three-dimensional layout of
the pictured scene. In such investigations, as Martin Kemp observed, “the
eye figures little, the mind features even less” (1990, p. 165). Rather, what
was to be accomplished was “the demonstration of an internally consis-
tent system of the spatial elements in a picture and, above all, a proof that
the system rested upon non-arbitrary foundations” (ibid., p. 11). The can-
vas was regarded as a window, often referred to today as an “Alberti win-
dow,” through which the painter views the world and that intersects the
painter’s visual cone (Lindberg 1976). This conception gave rise to the
idea that a realistic appearance of depth and space can be achieved in pic-
tures by mimicking the exact geometrical relations in the structure of light
that reaches the eye from a three-dimensional scene. Consequently, a sys-
tem of construction rules, in the sense of artistic-engineering techniques
for the purpose of creating, on a flat canvas, pictorial representations that
induce a strong appearance of depth in the observer, gained prominence
in Renaissance art.2 Although these artistic techniques later joined with
ideas on geometrical processes of image formation in the eye, their use and
development were primarily shaped by considerations internal to the com-
plex variety of cultural purposes underlying artistic productions. However,
for the endeavor to imitate nature and to achieve visual truth in two-
dimensional representations of the world, the importance of rules for lin-
ear perspective is on a par with those for simulating the effect of lights
and the interaction of light and objects by using spatial pigment patterns
on a flat surface (Schöne 1954). It is a historically contingent development
of art history that resulted in linear perspective, rather than other aspects,
first gaining prominence in the context of artistic attempts to imitate
nature.
The notion of a dual character of pictures basically refers to the phe-
nomenon that pictures can generate an in-depth spatial impression of the
scene depicted while at the same time appearing as flat two-dimensional
surfaces hanging on a wall. Albert Michotte (1948/1991) recognized the
challenge that this kind of phenomenon poses for perception theory.3 A
description in terms of a perceptual conflict between the perceived flat-
ness of the picture’s surface and the perceived depth of what is depicted
captures only a small fraction of the perceptual enigmas involved. By
careful phenomenological observation when viewing pictures, one can
21 Conjoint Representations
The lines along which we can theoretically explore the dual character of
pictures considered as a case of cue integration are comparatively well
explored in visual psychophysics. Although problems of cue integration
are intricate enough we have both a wealth of experimental results and
22 Rainer Mausfeld
vividness of the scene depicted to zero, that is, a flat picture without
any indication of surface orientation or relative 3-D locations of objects.
Interestingly enough, our visual system uses almost the opposite strategy:
it is a well-known result from visual psychophysics that monocular cues
often dominate the resulting 3-D interpretation over stereopsis, even at
close range where stereopsis is most accurate. I will mention only a
few corresponding studies. For instance, in a pioneering study, Walter
Schriever (1925) found that perspective alone as well as occlusion could
overrule disparity information. Schriever also made a wealth of care-
ful observations about attentional effects and vagueness and instability,
as well as individual differences. Mümtaz Turhan (1937) found that in
center-surround situations, brightness gradients of opposite direction can
result in perceptual impressions that violate the physical depth relations
of infield and surround as provided by disparity information and motion
parallax (infields whose brightness gradient has the same direction as that
of the surround appear to lie in the same depth level as the surround,
whereas they appear to lie in a different depth plane and often look bent
in the case of opposite brightness gradients). Turhan observed that phys-
ically incompatible depth interpretations of the infield can occur at the
same time and often are accompanied by some kind of vagueness of the
perceptual impression.
John Yellott (1981) showed that an inside-out face—a mold of a face—
looks right side out as long as shading is present, despite the presence of
contradictory disparity information as provided by a random-dot stereo-
gram projected onto the surface of the mask. If the mold is presented
solely as a random-dot stereogram with no shading, it is seen inside
out, that is, consonant with the disparity information. Another interest-
ing example is provided by Kvetoslav Prazdny (1986), who described a
random-dot stereo cinematogram that portrays a flat object in front of
a background that changes its two-dimensional shape consistent with a
three-dimensional rotating wire object, while at the same time the binocu-
lar disparities are incompatible with the relative depth information speci-
fied by the image motions. Due to the appearance of three-dimensionality
in these displays he concluded that the kinetic depth effect effectively
vetoes the stereo disparity cue with respect to the shape of the object
(however, disparity determined the position of the object with respect to
the background).
A particularly effective demonstration of the monocular influence over
stereo information was provided by Kent Stevens and Allen Brookes
24 Rainer Mausfeld
Figure 2.1
Stimulus configuration used by Stevens and Brookes (1988). The lines are stereo-
scopically presented as being coplanar, that is, they increase linearly in disparity
from left to right. The 3-D impression, however, is of a corridor extending in
depth, as suggested monocularly.
(1988; see also Stevens, Lees, and Brookes 1991). The lines in the stereo-
scopically presented figure 2.1 are coplanar, that is, they increase linearly
in disparity from left to right. The 3-D impression, however, is of a corri-
dor extending in depth, bordered on either side by columns of vertical
lines or stakes. In the stereo-apparatus the innermost lines on either side
of the vertical meridian had stereo disparities of <11 minutes of arc; the
outermost lines had disparities of <515. It is remarkable that the line
with :115 disparity appeared more distant than the line of disparity +515.
Stevens and Brookes found empirical evidence that stereopsis extracts
3-D surface information only where the second spatial derivatives of dis-
parity are nonzero, corresponding to loci where the surface is curved,
creased, or discontinuous.
We now can directly apply their result to the situation of picture per-
ception. A picture hanging on a wall has no local disparity differences over
the picture surface but only a continuous uniform gradient of disparity
25 Conjoint Representations
I will now turn to the second aspect of the notion of dual character of pic-
tures, namely, the issue of conjoint representations over the same input
and our ability to handle them smoothly. This issue is much more puzzling
and much less understood than the conflicting cue aspect. I shall first
describe the corresponding kind of phenomenon, as referred to in the lit-
erature on picture perception. Like the cue integration aspect, it is, in my
view, not specific to picture perception but rather a general and essential
property of our cognitive organization.
In picture perception, we can simultaneously have the phenomenal
impression of two different types of objects, each of which seems to thrive
in its own autonomous spatial framework,7 namely, on the one hand, the
picture surface as an object—with corresponding object properties such as
orientation or depth—and, on the other hand, the depicted objects them-
selves with their idiosyncratic spatial properties and relations. We seem to
have two mutually incompatible spatial representations at the same time;
at least in the sense that they are available internally and we can, without
any effort, switch back and forth between them.
26 Rainer Mausfeld
Before venturing some more speculative ideas about some general prop-
erties of our cognitive architecture on which this ability rests, I will list
some observations concerning this dual nature of pictures that I consider
to be of particular relevance.
In an aside, in order to avoid potential misunderstandings I would like
to emphasize that pictorial art in general cannot simply be understood
as a kind of frozen optical array or a static boundary case of the optical
structure of the input from a scene. Pictorial art is much richer than nat-
uralistic artistic productions and serves a great many different symbolic
functions. Pictures are not surrogates for scenes, nor can they be sub-
jected to a criterion of some absolute notion of veridicality (whatever
that means) with respect to the scene depicted. Thus, those aspects of pic-
ture perception that we potentially can understand from core perceptual
principles—and, as I said in the beginning, it is only this part that I will
address here—are, from the perspective of cultural studies, the least inter-
esting aspect of picture perception. What we usually refer to when we
talk about picture perception are symbolic interpretations at various
levels and thus aspects that pertain to highly complex interactions of our
perceptual faculty and various interpretative faculties. Within the frame-
work of the cognitive sciences we virtually know next to nothing about the
cognitive principles underlying these achievements. Let us look at four
types of observations that are of particular theoretical relevance:
1. A continuous path of transitions exists from a view of a real 3-D scene
to a scene as depicted (or abstracted) on a canvas.
We can easily construct a continuous path from the view of a real 3-
D scene to a real 3-D scene viewed as a kind of frozen Alberti window
to a photo of this frozen optical array and to a highly reduced drawing
of the relevant contours of the scene. This allows us to experimentally
investigate all sorts of transitions and boundary conditions in picture
perception.8
2. We can phenomenally accentuate one or the other aspect and switch
back and forth in an effortless way.
This aspect is phenomenally so conspicuous and striking that we usual-
ly do not pay much attention to it. It is at the core of what we mean when
we refer to the dual nature of pictures. Though such switches are corre-
lated with depth aspects, they actually pertain to the entire perceptual
organization of the visual field and thus to attributes such as shape, or
shading and brightness gradients. A wealth of observations pertaining to
this kind of phenomenon have been reported in the literature, a wealth
27 Conjoint Representations
that contrasts oddly with the silence about what to make of these obser-
vations theoretically.
Ernst Gombrich (1982) made the important observation that one has
to achieve the proper mental attitude to take full advantage of the capac-
ity to switch back and forth between the reality of the picture as an object
and the reality of the depicted objects. Because of this, people at earlier
stages of cultural development regularly seem to have problems seeing
what is depicted in a photo. For instance, Jan Deregowski, Muldrow, and
Muldrow (1972) reported that people from a remote Ethiopian tribe when
presented with a drawing of an animal would pay attention to the charac-
teristics of the drawing paper but would ignore the picture; they exhibited
a complete inattention to the content of the representation while concen-
trating on the medium. Several others have reported essentially the same
observation. When people at earlier stages of cultural development regu-
larly have problems in seeing what is depicted on a photo, they have
not yet attained the ability to exercise what Gombrich called the proper
mental attitude and thus cannot fully exploit a given cognitive capacity in
the case of previously unknown artifacts. However, as Margaret Hagen
and Rebecca Jones concluded from a review of corresponding studies,
“this coexistence of information poses few problems even for the naive
observers when pictures represented only single solid objects. There is no
evidence whatsoever that any group of people see pictures of faces, cups,
hunters, antelopes or elephants as flat ‘slices of life,’ as it were” (1978, p.
192). Many other interesting regularities were found with respect to the
ability to simultaneously handle both types of reality, as it were. For
instance, outline drawings present fewer difficulties than photos to naive
observers; thus contour information seems to be of greater importance
than texture.9
Corresponding observations are, of course, not confined to picture
perception, but pervade psychophysics and perceptual psychology. For
example, in the study mentioned previously, Stevens and Brookes (1988,
p. 383) made the observation that experienced stereo observers can also
discern the true stereo depth of the component lines with scrutiny, as if
they can selectively disregard the monocular depth interpretation.
3. The “realities” of pictures as objects and depicted objects bear dif-
ferent amounts of internal computational relevance and phenomenological
vividness.
We can switch back and forth between these two kinds of spatial repre-
sentations, but this is not a switch between a 2-D representation (within
28 Rainer Mausfeld
Figure 2.2
Pieter Jansz Saenredam, Interior of the Saint Jacob Church in Utrecht, 1642.
Bayerische Staatsgemäldesammlungen, Alte Pinakothek, Munich. See plate 2 for
color version.
30 Rainer Mausfeld
gained by one aspect is lost by the other. Second, as Hagen and Jones
(1978, p. 194) also pointed out, “the space behind the picture plane is not
completely separated from the space of ordinary environment which sur-
rounds the picture.” For instance, Deregowski, Muldrow, and Muldrow
(1972) observed an interesting effect of a horizontal versus a vertical pres-
entation of a picture of a profiled standing buck. When the picture was
presented lying flat on the ground, most observers reported that the buck
was lying down; when it was presented vertically, they reported that the
buck was standing up.
The proposals and corresponding observations by Maurice Pirenne
(1970), James Farber and Richard Rosinski (1978), Michael Kubovy
(1986), and others that the surface characteristics of the picture have to be
available internally, in order to allow some kind of compensation process
that corrects distortions caused by an inappropriate viewing geometry,
also indicate that both representations are interlocked in complex and
poorly understood ways. In other areas, such as color or brightness per-
ception, we have a better theoretical understanding about how conjoint
representations are interlocked.
The structural perceptual properties that we can identify in theoretical
analyses of phenomena centering around the dual character of pictures
cannot simply be regarded as kinds of “perceptual irregularities” that are
due to encountering an artifactual situation. Rather, these phenomena
seem to point in a particularly conspicuous way to a general perceptual
capacity to deal with conjoint representations.
visual system then has to provide computational means to deal with sen-
sory inputs that are compatible with different parameter combinations in
this joint region.25
The interplay of the two representational primitives involved is phe-
nomenally mirrored in many peculiarities that are characteristic for color
appearances under (chromatic) illumination. Of particular interest among
these is what Hermann von Helmholtz called seeing two colors “at the
same location of the visual field one behind the other” (1867, p. 407) and
what Karl Bühler referred to as “locating colors in perceptual space one
behind the other” (1922, p. 40; cf. Fuchs 1923a). For instance, in a room
illuminated by a reddish light, we can “see” both the color of the object
(e.g., “white” wall) and the color of the illumination, though there is,
as David Katz observed, a “curious lability of colors under chromatic
illumination” (1911, p. 274). Similar observations hold, with respect to
brightness, for the appearance of surfaces on which a shadow is cast. I will
briefly mention a few other observations that seem to be of relevance for
attempting to understand the internal structure underlying color and
brightness perception.
1. The dual nature of color coding that results from the exploitation of
the input by two different kinds of representational primitives is perceptu-
ally mirrored in what, since Katz’s (1911) groundbreaking work, has been
called “modes of appearance,” in particular a “surface color” mode and
an “aperture or light color” mode. This descriptive taxonomy, which itself
is in need of explanation by some deeper principles, has unfortunately
often been called upon as an explanation itself, thereby confusing the
observation with its explanation.
2. The phenomenal dissociation of brightness and grayness also suggests
different representational primitives in which “brightness” figures as a
parameter. Since Hering, is has been well known that even for achro-
matic colors at least a bidimensional account is necessary, as can be wit-
nessed by appearances such as luminous gray. With respect to painting,
the difference between a “brightish white” and a “whitish bright” is cru-
cial and has been recognized as such since painters became interested in
representing the effects of light (Schöne 1954, p. 203).
3. The Mach card or Ewald Hering’s “stain versus shadow” demonstra-
tion (Fleckschattenversuch) are typical classical phenomena that demon-
strate how certain attributes can modulate the relation between different
representational primitives that exploit a given sensory input. In Hering’s
demonstrations slight changes in figural characteristics of the Alberti
40 Rainer Mausfeld
Figure 2.3
Pieter Brueghel the Elder, The Big Fishes Eat the Small Fishes, 1557, copperplate
engraving, FM 1365, Rijksmuseum, Amsterdam.
Figure 2.4
Gerd Arntz, Kolyma, 1952, linocut, private collection.
45 Conjoint Representations
as the ones that figure in the actually intended interpretation, which refers
to the threatening of the individual by the terror of the state.
The pictures displayed demonstrate again that we cannot understand
the use of allegories solely in terms of representational primitives. Rather,
highly complex interactions of our perceptual faculty and various inter-
pretative faculties are involved, about which we presently know next to
nothing, within the framework of cognitive science.
Figure 2.5
Sergei Eisenstein, Potemkin, 1926, still from the “Odessa steps” sequence, New
York, the Museum of Modern Art, Film Stills Archive.
Figure 2.6
Pablo Picasso, Weeping Woman, 1937, etching, aquatint, and drypoint on paper,
Paris, Musée Picasso. © Photo RMN—Gérard Blot.
49 Conjoint Representations
Figure 2.7
Enrico Baj, I funerali dell’anarchico Pinelli, 1972, etching and aquatint, private col-
lection. It depicts the “defenestration” of the Italian anarchist Guiseppe Pinelli
from the Milan police headquarters on December 15, 1969, after the bomb attack
by right-wing extremists at the Piazza Fontana.
put it: “What we take as objects, how we refer to them and describe them,
and the array of properties with which we invest them, depend on their
place on a matrix of human actions, interest, and intent in respects that lie
far outside the potential range of naturalistic inquiry” (2000, p. 21). With
respect to those aspects that we hope are within the reach of naturalistic
inquiry we can, in situations of poor theoretical understanding, only
entrust ourselves to Helmholtz’s guiding principle that “order and coher-
ence, even if they ground on untenable principles, are to be preferred
to the disorder and incoherence of a mere collection of facts” (1867,
p. VI).
Whatever the specific nature of the representational primitives of the
perceptual system turns out to be, their categorical character necessitates,
from a functionalist point of view, additional general mechanisms for
handling continuous transitions in the sensory input, as well as for pro-
viding, whenever appropriate, smooth transitions between internal cate-
gories. Corresponding questions are of particular relevance with respect
to conjoint representations. I will therefore briefly address this issue using
examples from visual perception.
property is provided by the fact that, for instance, a green light (or a
greenish ambient illumination) and an olive-green surface, whose colors
are yielded by free parameters of different representational primitives,
exhibit some phenomenological similarity, although the classes of appear-
ances these two primitives give rise to could, in principle, have been com-
pletely divorced from each other.
An important consequence of the requirement of ensuring smooth tran-
sitions between conjoint representations is the existence of what is called
a “proximal mode” in perception. The existence of a proximal mode is,
as Rock noted, “not merely of interest as a phenomenological nicety but
rather has important ramifications for a thorough-going theory of per-
ceptual constancy” (1983, p. 254). Evidently, once we have attained the
ability to exercise a suitable “mental attitude,” we can perceptually detach
certain attributes from their “frame of reference” as given by a specific
representational primitive in which these attributes figure. For instance, a
coin lying on the ground at some distance from the observer and being
viewed at a slant is perceived as being circular in shape and of its usual
size. Still, we can also see it, in the proximal mode, as an elliptical shape
of diminutive size. In the same vein, railroad tracks that recede from the
observer toward the horizon are perceived as being parallel. Still, we can
also see them, in the proximal mode, as converging toward the horizon.
Attributes that figure in both types of conjoined representations can,
apparently, be dissociated from aspects that are proprietary to each of the
representations involved. Thus, the existence of a proximal mode helps to
protect the system from adopting a behavior where small continuous
changes in the input result in abrupt changes in internal representations.
It is important to note that only those aspects necessitated by corre-
sponding continuity considerations are accessible to a proximal mode.
There is no proximal mode in the sense of a measurement device miscon-
ception of perception, or in the sense of the (entirely obscure) notion of a
kind of retinal seeing (there is, for instance, no proximal mode for a veridi-
cal seeing of isolated elements of so-called geometrical illusions). Rather,
what can figure in a proximal mode is entirely determined by the structure
of conjoint representations involved.
In color perception, for instance, the proximal mode percept corre-
sponds to that combination of potential values for the free color parame-
ters of both representations involved that is determined by the internal
assumption of a “canonical” or default situation, which, in this case,
would correspond to a spatially homogeneous illumination that does not
55 Conjoint Representations
ACKNOWLEDGMENT
I should like to thank Franz Faul and Johannes Andres for comments on
an earlier draft of this chapter.
Notes
1. Diels (1922, fragment 12, p. 123).
2. Also, pictorial devices—flattening techniques (Willats, 1997)—have been devel-
oped to control the perceptual balance between the flatness aspect of the picture
and the depth aspect of what is depicted, such as accidental alignments between
two or more parts of a scene and the position of the viewer, the use of mixed and
mutually inconsistent perspectives, or obtrusive surface marks.
3. These phenomena could, from a physicalistic perspective, be described as a
kind of discrepancy between what is physically there, namely, a flat surface,
and the perceptual impression evoked. However, framing the problem this way
amounts to conflating the level of the physical generation process of the sensory
input with the level referring to perceptual mechanisms by which this sensory
input is exploited (cf. Mausfeld 2002).
4. Phenomenological observations that appear particularly salient or enigmatic
do not necessarily have a particular relevance for perception theory. Although
phenomenological observations of various kinds are of prominent heuristic impor-
tance for perception theory, they do not carry a kind of “epistemological superi-
ority.” Phenomenological observations do not provide “direct access” to the
nature of representational primitives; rather, they result from an interplay of vari-
ous faculties, including linguistic and interpretative ones. Thus they are, within a
naturalistic inquiry into the principles of perception, on a par with many other
sources that provide relevant facts and observations.
5. The notion of “representation” is burdened with a high degree of ambiguity
due to its multifarious meanings. Many corresponding locutions in this chapter,
such as pictorial representation, refer to ordinary discourse. With respect to these,
I do not attach much importance, in the present context, to carefully distinguish-
ing different meanings. In the context of explanatory frameworks for certain phe-
nomena, however, I use the notion of “representation,” for example, in the terms
spatial representations or surface representations, to denote elements of postulated
internal structure that are part of an inference to the best explanation. In the con-
text of perception theory, this neither involves particular ontological commitments
nor any reference to the external world. More concretely, a “surface” representa-
tion is not, in any meaningful sense, to be understood as a representation of phys-
ical surfaces. Dispensing with notions of “reference,” “truth,” or “veridicality”
within explanatory frameworks of perception theory is, as Chomsky (1996, 2000)
has argued most convincingly and adamantly with respect to naturalistic inquiries
of the mind, entirely in line with standard methodological principles of the natu-
ral sciences.
58 Rainer Mausfeld
mal) to adapt its reactions before any individual experience had the opportunity
to provide it with any structure” (Michotte 1954/1991, p. 45).
15. This distinction is different in character from widely made distinctions between
so-called earlier or lower-level systems and higher-level systems. The latter basi-
cally correspond to the sensation-perception distinction as used by Spencer,
James, Wundt or Helmholtz, which refers to an alleged hierarchy of processing
stages by which the sensory input is transformed into “perceptions.” In contrast,
the present distinction refers to two categorically different types of structures and
is more in line with corresponding distinctions by Descartes, Cudworth, or Reid
(cf. Mausfeld 2002, appendix).
16. Computational approaches of the kind pioneered by Marr almost exclusively
deal, with respect to this distinction as I conceive it, with the sensory system; they
have revealed that it has a much richer conceptual structure and greater computa-
tional power than previously assumed.
17. Among representational primitives pertaining to “objects” there are, as corre-
sponding evidence suggests, not only those that pertain to “physical objects” of
various types but also a great variety of specific types that pertain to intentional
physical objects or to biological objects.
18. Again, the internal concept “surface” is assumed to be entirely determined
syntactically, that is, by its data structure and the kind of transformations and
relations that operate on it. It is not, in any meaningful sense, a representation of
physical surfaces. I use the term surface representation only as a convenient abbre-
viation for a postulated representation (whose nature we presently only poorly
understand), whose properties seem to be conveniently describable, at the meta-
theoretical level of the scientist, in terms of perceptual achievements related to
actual surfaces.
19. More precisely, the two different parameters involved can be regarded as
pertaining to the same attribute, if they figure as parameters of the same type in
some superordinate structures and computations. Again, a label such as “color”
serves only as a convenient metatheoretical characterization of a certain type of
parameter.
20. Because of these interdependencies of free parameters, attempts to identify the
representational primitives of the structure of perception and their “data struc-
ture” by investigating attributes like color or depths in isolation are doomed to fail
(apart from lucky coincidences). They are just as futile as it would be to try to
determine a n-dimensional manifold from a random sample of one-dimensional
projections.
21. Even in cases of physically identical input situations, perceptual properties
that have usually been regarded as predominantly mirroring properties of input
channels, such as discrimination, critically depend upon the settings made for con-
joint parameters. A case in point is the observation, known as the Aubert-Förster
phenomenon, that discrimination for objects that subtend the same visual angle is
better when the object is perceived as a small one at near distance than when it is
perceived as a large one at greater distance.
60 Rainer Mausfeld
22. Jan Koenderink argued that “the notion of a depth map as summary repre-
sentation of pictorial relief is hardly tenable.” He concluded that it is likely “that
mental structure contains various (perhaps mutually inconsistent) fragments of
data structures and that only the execution of particular tasks may perhaps draw
on a variety of them and lead to some degree of coordination” (1998, p. 1083).
23. The Ames room demonstration also suggests that an “ambient space” repre-
sentation is triggered according to its own rules and has, in cases in which differ-
ent combinations of values can be assigned to free parameters, its proprietary
“default interpretations,” even if these result in ecologically odd parameter set-
tings for “object” representations pertaining to objects located in this ambient
space. (In the Ames room, a person who walks along the wall opposite the ob-
server, which physically recedes in depth from the observer, appears to shrink in
size.)
24. “Color” presumably also figures as a free parameter in a variety of superordi-
nate primitives that pertain to more complex, biologically relevant aspects of the
external world, such as those pertaining to “edible things” or to “emotional states
of others.”
25. Mausfeld and Andres (2002) found evidence that second-order statistics of
chromatic codes of the incoming light array differentially modulate, by a specific
class of parametrized transformations, the relation of the two kinds of representa-
tional primitives involved.
26. For recent demonstrations that bear on these issues see, for instance, Adelson
(1993), Knill and Kersten (1991), or Buckley, Frisby, and Freeman (1994).
27. Problems of figure-ground segmentations are also a “major obstacle in de-
veloping computational theories,” as Weisstein and Wong (1986, p. 61) noted,
because basic elements that are used in standard computational theories for the
extraction of surface properties are themselves dependent on figure-ground seg-
mentations. Figure-ground segmentation itself is a most fundamental variable that
determines and influences perceptual attributes such as color or depth. Surfaces
that are linked up as “background” can even survive inconsistent disparity infor-
mation, as Belhumeur (1996, p. 342) showed by a stereogram in which we perceive
a continuation of a background object behind foreground strips, even if this is not
consistent with the actual disparity relations for a part of the background section.
28. If, to some interesting extent, this should indeed turn out be the case, com-
paring the functioning of the perceptual system with language, as notably
Descartes and Cudworth did (cf. Mausfeld 2002, appendix), would not merely be
an illustrative or pedagogical metaphor but rather a theory-constitutive metaphor,
which invites exploring “the similarities and analogies between features of the
primary and secondary subject, including features not yet discovered, or not yet
fully understood” (Boyd 1979, p. 363).
Chapter 3
Relating Direct and Indirect Perception of Spatial Layout
H. A. Sedgwick
Pictures are made and used for many purposes. In my work, and in this
chapter, I focus on just one of these purposes, which is to afford the
accurate perception of the three-dimensional layout of a scene. This pur-
pose may be linked to, and in part derived from, many other purposes—
aesthetic, narrative, historical, architectural, navigational, instructional,
and so on—that give it a social setting and significance, but I shall not
pursue those links here. To say that the purpose is accurate perception
implies that we have knowledge of the true layout of the scene, to which
an observer’s perception of its representation can be compared. The scene
represented may actually have existed at one time, as in a photograph; it
may be pure invention, as in a 3-D computer graphic; or it may be some
combination of invention and reality, as in an architectural drawing. But
it is, in all the cases I am considering, precisely specifiable. I refer to this
represented scene as the virtual scene.
It would go beyond my purpose, and my competence, to pursue in this
chapter philosophical questions concerning what is real and how we know
it to be so. Instead, I shall adopt a naive realism that assumes the exis-
tence of a real world that surrounds us and that we perceive, more or less,
when we look around. Following James J. Gibson, I shall refer to our per-
ception of this real world, or scene, as direct perception. Gibson, and I,
then refer to our perception of a virtual scene that is represented to us in
a picture as indirect perception. These two terms, “direct” and “indirect,”
are to be understood in relation to each other. They refer to two dif-
ferent situations for perception. Direct perception is perception “without
intermediary agents,” to quote a dictionary definition (Webster’s College
Dictionary 1992); it is looking directly at the scene itself rather than look-
ing at a representation of it. Indirect perception refers to looking at a
62 H. A. Sedgwick
rather than in empty space, then stereopsis is used to relate the object to
this background surface rather than to the viewer.
The relation between direct and indirect perception is much more
straightforward from the standpoint of the ground theory than from that
of the standard model. Because the standard model posits distance per-
ception through empty space, from the eye of the observer to the object,
it has traditionally been forced to rely heavily in its theorizing about space
perception on stereopsis and the oculomotor adjustments of the eyes. But
because these are precisely the forms of information absent from paint-
ings, photographs, and other nonstereoscopic pictorial representations,
an explanatory gulf is opened up between the direct and indirect percep-
tion of space. The ground theory, in contrast, is right at home in pictorial
representations of space. As is clear in the first Western work on the sub-
ject, written by Alberti in 1436 (Alberti, 1436/1972), the successful repre-
sentation of spatial relations depends on the accurate construction of the
ground plane. Without a connected layout of three-dimensional surfaces,
pictorial representations tend to collapse onto the picture surface. Some
modern artists, such as Wassily Kandinsky, have done fascinating experi-
ments on the creation of three-dimensional impressions from abstract
objects floating in empty space, but much of the fascination comes from
the lability and ambiguity of these impressions. Recently, Jan Koenderink,
Andrea van Doorn, and their colleagues have performed a series of care-
ful and very interesting experiments on the perception of representations
of sculptural objects floating in empty space. They have found that the
perceptions of such objects are stretched and even sheared in ways that
appear to be essentially arbitrary from one observer to the next (Koen-
derink, van Doorn, and Kappers 1995; Koenderink, van Doorn, Kappers,
and Todd 2000a).
Our predisposition to perceive the environment as a layout of con-
nected surfaces underlies the effectiveness of simple presentations such as
line drawings. A horizontal line across the middle of the page is sometimes
enough to invoke the perception of the ground plane. A complex archi-
tectural space of connected surfaces can be evoked by a few converging
and parallel lines.
In complex scenes, not everything rests directly on the ground or floor.
Instead, a lamp may rest on the table, or a book may rest on a shelf
attached to the wall. But the wall extends down to the floor, as does the
table, so that ultimately everything that does not float or fly is supported
by the ground. To apply the ground theory of space perception to com-
67 Relating Direct and Indirect Perception of Spatial Layout
The second hypothesis I wish to discuss holds that direct perception is pri-
marily environment centered (Sedgwick 1983). By “environment centered”
I mean that what is most salient in our perception of spatial layout is how
68 H. A. Sedgwick
loses its primacy altogether. What is primary instead is location, the point
at which an object contacts a supporting surface. Given the set of contact
relations between the objects and surfaces of the environment, distances
between locations can be derived as needed (Sedgwick 1983, 1987b).
Finally, let me illustrate this distinction between environment-centered
and viewer-centered models with the example of surface orientation. As
Gibson and Cornsweet (1952) pointed out, we can define the slant of a
surface either relative to the observer’s line of sight or relative to an envi-
ronmental surface such as the ground plane. They referred to these as
“optical slant” and “geographical slant,” respectively. The properties of
these two types of slant differ in important ways. Consider a flat surface
slanted relative to the ground, say at an angle of 45°. The geographical
(or environment-centered) slant of this surface is a constant 45° along its
entire length. It also remains a constant 45° as the observer moves around
in the environment. In contrast, the optical (or viewer-centered) slant,
where the line of sight meets the surface, changes as the line of sight
sweeps along the surface or as the observer moves around.
Surfaces that have the same environment-centered slant are parallel
to each other and thus share the same vanishing line in perspective. Each
distinct vanishing line in the optic array thus corresponds to a family
of parallel surfaces and specifies their environment-centered orientation
(Sedgwick 1983).
Marr suggested that the visual system first finds the viewer-centered
structure of the environment and then can derive other information, such
as parallelism, from it (Marr 1982). Some years ago Steve Levy and I
tested this idea, using computer graphics images, by asking observers to
adjust the orientation of one surface until it matched either the viewer-
centered or the environment-centered slant of another surface (Sedgwick
and Levy 1985). We found that observers were more precise in making
environment-centered matches. This makes it seem more plausible that
environment-centered slant is primary than that it is derived from viewer-
centered slant.
What are the implications of the environment-centered model for the
relation between direct and indirect perception? It seems to me that it
offers us some understanding of the ease with which a representation can
embed a virtual space in the real space of the observer. An environment-
centered virtual space can be coherent within itself without necessarily
being clearly related to, or even commensurable with, the real space of the
observer. Virtual objects’ sizes relative to the scale of their virtual envi-
71 Relating Direct and Indirect Perception of Spatial Layout
ronment, their locations relative to the layout of virtual surfaces, and the
slants of their surfaces relative to the surfaces of this virtual space may all
be more salient to indirect perception than these virtual objects’ viewer-
centered distances or the sizes or slants that could be derived from such
distances.1
3.5 CROSS-TALK
The perceptual problem is that observers tend not to notice these dis-
tortions. Someone walking through a photography exhibit tends not to
notice distortions of the virtual space of the pictures, even though the
point of observation is continually changing. This has led to the sugges-
tion that there is a perceptual compensation mechanism (Pirenne 1970,
p. 162). Such a mechanism is hypothesized to recognize that the observer
is looking at the representation from the wrong viewpoint and to inter-
nally correct for the distortion that this produces. The existence of such a
mechanism in indirect perception would, it seems, open up a considerable
gulf between indirect and direct perception. There can be no need for such
a compensation mechanism in direct perception, because the viewpoint of
the observer is always, by definition, the correct viewpoint. Thus this
hypothesized compensation mechanism, which would need to be a mech-
anism of formidable complexity to correctly solve the problem of com-
pensation, would exist for indirect perception alone.
The cross-talk hypothesis offers an alternative to the compensation
hypothesis. As the observer moves around, the virtual scene specified by
perspective distorts, but the surface of the picture does not change. Thus
cross-talk from the picture surface to the perceived scene will tend to have
a conservative effect, that is, to reduce the perceived distortions of vir-
tual space. Unlike the compensation hypothesis, however, the cross-talk
hypothesis does not predict that there will no perceived distortion. It only
predicts that the amount of perceived distortion will be less than what is
specified by perspective.
I shall briefly describe one study, done with Nicholls (Sedgwick and
Nicholls 1994), that addresses this issue. Using computer-generated scenes
with slanted rectangles similar to those I described earlier, we looked at
the effect of reducing the size of the picture, which is optically equivalent
to viewing the picture from a distance farther than the correct viewing dis-
tance. The depth dimension of the virtual space specified by perspective is
thus stretched, increasing the optically specified slant and width-to-height
proportions of the rectangle on the wall. We found that the perceived pro-
portions of the rectangle did indeed increase, but not by nearly as much as
perspective predicted. We were able to account for most of this shortfall,
however, with a control condition that measured the cross-talk from
the proportions, unchanged by minification, of the projected rectangle on
the surface of the picture. When we also included controls for the effects
of the smaller angular size of the minified picture and the smaller angular
size of the minified rectangle within it, both of which tended to reduce
74 H. A. Sedgwick
perceived slant, we were able to account for almost all of the discrepancy
between the perceived virtual space and the optically specified virtual
space. We were able to do this without invoking any special perceptual
mechanism to compensate for being at the wrong viewpoint. This, along
with a number of other results (Rogers 1995; Sedgwick and Nicholls 1996;
Sedgwick, Nicholls, and Brehaut 1995; Yang and Kubovy 1999), suggests
to me that the hypothesis of a special compensation mechanism in indirect
perception is unnecessary and that we can account for the indirect per-
ception of virtual space, even when seen from the wrong viewpoint, in
terms of the same mechanisms, such as cross-talk, that we find operating
in direct perception.
Let me briefly sum up. The technology of accurate spatial displays
produces an ever-changing, multidimensional continuum of pictorial rep-
resentations, ranging from the most compelling virtual realities to the sim-
plest pencil sketches.
We may imagine indirect and direct perception as a lock and key. When
we have the right theory of direct perception, it may provide the key that
opens up our understanding of the whole continuum of indirect percep-
tion. I have described three hypotheses about direct perception that may
provide a good fit with the problems of indirect perception.
First, direct perception is attuned to the complex spatial layouts of con-
nected surfaces that compose much of our environment. Display tech-
nologies have for a long time been well suited to the representation of such
layouts.
Second, direct perception is environment centered—attuned to geo-
graphical orientations of surfaces, locations of objects on those surfaces,
and sizes relative to the scale of the environment. This facilitates the indi-
rect perception of self-contained virtual scenes embedded within the real
space of the observer.
Third and finally, there is a dual awareness in direct perception—of the
three-dimensional scene and of its optical projection—and the perception
of each is influenced by cross-talk from the other. In indirect perception
there is a corresponding duality between the perception of the virtual
scene and the perception of the picture surface. Cross-talk from the stable
picture surface tends to reduce the perception of distortions in the virtual
scene when the picture is viewed from the wrong viewpoint.
Notes
1. This is not to deny the possibility of a viewer-centered mode of perception,
especially in near space, and in some visual-motor activities, although such situa-
75 Relating Direct and Indirect Perception of Spatial Layout
tions can easily be mistaken as viewer centered when they are not. Instrumental
vision is rarely exactly viewer centered, where the center is taken to be the nodal
point of the eye, or the cyclopean eye of binocular vision. For manual tasks the
important relations are between the hands and the object; for locomotor tasks
they are between the feet and the goal; for tool-using or vehicular tasks they are
between the tool and its object or the vehicle and its goal; and so forth. It may be
that such tasks are better understood in environment-centered terms.
2. There is also cross-talk in the other direction: if we try to attend to the scene’s
projection, our perception is strongly influenced by the three-dimensional scene.
In indirect perception, cross-talk from the pictured scene to the picture surface can
account for some geometrical illusions, such as the Ponzo illusion (Sedgwick and
Nicholls 1993b).
This page intentionally left blank
Chapter 4
The Dual Nature of Picture Perception: A Challenge to Current
General Accounts of Visual Perception
4.1 INTRODUCTION
Pictures form a significant part not only of our visually oriented culture
but also of the stimulus material employed by perceptual psychologists.
Nevertheless, with few exceptions (e.g., Rock 1984), standard textbooks
on the psychology of perception tend to ignore the challenging pecu-
liarities of picture perception, and the same goes for mainstream vision
science.
One major reason for this is that, first of all, attention is restricted to
realistic pictures in the strict sense, based on central perspective, such as
Renaissance paintings or photographs. In a second step, then, the percep-
tion of these pictures is conceived as being on a par with the perception of
a scene as seen through a window. Hence, on this account, the perception
of a picture, that is, the perceptual emergence of pictorial space in the
observer’s consciousness, need not be distinguished in principle from the
perception of an everyday scene. Such a conception of picture perception
implies that the spatial percept evoked by a flat realistic picture is simply
a perceptual illusion, which is not different, in principle, from other well-
known perceptual illusions such as the Ames room.
Except for limiting cases such as perfect trompe l’oeil, this is not the
case, however. That is, as a rule, nobody misperceives a painted scene as
a real one. Rather we speak, without hesitation, of seeing a painting of
a scene. As Edmund Husserl (1980), Michael Polanyi (1970b), Maurice
Pirenne (1970), James J. Gibson (e.g., 1979), and others have forcefully
pointed out, this is due to what is often called the dual nature of picture
perception: the evoked percept involves both the perceived surface of the
picture as part of perceived real space and the experience of pictorial
space, both aspects being subtly interwoven. If our visual system were able
78 Reinhard Niederée and Dieter Heyer
ence would have to meet the challenge posed by this phenomenon. For
reasons to be pointed out, we are convinced, however, that phenomena
of that kind should not be considered marginal. On the contrary, we
believe that general theories of perception will profit substantially from
taking such complexities into account from the start. This would require,
however, a corresponding modification of standard scene-based overall
accounts of perception.
From this very perspective, pictures might indeed turn out to be natu-
ral objects of study not only for students of picture perception but for
visual scientists in general. To simplify matters, we will restrict attention
mainly to realist pictures (central perspective). Mutatis mutandis, our
considerations are meant to apply to other cases as well.
As the reader will have noticed, we have been talking about the dual
nature of picture perception and not about a “dual nature” of pictures
themselves. That is, duality is considered here as a feature of an observer’s
percept caused by a pictorial stimulus—the perceptual experience “of a
picture.” This duality has its origins in the visual processes underlying the
percept. Trivially enough, we will fail to find any such duality in the
“world out there,” that is, in the physical stimulus itself. Nor would it do
justice to the phenomenon at issue just to point at the “discrepancy”
between the real physical stimulus, a flat material object, on the one hand,
and the “illusory,” purely mental, pictorial space evoked by it, on the
other.
image itself must not, of course, be confused with “what is seen” by the
observer. Rather, by means of a stimulation of a field of retinal recep-
tors (rods, cones) it serves only as an “input” (“raw data”) to a complex
visual process. In the course of this process the visual system extracts
and integrates cues (or, in Gibsonian parlance, picks up invariants) that
eventually yield a richly structured conscious “output,” the percept. At
this level, meaningful “wholes” in the sense of Gestalt psychology (e.g.,
perceived objects), endowed with (perceived) shape, color, size, spatial
position, orientation relative to the observer, motion, and so on, are orga-
nized into a dynamic, more or less coherent, perceived scene. Note that
although the percept is part of our conscious experience, the underlying
visual processes (including the relevant cues) for the most part are not
accessible to consciousness. Needless to say, our present understanding of
these processes still is rather fragmentary.
In sum, this scheme assumes that a physical input evokes a conscious
experience, the percept. From a functional viewpoint, the latter in turn is
usually conceived as a partial 3-D scene-representation, whose compo-
nents may or may not be “veridical” with respect to the underlying phys-
ical scene. As usual in vision science, the term “perception” is employed
here, regardless of whether or not veridicality is met; that is, the evoca-
tion of a visual illusion by an external stimulus is equally counted as an
instance of perception. From a Helmholtzian viewpoint, this process
would be described as an unconscious inference yielding the most likely
hypothesis about a state of affairs “out there” that could have caused the
retinal input.
From the perspective of the scheme just outlined, it is indeed fairly
obvious that the notion of duality of picture perception pertains to the
visual processes and their “output,” that is, to the percept. However, in
being centered around the concept of a scene-representation, this scheme,
strictly speaking, does not allow for the possibility of this kind of duality.
the same location? On the preceding account, however, the system should
always come up with a single (most “simple” or “likely”) scene descrip-
tion. That is, we should either have (a more or less “veridical”) percep-
tual impression of a surface as part of perceived real space or the illusory
impression of pictorial space seen through a windowlike opening. (Here
and in what follows the term “impression” refers to perceptual experiences
and does not necessarily imply vagueness or the like.) At best, bistability—
perceptual switches between the two percepts—might occur, as known
from the well-known Necker cube and many other examples. Regarding
picture perception, Ernst Gombrich (1960) in fact advocated such a view,
insofar as he postulated that we could only attend to one of the two just-
mentioned aspects at a time, ruling out a simultaneous coexistence of both
perceptual impressions.
Against this position, a number of authors—rightly, we believe—have
argued that in many situations we are in fact aware of both aspects simul-
taneously, this double impression making up what we call a prototypical
perceptual experience of a picture (e.g., Polanyi 1970b; Pirenne 1970).
What may be shifted is attention. Indeed, we are hardly ever simultane-
ously aware of both aspects in a full-fledged manner, indeed. In many
situations, pictorial space is the more salient aspect, but our actions con-
cerning a picture will require an awareness of the picture as a flat object in
real space. In fact, even if attention is largely concentrated on one aspect,
there often is at least a residual awareness of the other aspect, too. The
dynamics of attention certainly needs to be taken into account in this con-
nection, but this is true of perception in general and does not imply bista-
bility as it is experienced, for instance, in the case of the Necker cube.
For brevity, the dual aspects of picture perception at issue will hence-
forth be called the planar and the spatial aspects of a picture percept,
respectively, even though the planar aspect itself does of course also
include a spatial component with respect to perceived real space. It is
worth noting that the distinction between planar and spatial aspects of a
picture percept involves not only genuinely spatial features but at the
same time a duality of aspects pertaining to perceived color. For instance,
what looks like a pattern of black, gray, and white patches on the picture
surface (planar aspect) may go hand in hand with the perceptual impres-
sion of a scene composed of objects of different achromatic colors under
certain perceived lighting conditions. As the case may be, this may include
perceived shadows, perceived light sources, perceived translucency, or
perceived gloss. Of course, the same goes for chromatic aspects of color
82 Reinhard Niederée and Dieter Heyer
Figure 4.1
Extension of the standard framework of perception for the case of picture percep-
tion. As for the proximal aspect, see the section “Related Forms of Perceptual
Duality.”
85 The Dual Nature of Picture Perception
have the same effect on the visual system (when they are viewed monocu-
larly from a specified viewing position and in certain fixed illumination
conditions). From a modern viewpoint, the concept of equivalent stimuli
refers solely to the physical principles connecting the distal and the prox-
imal stimulus. Although this level of analysis is sufficient if merely the
Renaissance painters’ goal is pursued, it falls short of providing us with a
perceptual theory that allows us to predict what is seen when we are look-
ing at a scene or a realistic painting and to explain why we see what we see.
The first problem that comes to mind is that, given a retinal image, there
is an indefinite number of—equivalent—possible scenes (seen from a fixed
viewing position) that could have generated this proximal stimulus. If per-
ception is understood as generating a scene-representation then the ques-
tion arises, which of these possible scenes should be represented? These
scenes might be called the possible virtual spaces associated with the prox-
imal stimulus. Of course, this question not only arises in the context of
picture perception but applies also to perception in general. Consider, for
instance, a cube viewed from a certain position. Usually, we will perceive
it as a cube. This cube could now be distorted in many different ways with-
out changing the retinal image. Hence, in all these cases we would have the
same percept; that is, we would perceive these objects as cubes. The ques-
tion now arises of why, given only that proximal stimulus, our visual sys-
tem generates a representation of a cube rather than a representation of
any of the other compatible objects.
If one would now produce an equivalent trompe l’oeil, we would of
course also see a cube. What the theory of central projection underlying
Renaissance art would then explain is that we perceive the same in both
situations, that is, when viewing the real cube and the picture of the cube,
but not why we see a cube at all. For when viewed from a fixed viewing
position, the picture (i.e., the proximal stimulus generated by it) is associ-
ated with an indefinite number of possible virtual spaces. Although it is
fairly unproblematic to speak of the perceived virtual (or pictorial) space
for a given observer at a given time, it seems unjustified to speak of the vir-
tual space of a picture independent of an observer. Unfortunately, many
investigators start out with the idea of a fixed virtual space of a picture,
directly or indirectly derived from the original scene underlying the pic-
ture (e.g., the scene photographed), which is then taken as a standard of
reference in their evaluation of the judgments made by their subjects con-
cerning their perceived virtual space.
88 Reinhard Niederée and Dieter Heyer
scene, respectively) in most cases are clearly different, with contrast being
severely reduced in the former case.
Even if it were possible to achieve strict equivalence, the usefulness of
this concept is limited from the start because it is restricted to monocular
viewing and does not take into account the dynamics of perception. That
is, the observer’s viewpoint has to be fixed, thereby excluding bodily
movements, which are so important in ordinary vision. Even in the case of
monocular vision with a fixed viewing position, accommodation intro-
duces a dynamic element, yielding different results for an observed paint-
ing and the corresponding scene.
Last but not least, the fact that most realistic paintings evoke a dual
percept clearly shows the nonequivalence of the picture and the corre-
sponding scene. For, if they were equivalent, a trompe l’oeil effect would
have to occur. In other words, one would have the visual impression of a
real scene as viewed through a window. Instead, the visual system comes
up with a percept in which two perceived virtual spaces—one of them
being the picture plane in a certain position in space—are intertwined, as
it were. Why does this happen? Rather than being a nuisance, the afore-
mentioned difficulties turn out to be a clue for our theoretical under-
standing of the dual nature of picture perception and possibly even of
some basic structural features of perception in general.
just form a kind of simple compromise or decide on only one of them. The
formation of different subpercepts, each of which involves a process of cue
integration, goes hand in hand with a kind of cue clustering we will call
cue segregation.
These different subpercepts are not simply generated in isolation but
there are mutual interactions at various levels of processing. Adopting a
term coined by Sedgwick (chapter 3 this volume), we assume a cross-talk
between those processes. For instance, the reduction of the distinctive-
ness of perceived depth in ordinary picture viewing (as opposed to static
monocular viewing) might be explained by a cross-talk between (pro-
cesses generating) the perceived picture plane and the perceived pictorial
space. The same goes for the often reported inverse phenomenon that in
certain situations the planar aspect of a perceived picture seems less vivid
in the presence of a perceived pictorial space that involves a strong impres-
sion of depth. (Further potential instances of cross-talk are described later
on and in chapter 3 in this volume.)
Another kind of interlock between the generated subpercepts is some
kind of phenomenal binding, which in the case of picture perception is
experienced as the perceptual unity of a perceived picture. The perceived
pictorial space and the perceived picture plane perceptually belong to
each other, as it were. For certain types of duality, the visual system even
seems to create a suitable type of gestalt (e.g., “perceived picture”), which
due to learning possibly becomes equipped with specific higher-order cues.
Furthermore, as already mentioned before, such subpercepts may be
experienced as standing in a hierarchical order to each other, only one of
them belonging to perceived real space.
Like any other account of perception, this scheme needs of course to be
complemented by the concept of attention. In particular, attention may be
shifted between different subpercepts, possibly leading to a refinement or
enhanced vividness of one subpercept at the expense of another.
Finally, one might contemplate integrating the concept of a proximal
mode into our scheme by assuming that it is primarily related to an initial
state of processing that forms the basis for the later extraction of cues
and precedes the processing of cue integration and segregation. The pre-
vious remarks on cross-talk, phenomenal binding, and shifts of attention
mutatis mutandis apply here as well.
Needless to say, our proposal of how duality concepts might be inte-
grated into common concepts of cue integration throws up more ques-
tions than it is able to answer. Nonetheless, we are convinced that duality
92 Reinhard Niederée and Dieter Heyer
Figure 4.2
Strategies of how the visual system copes with perceptual ambiguities. The two
extremes (top/bottom) correspond to two stimuli that evoke different clear-cut
percepts. As an example, the case of picture perception is considered. If, now, one
proximal stimulus is gradually transformed into the other, conflicting cues will
arise. Some of the responses of our visual system are listed on the right.
These examples suggest that our visual system possesses a rather sophis-
ticated ambiguity management up to the level of the percept, which so far
has hardly been taken into account in a systematic way.
4.5 EPILOGUE
The preceding analysis gives only a first impression of the richness of the
perceptual phenomena involved in the perception of pictures and, to some
extent, mirrors and shadows. Further complexities arise when pictures are
painted or projected on curved or transparent surfaces. In those situations
sometimes a complex phenomenal interlock between perceived pictorial
space and perceived picture surface occurs, which no longer can simply be
described as the coexistence of two neatly separable aspects (several
examples of this can be found in contemporary art, e.g., in the work of
Tony Oursler). Another type of complexity arises when pictures involve
multiple perspectives, or when their rendering becomes less realistic, as for
example in impressionist art. Last but not least, the dual-ity of the spatial
and the planar aspect extends to the dynamic case of motion pictures.
Without doubt, a much more refined theoretical account, thorough phe-
nomenological studies, and experimental investigations are needed to
obtain a deeper understanding of such phenomena in picture perception.
Once one has an increased awareness of duality in perception, a
plethora of analogous phenomena that go beyond the dichotomy of a
spatial and a planar aspect attracts one’s attention. Suffice it to briefly
98 Reinhard Niederée and Dieter Heyer
mention a few examples. First, duality can occur at various levels simul-
taneously. Think, for example, of perceived pictures in a picture or paint-
ings like Givseppe Arcimboldo’s Summer. Here one perceives (as a
part of perceived pictorial space) a head composed of fruit and vegetables.
That is, at the same time a head and an arrangement of fruit and vegeta-
bles are seen. Or think of a wooden sculpture of a woman, where, loosely
speaking, the impression of a woman and of a piece of wood coexist. In
theater the simultaneous perception of an actor on the stage and of a
character within the drama “put on stage” may be counted as an instance
of dual perception (e.g., Polanyi 1970b). Of course, in all these cases, an
additional symbolic aspect can come into play that increases complex-
ity even further. All in all, perception seems to embody some of the com-
plexities typically thought of as being characteristic of language-based
cognition. In certain respects, at least, our perceptual experience some-
times seems to resemble a multilayered narration more closely than a
straightforward representation of “what there is.”
Notes
1. Note that in order to verify “that a change is less than expected,” one would,
strictly speaking, have to compare the perceived pictorial space with all of the pos-
sible virtual spaces associated with the retinal image. As a measure of discrepancy
one would then have to take the distance to the best-fitting virtual space, as it were.
Unfortunately, most studies in the field base this measure only on a single distin-
guished virtual space associated with the respective viewing position. To this end,
a “true” virtual space is singled out associated with a standard viewing position
(station point). This true virtual space is then geometrically transformed into
“the” virtual spaces for other viewing positions. So far we have not found an ade-
quate justification for this approach.
2. If the visual system were able to reconstruct the correct viewpoint, a skull
would be seen in both cases. If, in contrast, the system always picked a virtual
viewing position perpendicular to the perceived picture plane, the long egglike
object would be seen in both situations.
Chapter 5
Perceptual Strategies and Pictorial Content
Mark Rollins
5.1 INTRODUCTION
Recently in cognitive science there has been a healthy trend toward rethink-
ing some old debates. In particular, the traditional opposition between
constructivism and direct realism—the view that perception depends on
mental representations versus the view that it does not—has been cast
aside by several philosophers and scientists in favor of what can be seen
as a hybrid account. Such hybrids go by a variety of names: “animate”
or “utilitarian” or “interactive” theories of vision, for instance. (Ballard
1991; Ramachandran 1990). But all are distinguished by their rejection
of central ideas in the work of David Marr. Although Marr’s “computa-
tional Gibsonism,” as he called it, is already a compromise, it combines
elements of constructivism and direct realism in the wrong way: it restricts
the range of knowledge on which the visual system can draw and builds
certain assumptions into it. Such constraints have the effect of making
the perceiver respond to visual information partly as if the response were
direct, because no variation is permitted across perceivers. There can be
no difference due to background knowledge; thus the response is to some
extent determined by causal law. But constraints of that sort merely pro-
vide a framework within which elaborate representations are constructed.
What is wrong with Marr’s compromise, according to recent research, is
that it presupposes that such rich representations are needed; and some
evidence strongly suggests that they are not.
An alternative that has recently emerged makes vision depend on rep-
resentations but of a rather sketchy sort: “visual semiworlds,” as they have
been called (Churchland, Ramachandran, and Sejnowski 1994). These
are sufficient, it is said, precisely because perception is not constrained in
Marr’s way. The visual system draws upon background knowledge of
100 Mark Rollins
various kinds, and it collaborates with other sensory systems and motor
control: the eyes conspire with the ears and with muscles, tendons, and
joints to enable the viewer to take representational shortcuts. The virtue
of such a cooperative sensory scheme is that it makes possible perceptual
strategies. With those in hand, precise computations and representations
are no longer required.1 I shall refer to a theory of this type as a strategic
design theory (SDT). The primary question I want to address is how SDT
might be applied to picture perception and, in particular, to the perception
of pictorial space in the context of visual art.
The claim that picture perception is partly direct means different things to
different people. It is sometimes said, for instance, that picture perception,
even from a Gibsonian perspective, is indirect, although the processes by
which the information in the picture are accessed involve no elaborate
internal representations (e.g., Sedgwick, chapter 3 in this volume). But
this means only that perception of the depicted scene is mediated by the
102 Mark Rollins
the way that an upright mask does. And this turns out to be the case (at
least up to a certain viewing distance): the inverted mask is seen as con-
cave rather than as convex. Thus, it has been argued that semantic cate-
gorization affects shape-from-shading (Churchland, Ramachandran, and
Sejnowski 1994, pp. 33–34). In which case, the strong modularity thesis
would appear to be false.
To be sure, Ramachandran has also argued that economizing strategies
presuppose the existence of modules, combinations of which get reinforced
for their success in performing a task and become part of an organism’s
standard repertoire. In that case, what the evidence actually shows is not
so much that the modularity thesis per se is false as that the nature of
modules and their impact on perception is not what Marr thought it to be.
Ramachandran suggests that there may be many such modules, over two
dozen in vision, whose functions are even more specialized than Marr’s
account would lead us to believe. But I am going to argue that even if the
modularity thesis is true, that fact does not matter much for our under-
standing of how pictures represent. That point applies with even greater
force, I believe, when there is the kind of proliferation of modules that
Ramachandran suggests. Such designer modules, as we might call them,
essentially undercut the original import of modularity, as providing a
fixed, basic set of distinct abilities that can contribute to the performance
of various perceptual tasks.
In any case, my claim is that modularity in any form is not able to
carry us very far toward a hybrid account and thus does not shed light on
the contribution of the direct component to pictures. The weakness of a
modularity-based hybrid account appears in two ways. First, the con-
straints imposed by modules are actually fairly minimal, to the extent that
tasks such as picture perception are performed on the basis of perceptual
experience, that is, conscious central processes on which diverse knowl-
edge is brought to bear. For example, although he defends the modular-
ity thesis, Danto’s view is predicated on the assumption that, especially
when perceived pictures are works of art, central processes must be brought
into play. In contrast to Gombrich, Danto believes that there is an inno-
cent eye. It exists in early vision. But for him, there is also a beguiled
and jaded eye, an eye of the world, which appears later in the process-
ing stream. And that is what eventually determines pictorial content.
Although picture perception depends on a basic perceptual competence
that all perceivers have in common, according to Danto, perceptual psy-
chology cannot tell us what the content, spatial or otherwise, of a picture
110 Mark Rollins
is. It cannot tell us what a picture means. The reason is that most of the
content of a picture is, in Danto’s words, “invisible.” It depends on his-
torical relations that cannot be seen, although they can be understood.
Danto’s point is that the content of a picture must be inferred; because
history, he says, supervenes on perception through the perceiver’s beliefs.
But if we agree that this sense is legitimate, then the modularity thesis
becomes more or less irrelevant. At higher levels, the constraints on mod-
ules can always be superseded. Danto suggests that the modularity of
mind insures that there will be something like theory-neutral observations
against which to check the visual hypotheses and interpretations that we
form when confronted with pictures. But if we think of the constraints
imposed by modularity on pictorial content as something like the con-
straints imposed by observation sentences on scientific theories, then it’s
clear that their role is not that of elements from which an interpretation is
constructed but only that of providing test cases for any interpretation.
And there may be many of those interpretations that can pass the test. If
so, then this stage-based compromise with Gibson is not very compelling.
In the end, it is the indirect component that carries all the explanatory
weight.
The second manifestation of the modularity-based hybrid’s weakness is
that constraints on vision really have more the status of antecedent causal
conditions than of premises, the conditions themselves having content, in
an inductive inference. And this points to the another reason that a stage-
based hybrid account falls short of being a genuine compromise with
direct realism. Even if the visual system were modular, the modules could
be harnessed together in a variety of ways in the performance of a given
perceptual task. That is, modularity does not preclude the possibility that
a wide variety of perceptual strategies might be employed in picture recog-
nition; in that sense, the relation between informational arrays and the
visual system cannot be one to one.
relations between the picture producer and the objects represented in the
scene. These need not be actual causal relations; a painting might embody
a point of view that the artist would have, if that artist were to stand in a
certain place. But what the picture provides is a representation of the spa-
tial aspects of objects as defined by this perspective. In that case, the pic-
ture perceiver is able to gain access to the content of the picture by means
of this spatial aspect, which the perceiver can recognize without any
reliance on concepts.10 The net result is essentially a two-factored theory
of pictorial content: the reference of a picture is determined by psycho-
logical processes, the object of which is itself established by causal spatial
relations. The meaning or content of the picture in a larger sense is then a
function of higher-order interpretations that draw on background knowl-
edge, vested in a holistic conceptual scheme.
This is a theory of pictorial content that rests on shaky empirical
grounds. For example, as Patricia Churchland, Ramachandran, and
Terrence Sejnowski (1994) point out, although figure-ground organiza-
tion is generally thought to precede shape recognition, their model allows
for the order to be reversed and for shape recognition to contribute to
the identification of figure and ground. Other evidence also suggests
that the spatial organization of a picture does not always ground or con-
strain recognition in the way the Recognition theory seems to imply. For
instance, subjects appear to be able to spatially reorganize remembered
pictures and, as a consequence, identify new features and new objects in
them (Kosslyn 1994). Thus, it cannot be said that scenarios or spatial
aspects or perceptual reference frames are always fundamental. Even if
spatial properties are somehow encoded preconceptually, that fact does
not resolve the dilemma described here.
Moreover, what emerges from these results is that the processing of pic-
torial space and its relation to the recognition of depicted objects need not
be done in a single way; it depends on the strategies available to the per-
ceiver. For example, in some cases, subjects are not able to overcome the
initial perceptual organization of remembered pictures and so not able to
recognize new objects in them (Reisberg and Chambers 1991). However,
in those cases, the block can be eliminated if the subjects are allowed to
supplement perception or perceptual memory with support from motor
control. For instance, subjects can recognize verbal ambiguities from
memory (e.g., when the word “life” is repeated over and over in the mind,
it comes to be heard as “fly”); but only if allowed to engage in a kind of
silent speech. That involves more than simply rehearsing the sounds in the
113 Perceptual Strategies and Pictorial Content
Let us return for a moment to the Recognition theorist’s idea that what a
picture represents depends on what can be seen in it, using ordinary per-
ceptual abilities, and to the attempt to reconcile that idea with the view
that pictorial art is somehow languagelike. Recall that the marriage of
perception and convention was supposed to be arranged by grounding
pictorial representation on pictorial reference, using demonstrative refer-
ence as a general model. The assumption was that spatial layouts relating
the scene to the position of a viewer are necessary to establish reference
and that these layouts are perceivable without benefit of concepts. Conse-
quently, their perception is not learned and not dependent on background
knowledge. But in light of the problems with the modularity thesis and the
nonconceptual process claim that I have cited, this marriage of perception
and convention is, I believe, on the rocks. So I now want to propose the
following change. Insofar as reference does constrain what a picture can
represent, the implication of SDT is that reference is established more by
strategic success in performing a task than by causal relations. What I
mean by “strategic success” is not just that a unique computational result
is obtained, or even that a goal is achieved or a target is hit. I mean, rather,
that establishing reference is done in an efficient way that conserves the
resources of the system. This implication is suggested by the various ways
in which SDT assigns important roles to attentional control.
For example, Churchland, Ramachandran, and Sejnowski (1994) argue
that attention shifts can precede eye movements in tracking a moving
object and thus bind the visual stimulus to a location. Although Church-
land and colleagues do not describe the mechanisms by which such covert
attention shifts might be carried out, an empirically well-grounded model
based on brain studies has been suggested by David Van Essen and
Charles Anderson (Van Essen, Anderson, and Olshausen 1994; Anderson
and Van Essen 1987). Further, Ramachandran and William Hirstein
114 Mark Rollins
5.6 CONCLUSION
it highlighted the spectator. But it does suggest that pictorial space and the
appeal of perspective has something to do with mechanisms of perceptual
control. What I have argued is that adapting a plausible theory of picto-
rial representation, Recognition theory, to the various hybrid accounts of
perception that have emerged in recent years sheds light on how those
mechanisms operate. This light is, I think, more important than the ques-
tion of what it means for perception to be direct. But by articulating a
Recognition theory of this sort, we can also come to better understanding
of the differences among certain hybrid theories, which is where the most
interesting issues remain to be explored.
Notes
1. It will soon become clear that these are strategies available to the visual sys-
tem rather than the perceiver. I do not wish to imply by the term strategy that
optional modes of processing are always selected as a matter of conscious choice.
2. And in either this or its original version, the idea that there are varieties of pic-
tures can be combined with the pictorial surrogate view. The result would be that
pictorial surrogacy would then be said to come in a variety of forms.
3. Of course, we then have to define the term “indirect,” which is not a simple task
(as Schwartz 1994 has shown). I simply stipulate for now that “indirect” percep-
tion involves processes that depend on internal representations; that it does not
matter whether those representations are thought to be neural or irreducibly men-
tal (as perhaps a functionalist philosophy of mind suggests); and that to represent
something is to stand for it in some way. Thus perception is indirect if it depends
on an internal event that refers to the object perceived and if the attribution of
properties to that object presupposes the reference.
4. Compare also Stillings et al. (1987, p. 456).
5. However, on this account, pictures need not be perceived exactly as ordinary
objects are. There can be significant differences in the ways the standard percep-
tual equipment is employed.
6. There are significant differences among these thinkers. In particular, Peacocke’s
view is sometimes treated as opposed to a recognition theory of the sort that Lopes
endorses because Peacocke makes basic recognition depend on resemblance. But
these differences do not affect the point I wish to make here.
7. That is, an accurate understanding of pictorial reference by a perceiver is one
that comports with its actual causal history.
8. To this, Lopes adds an important qualification: A picture is not limited to a sin-
gle consistent point of view, as vision is. It may contain multiple and visually
impossible perspectives. Nonetheless, it is essential for Lopes to lay down some
constraints on the spatial organization of a picture and its aspects, if he is to be
able to distinguish the case of a single picture with multiple perspectives from the
case of multiple pictures side by side or in the same frame, each with a single point
121 Perceptual Strategies and Pictorial Content
of view. This he does in terms of “spatial unity.” A picture, he says, “is a repre-
sentation whose content presents a “spatially unified” aspect of its subject. . . .
[E]very part of the scene that a picture shows must be represented as standing in
certain spatial relations to every other part” (1996, p. 126). It is in this sense that
space plays a fundamental role for him. This would be true of cubist paintings or
impossible pictures, for instance, which contain multiple points of view but not
true of a picture postcard showing the various sights of Frankfurt. The former
could be related to a single object-centered representation; but the latter could not.
9. However, the converse is not true: perception might depend on concepts, but if
those are limited in scope (e.g., the concept of an edge, a line, an angle, etc. in early
vision) then perception need not be said to be affected by background knowledge.
Conceptualization does not entail cognitive penetration. This assumes, of course,
that concepts can be used in relative isolation, which is just the point that some
philosophers (Peacocke, for one) deny. To that extent, one’s view of the connec-
tions between concepts and cognitive penetrability in vision will depend on one’s
view of conceptual holism. In contrast to the other recognition theorists I have
cited, Schier does not seem to take a position on whether perceptual content, in
particular spatial content, depends on concepts or not. But for him, in either case,
early recognition processes are modular.
10. Of course, being nonconceptual is only a necessary and not a sufficient condi-
tion for spatial perception to play the dominant role that it is supposed to play. In
Marr’s model, shape and line representation involve no concepts either. Lopes
thinks that the perceiver-centered perspective is eventually to be translated into an
object-centered representation, as Marr has claimed. It is that translation, in fact,
that supports the recognition of objects under different aspects. Presumably, as
early stages of perception, shape and line recognition also constrain later stages.
Nonetheless, initially such information is viewpoint dependent; in that sense, spa-
tial aspects might be said to be fundamental.
11. Of course, the analogy to speech acts is only partial at best. The pragmatics
of language depend on more than the mechanics of verbal “gestures,” including
the intentions of the speaker and an audience as the primary context of speech.
But Recognition theory does not claim to explain all of reference in terms of
demonstratives, or even all of demonstrative reference in terms of spatial aspects.
My claim about perceptual strategies, attention, and referring is intended to be
equally modest.
12. It is important for the account I am developing here that such rerouting can
also occur in other ways that do not involve shifts of attention. For example,
brain-imaging evidence strongly suggests that after even a brief training period,
some perceptual tasks become virtually automatic, in the sense that they no longer
require attention. When that happens, they appear to be performed by different
areas of the brain than when the task was new (Posner and Raichle 1994, pp.
125–129).
13. Churchland’s model applies to resources other than concepts, for example,
representations of abstract shapes for which we have no labels, as might be pro-
duced by early visual modules. I omit here an account of the actual mechanism of
122 Mark Rollins
John Willats
such as the map of the London Underground system, are better as route
maps than pictures in perspective because the maps show the connections
between stations without irrelevant details such as the shapes of the lines
or the stations.
There is, however, another more subtle source of confusion concealed
in these questions: the temptation to judge representational systems as
natural or conventional on the basis of whether they are best described in
terms of projective geometry or symbolic rules. “Natural” theories of pic-
tures are derived from the laws of optics, and because the projection of
light rays from the scene and their intersection with the picture plane can
be described in terms of three-dimensional projective geometry, the argu-
ment that pictures are natural has focused on the question of whether the
spatial layout of pictures can be described in these terms. Margaret Hagen
(1985, 1986) has argued that all representational pictures can be regarded
as being based on what she calls “natural perspective” because their spa-
tial systems can be described in terms of three-dimensional projective
geometry. In contrast, other types of pictures such as Byzantine mosaics
and cubist paintings, which are based on inverted perspective and whose
spatial systems are difficult to describe in these terms, are often said to be
“languagelike” because they symbolize features of the scene rather than
represent them optically (Goodman 1968). This analogy is typically used
to support the argument that pictures are conventional because linguists
generally agree that the relations between the units of language and their
referents are arbitrary. As a result, it is tempting to conclude that in the
case of pictures whose spatial systems are best described in terms of sym-
bolic rules, the relation between scenes and pictures is also arbitrary. My
thesis here is that these two modes of description—in terms of the laws of
optics or symbolic rules—are not mutually exclusive but complementary
and that in many cases descriptions given in terms of symbolic rules can
also be related back to optical laws. However, I shall also argue that in
some circumstances one form of description may be more appropriate and
revealing than the other.
Take, for example, Giovanni Canaletto’s line drawing in perspective of
the Campo di Santi Giovanni e Paolo, circa 1735.3 Martin Kemp argues
that the spatial system in this and a number of similar drawings “is best
explained by supposing that it is based on a camera image” (1990, p.
197)—that is, that Canaletto produced these drawings with the aid of a
camera obscura, the forerunner of the modern photographic camera. To
support his argument Kemp draws attention to the unusual quality of the
127 Optical Laws or Symbolic Rules?
projective geometry, pictures based on these systems exploit what she calls
“natural perspective, the geometry of the light that strikes the eye” (1986,
p. 8). To the extent that such pictures provide at least approximations to
views of possible views of objects or scenes this argument has some force
but only in relation to picture perception. However, her failure to recog-
nize that these systems can equally well be defined in terms of symbolic
rules—rules about the lengths and directions of the orthogonals—has led
her to a number of false conclusions.
The first of these is her claim that “there is no development in art”
(1985, p. 59), which she applies to both artists’ pictures and children’s
drawings. She was obliged to make this claim because it follows from her
contention that “all representational pictures from any culture or period
in history” and all children’s drawings (beyond the scribbling stage) are
based on “natural perspective, the geometry of the light that strikes the
eye” (1986, p. 8). If oblique projection, orthogonal projection, and the like
are simply varieties of perspective, and perspective is defined in terms of
three-dimensional projective geometry, how can there be any develop-
ment from one system to another? As Hagen puts it, “is it possible to
measure developmental level with a tool that itself shows no develop-
ment?” (1985, p. 76). The flaw in Hagen’s argument is that some periods
in art history do show a definite developmental sequence, and this is also
true in the development of children’s drawings. Moreover, this sequence is
not difficult to understand once the projection systems are defined in terms
of drawing rules.
Perhaps the most obvious example is the case of children’s drawings
of rectangular objects such as tables or cubes. Although the details vary
somewhat according to the experimental conditions, and the terminology
varies from one writer to another, a number of experiments have shown
that children’s drawing development follows the sequence: topological
geometry, orthogonal projection, horizontal and vertical oblique projec-
tion, oblique projection, and, finally, some form of perspective (Lee and
Bremner 1987; Nicholls and Kennedy 1992; Victoria 1982; Willats 1977).
This sequence is easily described in terms of increasingly complex draw-
ing rules. Thus in drawings based on topological geometry there are no
orthogonals; in orthogonal projection, the orthogonals are represented by
points; in horizontal and vertical oblique projection, the orthogonals are
horizontal or vertical; in oblique projection, the orthogonals run at an
oblique angle across the picture surface; and in perspective, they are also
oblique but converge to a vanishing point. So the rules for producing
pictures in these systems are increasingly complex (Willats 1977; 1997). In
130 John Willats
scenes, not from pictures, and judged by the standards of optical realism
pictures in these systems must be regarded as anomalous. But what could
be the reason for the existence of such anomalies?
There appear to be at least four possible reasons for these anomalies.
The first, and simplest, is the use of crude or mistaken rules or of incom-
patible mixtures of rules. For example, the fifteenth-century Birth of
Saint John the Baptist by Giovanni de Paolo is in a crude form of per-
spective but also contains incompatible mixtures of oblique projections.6
Anomalies of this kind often occur during periods of transition when
one drawing rule is being replaced by another: in this case, when oblique
projection and its variants were being replaced by perspective. Similar
anomalies appear in children’s drawings, during the transitions between
different developmental stages.
Second, as I have argued elsewhere, the anomalies that are so charac-
teristic of Byzantine mosaics and fifteenth- and sixteenth-century Russian
icon paintings are not “mistakes” but may have served quite specific the-
ological purposes. Unlike Catholic fifteenth- and sixteenth-century art,
and seventeenth-century Protestant art, Orthodox paintings and mosaics
were not intended to look optically realistic; rather, they were intended to
depict a spiritual world, as opposed to the material world of the senses.
Moreover, the flattening of the pictorial space that resulted from the use
of such anomalous devices as inverted perspective and false attachment
drew attention to the picture as a physical object so that the union of the
depicted spiritual world and its material support provided a metaphor for
the incarnation (Willats 1997).
Third, many artists, particularly during the twentieth century, used
anomalies for expressive reasons. Much of the poetic quality of Marc
Chagall’s paintings, for example, can be attributed to the fact that they
contain mixtures of drawing systems that impart a dreamlike and other-
worldly quality to the painting, similar to that found in many icons, and
it is surely no coincidence that Chagall was influenced by icon painting.
Finally, a number of modern painters have used anomalies as a way of
investigating the nature of painting itself. This seems to have been the
impulse behind much of the work of such diverse painters as Juan Gris,
perhaps the purist and most technical of all the cubist painters; the surre-
alist painter René Magritte; and Paul Klee. An adequate account of the
anomalies in the work of these painters, however, requires a fuller account
of the spatial systems in pictures than that given earlier in terms of topo-
logical and projective geometry.
133 Optical Laws or Symbolic Rules?
According to David Marr (1982) any scheme for representing shape and
space (and this includes pictures as well as scenes) must have at least two
components. The first component defines the spatial relations between the
units of which the scheme is composed. For example, the shapes of the
actual buildings in the Campo di Santi Giovanni e Paolo in Venice might
be described in terms of the relative directions of their edges in three-
dimensional space, whereas in drawings of this scene their shapes would
be represented by the relative directions of the lines representing those
edges. I shall refer to the systems that map the spatial relations in scenes
into corresponding relations on the picture surface as the drawing systems.
The second component in Marr’s account consists of the units of which
such representations are composed: Marr calls these the primitives of the
system. For example, a description of the shapes of the buildings in a
scene might be given in terms of edges as scene primitives, and the repre-
sentations of these buildings in a drawing might be made up of lines as
picture primitives. I shall refer to the systems in pictures that map scene
primitives into corresponding picture primitives as the denotation systems.
Thus an account of the representational systems in pictures can be given
in terms of two mutually dependent components: the drawing systems and
the denotation systems. In the drawing systems
spatial relations are mapped by the spatial relations
in the scene drawing systems into in the picture
and in the denotation systems
scene primitives are mapped by the picture primitives
(such as edges) denotation systems into (such as lines)
There are five main drawing systems: perspective; oblique projection
and its variants; orthogonal projection; inverted perspective; and systems
based on topological geometry. In addition, there are three main denota-
tion systems: optical systems; line drawings; and silhouettes in which the
primitives are two-dimensional regions. As (with certain restrictions) any
of the drawing systems can be combined with any of the denotation sys-
tems, this provides quite a rich method of classification that can be used
to describe the representational systems in pictures from a wide variety of
periods and cultures.
As with the drawing systems, the denotation systems can be defined in
terms of either the laws of optics or symbolic rules. In optical systems
zero-dimensional dots of tone or color are used to denote the intercepts of
small bundles of light rays. Typical examples are television pictures and
134 John Willats
tation system would have been based on an implicit rule: “use lines to
denote edges and contours.”
Gibson maintained that “there is no point-to-point correspondence of
brightness or color between the optic array from a line drawing and the
optic array from a scene” (1971, p. 28). Certainly, early attempts to obtain
line drawings from gray-scale images automatically, by scanning photo-
graphs and representing the luminance steps in the image (points where
the tonal values change abruptly), are fairly unconvincing (Marr 1982).
However, research by Don Pearson, Hanna, and Martinez (1990) suggests
that Gibson’s assertion that there is no correspondence between arrays
from line drawings and real scenes may have been unduly pessimistic,
because they were able to produce quite respectable line drawings from
gray-scale images by using what they called a “cartoon operator” tuned
to pick out a combination of luminance steps and luminance valleys.
Nevertheless, what works for a computer does not necessarily work for
human beings. Many of the mistakes made by amateur artists and art stu-
dents come about as a result of the temptation to use lines to represent the
abrupt changes in luminance values that come about at the edges of shad-
ows, and this is particularly the case in figure drawing. An important part
of an artist’s training consists of learning to ignore the boundaries of the
shadows that often mask the points where contours end and, instead, to
use the implicit rule “use lines to denote contours.”
Thus when it comes to picture production, the denotation systems in
optically realistic pictures are often best described in terms of optics and
the fall of light, whereas the denotation systems in line drawings are usu-
ally best described in terms of symbolic rules. But if so, why is it that line
drawings can look so convincingly realistic? Pearson and colleagues sug-
gest that this might be because the early stages of the human visual sys-
tem include a feature detector, similar to their cartoon operator, that picks
out a combination of luminance step edges and luminance valleys, so
although the optic array from a line drawing is not physically similar to
that obtained from a real scene, it nevertheless provides an effective per-
ceptual equivalent. If this is indeed the case, it might make sense to
describe the perception of line drawings in terms of these features of the
optic array, rather than in terms of rules about what lines may and may
not denote.
Finally, are the denotation systems in pictures based on regions as
primitives best described in terms of optics or symbolic rules? It is not
difficult to see how pictures of this kind can be derived using symbolic
136 John Willats
rules, such as the rule “use a round region to denote a round volume,” and
it seems likely that children employ implicit rules of this kind as a basis for
their earliest drawings. However, it also seems likely that rules of this kind
have a “natural” basis in optics, because round volumes will always proj-
ect a round region in the visual field, whereas a long volume will usually
project a long region, except in the unlikely case in which a long region
is viewed end-on. In evolutionary terms, such “natural” associations pre-
sumably come about as the result of an association between the shape of
the object and the view of that object that is most frequently encountered
(Willats 1992b). Thus William Stern, commenting on the lines and round
regions that children use in their tadpole figures, said: “We call these sym-
bols ‘natural’ because their meaning does not first require to be learnt (as
in the case of letters or mathematical signs) but directly occur to the child,
and are used by him as a matter of course. Thus a long stroke is a natural
symbol for an arm or leg, a small circle for an eye or head” (1930, pp.
369–370).
By the standards of optical realism and projective geometry, tadpole
drawings must be judged to be anomalous, and as adults we do in fact
make such judgments when we look at children’s early drawings—which
is perhaps why they look so puzzling. When we look at tadpole figures we
interpret the lines as representing contours, and these lines look “wrong”
as the projections of the contours of the human figure. For the child, how-
ever, the lines do not represent contours, but form the outlines of regions
representing whole volumes. Thus the drawing rules underlying the per-
ception of these drawings by adults are quite different from the rules
by which such drawings have been produced. Perhaps an analogy might
be drawn here between children’s early drawings and their early speech.
Although sentences such as “Nobody don’t like me” sound wrong to
adults, “It is important to understand that when children make such
errors, they are not producing flawed or incomplete replicas of adult sen-
tences; they are producing sentences that are correct and grammatical
with respect to their current internalized grammar” (Moskowitz 1978,
p. 89). I suggest that, in the same way, the drawing rules by which the
tadpole figures have been produced are consistent in their own terms;
but they are not the rules normally used by adults.
According to Breyne Moskowitz, “Children’s errors are essential data
for students of child language because it is the consistent departures from
the adult model that indicate the nature of a child’s current hypotheses
about the grammar of a language” (ibid., p. 89). Similarly, the anomalies
137 Optical Laws or Symbolic Rules?
lous denotation systems used for the arms and legs are highlighted with
patches of color: yellow for the arms and green for the stockings. In fig-
ure 6.1B these areas are shown using two different tones. Systems of this
kind are very rare in adult pictures but common in children’s early draw-
ings such as the tadpole figures.
The second group of anomalies violates Huffman’s rule that a line must
keep the same meaning along its whole length: in Klee’s painting this rule
is broken in at least six places. One instance of this can be seen in the line
that begins in the top of the patch of red paint representing the girl’s hair.
In its path round the girl’s head this line begins by representing the con-
tours of her head, cheek, and chin; then it turns sharply and continues as
a representation of her shoulder. Just past this sharp turn, however, at a
point marked by the beginning of the patch of yellow paint (1), the line
changes its reference and denotes the girl’s arm as a long volume. Two
more anomalous changes of reference occur in the line that begins at the
top right of the painting. At first this line represents the girl’s arm as a long
volume. Then, just past the edge of the patch of yellow paint it turns and
changes twice, representing first the back of her blouse as a contour (2)
and then the line of her belt as a long region (3). Finally, no fewer than
three anomalies are present in the near-vertical line in the center of the
painting. At the bottom of the painting, within the patch of green paint,
this line represents her foot and her leg as long volumes. Then, as it
crosses the edge of her skirt (4), it represents the contour of a fold in her
skirt seen against the skirt as a background. Next, as it passes the line
of her belt (5) it changes its reference again, representing the contour of
her bodice, but this time against a blank background to the left. Finally,
the line changes its reference for a third time (6), and ends by representing
the ridge in the girl’s neck as she turns her head.
The third group of anomalies in the picture concern the way Klee uses
line junctions. In normal line drawings of smooth objects, there are two
types of line junctions: T-junctions, which represent the points where con-
tours disappear behind surfaces, and end-junctions, which represent the
points where contours end. With Green Stockings contains examples of
both types of junction, used both anomalously and in the normal way. A
normal T-junction occurs at the bottom right of the painting, where the
line representing the girl’s left leg disappears behind the edge of her skirt.
A similar T-junction occurs at the bottom left of the painting, but here the
line of the leg, instead of disappearing behind the edge of the skirt, con-
tinues in view. We may interpret this anomalous junction in two ways.
140 John Willats
Figure 6.1
(A) Paul Klee, Mit Grünen Strümpfen (With Green Stockings), 1939, watercolor
and blotting paper, 34.9 cm _21.0 cm (13 43 ˝_ 841˝), Felix Klee Collection, Berne.
© 2001 Artists Rights Society (ARS), New York / VG Bild-Kunst, Bonn. (B)
Analysis of Paul Klee’s With Green Stockings. The patches of tone correspond to
patches of color in the painting (yellow for the arms and green for the stockings).
These patches highlight line segments based on a denotation system in which lines
denote volumes. Numbers 1 through 6 mark anomalous changes in meaning along
the lines: for example, 1 indicates the change from a system in which the line
denotes a contour to a system in which the line denotes a volume. The single aster-
isk marks a false attachment giving rise to an anomalous T-junction. The double
asterisks mark false attachments between features in the picture and the picture
frame: in this case, the edge of the blotting paper. The arrows indicate anomalous
end-junctions. See plate 3 for color version.
141 Optical Laws or Symbolic Rules?
The first, and perhaps more obvious way, is to think of the continuation
of this line past the edge of the skirt as an example of transparency: we can
still see the leg as it passes behind the skirt, as if the skirt were transpar-
ent. Alternatively, we can think of this anomalous junction as occurring
as the result of a false attachment (see the part marked “*” in figure 6.1B):
the line above the edge of the skirt represents the contour of a fold in
the surface of the skirt, but this contour is falsely attached to the line
of the leg. With Green Stockings also contains both normal and anoma-
lous end-junctions. If we follow the line of the contour of the skirt on the
right of the painting, up past the point at which it is caught in by the girl’s
belt, it ends in the normal way. On the left of the painting, however, a simi-
lar line simply ends in mid-air (see the “→” symbol). Similar anomalous
end-junctions occur in the contours of the ball above the girl’s head. In
terms of picture perception we can no longer interpret these end-junctions
as picture primitives denoting the points where contours end; instead, they
simply become marks on the surface of the picture.
Finally, like Gris’s Breakfast, Klee’s painting contains examples of false
attachments (see the part marked “**” in figure 6.1B) between features of
the picture and the edge of the frame: these occur where the lines repre-
senting the ends of the arms just touch the edge of the paper.
I have analyzed Klee’s With Green Stockings in some detail, and alluded,
though more briefly, to the pictorial devices in Gris’s Breakfast, because
these two paintings represent extreme test cases for the argument that the
spatial systems in pictures of this kind are “conventional” in the sense that
in them the relations between scenes and pictures are arbitrary and sym-
bolic rather than natural. At first glance both these paintings may sug-
gest that this argument is convincing. Clearly, neither of these paintings
can easily be accounted for in terms of the laws of optics, or by simply say-
ing that they reproduce the structure of the optic array, or the “light that
strikes the eye.” Nevertheless, I have tried to show that in neither case are
the relations between scenes and pictures arbitrary. Both paintings con-
tain numerous instances of pictorial anomalies, but nearly all these
instances can be accounted for by saying that they break or reverse rules
derived from optical laws.
My purpose in giving these examples is to show that it is sometimes
appropriate to describe the representational systems in terms of the laws
of optics, and at other times it may be better to describe them in terms of
pictorial rules. Psychologists and theoretical writers on theories of pic-
tures have tended to favor descriptions based on projective geometry and
142 John Willats
ACKNOWLEDGMENT
Notes
1. For example, Hagen regards all the representational systems that can be defined
in terms of projective geometry as perceptually equivalent, describing them as
varieties of “natural perspective” but argues that: “The selection from among the
representational options of natural perspective in terms of bias, preference, value
or function is determined by cultural convention” (1986, p. 83).
2. In practice, of course, photography nearly always involves some human
intervention.
3. Illustrated in Kemp (1990, fig. 390).
143 Optical Laws or Symbolic Rules?
4. Kemp (1990, fig. 278, pp. 145, 146). See also Kemp’s account of the working
methods of Pieter Saenredam, on pp. 114–116, and see also Willats (1997), on
pp. 190–193.
5. Illustrated in Willats (1997, fig. 3.2).
6. Illustrated in Dubery and Willats (1983, fig. 57, p. 54).
7. Children’s early drawings are usually composed of marks consisting of lines,
but it is crucial to distinguish between the marks in a picture, and the more
abstract notion of the picture primitives that these marks represent (Willats 1992a).
8. Illustrated in Gablik (1985, fig. 196).
9. Illustrated and analyzed in Willats (1997, pl. 6, pp. 275–279).
10. Illustrated and analyzed in Willats (1997, pl. 5, pp. 264–267).
11. Illustrated and analyzed in Willats (1997, fig. 12.3, pp. 280–281).
This page intentionally left blank
Chapter 7
Perspective, Convention, and Compromise
Robert Hopkins
distant corners lie are closer to each other than are the directions of the
two nearer ones.3
I make two observations about visible figure. The first is that it is a gen-
uine property of things in our environment. For the point of view is just
the actual location from which the world is seen, and visible figure is sim-
ply a matter of the spatial relations in which parts of the object stand to
that location. This is a complex property determined by the object’s 3-D
shape and orientation to the point of view, but it is distinct from either of
these. Reid realized this, despite the misleading contrast with “real” figure,
for he described visible figure itself as “a real and external object to the
eye” (ibid., chap. 4, sec. 8).
The second observation is that visible figure deserves its name: it is
something we see.4 My comments about the directions of corners of the
table do not merely describe how the world is laid out. They also capture
how my experience presents that arrangement. I see the directions in
which the various corners lie, and see, for instance, that opposite points
on the table’s long edges lie in directions ever closer together, the farther
away from me they are. Of course, as with any property we perceive, our
experience can be misleading and will always be imprecise in certain
respects. When a half-submerged stick looks bent we misperceive, not just
its 3-D shape, but its visible figure too. And no visible figure is perceived
with complete precision: there is always a degree of specificity beyond
which experience neither represents the object as having this visible figure
or as having that one. Indeed, imprecision is an important feature of our
perception of this property. For experienced visible figure is only as deter-
minate as the point of view it involves, and in general our visual experi-
ence presents the world not from a point but from a zone in environmental
space large enough, in binocular vision, to include the actual location of
both eyes.5 But the observation stands: visible figure is seen, albeit with
varying accuracy and precision. This is just as well. Visible figure can
hardly be central to making and understanding pictures, those represen-
tations so closely bound to vision, unless visible figure is something vision
makes available.
Reid noted that it is the artist’s job “to hunt this fugitive form [i.e., vis-
ible figure], and to take a copy of it” (ibid., chap. 6, sec. 7). I more or less
agree. For it is possible for a picture to resemble its object in visible figure,
even though they differ considerably in 3-D shape. A picture of our table,
for instance, might represent the four corners of its top by marks which lie
in just the same directions from my point of view as did the corners them-
149 Perspective, Convention, and Compromise
selves, when I stood before the table. The marks lie in the same directions,
from the relevant point of view, as did the corners, even though, by lying
at different distances in those directions, they form a configuration very
different in 3-D shape from the table itself. But Reid’s formulation is not
quite right. What is crucial for pictorial representation is not actual resem-
blance in visible figure but that the marks be experienced as resembling the
depicted object in this respect. To experience this resemblance is to see the
object in the picture. And provided some intention or causal connection
renders it right to experience the picture in that way, it is the picture’s sus-
taining such experiences of resemblance in visible figure that constitutes its
depicting that thing. Thus we have the basics of an account of picturing.
What are the implications of this view for our question about perspective?
at best goes only a small way toward meeting the challenge posed. For
even if all picturing in perspective meets the view’s criterion for picto-
rial representation, that still leaves all other picturing, quite implausibly,
excluded.6 In effect, the resemblance view seems to have taken what is true
of a subset of picturing, a subset at best as large as that of all picturing in
perspective, and made it definitive of picturing per se. And this leaves it
giving implausible answers to the two questions before us. To the question
“what is pictorial representation?” it gives an implausibly narrow answer,
in effect one applying only to perspectival picturing. If so, it also gives the
wrong answer to the question “what is special about picturing in perspec-
tive?” For it implies that it is special precisely in being the only picturing
there is.
To save the resemblance view, we need first to argue that it covers all
the rich variety in ways of picturing. But, having done that, we need also
to find some other account of what is special about pictures in perspective.
The next section undertakes the first of these tasks. Because its main aim
is negative—that the view can evade the current objection—readers more
interested in my positive assertions about picturing and perspective may
choose to skip this section.
ally preserving visible figure and those experienced as doing so. More or
less, then, only those marks preserving visible figure will count as depict-
ing the table. The challenge to the view is to make room for a wider range
of marks, counting as depictions of the table.
Showing that the resemblance view can meet this challenge is a complex
matter. Here I can do no more than sketch some of the ingredients in a full
reply. In this section I make three points. All three embody the basic
observation that marks depicting the table need not match any further in
content than that.
The first point is that not all marks depicting the table need depict it
accurately. Some, that is, might depict the table as having properties it
does not actually enjoy. It is obvious that, in general, pictorial misrepre-
sentation is possible. A caricature of Tony Blair need not show his nose as
being the shape it is and may indeed distort his features quite drastically,
while still depicting him. So there is no reason for every picture that
depicts our table to show it as the shape or color it really is or as having
the appropriate number of legs. Such misrepresentations differ in content
from each other and from accurate depictions (save, of course, that at
least one element is common to the content of all: the table). Now, with
respect to such groups of pictures, the resemblance view, far from pre-
cluding differences in the marks composing them, actually expects such.
Like any account of depiction, it seeks to say what it is for a picture to
have its pictorial content. For instance, it will say that for something to be
a picture of the table as having six legs, a round top, and so forth, is for it
to be experienced as resembling in visible figure the table with six legs, a
round top, and so on; but that to depict the table as round topped, and so
forth, but with seven legs, the marks must be experienced as resembling
the table with seven legs, a round top, and so on. The view does not say
how the surfaces must be marked to sustain these experiences. But given
the rather limited slippage between actual resemblance and experience of
it, it will at least be comfortable with the idea that different experiences of
resemblance will be sustained by differently marked surfaces. Here then is
one simple way in which marks depicting the table, from the specified
angle, may differ from one another, so that not all of them preserve the
table’s actual visible figure.
The thoughts underlying this move are central to the second point,
too, so it is worth spelling them out. Pictures need not be of particular
things: a painting can depict a horse without their being any answer to the
question of which horse it represents. But let us for the moment stick to
pictures that do depict particulars. In effect, I have said that for these
152 Robert Hopkins
pictures the view makes central, not experienced resemblance to the pic-
ture’s object as it actually is, but experienced resemblance to that item
with whatever properties it is depicted as having. As I have noted, it is in
a way obvious that this is how things should be. For only thus can the
view perform its main job, of stating what it is for a picture to have a given
pictorial content. For because the content of a picture is always richer
than just the representation of some particular, and because there is no
need for the properties filling out that content to be limited to ones the
particular actually enjoys, the view must say that what matters is that
resemblance to the object be experienced with whatever properties the
picture ascribes.
However, despite this obviousness, discussions of resemblance views
have always overlooked this point. There are several reasons for this.
Some are good reasons, although there is no space here to argue that they
do not justify rejecting the claims of the last paragraph.7 But others are
very poor. Among the poor reasons is a fixation on the idea of perspecti-
val systems as ways of projecting particular objects onto plane surfaces.
Thinking that this is what perspectival systems do, and thereby thinking
of nonperspectival picturing as another way, or set of ways, of achieving
the same result, makes it especially hard to grasp the thought that the test
of the resulting marks is experienced resemblance to whatever the picture
depicts and not experienced resemblance to the original projected object.
Yet in fact the notion of a perspectival system, at least in its full gener-
ality, has nothing to do with the projection of real things. As I noted at
the start, rather than conceiving of perspective in this historical way, it is
better to think of it as a set of conditions that completed pictures may or
may not meet. So conceived, perspective can apply, or fail to apply, to pic-
tures whether they are of particular objects represented accurately, of par-
ticular objects with properties they do not in fact enjoy, or not of any
particular object at all. If we do not conceive of perspectival picturing in
this way, it will seem less interesting than it really is. For, as noted, pic-
tures in general are not limited to representing particulars nor to rep-
resenting any particulars they do depict as having the properties they
actually do. So picturing in perspective, if it is conceived as a way of
projecting particular objects onto surfaces, does not, whatever its other
benefits and limitations, offer the same range of possible contents, in terms
of fundamental logical kinds, as picturing in general does.
This first point begins the process of broadening the range of pictures
the view can countenance. But it only gets us so far. For misrepresenting
pictures too might either be in perspective or not. We have seen how very
153 Perspective, Convention, and Compromise
different marks might all depict our table, by ascribing different properties
to it. But nothing said so far shows how those pictures might be other than
perspectival. So let me turn to the second point. This is to identify a sec-
ond way in which pictures of the table might differ in content: rather than
being inaccurate, their content might be imprecise. One might represent
the table’s shape very precisely, while another represents it as merely
roughly round. For pictorial content to be imprecise is for there to be no
answer, beyond a certain level of detail, as to which of two distinct prop-
erties the picture ascribes to its object.8
It is clear that pictures can have imprecise content. A quick sketch
might represent nothing more than a figure’s rough shape and posture. It
should be equally clear that the resemblance view can allow for this fea-
ture of depiction. As we saw, what matters is experienced resemblance to
the object as depicted. There is no reason why marks should not be expe-
rienced as resembling something with fairly indeterminate properties. But
it will help meet the challenge to spell out just what such an experience
does and does not involve.
What is not required is that the subject misperceive the marks, or only
perceive them with some rather limited level of precision. I can see a
child’s drawing as resembling in visible figure our table, with the prop-
erties the picture ascribes, even if I see the wobbly lines that make up the
drawing with perfect clarity. Nor is it required that I experience those
marks as only resembling the (relatively indeterminately characterized)
table to a certain degree. In the preceding text I made no mention of
degrees of resemblance in explaining how the view accommodates mis-
representation, because there is no need to appeal to that notion. For all
I said was that we experience pictures as resembling their objects, as
depicted, perfectly, even when the objects as they actually are do not share
the marks’ visible figure. And the same holds here. I may experience the
child’s wobbly lines as perfectly resembling a table in visible figure, pro-
vided the table in question is only roughly characterized, for example, as
more or less cuboid, rather longer than it is wide, and so forth. What is
required for such experiences is that although I may see the detailed fea-
tures of the marks, only certain aspects of them are salient in my experi-
ence of resemblance. It is the rough shape of the line, not its meandering
edges, that is prominent for me in seeing the resemblance to the table,
indeterminately characterized.
How exactly do these observations about imprecise pictorial content
help the resemblance view respond to the challenge? Different pictures of
our table might, without misrepresenting it, differ in content from the
154 Robert Hopkins
The points of the last section go a good way toward meeting the challenge
to the resemblance view. Marks very different from those preserving our
table’s visible figure might nonetheless depict it, from the right angle, pro-
vided they either misrepresent it or represent its properties only impre-
cisely, or, of course, do both. But I promised to do more than merely
deflect objections. I promised some form of compromise with those who
take depiction itself, and perspectival picturing in particular, to be either
properly conventional or a nonconventional consequence of conventions.
How does the view yield that compromise?
We can see room for compromise by getting clear about the view’s
ambitions. It seeks to analyze depiction in terms of the experience to
which it gives rise, seeing-in. To do that, it needs to tell us what that expe-
rience is. It is the experience of resemblance in visible figure. But charac-
terizing an experience is not the same thing as describing its causes. So far
our view has not said much about how the world must be for a given sur-
face to bring a given spectator to see a given thing in it. It is not entirely
without implications for this question. If seeing something in a set of
marks is a matter of experiencing them as resembling that thing in visible
figure, then, given some basic facts of optics and psychology, the only
marks that will do, except in exceptional circumstances, are ones at least
approximating to the object’s visible figure. And although the last section
argued that there is no need for the approximation to be very exact, it does
not follow that it need not hold at all.10 But the causal condition thus
stated is at best necessary. Many other factors will come into play in dic-
tating whether a given experience of resemblance in fact occurs. Saying
what these factors are is no part of the brief of our position. Its goal is to
analyze seeing-in, and thereby pictorial representation, not to describe the
empirical facts about when that experience is engendered. However, the
view can at least acknowledge these facts. It is here that the seeds of com-
promise can be found. For the view can allow that many of the factors
empirically determining a given experience of resemblance are precisely
157 Perspective, Convention, and Compromise
to which a person has been exposed and the ways in which those have
affected that individual’s perceptual sensitivities. Although few have real-
ized this, the resemblance view is easily able to acknowledge the role such
factors play. As Reid noted, visible figure is “fugitive,” that is, it is hard to
perceive with precision and accuracy. The less sharp one’s perceptual
grasp of it, the less sensitive one will be to a picture’s failure to preserve it.
In consequence, many nonperspectival pictures may well have been expe-
rienced as resembling their objects in visible figure perfectly well, for all
that they don’t in fact resemble them more than approximately. For they
may have been intended for, and enjoyed by, viewers whose perception of
visible figure was less than ideally acute. And one result of the rediscovery
of perspective may have been to sharpen viewers’ ability to perceive visi-
ble figure, thereby altering which marks will be experienced as resembling
things in that respect.
Here the compromise with the skeptic is considerable. Our view allows
that picturing essentially involves something independent of conventions,
the generation of a particular experience, but the view also allows that cer-
tain pictures may themselves alter which pictures are experienced in that
way. The suggestion is that perspectival pictures may be one such case,
altering viewers’ general perceptual propensities in order to bolster their
own position as the marked surfaces most likely to elicit the required expe-
riential response. The point can be illustrated by tackling an argument of
Nelson Goodman’s, one intended precisely as an objection to perspec-
tive’s special status.
One of Goodman’s points against perspective was, in effect, that not
even pictures in Albertian perspective preserve visible figure (1969, chap.
1). The Albertian system requires receding horizontal parallels to be rep-
resented by converging lines but not so parallels receding in the vertical
plane. As Goodman observed, this is not true to the geometry. The differ-
ence between the directions of opposing points on the parallels narrows
just as rapidly with recession in the vertical plane as in the horizontal.
What Goodman overlooked, however, is that this discrepancy may reflect
an aspect of our experience of visible figure. Setting aside pictures for a
moment, consider our face-to-face experience with things. Suppose that in
such experience we are more sensitive to this shift in the direction of points
on receding parallels when the recession is horizontal, as along railway
tracks, than when it is vertical, as with the sides of a building viewed
from the street. Then we would expect pictures that are experienced as
preserving visible figure to exhibit precisely the discrepancy noted. Sys-
159 Perspective, Convention, and Compromise
The last section tried to show that the resemblance view can accommodate
some of the factors that have impressed those skeptical about perspec-
tive’s special status. Section 7.4 argued that the view is in no way obliged
to restrict depiction to picturing in perspective. This leaves one issue out-
standing. We need to say what is special about perspectival picturing,
according to the resemblance view. If it’s not the only picturing there is,
and if the truth about both depiction in general and perspectival depiction
in particular leaves room for at least some of the culturally relative fac-
tors the skeptic emphasizes, then how should we answer the question with
which we began this chapter? Is perspectival picturing conventional or a
consequence of conventions? And if it is not conventional, what is the
problem that it solves better than any rivals?
Perspectival picturing is not a consequence of conventions, for pictur-
ing itself is not conventional. Pictorial representation essentially involves
the generation of a certain experience, one of resemblance in visible figure.
Generating this experience, in such a way as to effect communication, is
the problem that methods of picturing have to solve. More precisely, in
any particular case the problem is so to mark a surface such that it can be
seen as resembling, in visible figure, a certain thing or scene, with certain
properties, from a certain angle. Given our account of convention (section
161 Perspective, Convention, and Compromise
7.1), picturing will be conventional just if there are two or more equally
good solutions to each such problem, the choice between them being made
on the basis that others have also chosen likewise, it also being common
knowledge that this is what is going on. There are three reasons why this
condition is not met.
First, the range of possible marks in which the target content can be
seen is narrowly constrained. For despite all the concessions made above,
it remains the case that in general a causally necessary condition is that the
marks preserve, at least roughly, the relevant visible figure (section 7.5).
This excludes most of the possible ways of marking the surface from being
solutions to the problem in hand.
Now, those concessions were real. I allowed that other factors do play
a role in determining experiences of resemblance, factors other than the
marks’ actually reproducing the visible figure of the target scene. Nonethe-
less, none of these factors plays the role convention would require. For—
the second reason—they are not in general factors under the control of
a given group of picture makers or consumers. This is clearly true for
the general nature of the perceptual environment. We can control the
nature of what others or ourselves are in perceptual contact with but only
under special circumstances (e.g., persuading someone to take part in a
psychological experiment) and over very limited portions of space and
time. The limited context over which we thus exercise control plays a
highly restricted role in conditioning pictorial perception, compared with
the subject’s perceptual life history. The point also holds for pictorial
acculturation. No doubt for any pictorial education a viewer has had,
there is a different one that person might have undergone. But it does not
follow that either the viewer, or any picture maker currently attempting to
communicate with that viewer, could have chosen to give him the one
rather than the other. It is a matter of the history of the viewer’s culture
and own exposure, over many years, to various images. And something
similar is at least in part true of our habits of viewing pictures. One can,
by prescription or by constricting the viewer’s movement, in exceptional
circumstances limit the ways in which the picture is viewed. But even
here many of the key factors, such as the tendency to deploy eye or head
movement, reflect engrained perceptual habits and even basic facts about
visual physiology, which cannot be swept aside, however determined we
are to find a new way of doing things.
Perhaps these considerations do not exclude every relevant possibility.
Perhaps they leave some room at the margins, by manipulating the factors
162 Robert Hopkins
discussed, for alternative means to the same experiential end. But, and this
is my third reason that the aforementioned condition on conventionality
is not met, even if it is true that alternative means exist, it is not common
knowledge that it is. Different viewing habits, for instance, may affect what
people see a set of marks as resembling, but those people are not aware
that this is so. This will prevent them from knowing that what they are
responding to is one of two equally good solutions to a given pictorial
problem. And thus, insofar as the factors of section 7.5 meet the control
condition for conventionality, they do not meet the common knowledge
condition. Picturing, it seems, is not conventional.
Nor is it the case that perspectival picturing is conventional. It might
seem that my attempt to defuse the objection that the resemblance view is
hopelessly narrow forces just this result. For if perspectival picturing is
not the only sort there is, and if the problem solved by all pictorial systems
is just that of generating experiences of resemblance to certain things, it
may seem that there is nothing left for perspective systems to do better
than their rivals. If not, we would have in place at least part of what is
required for their adoption to be a matter of convention. But this line is
mistaken. There is something perspective systems do that others do not:
they are not the only pictures, but they constitute the only way to depict a
certain kind of thing. For only pictures in perspective depict detailed spa-
tial content. Or so at least I will argue. I cannot derive this conclusion
from the resemblance view alone. However, the only extra support need-
ed is a little theoretical reflection and some plausible psychological specu-
lation. Given these foundations, although nonperspectival pictures might
depict spatial detail, it seems that they do not in fact do so. Moreover, a
lot would have to change before they would.
The theoretical reflection is this. Consider properties that admit of
ordered variation. Colors are one example—there are many shades
between pure green and pure blue. Spatial properties are another—my
hand can occupy many positions between my two feet. One can represent
such properties, of course, in more or less precise ways—merely by speci-
fying rough hues or locations, or by specifying more determinate ones.
Now, some forms of representation allow one to represent instances of
these properties in a highy precise but piecemeal way. One could, for
instance, have a word for a particular shade of blue, without having words
for any but the roughest categories of the other colors. But depiction is not
a form of representation that allows for this possibility. If you can depict
an instance of some color, at a certain level of precision, your represen-
163 Perspective, Convention, and Compromise
tational resources must also allow you to depict other colors, with equal
precision. And likewise for spatial properties.13
Coupled with the resemblance view, this thought promises to ensure
that only pictures in perspective can depict precise spatial properties. For
suppose we consider a competing system—so-called inverse perspective.
In pictures drawn on these principles, receding parallel lines in pictorial
space are represented by lines on the canvas that diverge, in sharp con-
trast to perspectival systems, in which such lines converge. Thus, because
perspectival systems preserve visible figure, inverse perspective cannot do
so. Now, despite that, there is no reason to deny that inverse perspective
yields depictions. For an inverse perspective drawing of an oblong table
may indeed be seen as resembling, in visible figure, an oblong table. But
this is only obviously possible if the object the picture is experienced as
resembling is reasonably indeterminate—a table that is roughly oblong,
no more. So let us consider under what conditions, if any, such a system
would yield depictions of precise shape and other spatial features. They
must be conditions allowing for the depiction of any of a wide variety of
such shapes and arrays, or so our theoretical reflection suggests. So the
rules of inverse perspective will have to allow for projecting a wide variety
of such shapes and arrays onto surfaces. But if we are to appreciate the
detailed content of the pictures thus created, it seems we must be sensi-
tive to small differences between the way a given canvas is marked and the
ways in which it might have been. Now, those differences are precisely
between marks that do not preserve the visible figure of what they repre-
sent, and their not doing so is important to their representing, according
to the rules of inverse perspective, what they do. So an appropriate sen-
sitivity to the relevant details of the surface markings cannot, in all psy-
chological plausibility, accompany experiencing the marks as resembling,
in visible figure, something with the precise spatial features in question.
Thus, although the marks may depict something, and although they may
systematically represent spatial detail, they do not depict that detail. The
only conditions under which they would do so would be ones in which our
psychology were quite different from how it in fact is.
Of course, inverse perspective fails in a radical way to preserve visible
figure. However, any other way of marking surfaces that fails less spec-
tacularly will also confront the basic problem here. So the resemblance
view, coupled with our theoretical constraint on depiction and our sense
of what is psychologically possible for us, does seem to preclude any depic-
tion of spatial detail that is not depiction in perspective. If this conclusion
164 Robert Hopkins
Notes
1. This is a drastically simplified version of David Lewis’s much admired account
(1969).
2. Logic leaves room for another position, on which picturing is not conven-
tional but picturing in perspective is so. Although intriguing, this view is not
adopted by any writer known to me and is not discussed here.
3. Visible figure is what I have elsewhere (Hopkins 1998) called “outline shape.”
4. Do not be misled by Reid’s term. It is not part of the notion of visible figure
that it is the only spatial property, or “figure,” that vision represents. My view is
that others, including 3-D shape, are also seen. Although Reid disagreed (see
Hopkins, 2002), this claim is a further, and optional, element in his position.
5. This last is an empirical speculation but a plausible one. If right, it allows us to
dismiss an incipient worry. This is that because we see with two eyes, we cannot
see from a single point of view and hence that something is amiss in my claims
about the perception of visible figure.
165 Perspective, Convention, and Compromise
6. Given this, I need not at this point address the issue, set aside in section 4.1, of
which systems are the perspectival ones.
7. These reasons are (1) it is assumed the resemblance view is designed to do a
job it should not, in fact, attempt to do, namely, explaining what it is about the
picture that leads people to experience it as they do (see section 7.5); (2) it is cor-
rectly assumed that the resemblance view will want to give some role to how the
depicted item actually is, the appropriate role then being misunderstood; and (3)
the worry that if what matters is resemblance to the depicted object as depicted,
the view takes for granted part of what it is supposed to explain, thereby guaran-
teeing that its analysis cannot be sufficient. For discussion of these objections, see
Hopkins (1998).
8. As I will say, the content is imprecise, and it is indeterminate how things are in
the depicted realm.
9. This is only one way in which this might be achieved, because given the point
about salience, matching marks might, under the right conditions, differ in content
in virtue of different aspects of each being salient in the experience of resemblance.
Hence the view predicts, not that the perspectival and nonperspectival pictures will
differ but that they might.
10. In section 7.4 I said that the view makes no appeal to (experienced) degrees of
resemblance in characterizing seeing-in. Indeed, the notion of degrees of resem-
blance does not figure in any of the claims that constitute the view. It is quite con-
sistent with this that the view, coupled with some basic optics and physiology, has
implications involving (actual) degrees of resemblance.
11. Strictly, the relevant range is of different pairings of 3-D shape and orientation.
12. For La Gournerie’s experiment, see Pirenne (1970, p. 122). His engraving
seems certain to have been far smaller than Raphael’s original. This complicates
the relationship between the story and the moral I want to draw from it. However,
I think it obvious that the outcome would have been the same had La Gournerie
tampered with the Raphael itself, so I ignore this complication here.
13. This thought is really the offspring of Goodman’s account of depiction (1969).
It ignores some of his view’s difficulties by restricting itself to certain sorts of prop-
erties and by resisting the temptation to insist on infinitely precise representation.
But the most significant difference is that I do not offer it as a definition of depic-
tion, more as a test for it. For another use of a similar thought, see Lopes (1996).
14. John Hyman (1992) has reached somewhat similar conclusions independently.
It might be fairer to consider the Renaissance achievement one of rediscovery.
See, for instance, White (1987).
This page intentionally left blank
Chapter 8
Resemblance Reconceived
Klaus Sachs-Hombach
8.1 INTRODUCTION
other sources—for instance, via description.6 However, this does not apply
to a passport photograph. Unlike the proper name, the picture gives us the
clues necessary to determine the person that the photograph refers to. The
clues come from the picture and only from the picture. We do not need
any additional information, nor do we need to know the person.7 In this
regard, the mechanisms by which a photograph and a proper name func-
tion are basically different, although they may be used similarly in certain
situations. (The case of twins is a special case I will discuss later on.)
If we accept the assumption that we get the information necessary to
correctly interpret a passport photograph exclusively from the physical
basis of the picture—the “sign vehicle” so to speak—we must also assume
that there is a nonconventional relationship between the picture and the
object depicted. This relation has the following form: in order to be the
referent of a picture, an object must have some relevant properties in
common with the sign vehicle.8 Naturally, this relation can be thought of
as a resemblance relation. No other relation so clearly suggests itself. The
resemblance theorist thus wishes to use this example to elicit an intuition:
that resemblance is the only plausible relationship allowing us to describe
the identification. On this basis, the resemblance relationship could be
regarded as a necessary property of figurative pictures. Of course, the
resemblance theorist does not need to defend such a strong version. The
theorist might take the resemblance criterion merely as a sufficient condi-
tion for distinguishing pictorial signs from linguistic ones. This weaker
version of the theory is the one I shall argue for in what follows. The inter-
pretation of pictures does not necessarily make use of this resemblance cri-
terion. There might be some pictures (or some aspects of pictures) that we
interpret differently. But whenever we employ the resemblance criterion,
the sign thus interpreted should be classified as a picture. This allows the
resemblance theorist to speak of a core area of depiction. My explication
is meant to be applicable at least to that area.
nection with (S1), the statement suggests that such a distinction also
makes sense for pictures. So pictures should be considered as normally
having a vehicle, a content, and a referent. The vehicle is a physical object
with properties, such as the visual ones of shape and color. The referent of
a picture is what the picture is taken to refer to: this can include all kinds
of objects and events—for example, the person we recognized on the road
after seeing the passport photograph. Finally, the content consists in those
properties that we consider to be relevant for representation and that we
relate to an object or an event. Thus the content normally provides us with
a procedure by which to determine the referent of the picture. The content
can also be characterized as an intentional object, which—so to speak—
we can see in the surface of the picture vehicle.9 A somewhat different but,
I think, equivalent formulation would be to say that the content is what
we see the relevant properties of the picture vehicle as. For example, we
might see a certain line on a canvas as the silhouette of mountains. The
process that constitutes a content should be termed “interpretation.”
As far as the distinction of vehicle, content, and referent is presupposed
in the following, pictures are signs by stipulation. Because this is the
meaning of the term “picture,” I would find it difficult to understand what
was meant if someone denied that pictures were signs. But one might
also say that in the following I am going to deal with pictures only insofar
as they are signs. Thus the question I want to answer is What criterion
allows us to distinguish pictorial from other kinds of signs, particularly
linguistic signs? I would like to argue that the resemblance criterion will
perform this function. In order to argue this, I am now going to introduce
the notion of resemblance, in a form initially independent of the picture
notion.
(S3) Objects resemble each other if they have, in relevant respects, essen-
tial properties in common.
Putting it this way, the term “resemblance” remains vague, because there
is nothing said about the meaning of “relevant” and “essential.” But this
reflects the way we use this term. Normally, we take resemblance to be
a gradual relation that applies to different objects only with regard to a
certain respect. Which respect should count as relevant depends on our
intentions and on the context. Compared to tables, quite different chairs
resemble each other, but compared to a variety of other chairs they might
not, because now size, color, or height are relevant. Also, what property
should be taken as essential depends, of course, on what respect is rele-
vant. Furthermore, it depends on how our perception works. We take, for
172 Klaus Sachs-Hombach
resemble each other. Second, one determines the respects that are also
relevant for representation, that is, constitutive for interpretation. Thus,
“to take an object as a picture” means that we, sometimes automatically,
delimit the respects relevant for resemblance to the respects that are also
relevant for representation.
Compared to (S5), (S6) points out that there might be signs that are
similar to other objects without being pictures of these objects, because
some similarities are not relevant for depiction. Ruling out such cases,
for example, self-referential linguistic expressions, reference to the process
of interpretation implies that our classification of different kinds of signs
depends on how we use the signs. This implication seems acceptable,
because, depending on the context, we actually classify some objects as
pictures or as texts. In both cases we have to take the object as standing
for something else, that is, we have to assume a content and, if possible, a
referent, but the respects we consider relevant—and which partially deter-
mine the content—are not the same for pictures and text. In the case of
linguistic expressions, we do not, for example, consider color relevant. In
determining the properties that are relevant for representing something,
we therefore also determine what kind of sign we take this to be. In some
cases, mainly perspectival pictures that conform to Albertian principles,
this happens immediately and involuntarily. According to my explication,
this is exactly what one would expect, because resemblance is a gradual
relation: the greater the number of essential properties a picture has in
common with an other object, the more easily we recognize this object in
the picture. Often this also implies that more respects are relevant for
representation. Thus, perceiving the picture becomes more and more like
perceiving the object depicted itself. This case might be described more
adequately by (S7), which simply combines (S4) and (S6):
(S7) A sign is a picture if the perception of the essential properties that
the sign vehicle has in relevant respects is identical to the perception one
would have of the corresponding properties of some other object under a
certain perspective and if this perception is constitutive for the interpre-
tation of the sign.
Determining which properties we take to be relevant for representing
something is also helpful in distinguishing kinds of pictures. In a diagram,
for example, it is not important to know how thick a line is, whereas
this is normally relevant in a painting. Goodman has named this aspect
“repleteness.” In my explication, it is linked to the resemblance view.
Because the resemblance criterion is vague, that is, being determined in
175 Resemblance Reconceived
Notes
1. See Lopes (1996, p. 11).
2. See Goodman (1968).
3. This view is sometimes ascribed to Gombrich (see Gombrich 1960 and 1982),
but the ascription is only partially correct. See Lopes (1992). A refined version of
the resemblance view can be found in Peacocke (1987).
4. For critical comments, see Scholz (1991).
5. My proposal is oriented toward the view Gombrich holds. See also Sachs-
Hombach (1999).
6. There is, of course, much more to be said about proper names. In order to con-
trast them with our intuitive resemblance theory, however, it is sufficient to point
to their conventional nature.
7. The claim I am considering here is termed “natural generativity” by Flint
Schier (see 1986, p. 43). See also the comments on this feature by Hopkins (1998,
p. 31 f.).
8. Having some properties in common with the sign vehicle is not, however, a
sufficient condition for being the referent of a picture. Particularly in photography,
we also regard the underlying causal relation between object and picture as essen-
tial. I will discuss this problem later.
9. For a deeper discussion of “seeing-in,” see Wollheim (1980) and Hopkins
(1998, p. 15 f.).
10. For such an explication, see Rehkämper (1995).
11. Pictures should thus be seen as analogous to general terms, not to proper
names. See Sachs-Hombach and Rekämper (1998, p. 124).
This page intentionally left blank
Chapter 9
What You See Is What You Get: The Problems of Linear Perspective
Klaus Rehkämper
9.1 INTRODUCTION
The two leading questions of this paper are, Is there just one way to see
the world? and, closely connected to that, Is there a correct way to repre-
sent the world pictorially? The plain and simple answer to both ques-
tions is, Yes, there is! Perspectival pictures represent the world correctly,
because the underlying theory describes correctly the way human beings
see the world as well as the making of correct pictorial representations.
The theory of perspective connects picture making and human vision.
Several arguments against this view, which is sketched very briefly, have
been formulated over the years. The three most prominent will be dis-
cussed in following: the curvilinearity argument, the immobile eye argu-
ment, and the argument that a picture drawn according to the rules of
linear perspective would not be accepted as natural. A closer examination
of these arguments shows that all three are not convincing and that they
also fail to show that the use of linear perspective in pictorial representa-
tions is merely a convention.
Let me start with a few remarks about the different answers different sci-
entists have given over the years to the aforementioned questions. Euclid
was the first to describe the process of vision geometrically; he defended
the rectilinear propagation of light (a fact that no one has doubted and it
holds as long as we are talking about things happening on earth; here we
live in a Euclidean space). And Euclid also introduced the cone of vision.
During the Renaissance the rules of linear perspective—a system of rep-
resenting spatial relations—were discovered and described by Filippo
180 Klaus Rehkämper
Arguments 2 and 3 (from the end of section 9.2) can be found in Good-
man. At first he gives a correct summary of the basic assumptions of lin-
ear perspective: a picture drawn in correct perspective will, under specified
conditions, deliver to the eye a bundle of light rays matching that which
would be delivered by the object itself. This matching is a purely objec-
tive matter, measurable by instruments. And such matching constitutes
fidelity of representation; for because light rays are all the eye can receive
from either picture or object, identity in pattern of light rays must consti-
tute identity of appearence (Goodman 1968/1976, p. 11). Gibson would
agree with this summary; in his 1960 paper he states: “It is the light to the
eye that counts. . . . [O]ther things being equal, two identical instances of
stimulation must arouse the same percept” (Gibson 1960, pp. 219, 222).
Goodman takes his descripton as a basis for his arguments against the
theory of linear perspective. In the first of the two arguments, he tries to
show that the “specified conditions” required are too artificial and impos-
sible to achieve, an argument that already can be found in Panofsky’s arti-
cle. A picture drawn in linear perspective has to be looked at with a single
unmoving eye from a certain well defined-distance. Usually this is done by
looking through a peephole. The represented object itself must be viewed
through a peephole but not necessarily from the same distance and under
the same angle. First of all, according to Goodman, if the eye is not
allowed to move it becomes blind. Even if one allows the eye to move, “the
specified conditions of observation are grossly abnormal” (Goodman
1968/1976, p. 13). Under circumstances that are no more artificial than an
immobile eye—e.g., using suitably contrived lenses—one can wring out of
183 What You See Is What You Get
nearly every picture (i.e., pictures not drawn according to the rules of per-
spective) a pattern of light rays that matches. And so, he continues, even
if the patterns of stimulation are identical, “the same stimulus gives rise to
different visual experiences under different circumstances” (ibid., p. 14)—
A red light might mean “Stop” on the highway and “Port” at the sea.
Furthermore, we usually do not view pictures motionless through a peep-
hole but in a gallery, where we are free to walk around. So, the artist’s task
in representing an object is to decide what light rays, under gallery condi-
tions, will succeed in rendering what he sees. This is not a matter of copy-
ing but of conveying (ibid., p. 14). And Goodman closes this argument by
saying that “pictures in perspective, like any other pictures, have to be
read; and the ability to read has to be aquired” (ibid., p. 14).
For the sake of the first argument, Goodman has accepted the rules of
perspective, but in his second argument—argument 3 in section 9.2—he
attacks these rules. The main point now is handling the represention of
parallel lines. Goodman is not a curvilinearist, but he claims that pictures
that are accepted as natural do not obey the rules of perspective; in fact,
they disregard them. One of these rules, which he quotes, states that every
two parallel lines, which should be represented in a picture, have to be
drawn converging. This seems to conform with a statement of Panofsky’s,
about the rules of linear perspective: “[A]ll parallels, in whatever direction
they lie, have a common vanishing point” (Panofsky 1927/1991, p. 28).
But as a matter of fact, not all those lines are drawn in this correct way;
some stay parallel in perspectival pictures. And this is so only because of
convention, so the argument goes. Telephone poles bordering a street and
vanishing in the distance usually do not converge when represented in a
photograph but the street does; railroad tracks are another example of
this effect, as they are also drawn converging. But the edges of a facade
running upward from the eye are usually not represented as converging.
We use cameras with tilting backs and elevating lens-boards to avoid such
converging; we try to make vertical parallels come out parallel in a pic-
ture. And so Goodman concludes: “The rules of pictorial perspective no
more follow from the laws of optics than would rules calling for drawing
the tracks parallel and the poles converging. In diametric contradiction to
what Gibson says, the artist who wants to produce a spatial representa-
tion that the present day western eye will accept as faithful must defy the
‘laws of geometry’” (Goodman 1968/1976, p. 16).
It is clear from this short description of Goodman’s standpoint that he
argues for alternative (B). There is no natural system of representing
space: all such systems are based on convention and they have to be
184 Klaus Rehkämper
Are these arguments convincing? In the following text, I will make some
remarks for rescue of linear perspective. The position I try to defend is
alternative (Aa): there is a natural system of representing spatial relations,
and this system is that of linear perspective. But this does not mean that
everything said against linear perspective is unreasonable, even if it is not
entirely right, if right means that linear perspective is shown to be wrong
by these arguments.
Let me give a short introduction to the theory of perspective. With this,
I will differentiate between two relations: (1) the relation of a picture
drawn according to the rules of perspective to the object depicted, and (2)
the relation that holds between such a picture and the beholder of that
picture. Relation 1 is a matter of mathematics, or to be more precise a
matter of projective geometry; relation 2 is where interpretation and there-
fore psychology come into play. Unfortunately these two very different
relations are often mixed up in the discussion about linear perspective.
What you need to produce a two-dimensional representation of a three-
dimensional scene in perspective is an object or an arrangment of objects,
G, a center of projection (often called the eyepoint), O, a cone or pyramid
of vision having its apex in O and its base at G, and a surface, B, inter-
secting the cone of vision. Now imagine (light) rays connecting the ob-
ject and the eyepoint. These rays are also intersected by the surface, and
the intersection points make up the two-dimensional representation of
that three-dimensional object relative to the eyepoint. It is important to
note that the surface intersecting the cone of vision need not be flat. It
might also be curved as in GB (figure 9.1). And you can also extend the
cone of vision beyond the center of projection into another cone (figure
9.2).
Intersecting the cone of vision not between G and O but behind O you
get a representation in perspective that is turned upside down and left-
right reversed (B2 or GB2). That is the way a camera obscura works.
Having a curved surface, for example, part of a sphere instead of a flat
plane, you get a picture that is similar to a retinal image. But as you can
185 What You See Is What You Get
Figure 9.1
The cone of vision with a flat and curved picture plane.
Figure 9.2
The cone of vision extended beyond the center of projection.
186 Klaus Rehkämper
Figure 9.3
Different picture planes. O, eyepoint; B, picture plane; S, cone of vision; G, object.
The next point of discussion involves the different distances and angles
under which both object and picture have to be seen. As explained earlier,
what you need is an eye point, an object, a cone of vision having its apex
at the eye point and its base at the object, and—last but not least—a plane
intersecting the cone of vision usually somewhere between object and eye.
As mentioned before this plane need not be flat and it also need not be ver-
tical. All that counts is that the cone of vision is intersected by the plane.
From this, I think, it is clear that the distance from the eye point to the
object is usually greater than the distance from the eye point to picture.
And the angle under which the object is seen may differ from the angle of
eye point to picture. Suppose you a looking at the top of a tower, then the
main axis of your cone of vision meets the facade of the tower at an angle
less than 45°, but the picture plane may intersect the cone of vision at an
angle less than 90°.
That does not contradict the theory of perspective. But—and this leads
me to argument 3—Goodman is not right in assuming that all parallels of
the world always have to be drawn in a picture as converging. The paral-
lel edges of the facade are drawn converging in figure 9.3a but parallel in
figure 9.3b. This holds only for those parallels that are not parallel to the
picture plane; the others remain parallel. Or to rescue Panofsky, they have
a vanishing point in infinity. And if we use cameras with tilting backs, we
obey the rules of perspective. The only thing we are doing is changing the
angle under which the picture plane intersects the cone of vision and make
it parallel to the facade. And using lenses also does not defy these rules; it
is all in the range of physical optics.
I fully agree with Goodman that a stimulus can give rise to different
visual experiences under different circumstances. But this does not affect
188 Klaus Rehkämper
Figure 9.4
Floor or facade? (After Paul Klee.)
the theory of perspective, which holds for relation 1 described at the begin-
ning of this section. His argument belongs to the second relation, the
interpretation of the picture by someone who looks at it.
And while I am at it, Goodman also quotes the famous painter Paul
Klee to support his argument. Showing a drawing of Klee’s (imitated in
figure 9.4), he states: “As Klee remarks, the drawing looks quite normal
if taken as representing a floor but awry as representing a facade, even
though in the two cases parallels in the object represented recede equally
from the eye” (Goodman 1968/1976, p. 16 Fn. 16). But if one has a closer
look at Klee’s own description of the problem, it becomes obvious that he
is not supporting Goodman’s view.
Why is figure 9.4 wrong as a picture of a vertical facade? “It is not log-
ically wrong, because the lower windows are closer to the eye than the
upper windows, which in terms of perspective means ‘larger.’ As a repre-
sentation of a floor this perspective drawing would be at once accepted.
This picture is not logically wrong, but psychologically wrong. Because,
in the interests of maintaining its balance the animal wishes to see as
vertical all that is in reality vertical” (Klee 1925/1981, p. 31).2 Here again
Goodman but not Klee mixes up the two different relations. Arguing
against an interpretation, which is based on a psychological attitude, does
not affect the picture-object relation. Therefore it cannot show that per-
spective is merely a convention.
The last point I would like to mention is the problem that pictures
are usually looked at with two eyes and that pictures hang on walls. But
this does also not effect the theory of linear perspective. The theory
of linear perspective describes the relation between a picture and the
scene depicted under specified circumstances. Under these conditions
the theory obeys the laws of optics. And to borrow a phrase from Ernst
Gombrich: “Having achieved this aim, [the theory of perspective] makes
189 What You See Is What You Get
its bows and retires” (Gombrich 1960/1993, p. 217). The observer’s prob-
lem of how to read a picture while it hangs on the wall is not a problem of
the theory here under discussion. The conditions specified by the theory
are not fullfilled in this scenario. And if we leave the specified conditions
behind, then other problems crop up. In these cases we have to learn
to read the pictures. But in the case of perspectival pictures this is not
like learning a language, the symbols of which are chosen by convention.
These pictures have as a core the natural system of linear perspective—
a system that also describes correctly the way the human visual system
works—and that is why representational pictures of this kind are much
easier to read under “normal” conditions than any language is.
Notes
1. In the English edition Wood translates Sehbild as “visual image” in the first
quotation I have cited and “optical image” in the second. In doing this, he tones
down the contradiction but introduces a new, third term, which is, I believe, not
intended by Panofsky.
2. The original formulation used by Klee is “Warum ist Fig. 44 als Bild einer
senkrechten Hauswand falsch? Es ist nicht logisch falsch, denn die unteren
Fensterebreiten sind dem Auge näher als die oberen Fensterbreiten, was perspek-
tivisch “größer” bedeutet. Als Darstellung eines Bodens würde diese Perspektive
glatt akzeptiert. Dieses Bild ist also nicht logisch falsch, sondern psychologisch
falsch. Weil das Animal im Interesse seines Gleichgewichtes sämtliche Senkrechten
der Wirklichkeit auch als Senkrechte sehen will” (Klee 1925/1981, p. 31).
This page intentionally left blank
Chapter 10
Pictures of Perspective: Theory or Therapy?
Patrick Maynard
Dear Theo:
In my last letter you will have found a little sketch of that perspective frame I
mentioned. I just came back from the blacksmith, who made iron points for
the sticks and iron corners for the frame. It consists of two long stakes; the frame
can be attached to them either way with strong wooden pegs. So on the shore or
in the meadows or in the fields, one can look through it like a window. The ver-
tical lines and the horizontal lines of the frame and the diagonal lines and the
intersection, or else the division into squares, certainly give a few fundamental
pointers which help one make a solid drawing and which indicate the main lines
and proportions—at least for those who have some instinct for perspective and
some understanding of why and how the perspective causes an apparent change of
direction in the lines and change of size in the planes and in the whole mass.
Without this, the instrument is of little or no use, and looking through it makes
one dizzy. I think you can imagine how delightful it is to turn this “spy-hole”
frame on the sea, on the green meadows, or on the snowy fields in winter, or on
the fantastic network of thin and thick branches and trunks in autumn or on a
stormy sky. Long and continuous practice with it enables one to draw quick as
lightning—and, once the drawing is done firmly, to paint quick as lightning,
too. . . . The perspective frame is really a fine piece of workmanship. (van Gogh
1959, pp. 432–434)
Revealing shop talk about a technical device made for a stated practical
purpose, to be deployed by someone who knows how to use it: how dif-
ferent from what we are used to in the theoretical literature of perspective,
which as often as not can make one “dizzy.” For now we must put aside
the most interesting aspect of Vincent’s typically articulate and engaging
report: “pictorial space” in the sense of the composition of pictures, espe-
cially pictures of nature—including fantastic networks of branches and
the lie of the land and the clouds—so far removed from the cubes, crated
spheres and cylinders, checkerboard pavimenti of most theoretical presen-
tations. In this brief space we will stay with our project’s most simple,
192 Patrick Maynard
Luckily, our problems about linear perspective are not so bad, as they
exist more in the realm of theory than of practice. Let us begin with some
brief reminders about general practice, before considering the conceptual
problems.
As described by Vincent, his viewing frame was a device, a simple bit
of engineering. I suggest that what he used it for be thought of techno-
logically, also; indeed, that in our considerations of “pictorial space” we
recognize linear perspective itself as a kind of technology. What is a tech-
nology but an amplifier of our powers to do certain things?1 Looking at it
that way encourages us to be specific about perspective, to ask, do what
things? For his part, Vincent was clear: to make convincingly laid-out
depictive pictures from nature, whether drawings or paintings. How does
linear perspective allow this? Like any technology, by exploiting preex-
isting natural structures or forces. In this case the tapping into natural
powers is rather direct: linear perspective gathers overlap/occlusion and
foreshortening (which it shares with other projective systems) together
with its distinctive diminution with distance according to a simple rela-
tionship, all into one correlated system in which foreshortening is affected
by diminution and by other factors—that is, into a “space”—then feeds
the result into a visual system that is naturally tuned to using such infor-
mation. Zoom. Like most successful technologies, linear perspective devel-
oped from a variety of related dodges or practices, got consolidated and
then greatly elaborated at historical points, was diffused, and then—
again, like most successful technologies, such as the wheel, plow, water-
mill, printing press, automobile, computer—created its own markets and
rather changed the ends for which it was the means. This change of ends
is partly due to the fact that every technological amplifier is also a filter—
indeed a suppressor: get one form of power, lose another (which normally
we soon come to disvalue, even forget).
Perspective, like other technologies, has usually been mixed in with
its alleged rivals. Rather as we might call a Degas “a pastel,” despite the
other things it is in (and the underlying monotype), we usually call pic-
193 Pictures of Perspective
tures “perspective” when they are partly in perspective while also produced
through other shape- and space-rendering techniques. In this respect, per-
spective’s history is like that of a close pictorial relative, namely shade and
shadow techniques, which were also consolidated, systematized, theo-
rized, closely controlled, passed from specialized refinement to mixed use.
My next observation is that the listed components of linear perspective—
including foreshortening, overlap, diminution as a function of distance,
unified space—are frequently disassembled and stuck in with other sys-
tems. In general, it is good to approach linear perspective that way, in
terms of its components and their relations, rather than as monolith. As
we look around, we find what one might call “perspective patois,” even
“pidgin perspective,” in much of its familiar use in mass printing of images
today. Indeed I think that you will often find mainly oblique techniques
predominating there (with some orthogonal), though often inflected with
a bit of diminution—sometimes convergence of parallels—for effect: and
effect is, after all, the whole point of “pictures on a page.” What about the
photography which pervades these media? Is not linear perspective, as
we often hear, simply built into camera optics? Again, when we actually
look, I think we find similar mixes in commercial photography, notably
in advertising: the use of long lenses and closing up of backgrounds (often
stressing L-, fork, and arrow junctions), as the parallel systems, oblique
and orthogonal, predominate, with emphasis on object shape and texture
(so well rendered by photographs), with a residue of diminution. The
sense of a space, in terms of coordinated common vanishing points, is
rather rare in such work, perhaps because that would draw attention from
the surfaces that are being marketed to just space, which is not. Of course
one does see there, as in cinema and TV work, much strong perspective,
though again shapes on a page—including print and other graphics—
rather than “pictorial space” are the goal.
Thus, very briefly, use. Now, what of conception, theory? I said that “we
call pictures perspective,” but who are we? Most people, including those
who ought to know better, seem rather vague about what perspective
and its main features actually are, typically conflating perspective with
a number of other projective techniques. For example, MoMA’s direc-
tor of painting describes an isometric cityscape as done by “perspectival
194 Patrick Maynard
Therefore these remarks belong to only the most recent version of what
Panofsky long ago called “the ultra-modern criticism of art,” in words still
worth citing:
It is now clear that the perspective view of space (not merely perspective con-
struction) could be attacked from two quite different sides: if Plato condemned it
in its modest beginnings because it distorts the “true sizes” of things and puts sub-
jective appearance and caprice in the place of reality and law, the ultra-modern
criticism of art makes exactly the opposite charge that it is the product of a nar-
row and narrowing rationalism. The ancient East, classical antiquity, the Middle
Ages, and every archaistic art, such as Botticelli’s, have—more or less completely—
rejected perspective because it seemed to bring into a world that goes beyond
or above the subjective something individualistic and casual; expressionism (for
recently there has been another swing of the pendulum) shunned it for just the
opposite reason: perspective confirms and preserves the remnant of objectivity
that even impressionism had still had to remove from the power of the individual
will toward form, that is, three-dimensional real space as such. But at bottom this
polarity is the two-fold aspect of one and the same thing and those objections aim
at one and the same point. (Panofsky 1924/1925, p. 18)
It may seem paradoxical that the common strategy of the more recent
“ultra-moderns” has been to attribute to perspective, among all drawing
systems, the most extreme form of subjectivity. Thus a very influential
popular book for nearly three decades states that “the convention of per-
spective, which is unique to Europe . . . centres everything on the eye
of the beholder.” “Every drawing or painting that used perspective,” it
continues, “proposed to the spectator that he was the unique centre of
the world” (Berger 1972, pp. 16, 18). Such opinions produced in cinema
theory ideas about what has been called “the ideology of the visible,”
with remarks such as the following: “[P]erspective-system images bind the
spectator in place, the suturing central position that is the sense of the
image, that sets its place (in place, the spectator completes that image as
its subject)” and “[T]he installation of the viewer as subject depends on
reserving for him or her, the reciprocal in front of the image of the van-
ishing point ‘behind’ it. . . . To launch an assault at the window [as de-
picted in Alfred Hitchcock’s The Birds] is, in turn, to assault the place of
the viewer [of it]; it is an act of aggression against the eye of the beholder
and the ‘I’ of the self-as-subject.”4
Figure 10.1
Faces in strange places: memento mori soup.
Figure 10.2
Faces in strange places: birch eyes.
Figure 10.3
Natasha toothbrush.
202 Patrick Maynard
though, after all, by a computer graphics specialist. Still, what many per-
spective theorists normally say is not far off that. No point to a parade of
quotations here: we know that the standard heuristic exposition of lin-
ear perspective is simply, without argument, to identify the perspective
construction point as the picture’s viewer’s point and to take that quite
literally. This is sometimes even expressed (again by one of our quoted
experts) as “the viewing position of the artist” or “the artist’s unique van-
tage point” (Shepard 1990, pp. 191, 196 f.), sometimes as “where the artist
stood”—as though perhaps Masaccio had stood before the Trinity, or
Raphael behind Saint Paul preaching in Athens—or at least behind an
actor on a set, and so on! Part of the pity of this amazingly careless way
of speaking, thinking, is that—just as viewer position in cinema is some-
times part of the content of the experience—in some cases it does matter
that the artist was at that point, just as it sometimes does matter for the
content of a picture that we understand there to have been a viewer at a
more-or-less precise point, just as it sometimes does matter that we view
the picture from near the construction point. Sometimes these concerns
matter, but, in the overgeneralized, a priori fog round the topic, such
“finer” points of picture meaning are soon lost.7
Figure 10.4
relativists, we will hold that some perspectives (in the second sense) afford
better perspectives (in the first sense) than do others—although, if we are
reasonable, we will also be willing to consider things from others’ faulty
perspectives. But what has this to do with pictorial perspective, except as
the agency that brought these usages into modern languages? One appeal
of a perspective paradigm for all depictions may have been that linear per-
spective renders what are considered views, and visually effective pictures
of any kind usually show views of things.9 That this is a constraint on
effective pictures is hardly surprising, because we often recognize things by
perspectival contours. Indeed, the famous word eidos, developed by Greek
philosophers and reworked by Latin as “form,” meant—and this root
meaning persisted through the theoretical elaborations—that about the
look, notably of the shape, of a thing that allows us visually to recognize
it. Consider, for example, a clear photograph that does not present such a
view (figure 10.4). When pictorial perspective does provide effective views,
there is nothing peculiar to it, and a “peephole” need no more be involved
in linear perspective than in any other method.10 Not only are most depic-
tive drawing systems designed to exploit such views, perspective has weak-
nesses as well as strengths in that regard. Such are the systems’ trade-offs,
or “trades”—one motive for that practical use of mixed devices discussed
at the outset.
205 Pictures of Perspective
have invented a special research project for perspective. Some favor deft
trigonometric extrapolation of the projection point. But, as illustrated by
Frankie and the photo, not only will people not know what we are talking
about here, it is implausible to suppose that their visual systems have
much motive, or means, to do so. As we watch movies, flip magazine
pages, and encounter a fast flux of camera positions and focal lengths—in
magazines, of cropping formats—we show scant tendency to adjust our
viewing distances. The sense of close-up and long-shot certainly do regis-
ter, but that so-called close-up may have been taken some distance off
with a telephoto lens, to keep the background blurred, whereas the wide-
angle lens that was used close in mainly distorts features so as to make
faces look ugly or menacing.
The argument is not that it is vain for perceptual theory to investigate
generally how we reckon shapes on or of surfaces. After all, most of the
surfaces we work on or walk on meet vision at acute angles. Getting the
three blocks from the Rijksmuseum to the Stedelijk along Paulus
Potterstraat to see the Schwitters would involve a lot of estimating in that
regard—probably harder at a child’s or wheelchair’s lower perspective.
My worry is, again, about what is behind the assumption, without argu-
ment, that optical inclination poses a particular problem regarding per-
spective pictures. Perspective is capable of winning its way empirically
as a method of producing pictorial space (and other things) and has no
need for the standard stacked decks—or opening, a priori, sleights of
hand.
I stress “back and forth” because I perceive another problem with that
heuristic beyond what I have indicated. It has continually—unwittingly,
to be sure, but influentially—imbued thinking with the prejudice that
drawing perspectively, or in any way projectively, has not only the con-
notation of passive, mechanical skills of projecting (“a machine could
do it”) but something deeper, which seems never articulated: the deriv-
ative connotation that underpins that heuristic, that the 3-D subject is
somehow given, whereas the drawing is gotten from it. Thus the stan-
dard talk about a “real world” or “real object” (mostly fictional) and “its”
picture-projection. If this connotation did not actively depress projective
drawing, it did little to help such drawing in its recent times of trouble.
Ironically, what actually happens is often the reverse; as remarked, most
of the real things that define the modern world (that is, all the artifacts in
it) are made from projective drawings, rather than the other way around.
I doubt that one will achieve a satisfactory account of the depiction of
space in pictures unless one has a good general account of depiction itself.
Not having attempted that, I close with two critical remarks in that direc-
tion, which are more radical than what I have argued so far. First, briefly,
insofar as the depictive challenge is spatial, why assume that this is neces-
sarily a matter of going between 2-D and 3-D, when 0-D, 1-D, 2-D, and
3-D are all involved, even in visual depiction? As visual depiction of space
is by no means confined to pictures, I suggest questioning the focus on
pictures that has dominated discussion—in isolation from reliefs, masks,
carving, sculpture, and so forth—even when talking narrowly about per-
spective. (Remember the meaning of three old words, arti del disegno.)11
Second, the idea “insofar as depiction is spatial” has been assumed, but
must it be? Far more radically, I suggest that we are unlikely to gain an
adequate conception of any kind of depiction so long as we think of it
even as an essentially spatial affair—less so, as long as we accept the stan-
dard idea that “the basic trick of pictures” (to quote one outstanding art
historian of our time) is that of “marking a flat plane to suggest the three-
dimensional” (Baxandall 1985, p. 106). Here is not the place to challenge
a dominant space-based conception of vision that reinforces the dominant
space-emphasizing approaches to visual depiction, which has less to say
about substance, process, movement.12 Still, as a “simple” example, the
earlier physiognomic examples should have already suggested what this
wonderful, eidos-capturing, 3,000-year-old, double-necked pot painting
shows: that not only are pictures not essentially flat or even plane, but that
we need to think whether the basic challenge of pictorial depiction has to
209 Pictures of Perspective
Figure 10.5
Stirrup jar: Octopi and fish. Mycenean, terracotta, c. 1200–1125 B.C., h. 10 41 inches.
The Metropolitan Museum of Modern Art, Louisa Eldridge McBurney Gift Fund,
1953. (53.11.6)
motion out and up emphasizes our sense of the pot rising and expanding
through belly to shoulder, then closing through handles and lips, as the
fish spring up from the foot.
That such decorative depiction forms a central kind of depiction seems
rather lost on modern theorists, misled by a few centuries’ dominance of
technologies of “unlocated” depictions—that is, of images put on sur-
faces made for just that purpose and devoid of intrinsic interest—such as
we find in cinema and TV screens. Technologies of photo reproduction
convey the impression that such is the historical norm. And those who
take decorative depiction as marginal to issues of pictorial space might
recall that a favorite example of those drawn to the topic, Andrea Pozzo’s
painted ceiling for the nave of the Chiesa di Sant’ Ignazio in Rome
(1688–1694), is essentially—not coincidentally—a decorative depiction.
That ceiling was not just a convenient ground on which Pozzo could place
his images. As we began with technological uses of perspective, let us
recall that here linear perspective’s primary function is to promote our
imagining, of the very nave in which we stand, that its walls rise much
higher above us and are occupied by diverse figures in motion, that its roof
is open to the clouds, that the saint is visible above us, carried up by
angels. As decorative, the image is more than what Maurice Pirenne calls
Pozzo’s “imaginary structure” (Pirenne 1970, p. 79), for (and this hap-
pens, too, with some restaurant decor) its practical purpose is to have us
imagine things of that actual architecture—and thereby of ourselves, as
well. Through attention to such cases of perspective and other methods in
actual use we should find improved bases for an understanding of visual
depiction, which can guide our research into pictorial space.
Notes
1. I have taken this approach to photography, for example in Maynard (1997a,
1997b), and to perspective in Maynard (1994).
2. See Damisch (1994, pp. 63, 107, 158, etc.). I make further comment in review
of the book in Maynard (1997b, pp. 84–85). The quotations are from editorial
reviews available on booksellers’ Websites.
3. This citation is from Noël Carroll (1988, p. 128). For more careful argument of
the material in sections 10.2 and 10.3, see Maynard (1996, pp. 23–40).
4. These citations are also from Carroll (1988, pp. 128 f.).
5. It is even a fallacy to go from “x is a depictively essential constituent of (pic-
ture) P” to “P depicts something as x,” because that fails for such values of “x” as
being a painting, being monochrome, and so forth.
211 Pictures of Perspective
6. This does not necessarily commit the fallacy we began with, that of claiming
that all pictures are essentially projections. Notice how Henry Talbot could only
invent photography, a more recent source of the projection/picture mix-up, by
fighting off that very tendency: “The picture [sic]” from his camera, he wrote,
“divested of the ideas which accompany it, and considered only in its ultimate
nature, is but a succession or variety of stronger lights thrown upon one part of
the paper, and deeper shadows on another” (Talbot, 1844–1846/1969). If Talbot
meant his camera lucida, where the image is virtual, there would not have been
lights upon the paper.
7. These careless, common phrases might be revealing of something: maybe
drafters’ often assuming their pictures’ construction points when doing observa-
tional drawings, in order to “eyeball” the perspective, has encouraged the idea that
viewers need to do the same. An odd inference, in any case: it is like saying that
because roofers have to go on the roof to shingle it, you have to go up there, too,
to see the roof.
8. Erwin Panofsky noted Dürer’s error here in Panofsky (1924/1925): “[D]ies
‘lateinisch Wort,’ das schon bei Boethius vorkommt, ursprünglich einen so präg-
nanten Sinn gar nicht besessen zu haben scheint” (p. 258).
9. For an account of “effective pictures” and “views,” see Willats (1997, pp. 22 f.,
207, 219).
10. As is claimed in Shepard: “[E]ach picture of a three-dimensional scene carries
its own peephole” (1990, p. 122, repeated on p. 196). The photograph shows the
top of a spaghetti tube.
11. As Vasari’s phrase arti del disegno indicates, sharp distinctions among paint-
ing and sculpture, architecture, and relief are recent. To call pictures 2-D media is
somewhat misleading, as there, with 0-D, 1-D, and 2-D marks, we depict not just
3-D but also 0-, 1-, and 2-D situations. In 3-D sculpture and architecture we depict
the same, using 0-D junctions, 1-D contours, 2-D shapes and expanses, and so on.
It is also possible to depict in 1-D media.
12. In Maynard (2001), I hazard remarks about biological and technological vision
as aimed more at process and motion than at objects and spatial forms. Ernst
Gombrich is one leading theorist of depiction who has confessed an overemphasis
on space at the expense of representation of transparency and life (Gombrich
1987, p. 16).
This page intentionally left blank
PART III
THE NATURE AND STRUCTURE OF RECONCEIVED
PICTORIAL SPACE
This page intentionally left blank
Chapter 11
Reconceiving Perceptual Space
James E. Cutting
11.1 INTRODUCTION
and taxonomies (which are nested nominal scales) remains critically impor-
tant in biology today. Ordinal scales categorize and order, but they say
nothing about distances between what is ordered. The division of U.S.
college students into freshmen, sophomores, juniors, and seniors is an
example, for this only gives a rough idea of their “distance” from gradua-
tion requirements. Interval scales categorize, order, and provide true dis-
tances, but there is no true zero on such a scale. Thus, one can say that
the distance between 5°C and 10°C is the same as that between 10°C
and 15°C, but one cannot say that 10°C is twice as warm as 5°C. Finally,
ratio scales categorize, order, provide true distances, and have a true zero.
Thus, one can say that 1 m is half as long as 2 m. In doing psychophysics,
for example, one often manipulates a physical variable along a ratio
scale, and records some psychological variable that is almost surely only
ordinal.
Stevens’s (1951) classification system of the four scales is itself ordinal.
That is, one cannot really know the “distance” between nominal and ordi-
nal scales, for example, as compared to interval and ratio. However true
this is logically, one can nonetheless state psychologically that through
various considerations one may find that ordinal information can be used
to approximate a metric scale (Shepard 1980). And this is the crux of what
I have to say. I will claim that all perceived spaces are really ordinal.
However, sometimes these spaces can be said to converge on a metric
space.
Yes—and for some pictures to a large extent. This is one reason nonpro-
fessional, candid photographs work so well; the cinema can act as such
a culturally important surrogate for the everyday world; and precious
little experience, if any, is needed to appreciate the content of pictures
(Hochberg and Brooks 1962) or film (Messaris 1994). Nevertheless,
almost all theorists who bother to address this question—and few actu-
ally do—answer largely in the negative, choosing to focus instead on dif-
217 Reconceiving Perceptual Space
ferences between pictures and the world, regardless of how they define pic-
tures or they might define “the world.” Consider views from the humani-
ties and then from psychology.
Among artists and art historians, statements about the similarity of
pictures and the world are not prevalent. To be sure, few would deny
the impressiveness of certain trompe l’oeil (e.g., Cadiou and Gilou 1989;
Kubovy 1986) as having the power to be mistaken for a certain type of
reality, but equally few would claim this is other than a relatively small
genre, and it does not legitimately extend even to photographs. Moreover,
the effectiveness of trompe l’oeil is predicated, in part, on a tightly con-
strained range of depicted distances from the observer, or simply depths.
Rather than concentrating on similarity or verisimilitude of images and
their naturalistic counterparts, many in the humanities have focused on
viewer response. Responses to images are often indistinguishable from
responses to real objects, and this was an important problem in the
development of the Protestant Church in the sixteenth century and in the
Catholic Counter Reformation. Worship with images could not always be
separated from the worship of images, a violation of the Old Testament’s
First Commandment (Michalski 1993). This flirtation probably contri-
buted to the prohibition of images in Judaic and Islamic worship as well.
Over a broader cultural sweep, Freedberg offered powerful analyses of
how pictorial objects evoke the responses in people as real objects, but he
never claimed that pictures are mistaken for reality. To be sure, “people
are sexually aroused by pictures . . . ; they break pictures . . . ; they kiss
them, cry before them, and go on journeys to them” (1989, p. 1), but they
don’t actually mistake them for the real objects they represent. Instead,
images (and sculptures) stand in reference to the objects they represent,
attaining an equal status with them, to be loved, scorned, appreciated, or
decried in full value.
More generally, there are simply the difficulties of generating mim-
icry. Ernst Gombrich, for example, suggested that “the demand that the
painter should stick to appearances to the extent of trying to forget what
he merely ‘knew’ proved to be in flagrant conflict with actual practice. . . .
The phenomenal world eluded the painter’s grasp and he turned to other
pursuits” (1974, p. 163). Indeed, with the invention of photography there
was even a strong sentiment that, as Rodin suggested, “it is the artist who
is truthful and it is the photograph which lies. . . . [H]is [the artist’s] work
is certainly much less conventional than the scientific image” (Scharf 1968,
p. 226).
218 James E. Cutting
Within psychology, James J. Gibson used the picture versus real world
difference as a fulcrum to make a distinction between direct and indirect
(or mediated) perception. He is often quoted: “Direct perception is what
one gets from seeing Niagara Falls, say, as distinguished from seeing a
picture of it. The latter kind of perception is mediated” (1979, p. 147).
Although the nature and the wider ramifications of this distinction have
been much debated (e.g., see Cutting 1986a, 1998, for overviews), it is
clear throughout all discussions that picture perception and real-world
perception are conceived as different.1 Although few other psychologi-
cal theorists use pictures to discuss a direct/indirect distinction (but see
Sedgwick 2001), Alan Costall (1990), Margaret Hagen (1986), Julian
Hochberg (1962, 1978), John Kennedy (1974), Michael Kubovy (1986),
Sheena Rogers (1995), and John Willats (1997) all emphasize differences
between the world and pictures of it. Indeed, for William Ittelson (1996)
there are few, if any, similarities between the perception of pictures and
the everyday perception of reality.
In most psychological discussions of pictures versus reality, the essen-
tial element centers on the truism that pictures are two-dimensional sur-
faces and that the world around us is arrayed in three dimensions. At
their photographic best, pictures are frozen cross sections of optical arrays
whose elements do not change their adjacent positions when the viewer
moves. In particular, what is left of a given object seen in a picture from
a given position is always left of that object; what is right, always right;
and so forth.2 This is not always true in the natural environment. As one
moves forward, to the side, or up or down, objects in the world cross over
one another, changing their relative positions. In the world the projective
arrangement of objects is not frozen.
To be sure, there are other differences between pictures and the world
than the 2-D versus 3-D difference and the lack of motion in pictures. At
a comfortable viewing distance, the sizes of objects as projected to the eye
are generally smaller in pictures than in real life; pictures typically have a
compressed range of luminance values (and often of color) compared to
the real world; and there are lens effects that compress or dilate space. I
will focus on lens effects later, but let’s first consider a second question.
of the perceived space in the world around us. Variously, I will call this
environmental space, physical space, and even reality. My intent in this
multiplicity (other than to avoid semantic satiation) is to emphasize the
assumption that I consider all of these identical.
Two interrelated facts about the perception of physical space must be
considered. First, perceived space is anisotropic (Luneburg 1947; Indow
1991). In particular, perceived distances are somewhat foreshortened as
compared to physical space, particularly as physical distances increase.
This fact has been noted in various ways by many researchers (e.g., Gogel
1993; Loomis and Philbeck 1999; and Wagner 1985; see Sedgwick 1986,
for a review), although some methods of judging distance yield quite dif-
ferent results from others (Da Silva 1985; Loomis et al. 1996).
Second, this compression is likely due to the decrease in information
available. Such decreases in available information have been shown ex-
perimentally by Teodor Künnapas (1968) in the near range (/4 m) of
physical environments. They have also been demonstrated for near space
in computer-generated ones (Bruno and Cutting 1988). I contend that
the dearth of information about depth is responsible for compressions
throughout the visual range, particularly at extreme distances.
Figure 11.1
The top panel shows the exponents of perceived distance functions plotted by the
range of egocentric distances investigated in eleven studies. Notice the general
decline with increasing distance. This suggests an accelerated foreshortening of
space, from near to far. All studies chosen were done in naturalistic environments.
Those included, listed by order of decreasing exponents are Da Silva and Rozen-
straten (1979, reported in Da Silva, 1985), method of fractionation; Gårling (1970),
magnitude estimation; Harway (1963), method of partitioning; Teghtsoonian
(1973), magnitude estimation; Miskie et al. (1975), magnitude estimation; Da Silva
(1985), magnitude estimation; Kraft and Green (1989), verbal reports from pho-
tographs; Gilinsky (1951), method of fractionation; Flückiger (1991, two studies),
method of reproduction. The bottom panels select the data of naive viewers from
Gibson and Bergman (1954) and the data of Kraft and Green for special analysis.
That is, data are shown first with exponents and full egocentric depth range but
then with ranges incrementally truncated in the near range. This truncation sys-
tematically lowers the exponents. This means that the shape of perceived space
beyond about 50 to 100 m compresses faster than is captured by exponential
analysis. Exponents for the studies of Flückiger (1991), Gibson and Bergman
(1954), and Kraft and Green (1989) were calculated from the published data
for this analysis; others are available in Da Silva (1985).
221 Reconceiving Perceptual Space
measures for ordinal depth in the literature and then used them to make
suprathreshold comparisons (Cutting and Vishton 1995). The results are
shown in the top panel of figure 11.2.
In the panel, threshold functions for pairwise ordinal distance judg-
ments are shown for nine sources of information known to contribute to
the perception of layout and depth. They are based on various assump-
tions and data and applied to a pedestrian. Inspired by and elaborated
from Shojiro Nagata (1991), the data are plotted as a function of the mean
distance of two objects from the observer (log transformed) and of their
depth contrast. Depth contrast is defined as the metric difference in the
distance of two objects from the observer divided by the mean of their
two depths. This measure is similar to that of Michelson contrast in the
domain of spatial frequency analysis.
Notice that, plotted in this manner, some sources are equally effica-
cious everywhere. In particular, I claim that potency of occlusion (the
most powerful source of information), relative size, and relative density do
not attenuate with the log of distance. Depth contrasts of 0.1%, 3%, and
10%, respectively, between two objects at any mean distance are sufficient
for observers to judge which of the two is in front. However, other sources
are differentially efficacious. Most—accommodation, convergence, binoc-
ular disparities, motion perspective, and height in the visual field—decline
with the log of distance. One, aerial perspective, actually increases with
the log of distance (it is constant with linear distance), but this source of
information, I claim, is used generally to support luminance contrast infor-
mation for occlusion. Notice further that, integrated across all sources,
the “amount” and even the “quality” of information generally declines
with the log of distance. This seems the likely cause of compressed per-
ceived distances.
The general shapes of the functions in figure 11.2, plus a few practical
considerations, encouraged us to parse egocentric space into three regions
(see also Grüsser and Landis 1991). Personal space, that which extends a
little beyond arm’s reach, is supported by many sources of information
and is perceived almost metrically (Loomis et al. 1996), that is, with an
exponent of 1.0. This means that observers can generally match a lateral
distance of, say, 40 cm with a 40 cm distance extended in depth. After 2 m
or so, height in the visual field becomes an important source (a normal
adult pedestrian cannot use this information until objects are at least
1.5 m distant), and motion perspective for a pedestrian is no longer a blur;
but accommodation and convergence cease to be effective.
223 Reconceiving Perceptual Space
Figure 11.2
The upper panel shows the ordinal depth thresholds for nine sources of infor-
mation (panel modified from Cutting and Vishton 1995). Egocentric depth (the
abscissa) is logarithmically represented. Depth contrast (the ordinate) is measured
in a manner similar to Michelson contrast in the spatial frequency domain. That
is, the difference in the metric depths of two objects under consideration (d1 : d2)
is divided by the mean egocentric distance of those objects from an observer
[(d1 + d2)/2]. Egocentric space is also segmented into three general regions, which
grade into one another. Personal space, out to about 2 m, is perceived as Euclidean
(exponent of 1.0); action space, from about 2 to about 30 m, demonstrates some
foreshortening (exponents near 1.0); and vista space, beyond about 30 m, often
demonstrates considerable foreshortening (exponents declining to .40 or so, as
suggested in figure 11.1). Different sources of information seem to work differen-
tially in the different regions. An assumption made is that threshold measure-
ments, shown in the panel, are good indicators of suprathreshold potency. The
lower panel shows those thresholds isolated for pictorial sources of information,
with personal space excluded because few pictures (before the twentieth century)
depict space within 2 m of the viewer.
224 James E. Cutting
deprivation (Wallach, Moore, and Davidson 1963); many people over the
age of forty can no longer accommodate; and, of course, adults are of dif-
ferent sizes and youngsters crawl and toddle, further changing the func-
tion for height in the visual field. All of these factors contribute, I claim,
to the lability of perceived distances across people and environments.
Figure 11.3
The two images on the left are from Swedlund (1981), reprinted with permission
of Holt, Rinehart, and Winston. These images show the changes in pictorial space
due to changes between a short and a long lens. The point from which the bottom
image was photographed was about four times the distance of that for the top
image, and both were cropped and sizes adjusted to make the appearance of the
sign in the foreground the same. The diagram to the right is a plan view of the lay-
out of the sign, the two backboards, and the two camera positions. It was recon-
structed knowing that backboards are usually about 25 m apart and from the
measurements of their heights (in pixels) in the digitized images. (Photographs
from Photography: A handbook of history, materials, and processes, 2nd edition by
Charles Swedlund and Elizabeth U. Swedlund, copyright © 1981 by Holt, Rine-
hart and Winston, reproduced by permission of the publisher.)
228 James E. Cutting
Figure 11.4
An affine reconstruction of pictorial space. The left panels show slices through
untransformed pictorial space as seen from the composition point and the image
seen by the observer; the other panels show transformations and views due to
changes in observer position away from the composition point. Top panels show
the affine transform in horizontal planes; the bottom-right panel shows a similar-
ity transform (enlargement). After Cutting (1986b).
person has the same image size as the former seen through a 50 mm
lens. Thus, familiarity with sizes of objects in the environment shifts, in
this case, the whole of the environment shifts one log unit toward the
observer; familiarity with what one can do with those objects move distant
ones into what appears to be action space. (In telephoto images, as in reg-
ular images, there typically is no content in personal space.) Thus, what
passes for action space now really extends out to perhaps 300 m.
If we carry with us a set of expectations about objects and the efficacy
of information in real space to what we see in the telephoto image, per-
ceptual anomalies will occur (Cutting 1997). Compared to the perception
of environmental and normal pictorial spaces, relative size differences and
relative density differences are reduced. Consider size. Basketball back-
boards are normally about 25 m apart. In the Swedlund images in figure
11.3 the ratio of sizes (linear extent) of the two backboards is 1.5:1 in the
top image but only 1.24:1 in the bottom image. The perceived distance
229 Reconceiving Perceptual Space
difference in the top image could rather easily be 25 m; that in the lower
image could not. In near space, we expect smaller relative size differences
for similar objects closer together. In addition, height in the visual field
also changes; the difference in height between the base of the poles for the
two backboards is three times greater in the telephoto image. We expect
larger height-in-the-visual-field differences for nearer objects.
Figure 11.5
The data of Kraft and Green (1989) are presented in the left panel, showing per-
ceived distances of posts as functions of their physical distance from a camera and
its lens. The middle panel shows a model prediction based on the combination of
physical distance and lens length; and the right panel shows a prediction based on
this parameter and an exponent of .68. The fit shown in the left panel is statisti-
cally superior to that in the middle panel, revealing compression in perceived
depth.
Yes—at least in terms of principles if more than the end result. Earlier I
had suggested that psychologists and others interested in pictorial space
have spent perhaps too much time with photographs and linear perspec-
tive paintings, drawings, and etchings. These are among the more recent
of pictures that human beings have crafted. Manfredo Massironi and I
(Cutting and Massironi 1998) suggested one might equally start at the
other end of time, at least from the perspective of human culture. Cave
paintings—some of which are at least 30,000 years old—as well as car-
toons, caricatures, and doodles are made up of lines. These pictures are
not copies of any optical array, and yet as images they depict objects well.
232 James E. Cutting
Figure 11.6
A taxonomy of lines. Edge lines separate the regions on either side and assign dif-
ferent ordinal depth. Four figures that play with this relationship are shown: (a)
the faces/goblet illusion, after Rubin (1915); (b) an ambiguous occlusion figure,
after Ratoosh (1949); (c) the devil’s pitchfork, after Schuster (1964); and (d) a rec-
tangle/window, after Koffka (1935). Next are shown some object lines: (e) is a mid-
twentieth-century version of a television antenna; and (f) shows the twigs at the
end of a tree branch. Third are figures with crack lines: (g) is the mouth of a clam,
after Kennedy (1974); and (h) is a crack in a block. Note that (f) and (h) are exact
reciprocals—switching from object line to crack line. Finally, four texture line
types are shown: (i) texture edges of occlusion in cobblestones, after de Margerie
(1994); (j) texture objects as palm fronds, after Steinberg (1966); (k) texture cracks
as mortar between bricks, after Brodatz (1966); and (l) texture color, indicating
shadow. (Adapted from Cutting and Massironi, 1998)
234 James E. Cutting
Figure 11.7
A thresholded detail from a painting in La Grotte Chauvet demonstrating the use
of four types of lines—edges, objects, cracks, and textures. Because this image is
at least 30,000 years old, it would appear that this typology of lines is not cultur-
ally relative but possibly biologically engrained. (Image reprinted from Clottes,
2001, with the kind permission of Jean Clottes, Ministère de la Culture)
Figure 11.8
From pairwise ordinal depth relations to five ordinal depth planes. A thresholded
detail from perhaps the most famous and controversial image in La Grotte
Chauvet. Many claim it should not be read as four horses in depth, although I dis-
agree. If it is read in depth, one can follow the assignments of occlusions, using
edge lines [bAa]. Each of four pairs has this assignment. If this information is fed
into a comparator that knows that objects can occlude, the end result is five ordi-
nal pictorial depths, [a]–[e]. (Image reprinted from Clottes, 2001, with the kind
permission of Jean Clottes, Ministère de la Culture)
ACKNOWLEDGMENT
I thank Claudia Lazzaro for many fruitful discussions and for her patience
in listening to many ill-formed ideas, Robert Kraft for sharing method-
ological information about this study of pictorial depth, and Patrick
Maynard for some insightful comments.
237 Reconceiving Perceptual Space
Notes
1. Earlier in his career, however, Gibson was less sure: “It is theoretically possible
to construct a dense sheaf of light rays to a certain point in a gallery or a labora-
tory, one identical in all respects to another dense sheaf of light rays to a unique
station point thousands of miles away” (1960, p. 223), although he denied it was
obtainable in practice.
2. This is an axiom from Euclid’s optics (Burton 1945). Euclid’s axioms and his
proofs deal with both stationary and moving observers.
3. This is not to say that people cannot be trained to perceive vista space more
veridically; clearly they can, and the data of E. Gibson and Bergman (1954)
and Galanter and Galanter (1973) show this. The focus of this chapter, however,
is on what normal adults, without specific distance training, will perceive and
judge.
4. Flückiger (1991) had observers judge the distance of boats floating on Lake
Leman, looking from near Geneva toward Montreux. Only relative size and
height in the visual field would provide firm information, perhaps with the addi-
tion of aerial perspective and relative densities of waves. Experimental results for
judgments between 0.2 km and 2.0 km and between 0.75 km and 2.25 km were
quite tidy; whereas those between 2.8 km and 5.6 km were not. Flückiger did not
fit exponents to his data, but this is easily done. Mean exponents for the first com-
plete data sets were .36 and .44, respectively. In addition, I have omitted the data
of Galanter and Galanter (1973) here. Their observers were highly trained and,
although they viewed objects up to 10 km into the distance, they viewed them
from airplanes at altitudes of about 60 m. This raises the function of height in the
visual field by a factor of 40, as discussed in the next section.
5. Exponents become unstable when a first pair exhibits overconstancy (more per-
ceived distance between them than physically present). When this occurs, expo-
nents tend toward zero. This also happened in the analysis of the data of Flückiger
(1991).
6. Linear perspective, often cited as a single source of information, is really a con-
comitance of occlusion, relative size, relative density, and height in the visual field,
using the technology of parallel straight lines and their recession.
7. Kraft and Green (1989) believed that the foreshortening of space seen in pho-
tographs due to the use of lenses of different lengths is caused by the truncation
of the foreground. Although this is a possibility, the purist form of the argument
is that the foreground is truncated but the remainder of space remains the same.
Their own data (Kraft and Green, 1989, experiment 1) are not consistent with this
idea; they show a fan effect rather than parallel lines.
8. The Kraft and Green article did not give full details of the presentation of the
stimuli. Thus, I contacted Robert Kraft and he kindly provided additional exper-
imental details (Kraft, personal communication, May 24, 2000). First, the length
of the lens used to project the images was 100 mm. This means that the images are
half as large than they would be with a 50 mm projector lens from the distance of
238 James E. Cutting
the projector. Ideally, when seen from the projector, such diminution should dilate
depth by a factor of about 2.0 (multiplying distances by about 2.0; see Cutting,
1988, for a similar analysis). However, the distance of the projector to the screen
was at 6 m, twice the mean distance of the observers (3 m). This latter effect
enlarges the image by a factor of 2, compressing depth (multiplied by 0.5). These
two effects are linear and should cancel. Theoretically one would expect perceived
distances to be compressed by a factor of 0.5 · 2.0 (multiplied by a factor of 1.0).
This also means that the images seen in the experimental situations subtended the
same angle as the corresponding images in the real world.
9. The most convincing comparisons in the data of Hecht, van Doorn, and
Koenderink (1999) are for angular judgments of the intersections of wall of build-
ings at ZiF, which has few right angles.
Chapter 12
Pictorial Space
When you are handed a “straight” photograph of, say, a statue you can
either
• look at it and see a flat piece of paper covered with pigments in a certain
simultaneous order, or
• you can look into it and see an object (“the statue”) in “pictorial space.”
Issues of “Information”
Here is a remarkable fact: When you look at the scene in front of you
you pretty much see the scene in front of you. We mean this in the sense
that you find no problem in making judgments of either a geometrical
or a physical nature that you know are likely to be corroborated by vari-
ous explorations you might make were you to walk up to the scene and
explore it in various ways. Examples of geometrical attributes would be
sizes, distances, and spatial orientations; examples of physical attributes
would be such properties as being rough, smooth, metallic, wooden, or wet.
Although you know that your judgments are unlikely to be correct in
every detail, you are convinced that they are good enough for most inter-
actions with your trusty environment. When you get out of your generic
biotope, things may turn out to be different than expected, though, and
you have to be more careful; that is, you have to reckon with various
forms of erroneous—in the sense of leading you to unsuccessful action—
judgments. For instance, you are likely to seriously misjudge distances
and shapes when transferred to the submarine environment (Lythgoe
1979), and you will probably misinterpret X-ray pictures.
Because your only physical interaction with the scene is of an optical
nature, it must be the case that you interpret the “radiance” at the loca-
tion of your eye in terms of “information” relating to the geometry and
physical properties of the scene. For the moment we consider only monoc-
ular vision.
242 Jan J. Koenderink and Andrea J. van Doorn
The radiance is a function of position and direction that for each posi-
tion and direction specifies the photon spectral number density per unit
solid angle, unit area, unit time, and unit photon energy interval. It is sub-
ject to few constraints. Photon number density is nonnegative throughout,
and, at least in empty space, the radiance at different locations in the direc-
tion of their connecting line must be the same. Otherwise, the radiance is
quite arbitrary. The best intuitive way to think of radiance is as an infinite
storage cabinet containing all possible photographs, taken from any con-
ceivable viewpoint, field of view, view direction, spectral filter, and so
forth. Its structure is extremely complicated in the sense that when you
increase the resolution you obtain more and more variation. You would
see all the hairs on the body, the tiny organisms that live in the crevices of
the skin, all the tiny articulations of these, and so forth. In that sense the
radiance is not a simple function (i.e., it is not continuous and differen-
tiable) at all, but what mathematicians call a “distribution.”
From the perspective of physics the radiance is due to the scattering
of photons emanating from certain “primary sources,” for example, the
sun, by the scene to your eye. Multiple scattering is the rule. A photon
that arrives at your eye is likely to have been scattered many times by
the atmosphere and various objects in the scene. All photons of a given
energy are the same; they don’t carry an imprint of their history. It is only
the radiance as a whole that may be interpreted in terms of a history of
scattering events in the scene (Born and Wolf 1959).
Your eye transducts the radiance at the position of the cornea into a
simultaneous-successive order of action potentials in the optic nerve. The
retina doesn’t distinguish between the energies of photons once they have
been absorbed (see the “Law of Univariance”; as described by Wyszecki
and Stiles 1967), and many different patterns of photon absorptions lead
to identical optic nerve patterns. The structure of the “input to the brain”
reflects—in an extraordinarily complex way—the structure of the scene in
front of you (Zeki 1993). Some correlations between certain aspects of this
structure and certain aspects of the structure of the scene are understood,
or even well understood, but much still awaits discovery. That is one
reason, though not the most important one, why we deem it prudent to
distinguish between “structure” and “information.”
Structure simply refers to aspects of spatiotemporal order. It implies a
topology (adjacency in space-time) but has nothing to do with any refer-
ence to the scene. Information is structure referred to a real or imagi-
nary scene by an observer. Thus the concept of information implies an
243 Pictorial Space
observer with experience with generic scenes who posits a particular scene
and takes the available structure as referring to that scene. This is Franz
von Brentano’s “intentionality” (Intentionalität), which turns structure into
information in the sense of “meaning” (Brentano 1874). To take structure
for information in this way is an act of faith on the part of the observer.
Structure is not intrinsically information in any way, although most text-
books are designed to mislead you on the issue. That is why information
can be wrong, even information concerning the (real!) scene in front of
you. Structure is never wrong, nor is it ever true; it simply exists. Of course
the act of faith on the part of the observer need not be a conscious one.
It typically is not. Organisms simply have to stick their necks out or they
won’t survive.
Classically (Berkeley 1709/1975) optical information is discussed in
terms of “cues.” The meaning of cue is differently construed by various
authors. Here we take it to refer to structure interpreted by an observer in
terms of knowledge of “ecological optics” (Gibson 1950). “Knowledge” is
understood in the sense of “know-how,” or capability. Such knowledge is
of two kinds, generic and particular. Knowledge of generic ecological
optics coincides with the laws of physics as they apply to perception. An
example would be that large and small images of similar things indicate
near and far distances of these things. Generic knowledge holds good in
virtually all biotopes. In contradistinction, particular knowledge applies
to particular biotopes and may easily fail in the wrong setting. An exam-
ple would be that strong and otherwise unaccountable contrasts (“glitter”)
indicate wet or metallic surfaces. This won’t work in an art gallery where
chrome objects are rendered in paint on canvas. Notice that the knowl-
edge, be it generic or particular, is almost never overt. It may even be
anatomically implemented (Riedl 1988; Vollmer 1990). It always derives
from uncontradicted experience though, whether phylogenetically or
ontogenetically.
Because generic ecological optics (Gibson 1950) coincides with a subset
of physics, it is the same for all observers (species) and can be formally
understood via an analysis of the pertinent physics. Much of our present
knowledge derives from applied physics and computer vision. Whether
a given observer or species actually exploits such knowledge is of course
up to behavioral verification. Empirically we find that many generic facts
of ecological optics are reflected in the behavior of very different species,
though the precise implementations may differ widely. After all, flies are
constructed differently from man, though both orient and navigate partly
244 Jan J. Koenderink and Andrea J. van Doorn
via “optical flow.” This indicates, by the way, that a mindless reduction-
ism is not going to get one very far in understanding such feats of living
organisms.
Types of “Pictures”
There exists a bewildering variety of artifacts that are commonly referred
to as “pictures.” In this chapter, we reserve the term “images” for percep-
tions and use “pictures” for such things as photographs, paintings, and
drawings. Although pictures are simply physical objects, they are pictures
to an observer. That is to say, an observer has to take them as a picture. A
footprint in the sand is a picture to human observers, whereas a photo-
graph is merely a piece of paper to your dog. Images are in the mind,
whereas pictures may be handled. Here we are mainly interested in pic-
tures “of” something, either of actually existing things, such as a horse
or a cow, or imaginary things that still fit into some observer’s realm of
expertise in ecological optics. Think of unicorns or dragons, for instance.
For our purposes it is of minor interest whether a picture has been pro-
duced by some mechanical means such as footprints on the beach, photo-
graphs, or computer graphics, or whether it came about through an act of
artistic rendering. Indeed, it is impossible to decide on the basis of the pic-
ture as object alone. What will be important is the particular bouquet of
cues that the picture offers the observer. For instance, a silhouette, a car-
toon (line) drawing, a painting in flat tones, a pointillist (realistic) paint-
ing, and a straight photograph offer quite different bouquets of cues (see
figures 12.1 and 12.2). Remember that cues are bits of information and
exist only in relation to certain observers.
Because cues can be, and often are, qualitatively different chunks of
information, different pictures offer not simply different amounts of infor-
mation but qualitatively different informational contents. Thus different
renderings of a single “fiducial object” may well specify distinct pictorial
objects to an observer (see figure 12.2). The fiducial object may be a real
(or physical) object in the case of photographs, or a concept in the head
of an artist (a unicorn, say) that may materialize as a painting, drawing,
and so forth.
In many cases it may not be very useful to speak of a fiducial object at
all, because artists often aim at a concrete result, such as a painting, rather
than (or in addition to) the expression of an idea. Whereas many ideas can
be drawn, painted, or described in words, this need not be universally
the case. For instance, painted unicorns cannot be drawn, for the simple
245 Pictorial Space
Figure 12.1
Four pictures with identical geometry but different rendering.
246 Jan J. Koenderink and Andrea J. van Doorn
Subject CC
Silhouette Cartoon
Illumination Illumination
from lower right from upper left
247 Pictorial Space
reason that drawings are not paintings. In such cases, like in the case of a
face seen in a cloud, the picture is all there is.
Modes of Seeing
Pictorial spaces are mental entities and depend upon the observer just
as well as upon the picture. Without observers there are no images and
pictures remain forever “planar objects covered with pigments in cer-
tain simultaneous orders”; to echo Maurice Denis: “Remember that a
picture—before being a battle horse, a nude woman, or some anecdote—
is essentially a plane surface covered with colors assembled in a certain
order” (Denis 1890/1976, p. 380). There are many different modes of see-
ing: a face may be seen in a cloud, a person may be seen frontally (facing
you) in a painting viewed obliquely, a photograph may be seen as a flat
object and you may see a pictorial space in it simultaneously, and so forth.
Of particular interest are cases in which the observer assumes a certain
viewing mode with the express purpose to influence the structure of picto-
rial space. Such viewing methods are often practiced by artists (Leonardo
da Vinci 1989) to either flatten optical space for purposes of drawing, or
deepen pictorial relief for purposes of monitoring the effectiveness of ren-
dering. Optical space is flattened through such devices as monocular view-
ing, “screwing up the eyes,” viewing through a wireframe, using a Claude
glass (tinted mirror) or camera obscura, and so forth (Dubery and Willats
1972). (See figure 12.3.) Pictorial space is deepened through monocular
viewing, deemphasis of the frame, a frontoparallel vantage point, and so
forth.
The case of deepening the relief in pictorial space is interesting, because
the structure relating to the pictorial relief that is sampled by the eye(s) is
obviously the same whether you view the picture monocularly or binocu-
larly. Yet the “information” is different because the structure relating to
the actual scene—you holding the photograph and looking at it—is dif-
ferent. With two eyes open, there are strong indications that you are look-
ing at a flat object. Apparently this cue “conflicts” with the pictorial cues
proper. More on that later. This conflict is an indication that a clean
Figure 12.2
Pictorial relief for the four pictures shown in figure 12.1 with identical geometry
but different rendering. The lines are loci of constant depth. The pictures represent
different pictorial objects to the observer CC. In the case of the silhouette you may
guess the object is a torso, but it is ambiguous whether you see the front or back
of it. This observer apparently split the difference.
248 Jan J. Koenderink and Andrea J. van Doorn
Figure 12.3
(Left) “framing” the scene in front of you and looking at and through the frame
with one eye is a time-honored method to “flatten” a scene. Optical space starts to
look as a picture. (Right) cutting out context is an excellent way to remove depth
and relief in order to be able to focus on textural properties and color.
segregation between “pictorial space,” that is, the space in which the pic-
torial content unfolds, and “physical space,” that is, the space that con-
tains the picture, is not necessarily made by a given observer.
A P
Expertise
in
ecological optics
I
space of images
Figure 12.4
The actual scene A gives rise to an actual image I, which again gives rise to a
veridical perception P. A “nonveridical” perception would indicate a scene that
would lead to a different picture. All scenes (like M) that lead to the picture I are
“metameric” to the actual scene and would be equally veridical perceptions. All
such metameric scenes are related by a group of ambiguity transformations.
The first problem we have to face in the study of “pictorial relief” is how
to “measure it.” The way we phrase it here perhaps sounds reasonable but
actually puts you on the wrong foot. The problem is that in order to meas-
ure something that “something” has to exist independently of the act of
measuring. If the something is operationally defined (by the measurement)
you don’t measure it, but rather the measurement creates “it.”
It is a priori evident that methods of measuring physical quantities will
not be applicable to the case of pictorial space, because pictorial space is
a mental entity. Likewise, physiological methods (electrophysiology, func-
tional brain imaging, etc.) are by their nature unable to address the prob-
lem. Pictorial relief is a mental entity and thus can only be probed via
mental operations. An example would be to have an observer describe the
pictorial space verbally or in terms of an artistic statement. Although such
methods certainly have their advantages, they are hardly suited to arrive
at a body of quantitative, parametric data.
252 Jan J. Koenderink and Andrea J. van Doorn
At the next level we may ask the observer for magnitude estimates
of “depth” or “surface attitude” (slant and tilt) and so forth of pictorial
points or surface elements. Such (verbal) estimates are judged to be
extremely difficult by typical observers and, what is worse, they yield very
variable responses (Mingolla and Todd 1986). Much easier are judgments
of relative magnitudes, such as “nearer” or “more distant” (Koenderink,
van Doorn, and Kappers 1996). Here we may establish fairly well-defined
discrimination thresholds under various, parametrically varied, conditions.
Such methods helped to make up most of the classical corpus of scientific
data on the topic, and they remain important to this day.
A problem with these classical methods that is hardly recognized in the
literature is that (discrimination) thresholds don’t yield any handle on
what the observers actually experience. Another problem is that it is pro-
hibitive to collect the really rich data sets we desire, say the equivalent of
an elaborate description, such as a fairly fine triangulation, of pictorial
relief. These data are indeed desirable because they would enable us to
perform calculations of a differential geometric nature and thus relate
apparently disparate data sets. For instance, you might want to predict
curvature data from attitude data through differentiation. But the opera-
tion of differentiation requires a fine-grained description. There is pre-
ciously little one can do with sparse data. In figures 12.5 and 12.6 we show
an example of what we mean by a “rich data set.” Here the response is
essentially a surface.
What is needed are methods that go beyond thresholds, yet avoid the
problems that bedevil absolute magnitude estimates, and that are fast,
that is, enable us to collect hundreds or thousands of data points within a
reasonable time span, such as an hour. Indeed, a picture may be roughly
characterized through at least 10 by 10 (100) to a 100 by a 100 (10,000)
samples, and a single experimental session cannot last for more than
roughly an hour (that is less than 10,000 seconds). Thus we need methods
that allow us to collect a data point at least every few seconds for periods
up to not much more than an hour. This evidently eliminates both mag-
nitude judgments, where observers say, “hmm, . . . , well, . . .” for a minute
per data point, and threshold measurements, which require a full psycho-
metric curve—at least a 100 yes/no judgments—per data point. It cannot
be stressed enough that powerful methods to address pictorial relief have
to be based on methods that yield rich data, essentially “fields of esti-
mates.” For instance, a dense field of depth estimates can be considered
a “surface,” that is, a true geometrical entity, whereas a mere handful of
253 Pictorial Space
Figure 12.5
(Left) a stimulus with two superimposed gauge figures. Notice that the white one
“fits,” and the black one doesn’t. (Right) Result of one-half hour’s work: a dense
field of gauge figure settings.
Figure 12.6
The result from the attitude sampling can be integrated to a depth field (left) or a
“pictorial relief” (surface in three-dimensional space) (right).
Notice that the slider bar might as well have been plotted outside the
picture frame proper.
In a “method of fit” an observer is asked to change the appearance of a
“gauge figure” in such a way that it “fits” the pictorial relief (see figure
12.7, right).
The method of fit is especially interesting because the judgment of fit
applies to pictorial space proper and to nothing external to it. This is not
true for the other methods, which introduce alien elements (markers,
pointers, etc.) that don’t properly belong to pictorial space but are graphi-
cal elements superimposed upon the picture that may (like the punc-
tate markers or the arrow) or may not (like the slider bar) relate to the
graphics sustaining pictorial space.
An instance of the method of fit that we have exploited frequently
(the method of fit is much more generally applicable, though) is the meas-
urement of surface attitude (Koenderink, van Doorn, and Kappers 1992).
(See figures 12.7, right, and figure 12.8.) We superimpose a gauge figure
that is under control of the observer, who is supposed to adjust it in such
255 Pictorial Space
Figure 12.7
(Left) example of a method of adjustment. Notice that the arrow exists in the pic-
ture plane, not in pictorial space. (Center) example of a method of reproduction.
Although the punctate markers seem to lie on the pictorial surface, the slider bar
is in the picture plane and seems fully unrelated to pictorial space. (Right) exam-
ple of a method of fit. The “gauge figure” belongs to pictorial space and relates to
pictorial objects.
a way that it “clings to the pictorial surface.” From the setting we derive
numerical values of slant and tilt, but at no time the observer is required
to estimate or even distinguish between slant and tilt. The task is simply
to establish a fit in pictorial space. Observers find this an easy (almost
trivial) task, whereas they are hard put to estimate slant and tilt values
when asked to do so.
We belabor this point because it is so often misconstrued. It is not that
the observer has to estimate the attitude of the pictorial surface and of the
gauge figure separately and somehow make them become equal. The task
is simply to produce a fit, and the slant and tilt never enter into this. The
fit can be judged cheerfully within a small fraction of a second, whereas
observers take up to a minute to estimate slant and tilt explicitly—and bit-
terly complain about that task. What is important here is that the method
of fit requires only judgments that are intrinsic to pictorial space. This
important fact is generally misunderstood. For instance, people have
256 Jan J. Koenderink and Andrea J. van Doorn
Ti
lt
Slant
Figure 12.8
Gauge figures used to probe spatial attitude of pictorial surface elements.
Parameters are the slant and tilt angles. The array shows a “natural” planar inter-
face: a polar diagram with slant as radius and tilt as orientation.
studied the judgment of slant and tilt of a gauge figure in isolation and
used the result to “correct” the results of the fit method. It is entirely
unclear that such a procedure is at all reasonable unless one is ready
to take very strong and—to our mind—unlikely prior assumptions for
granted. It seems much more reasonable to take the results of the method
of fit as an “operational definition” of pictorial relief and to take the
work on gauge figures in isolation as (perhaps interesting) essentially un-
related data. In order to relate such initially unrelated data, one stands in
need of some theory. The theory itself is subject to empirical verification
of course.
P P P
T Q
Figure 12.9
The relief in the neighborhood of a point P in the zeroth-order (leftmost, only
location), first-order (center, tangent plane T added), and second-order (right-
most, “osculating quadric” Q added) approximation.
Orders of Differentiation
When “pictorial relief” is a more or less smooth surface, like the surface
of a polished marble sculpture, then it should be possible to speak sen-
sibly of various “orders of approximation” to the surface (Koenderink
1990). (See figure 12.9.) In the “zeroth order” we merely indicate the loca-
tion of a point on the surface. Thus we know where the surface is, but we
don’t know its attitude, let alone its curvature, and so forth. In the “first-
order” approximation we also know the attitude of the tangent plane at
the point. Then we know both the location and the orientation of the sur-
face, but not its curvature, and so forth. In the “second-order” approxi-
mation we know in addition the “osculating quadric” of the surface. Then
we know the location, attitude, and (local) shape of the surface. The local
shape is defined via the directions of principal curvature and these princi-
pal curvatures themselves, and so forth, for the still higher orders.
When we indicate a point in the picture (two degrees of freedom, e.g.,
Cartesian coordinates), the zeroth order gives another degree of freedom
(the depth, a third Cartesian coordinate), the first order adds two degrees
of freedom (e.g., the slant and tilt angles of the surface), and the second
order adds another three degrees of freedom (the direction of the maxi-
mum curvature and the two principal curvatures, e.g.).
258 Jan J. Koenderink and Andrea J. van Doorn
Figure 12.10
“Integratability constraints.” On the left, the “deck of cards” example. Second
from the left, a case where a field of tangent planes can easily be integrated to a
closed ribbon. For the case second from the right, this doesn’t work (the planar
elements at the endpoints are at different depths). The case on the right is differ-
ent: the planar facets at the endpoints have different spatial attitudes. In order to
obtain a “fit,” the field of planar elements has to be quite special.
AD AK
Figure 12.11
On the left, the stimulus; on the right, two pictorial reliefs from different observers.
Depth runs to the right. Notice that the reliefs are different and differ mainly by a
depth stretch.
Figure 12.12
On the left, the stimulus, and in the center, a pictorial relief with depth increasing
toward the right. On the right, a picture of the actual scene taken at right angles
with respect to the viewing direction for the stimulus. “Naive veridicality” would
imply identity of the outlines of the relief (center) and picture of the rotated object
(right). Clearly it does not pertain.
Viewing Modes
When a method such as the one discussed yields a pictorial relief as psy-
chophysical response to a photograph (the stimulus), we may study the
stimulus-response relation over a large number of parametric variations
of both the stimulus and the viewing conditions. In this section we con-
sider the variation of viewing conditions.
265 Pictorial Space
Figure 12.13
Stimulus and three pictorial reliefs, from left to right, with binocular, monocular,
and synoptical (see figure 12.14) viewing. In these pictures the depth dimension
is vertically upward. Notice the differences in total depth range and qualitative
similarity of shape.
When only the viewing conditions are changed, the photograph is invari-
ant and so are all cues that relate to pictorial space. It is only the cues
that relate to the photograph as an object in optical space that are varied.
In conventional language we may speak of a “cue conflict” situation.
Because the observers are always simultaneously aware of both the opti-
cal space that contains the photograph as a physical object and the pic-
torial space that exists as a completely disparate entity, cue conflict is
perhaps an unfortunate term, as it suggests that different cues are in
conflict with respect to the resolution of a single entity. However that
may be, it is evident that a change of viewing conditions, such as viewing
monocularly versus binocularly, looking straight at the photograph versus
viewing it at an oblique angle, and so forth, affects the cues that relate to
the photograph as a flat object in optical space. They do not affect the cues
that relate to pictorial space.
We find that such changes in viewing conditions have a strong influence
on pictorial relief (see figures 12.13 and 12.14). As perhaps expected, the
relief remains qualitatively the same—after all, the pictorial cues are iden-
tical. What happens is that pictorial space suffers a “depth stretch,” whose
amount is immediately correlated with the viewing condition. We find that
the depth of pictorial relief is expanded by factors up to two when the
observer changes from binocular to monocular vision by closing one eye,
assuming normal binocular stereopsis. Similarly, we find that depth of
266 Jan J. Koenderink and Andrea J. van Doorn
binocular depth
200 synoptical depth
100
Cue Variation
When we fix viewing conditions, for example, monocular viewing, and
vary the photograph we expect qualitative changes of the pictorial relief.
If such changes do not occur, or are only minor, we may well use conven-
tional terminology and speak of “constancy.”
267 Pictorial Space
Figure 12.15
Four stimuli with identical geometry but different chiaroscuro. The illumination is
from top left in the top left figure; in the top right figure it is from the top right; in
the bottom left figure it is from bottom left; and the bottom right figure is illumi-
nated from the bottom right.
269 Pictorial Space
UL UR
LL LR
Figure 12.16
The pictorial reliefs obtained from the stimuli shown in figure 12.15 for a single
observer. The curves are loci of constant pictorial depth.
270 Jan J. Koenderink and Andrea J. van Doorn
UL UR
LL LR
Figure 12.17
The residuals for the pictorial reliefs of a single observer (shown in figure 12.16)
obtained with the stimuli shown in figure 12.15. Notice that the distribution is
systematic, rather than random. The pattern of deviations from the mean of the
four reliefs depends systematically upon the direction of illumination of the scene.
271 Pictorial Space
space, that has been obtained via the integration of many (a field of) local
samples of surface attitudes. The local settings are the immediate result of
the observer’s efforts; the integration is done only after all settings are
completed, via a mechanical process of numerical integration. Remember
that the settings are obtained in random order, quite independently of
each other. It is an empirical question whether the surface as a global
entity is available to the observer, in the sense of being a causal factor in
the successful completion of some task, or not. A priori this availability
is not necessary to the observer’s success, even though all the necessary
data to determine the global surface (the individual settings) are evidently
available to the observer in the sense that they are produced when the
observer is asked to perform the gauge figure fitting task.
In order to establish whether the observer has access to the global sur-
face as an integral entity, we need a task of a global nature. A possibility
that we have explored (Koenderink and van Doorn 1995) is to put two
punctate marks on the photograph and let the observer decide which of
the marks appears nearer in depth. The marks are on the photograph, but
they are seen in pictorial space and they appear to be located on the picto-
rial surface. They don’t appear to hover in front of the pictorial surface,
for instance. Thus the task is accepted as a natural one by all observers.
When we fix one mark and vary the location of the other mark, we can
explore the probability of the observer’s judging the mark “nearer” in a
two-alternative forced choice task (“nearer/farther”). When the observer
has access to the integral surface, we expect the result to be almost binary.
For the mark should be closer with 100% probability if the mark on the
global surface is nearer than the fiducial mark, and it should be closer with
0% probability in case it is farther away. Only on the curve on the surface
that is the locus of points all equal in depth to the fiducial mark do we
expect probabilities of intermediate values. The task should be a trivial
one. Here is how to do it: simply threshold the integral surface at the
depth of the fiducial mark.
But this is not at all what we find (see figures 12.18 and 12.19). We find
that the depth order of any two marks can be established with some
degree of certainty only in cases where the points can be joined by a path
that either goes into depth or toward the observer, but not when such
paths don’t exist and there is a hill or valley that separates the points
(see figure 12.19). It is as if the observers can only check whether the direc-
tion of the gradient at nearby points matches up. Thus the pictorial relief
as a global geometrical entity is not available to the observers, even
272 Jan J. Koenderink and Andrea J. van Doorn
Figure 12.18
(Left) A stimulus. (Right) The pictorial relief as measured via gauge figure fits.
though a field of local data that is sufficient to determine the global entity
was obtained psychophysically and is thus in principle available to the
observers.
This is a very important and perhaps somewhat surprising finding.
There is much information (or structure?) in the brain that is not available
to our consciousness. What enters the brain need not enter the mind.
Task Variation
When the task is varied for the same picture and viewing conditions, we
have a case in which the “freedom” of the observer is constrained by the
picture, that is to say, the bouquet of cues. Though we may expect differ-
ent results from different tasks, all these results should be compatible with
the cues. We have found a particularly surprising example (Koenderink
et al. 2000b) of a pair of similar tasks that turn out to yield spectacularly
different results (see below). We use a method of reproduction that requires
the observer to produce a (global) “normal section” of a pictorial object.
A normal section is defined as the intersection of the pictorial object with
a plane that contains the visual direction. In the picture, such a plane can
be represented through a straight line. In the paradigm we simply draw
such a line over the picture and let the observer produce the shape of the
273 Pictorial Space
Figure 12.19
(Left) Prediction of the probability for any point to be closer than the fiducial
point (gray hexagonal dot) arrived at by thresholding the pictorial relief obtained
from a gauge figure fitting method. (Center) Prediction on the basis of depth gra-
dient following. (Right) The actual probabilities obtained empirically. Notice that
the result is perhaps closer to the latter than the former prediction, although the
observer can do the global task to some extent.
adjust adjust
A B
Figure 12.20
The method of reproduction of normal sections. The observers have to drag points
in an auxiliary (A or B) orthogonally to the line.
60 60
B B
40 40
20 20
0 0
-20 -20
-40 -40
-60 A -60 A*
-60 -40 -20 0 20 40 60 -60 -40 -20 0 20 40 60
Figure 12.21
(Left) A straight scatter plot of the depths at corresponding points obtained with
methods A and B. (Right) Result of a multiple regression including the picture
plane coordinates.
A B A*
Figure 12.22
(Left) Pictorial relief (depth in the horizontal direction) obtained with method A.
(Center) Pictorial relief for the same stimulus, obtained with method B. These
reliefs produce the scatter plot shown in figure 12.21, left. (Right) The pictorial
relief shown on the left after an affine transformation obtained via multiple regres-
sion including the picture plane coordinates (figure 12.21, right). Now the reliefs
obtained with the two methods are nearly equal. Notice that the transformation
looks much like a rotation (although it is a depth shear).
Figure 12.23
On the left the stimulus, a “turtle” by Constantin Brancusi; on the right, the
(rather coarse) triangulation used in the study.
276 Jan J. Koenderink and Andrea J. van Doorn
100 100
0
0
-100
-100
even such a (at first blush well-defined and simple) geometrical prop-
erty as “frontoparallelity” need not be well defined, and indeed may be
extremely volatile with respect to minor task variations, in pictorial space.
Such a result should be shocking when you have been led to believe that
pictorial spaces due to photographs roughly resemble (in an Euclidean
way) the physical scenes in front of the camera at the moment of exposure.
They don’t, at least not in any naive sense.
We have found such transformations even in the case of single task
results. In such cases we sometimes find low correlations between observ-
ers. Scatter plots often reveal high correlations in certain areas of the pic-
ture. Apparently different transformations apply to different parts. An
277 Pictorial Space
Figure 12.25
A comparison of pictorial reliefs for two observers (same task, same picture). The
precise nature of the task is not important here; we find similar results with any
task. In the scatter plot of depths (on the right) we detect discrete branches. On
further analysis these correspond to contiguous regions (“parts”) in the picture
(figure at the left).
Does pictorial space have a similar structure to the space we move in? The
latter will be taken as the conventional three-dimensional Euclidian space.
Consider Euclidean space first. Euclidean space is infinitely extended and
looks essentially the same anywhere. Any line can be extended indefi-
nitely in either of its two directions and admits of an aperiodic, linear
metric. In contradistinction, the angular metric of the Euclidean plane is
periodic; the angles are confined to a finite range. In Euclidian space you
can make a full turn and end up in the same orientation. The difference
between the length and angle metrics precludes full duality between points
and planes. Although three distinct points determine a unique plane, three
different planes need not specify a unique point, because two of them may
turn out to be parallel. Although there exist parallel lines and planes, you
have no parallel points in Euclidian space, because any two points can be
joined with a line. Thus Euclidian space is much less symmetric than for
instance projective space. The group of orientation-preserving isometries
(also known as congruences, or motions) of Euclidian space contains rota-
tions and translations. “Euclidean shapes” are invariants under congru-
ences. By definition, two objects have “the same shape” when they can be
brought into superposition through a suitably chosen motion.
It is a priori evident that pictorial space is not at all like Euclidian space.
For instance, the “visual rays” are different in character from lines in fron-
toparallel planes. The length metric cannot be the same for these lines.
279 Pictorial Space
You cannot make a full turn in pictorial space, for it will be forever impos-
sible to obtain a dorsal view of a person (as pictorial object) when the
picture is a photograph in a ventral view. Thus the angle metric must
be aperiodic, like the length metric is. When we define pictorial lines as
straight, one-dimensional entities that correspond to straight lines in the
picture, then there exist “parallel points,” namely, points that cannot be
connected by any straight line. Such points coincide in the picture and
thus fail to define a line in the picture. They lie on a single visual ray. But
notice that the visual rays fail to count as proper pictorial lines because
they do not correspond to any line in the picture.
Thus pictorial space is quite different from Euclidian space, and it
becomes of much interest to establish its geometry in formal terms. The
next few sections are devoted to this issue.
Hildebrand’s Contribution
Hildebrand has to be considered the first person to establish a formal
framework for pictorial space. He noticed that observers often fail to dis-
tinguish between flat relief and sculpture in the round. Thus he concluded
that arbitrary depth dilations and contractions have to be considered
either congruences or similarities of pictorial space.
This idea has immediate consequences for the formal analysis of
“shapes” and “features” of pictorial objects. These must be invariants
under arbitrary depth scalings. We have studied these invariants previ-
ously: they are the “depth flow” (Hildebrand’s term, it means essentially
the congruence of curves of fastest depth descent on pictorial surfaces), the
near and far points (which are points on the pictorial relief where the local
tangent planes are frontoparallel), the ridges (like “divides” of the flow)
and ruts (the “courses” of the flow) (Koenderink and van Doorn 1995,
1998). Such features as ridges and ruts are indeed important features that
have evidently been deployed by many sculptors. We can easily compute
them for the pictorial reliefs we obtain empirically (see figure 12.26).
Figure 12.26
The Hildebrand invariants of pictorial relief in a particular case. Shown are near
points (circles with an open center dot), far points (circles with a filled center
dot), and saddle points (circles with a cross), as well as ridges (dark lines) and ruts
(gray lines). Near, far, and saddle points that don’t appear on the boundary are
frontoparallel points.
Figure 12.27
An example of the ambiguities associated with “shape from shading.” The convex
and the saddle-shaped surface patch can be illuminated in such ways (the illumi-
nation direction is indicated by the arrows) that a picture (taken vertically down-
ward) turns out to be identical for both of them (the figure on the right).
Figure 12.28
The patches on the left and center are uniform. The shape from shading inference
is that they are planar, though their spatial attitudes are indeterminate. The patch
on the right reveals some nonuniformity; thus it cannot be planar.
depth
nl
p
np
P
Figure 12.29
“Normal lines” (denoted nl, these correspond to the pixels, e.g., p in the picture
plane P) and “normal planes” (denoted np, these are straight lines in the picture
plane P) are invariants of pictorial space.
Figure 12.30
The group of congruences in pictorial space allows the observer to adjust the
“direction of view of the mental eye” ad libitum. Any patch of pictorial relief can
be made to appear frontoparallel.
Figure 12.31
Illustration of pictorial congruences.
ian sense) have effects that are in many respects reminiscent of the trusty
rotations from Euclidean space. They are indeed very similar when small
but appear vastly different when large.
The relation between two points {x1, y1, z1} and {x2, y 2, z2} that is con-
served by the congruences is U(x1 : x2)2 +(y 1 : y 2)2, that is, simply the
Euclidian distance in the picture plane. One says that pictorial space has
a “degenerate metric” because the metric “forgets” the depth dimension.
When two points coincide in the picture plane they can still be different
because they may differ in depth. For such points the depth difference
z1 :z2 is conserved, and it can be used as a “special distance.” For a pair
of generic points the depth difference is typically not conserved. Thus we
need two types of distances in pictorial space (Strubecker 1956). One is the
distance in the picture plane: it applies to any pair of points seen in differ-
ent directions. The other is the depth difference: it applies (only) to points
seen in the same direction. Both distances are conserved under arbitrary
pictorial congruences.
The congruences (by design) conserve a family of parallel directions
(the direction of the z-axis, or “visual direction”). This conservation has
the immediate consequence that you can’t make a full turn in pictorial
space, except about a visual axis. These latter rotations are trivially the
rotations in the picture plane. If we define proper lines as any lines other
than a visual direction, and proper planes as any planes not containing
a visual direction, then any (proper) plane or line can be brought into
correspondence with any other (proper) plane or line through a suitable
congruence.
When you consider a point in the picture plane that does not lie on the
(projection of) the line of intersection of two planes, the point in the pic-
ture plane corresponds to a point on each plane in pictorial space. The dis-
tance between these points is their special distance. When you divide it by
the distance of either point to the line of intersection of the planes, you
obtain an invariant. This invariant relation between intersecting planes
may be taken as the “angle” between the planes (Strubecker 1956). This
angle is aperiodic. There exist proper planes that fail to subtend an angle
in the generic sense because they are parallel. Such planes have a well-
defined depth difference, though, which can be taken as a special angle.
There exists a full metric duality between planes and points, angles and
distances in pictorial space. Thus we speak of “parallel points” when their
distance, but not their special distance, is zero. This makes the space in
many respects simpler, though in many respects richer, than Euclidian
space. Because of the aperiodic and identical nature of the metrics, we
have two distinct types of similarities in pictorial space. One type (µ P0,
h; 0) scales distances and conserves angles; the other type (µ; 0, h P0)
285 Pictorial Space
Figure 12.32
A cyclical rotation in a normal plane (left). Compare the rotated copies of the
scribble: it doesn’t “turn around,” yet the elements of the scribble change orienta-
tion. On the right a “wagon wheel” in the normal plane. The “spokes” are normal
lines and thus invariant. (The “hub” is at infinity.) The parabolas are “circles” in
this geometry; a rotation shifts them inside themselves, like the rim of a wagon
wheel. The “circles” drawn here are concentric, though they may look “shifted” to
your Euclidian eye.
scales angles and conserves distances. The latter type is exactly the group
identified by Hildebrand. A “general” similarity (neither µ ; 0, nor h; 0)
is characterized through a pair of magnifications, one for the lengths and
one for the angles (Yaglom 1979).
Notice that the “depth shears” (the q parameters) take over the place of
rotations in the normal planes (figures 12.32 through 12.34). They supple-
ment the Euclidian rotations in the frontoparallel planes. With such a
shear you may turn any proper plane into a frontoparallel attitude. Yet
the angle between any pair of proper planes is conserved by these shears;
they thus appear indeed as “rigid rotations.” Although you can rotate any
proper plane into any attitude, the rotations don’t let you make a full turn.
Thus you can’t turn the pictorial object that corresponds to the picture of
a person in a ventral view such as to obtain a dorsal view. Full turns can
only be made about a visual ray, that is, in frontoparallel planes. Such
rotations appear as rotations of the picture plane in itself and are thus
trivial with respect to pictorial space.
The differential geometry of curves and surfaces in pictorial space is
much simpler (though certainly not less rich) than that in Euclidian space
(Sachs 1990). We can define curvature in such a way that it is conserved
by arbitrary motions in pictorial space. We can again introduce principle
directions of curvature, curvedness and shape index, ridges, and so forth.
Although such features have rather similar properties as their Euclidian
counterparts, they are not the same as these.
286 Jan J. Koenderink and Andrea J. van Doorn
+2 p
+1
b
0
q
-1
-2
Figure 12.33
A “protractor” (left) in a normal plane. The shaded sector subtends an angle of
2.25 (simply read the scale). For easier reading you may “turn” the protractor to
bring the bottom line of the sector in a horizontal position (figure 12.34, left,
shows you how). On the right we show the construction of a bisectrix (b bisects the
angle subtended by p and q).
-2
-4 -2 0 2 4
Figure 12.34
On the left, we show the effect of “rotating a protractor.” On the right, a family of
congruent curves. These curves can be transformed into each other via rotations.
287 Pictorial Space
Multiple Pictures
The police would never record the scene of a crime via a single photo-
graph. It is well known that multiple photographs, taken from different
vantage points, are necessary to produce a reasonably “complete cover-
age” of a scene. This is what most people have done when they bring home
their holiday snapshots. In the limit you would have many pictures in an
almost continuous sequence, that is to say, a movie sequence. The lower
limit is a mere pair of pictures (Koenderink et al. 1997).
It is easily possible to study the type of information in a pair of pho-
tographs that is not available from a single one. We have devised the
“method of corresponding points” to do exactly that. The paradigm is a
most simple one. We present two pictures next to each other and put a
mark on one of them. An observer is asked to put a mark on the other pic-
ture such as to indicate the point in pictorial space that corresponds to the
first mark (see figures 12.35 and 12.36).
The paradigm is perhaps deceptively simple. Notice that the task requires
that the observer has to be simultaneously aware of two distinct pictorial
spaces, one for each photograph. The task cannot possibly be done on the
basis of mere (two-dimensional) image structure (see figure 12.36), unless
there are small local landmarks that show up in both pictures. In our
examples, such landmarks are virtually absent though. The image struc-
tures in the neighborhoods of corresponding points may indeed be com-
pletely different. This task cannot be done by any algorithm as available
today, because it requires the construction of two three-dimensional inter-
pretations each based on a single image. Successful three-dimensional
interpretion in computer vision is still based on multiple image corre-
spondences, though (Faugeras 1993).
288 Jan J. Koenderink and Andrea J. van Doorn
Figure 12.35
Three pictures depicting the same object in the same light field, photographed
from the same vantage point but rotated about the vertical between exposures (by
45° either way). Can the reader find the corresponding points for the black mark
in the center figure on the figures left and right?
Figure 12.36
Two local environments of corresponding points. Notice that the relation to the
contour, the contour itself, and the shading patterns are quite different. It is
unlikely that the task would be possible at all if you only had such a local neigh-
borhood. Observers no doubt use global information as context.
289 Pictorial Space
Figure 12.37
(Left) Result of a correspondences session. (Center) Profile view of a correspon-
dences result. (Right) Covariance ellipses indicating scatter in correspondences.
500
500 500 500 500
400
400 400 400 400
0 0 0 0 0
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 50 100 150 200 250 150 200 250 300
Figure 12.38
Profiles of pictorial reliefs obtained from correspondence settings for various rotations between the two images. For the sake of ref-
erence: the rotation due to ocular disparity (vergence angle) would only be a few degrees in this case.
291 Pictorial Space
10 pixels
Figure 12.39
Scatter in correspondence settings. Notice that all disparities are actually limited
to the horizontal direction, but the scatter is roughly isotropic. Observers search
for correspondences in the pictorial spaces, both in the horizontal and vertical
directions.
arbitrary pictorial congruence, most likely different for each. Thus the
correspondences must be due to a comparison of local pictorial shape,
where “shape” is defined as invariants under pictorial congruence.
This use of comparison was studied in a setting where we had three pho-
tographs of a scene containing a mannequin (figure 12.40). The difference
between the three photographs was that the mannequin was rotated by
30° about the vertical between exposures. Because the viewing and light-
ing geometries remain unchanged, both the perspective and the chiaroscuro
of the mannequin in the pictures changed. In this study we did both cor-
respondence settings on pairs of pictures and gauge figure settings on the
individual pictures. We can define relief both from the correspondences
and from the gauge figure settings. We find excellent agreement in the sense
that the reliefs are very similar up to a 30° (Euclidian!) rotation about the
vertical if we first apply certain pictorial congruences (non-Euclidian!) to
the individual reliefs (van Doorn, Koenderink, and de Ridder, in 2001).
(See figure 12.41.) These results show that the observers apparently estab-
lish the correspondences on the basis of the pictorial shapes.
Figure 12.40
For the pictures on the left and right of the center the object was rotated by 30°
(either way) about the vertical. Notice that the light field in the scene was the same;
thus shading and geometry are different for these three stimuli.
face appear to “follow you around the room” (Busey, Brady, and Cutting
1990; Sedgewick 1991; Gombrich 1959). This is typical of pictorial relief,
for the effect fails to occur for a marble bust, for example. The same effect
occurs for people and some objects “pointing out of the picture” (see
figures 12.42 and 12.43).
Here we have a very complicated situation in which at least three dis-
tinct spatial entities play a key role, to wit:
1. The physical space contains both the observer and the picture.
2. The picture frame exists (as a physical object) in physical space, as a
visual object in optical space, and as a pictorial object in pictorial space,
all simultaneously.
3. The person depicted in the picture exists in pictorial space.
Although physical, optical, and pictorial space are distinct, (only) the
picture frame exists in all of them. It tends to appear as a frontoparallel
layer in pictorial space, but “mental eye movements” may freely skew it.
Thus the observer has the freedom to skew the frame in pictorial space
such that it “coincides” with the picture frame in optical space, thereby
removing possibly awkward ambiguities. Notice that this has the effect
of effectively connecting pictorial and optical space in the observer’s con-
sciousness. In such a case the frame would establish a kind of “wormhole”
293 Pictorial Space
100 100
50 50
0 0
– 50 – 50
– 100 – 100
– 100 – 50 0 50 100 – 100 – 50 0 50 100
100
50
50
0
0
– 50 – 50
– 100 – 100
– 100 – 50 0 50 100 – 100 – 50 0 50
Figure 12.41
Comparison of pictorial reliefs from attitude samples and correspondences. The
two rows illustrate opposite rotations of 30° about the vertical. In the left column,
we show a straight scatter plot; in the middle, scatter plots corrected for the best
shear. Clearly the shear component is appreciable here. On the right we show the
residuals after taking rotation and shear into account. Apparently the deviations
are systematic, indicating a piecewise transformation.
Figure 12.42
A framed picture hung on a wall.
Figure 12.43
An oblique view of the framed picture hung on a wall (figure 12.42). Notice that
the same features in both pictures appear frontoparallel. This is the same effect as
that of the portrait with “eyes following you.”
Figure 12.44
Pictorial reliefs obtained with frontally (left) and obliquely (right) viewed stimuli.
Due to foreshortening, the right relief is only U2/2 the width of the left one.
296 Jan J. Koenderink and Andrea J. van Doorn
50
– 50
– 100
– 100 – 50 0 50
Figure 12.45
Scatter plot of pictorial depths at corresponding positions for the reliefs shown in
figure 12.44. Although there is a depth contraction (of about 15%), the shear is not
significant.
horizontal levels in the picture. These settings were done in random order.
In figure 12.46 they are seen to line up nicely as ruts and ridges of the
relief. These features are almost identical for frontal and oblique viewing.
The connection of optical and pictorial space is such that (mirabile
dictu) the visual direction may be oblique in physical and optical space
(the wall being seen slanted) and simultaneously in the “straight ahead”
direction in pictorial space (see figure 12.47). Thus the effect of the “eyes
following you around the room” is resolved due to the different angular
metrics in the various spaces (and geometries) involved in the situation. In
retrospect the effect is nothing special.
Figure 12.46
Frontoparallel points on horizontal scan lines for the reliefs shown in figure 12.44.
Notice that the points line up neatly to form “ruts” and “ridges.” Notice also that
the ruts and ridges are virtually identical for these two reliefs.
ν
Pictorial space Pictorial space
θ
Physical space Physical space
α
n n
Figure 12.47
An observer views a portrait hung on a wall seen frontally (left) and at an oblique
angle (right). The vector n is the normal to the picture plane. In the right-hand pic-
ture the angle b equals 45° (oblique viewing), but the angle o is a right angle! This
is because the line m is the viewing direction, and thus normal to the picture plane.
298 Jan J. Koenderink and Andrea J. van Doorn
sets, of the order of 100 to 10,000 (more is better) samples in some geo-
metrical skeleton framework. Methods are needed that address all the var-
ious aspects of geometry.
It is important to relate the results from as many different operational-
izations as possible to each other. This is the only way to get to grips with
the structure of pictorial space in the generic, abstract sense, detached from
any specific method. Any claims to general conclusions on the basis of any
single method (by no means rare in the literature) should be regarded with
the highest suspicion.
We are still a long way from understanding the nature and efficacy
and—especially—the interrelations of the various cues. The conventional
methods, almost invariably based on stimulus or response reduction, are
highly questionable. We need results in realistic settings (many computer-
graphics-generated stimuli are not above suspicion, because it is rare to
see the relevant physics being taken into account). From experience with
computer vision, we know how difficult and vulnerable interpretations on
the basis of monocular cues really are. The existing algorithms are not
at all robust against small variations on the (typically extremely “ideal”
and thus unlikely) prior assumptions. Thus one should be very wary of
confronting the observer with “cues” that are quite unlike (or even a little
different from) those from the observer’s generic biotope.
We are still a long way from a quantitative theoretical understanding.
This is partly due to the extreme scarcity of empirical data. Most of the
classical data is hardly of the type that could successfully be confronted
with quantitative theories of the type that would enable you to predict
the geometrical properties of pictorial space or objects. It is also due
to the fact that the available data are very hard to detach from the—
often implicit—prior assumptions. In our opinion most of the available
knowledge is a hindrance rather than a help in gaining an understanding,
exactly because it is so intricately wrapped up in unfounded prior assump-
tions. What is sorely needed is a fresh approach, not based on the tangle
of unfortunate terminology that is taken for granted by even most profes-
sionals. This is hard, because who doesn’t like to trade essential uncer-
tainty for the certainty of a word sanctified by silent concensus? For
example, we all nod in understanding when someone mentions “depth,”
yet there may be a dozen (or more) meanings of the term that we do not
know how to relate to one another. Many discussions going on in the lit-
erature are effectively void because one uses the same words for different
entities.
299 Pictorial Space
Sheena Rogers
Pictures of all kinds fill our world. They hold records of past events, prom-
ises of the future and now, simulations of an alternative present. Some pic-
tures are still, some move, and some surround us, interacting with us in a
virtual reality. All seem to present some truth about the world. Indeed,
perspective pictures, photographs, and movies seem to be fragments of
reality itself, like insects in amber, caught in some medium merely to ensure
their longevity. Seeing the picture and understanding what is depicted
there seem simple. We do not pause to ponder the picture’s success. Young
children follow a story in a picture book before they can read. Wordless
narratives depict biblical events in hundreds of medieval churches. We are
reassured daily of the truthfulness of pictures by their constant presence in
our lives. The pictures we encounter seem to be as simple to see as the real
world on which they present a window.
Photographic pictures seem to capture reality in an especially immedi-
ate way. The medium is transparent. We see through the picture to the
world beyond it (Walton 1984). It has even been argued that “the photo-
graphic image is the object itself” (Bazin 1967, p. 14, emphasis added). So,
when I look at a photograph I literally see my daughter, or a London bus,
the ocean, a chair. “Viewers of photographs are in perceptual contact with
the world” according to Kendall Walton (1984, emphasis added). On this
view, the photographed world is the world itself and so our perception of
the three-dimensional spaces and objects in both must proceed in the same
way. It is a seductive argument, and many writers on picture perception
have fallen for it (see, for example, Walton 1984).
Yet pictures pose a special challenge to theories of perception. Repre-
sentational pictures are flat, yet they depict a world with solidity and form.
Still pictures present frozen moments, torn from the comfortable sup-
porting arms of the spatial and temporal context of real life and held up
302 Sheena Rogers
for our inspection, isolated and alone. Even moving pictures, where action
and observation unfold over time, fix the viewer in space, attached to the
camera, a couch potato unable to step up and engage in the life of the
depicted world (see Rogers in press for more on the perception of movies).
Discrepancies between the location of the observer’s eye and the loca-
tion of the original camera or center of projection generate changes in
the virtual layout of the picture or movie (Rogers 1995). These differences
between pictures and real scenes are important. The possibility of realism
under these conditions must be explained. If there is meaning in pictures,
it does not come easily.
Unquestionably, much of the credit for the success of pictures in carry-
ing meaning must go to the artists. The nature of their contribution
should be examined carefully, however. The devices and designs that pic-
ture makers employ to show a three-dimensional world in frozen frames
have suggested to some that pictures communicate by means of a lan-
guage, an agreed-upon system of signs (e.g., Kepes 1944). Pictorial signs
may “short-circuit” their route to meaning through their visual similar-
ity to the objects depicted (see Metz 1974), but meaning is nevertheless
obtained through a process of decoding. On this view, the image is never
equivalent to the object itself. Realism is an illusion. Although a picture
may tell about reality, it can never be a surrogate for it.
I have presented two extreme views and hinted at decades of argu-
ment about the fundamental nature of pictorial representation. There are,
of course, many variants of these views and many subtleties I have not
expressed. Instead of pursuing these arguments and weighing their relative
merits, I propose that the debate be shifted to empirical grounds. What
exactly do pictures and natural scenes have in common, and can we obtain
experimental evidence of similarities and differences in their perception?
Drawing on the ecological approach to the psychology of perception, I
have argued elsewhere that the perspective structure of pictures can pre-
sent geometrical structures to the eye of the observer that, under certain
constraints, are identical to those available in nature (Rogers 2000). I have
selected one of these structures, the horizon ratio, for empirical investiga-
tion, and will describe it here as a test case in our search for truth and
meaning in pictorial space. The results of the experiments I will describe
here show that although the same geometric structure is available, it is not
always perceived in the same way. Under certain conditions, observers
responded to a pictorial scene exactly as they did to the real scene, and
the picture acted as a reasonable surrogate for reality. The availability of
303 Truth and Meaning in Pictorial Space
Figure 13.1
The horizon ratio specifies the size of objects standing on the ground. Objects A
and B each have a horizon ratio of two and are twice eye-height. Object C has a
horizon ratio of four and is four times eye-height. Object C is twice as tall as
objects A and B.
objects standing on the ground around you. The horizon ratio relates your
eye-height to the height of these objects. It specifies precisely the sizes of
the objects relative to your eye-height. In figure 13.1, for example, objects
A and B are each twice eye-height, and object C is four times eye-height.
The horizon ratio is given as the relation between two visual angles,
though it can also be described as a relation of two extents on the pic-
ture plane (Sedgwick 1973, 1980, 1986). In figure 13.2, the angle O sub-
tended by the whole object o from base to top is divided by the angle H
subtended by the distance from the base of the object to the horizon h. The
general version of the ratio is (tanY+ tan H)/tan H, but the angles O and
H themselves approximate the tangents of the angles when they are small,
so o/h / O/H (Sedgwick 1973, 1986).
Objects A and B in figure 13.1 each have a horizon ratio of two
(O/H ; 2). If eye-height is known in some metric, such as feet and inches,
then the absolute sizes of objects in the scene can be labeled in that met-
ric. In addition to information about absolute size based on eye-height,
the sizes of objects in the scene relative to each other are also given by the
305 Truth and Meaning in Pictorial Space
Figure 13.2
The horizon relations are given in terms of visual angle. The horizon ratio is
(tan Y + tan H)/tan H. The horizon distance relation is d ; h ctn H.
horizon ratio. Object C, for example, has a horizon ratio of four and so
it is twice the height of objects A and B. (In the case of both absolute and
relative size, the constraint that objects are standing on the ground plane
must be met. The sizes of objects floating in space, like the classical stim-
uli of early psychology experiments, are not specified by horizon-based
information, e.g., Holway and Boring 1941.)
For our purposes, the horizon ratio is especially interesting. The struc-
ture is present both in real scenes and in all perspective pictures: paintings,
photographs, television, movies, and virtual reality representations. The
optical horizon is always implied in the perspective structure of the pic-
ture, and it may even be visible as a physical line or edge. Indeed, in the
construction of a perspective picture, the horizon may be the first line laid
down on the paper, canvas, or wall. A short stretch of it is often seen in
the background of Renaissance paintings. The horizon defines the limit to
the ground, and in so doing, creates the skeletal structure of a 3-D world.
Although, to my knowledge, horizon ratios have never been used to lay
out the objects in a perspective picture, they could be. Perspective manu-
als usually recommend the construction of vanishing points for parallel
lines lying on the ground plane as the next step (e.g., Ware 1900). Horizon
ratios fall naturally out of perspective structure, however. This is most
easily seen in a picture of a row of objects of the same size receding into
the distance. A perspective line connecting the tops of the objects will meet
a perspective line connecting their bases at a vanishing point on the hori-
zon. The objects, being of the same real size, will be intersected in the same
proportion by the horizon (see figure 13.3).
306 Sheena Rogers
Figure 13.3
The horizon ratio is a component of perspective structure.
question of truth and meaning in pictorial space and to the possibility that
a picture might act as a surrogate for reality.
horizon (or eye-height) relations, and that the information is equally effec-
tive in pictures and in real scenes. The perception of relative size appears
to be achieved in the same way in pictures and real scenes.
Another visual angle relation based on the horizon specifies distance along
the ground to the base of the object by relating the angle of gaze elevation
to eye-height. In figure 13.2, this distance, d, is equal to h ctn H, where h
is the height of the point of observation and H is the visual angle sub-
tended by the extent from the base of the object to the horizon (Sedgwick
1973, 1986). (H is the angle through which one’s gaze would travel if one
were to look at the base of the object and then raise one’s eyes to the hori-
zon.) The angle H becomes smaller the farther the object is from the
observer and the closer it is to the horizon (and regardless of the size of
the object). Observers could estimate the relative distances of two objects
by comparing the angle H for each object. (The angles themselves approx-
imate the cotangent of the angles when the distances are large relative to
eye-height, as they were in my displays.)
We asked observers to make estimates of the relative distance of many
pairs of poles using the magnitude estimation procedure described. In
desktop virtual reality (Rogers and Watson 1996a), in photographs, and
in real outdoor scenes (Rogers and Pollard 1998) estimates of relative dis-
tance were quite accurate. Estimates were between 95% and 100% of the
relative distance predicted by the horizon distance relation.
I conclude that in addition to information for relative size, the optic
array from both pictures and real scenes contains information for rela-
tive distance, this information is also based on horizon (or eye-height)
relations, and the information is equally effective in pictures and in real
scenes. As we found for the perception of relative size, the perception of
relative distance appears to be achieved in the same way in pictures as it
is in real scenes.
Figure 13.4
When we relate our own standing eye-height to the horizon ratio in a photograph
with an unusual camera location, distortions of perceived size occur.
no consequent optical flow within the array from the picture. Instead, it
becomes immediately apparent that the pictorial horizon is not coincid-
ent with one’s own optical horizon. When we do not know the size of the
eye-height unit in the picture, the scale of the space is unknown. Artists
can take advantage of this ambiguity of course. René Magritte, in The
Listening Room (1952), fills a room with an apple. The horizon is clearly
halfway up the apple and halfway up the room, so both have a horizon
ratio of two. We don’t know if we see a large apple in a normal room
(from a standing eye-height) or if we see a normal apple in a dollhouse
room (while laying on the ground). The pleasure of the picture is in the
perceptual uncertainty (see figure 13.5 and plate 4).
The foregoing options are not exclusive. Any one may apply in some
particular circumstance, and it may be possible to create particular per-
ceptual effects by forcing perceivers to adopt one option over another.
Fortunately, the question of how perceivers understand the scale of picto-
rial space is open to empirical testing. We can assess the similarity between
perception of absolute size and distance in real scenes and in photo-
graphs, and we can explore the perceptual options described through care-
311 Truth and Meaning in Pictorial Space
Figure 13.5
The pleasure of Rene Magritte’s The listening room/La chambre d’écoute, 1952, is
in the uncertainty of its reality through the ambiguity of perceived size. Oil on can-
vas, 45 cm _54.7 cm, private collection, © A.D.A.P.G. Paris. © 2002 C. Hersovici,
Brussels / Artists Rights Society (ARS), New York. See plate 4 for color version.
Observers in one of the open field and pictorial space studies described
earlier were also asked to estimate the absolute sizes and distances, in feet,
of the poles in the field (Rogers and Pollard 1998). In the real outdoor
scenes, absolute size estimates of 4 to 12 ft. poles, positioned at distances
from 100 to 300 ft., were extremely accurate (close to 100% of the actual
sizes). Observers viewed photographs of the scene on a large-screen com-
puter display with the horizon at the observer’s standing eye-height, and
all visual angles matched to the real scene. Under these conditions esti-
mates of absolute size were near perfect at distances below 200 ft., and
slightly overestimated at distances beyond this.
Information for absolute size seems to have been available both in pic-
tures and in real scenes for the observers in our study. The experience of
312 Sheena Rogers
absolute size in pictures is always perceived this way, as we will see after a
summary of our findings for perceived absolute distance.
so picture perception is even less truthful than perception of the real scene.
This is an important difference between pictures and life, one that should
be expected in all ordinary picture-viewing situations.
In the analysis presented here, size and distance are specified independ-
ently by related but distinct horizon relations. Traditional accounts of
size perception, on the other hand, have always held that perceived size
is dependent on perceived distance (the so-called size-distance invariance
hypothesis, e.g., Ittleson 1960). Size, it is claimed, cannot be perceived
directly but is instead obtained indirectly by scaling the single visual angle
subtended by the object with its perceived distance. Thus the causal chain
of perception builds one percept upon another (called percept-percept
coupling; see Epstein 1982). The results of the experiments I have described
indicate that size and distance are, indeed, perceived independently, in line
with the hypothesis that perception of these properties is based on horizon
relations. This finding supports the idea that perception of size is achieved
directly (see Epstein and Rogers, in press, for discussion of other appar-
ently coupled percepts). We found, for example, that despite tremendous
compression in perceived pictorial distance, perceived size remained sta-
ble and was not affected by this distortion. In addition, in the analysis of
individual observers’ data, the accuracy of their size estimates was only
weakly related, if at all, to the accuracy of their distance estimates.
The evidence presented so far strongly suggests that perception of size and
distance is achieved through detection of horizon relations (with the pos-
sible exception of absolute distance). These relations are available in pic-
tures, and the experiments demonstrate that the perception of pictorial
space is governed by the information they provide. Pictures do seem able
to reveal the truth about the spaces they depict (again, with the exception
of absolute distance). We saw, however, that in at least one important
respect, pictures are not reality: the pictorial horizon is not yoked to the
observer’s optical horizon, and therein resides the possibility for pictures
to lie. The height of the point of observation for the picture is not known,
and observers must use one of the solutions described earlier to make
315 Truth and Meaning in Pictorial Space
sense of the pictured space. We have seen some tendency among observers
to use their own eye-height to understand the scale of a pictured scene. If
the location of the original station point (or camera location) for the pic-
tured scene was similar to the picture viewer’s current vantage point, no
harm will be done and the picture will be a truthful depiction of the scene.
But what happens when this constraint is not met? Do we continue to see
ourselves “in” the pictured scene—to see the pictorial space as coextensive
with our own? Can we resist the pull to scale the pictorial world in eye-
height units? Are there some conditions under which seeing the picture is
like seeing the world, and some where it is not?
The pictures used in my experiments so far have all been presented on
a large-screen computer display, with the observers at the station point,
their eye level and optical horizon coincident with the pictorial horizon.
These conditions may have enhanced the observers’ sense of immersion in
the scene through information that the pictured ground plane was co-
extensive with the real one. Observing the photographs this way was like
looking through a window to another place but one still very much a part
of the world we are in, whose ground continued outside the picture to pass
under our own feet. But what of a picture made on paper and hanging on
the wall? And what of a picture made from an unusual spectator point? Is
the perception of these pictures like the perception of the world?
In another experiment, a student and I (Rogers and Fichandler 2000)
manipulated this connection between the observer and the pictorial world.
We compared perception of size and distance in a real scene, with a paper
photograph and with a trompe l’oeil presentation of the photograph in
a peephole viewing box. In addition, we introduced two manipulations
designed to reveal the extent to which observers see themselves to be con-
nected to the pictorial space and so perceive the pictorial space as if it were
reality.
We set up a field of poles as before, on a field next to the ocean with an
unobstructed horizon line. We photographed this scene from four differ-
ent camera elevations: the photographer lay on the ground, sat on a box,
stood on the ground, and stood elevated on the box. We prepared two sets
of these four views as 8˝_10˝ photographs. One set we mounted on the
wall of the laboratory, the other we mounted in a viewing box with a peep-
hole positioned so that the visual angles of the picture matched those of
the real scene. The box was then hung on a structure erected outside
in the actual ocean-side field. The peephole could be positioned at each
of the original observation points for the photographs. Thus, when the
316 Sheena Rogers
pictures were mounted at the correct elevation, they fit smoothly into the
ambient optic array from the real scene. The observer was anchored at the
station point, and the viewing box minimized binocular and motion-based
information for the picture’s flat surface, creating a trompe l’oeil effect.
Under these conditions, the picture came as close to a reality surrogate as
it could. Removing the photograph and flipping open the back wall of the
viewing box gave us the real scene comparison condition, which was
viewed with one eye through the peephole.
The four camera elevations generated four different sets of horizon
ratios for the same scene. If observers use a default value of eye-height,
perhaps their own standing eye-height, as suggested earlier in option 2, the
different horizon ratios in the picture should lead to differences in the per-
ceived sizes of the poles. Objects photographed from low viewpoints will
have larger horizon ratios and should appear larger than objects pho-
tographed from higher viewpoints.
We examined the possibility that as perceivers we relate our current
eye-height, which varies as we move about the world, to the horizon rela-
tions (option 1), rather than a fixed, default value (option 2). Observers
viewed the pictures from each of the four positions, lying down, sitting,
standing, and standing on a box. If they were to ignore their current
eye-height and use the default value, their viewing position should have
no effect on perceived absolute size. If, however, the observer’s current
eye-height is the unit by which horizon relations are understood, then
for a given horizon ratio, changes in eye-height elevation should lead to
changes in perceived absolute size. Lower observer viewpoints should lead
to the perception of shorter objects, and higher viewpoints should lead to
the perception of taller objects.
Observers viewed the four pictures from the one congruent viewing
position for each picture (e.g., sitting photographer with sitting observer)
and from each of the three incongruent ones (in this example, sitting
photographer with lying, standing, and raised observer). If picture view-
ers relate their own current eye-height to the horizon ratio (option 1),
absolute size should be accurately perceived in all congruent conditions,
but distortions of perceived size should occur in the incongruent condi-
tions. For example, the raised observer viewing a picture photographed
from a sitting position should overestimate the size of the object.
The most compelling finding of this study was a difference between the
perception of the trompe l’oeil presentation and the paper photographs.
Observers did not relate their current eye-height to the absolute size of the
317 Truth and Meaning in Pictorial Space
poles in the paper photographs. Estimates of absolute size were very close
to the actual pole sizes and did not vary with viewing position, suggesting
that a default eye-height value was used (option 2). In the trompe l’oeil
condition, however, estimates of absolute size did vary with viewing posi-
tion. When in the lower viewing positions, observers perceived the poles
as slightly shorter than they did when they were at higher viewing posi-
tions. Under- and overestimates of pole size were not as large as would be
predicted by entering the actual value of eye-height into the horizon ratio,
but the difference in the pattern of responding for these two conditions
was clear, and it was statistically significant. I take this difference between
perception of the paper photographs and the trompe l’oeil presentations
to indicate that the observer’s current eye-height can influence perceived
size in pictures (option 1) but only when the observers are forced to treat
the picture as if it were part of their reality. In effect, the trompe l’oeil con-
dition prevents observers from noticing that the pictorial horizon is not in
fact their own. In other words, picture viewers do not necessarily perceive
pictorial space to be coextensive with the space they occupy in the real
world, although under special conditions they can. If they do not, then
there is no reason to scale the pictorial space in units determined by their
actions in the real world.
That is not to say that they do not necessarily bring the same psycho-
logical process to bear on picture perception that they use in everyday
perception of the world. Instead of using the changing eye-height value
generated during movement, picture viewers could still use a default value
of their own standing eye-height (option 2) to extract meaning from hori-
zon relations. Indeed, there is evidence that this is what picture viewers
were doing in our study. Raising the camera made objects appear smaller,
lowering it made objects appear larger, as option 2 predicts. Although cur-
rent actions and body position were not relevant to picture perception,
observers did in some sense see themselves in the pictorial space and see
the pictorial space as part of their world. Once more the observers’ own
body size mattered. Our taller participants saw the poles as taller than
they were, and our shorter participants saw them as shorter than they
were. These two effects occurred in both the paper photographs and in the
trompe l’oeil presentations, implying that the strategy of scaling horizon
ratios with our normal standing eye-height is a general one and not one
specific to picture perception.
I conclude from this last study that picture perception is very like the
perception of real scenes, though there are differences. Horizon relations
318 Sheena Rogers
are the basis of perceived absolute size, relative size, and relative distance
in real scenes and in pictures. (Absolute distance is inaccurately perceived
in real scenes and worse in pictures. It is possible that this scene property
is not governed by available horizon-based information.) Horizon rela-
tions can be exploited extrinsically, that is, without reference to our own
bodies, and this seems to be what we do when we perceive relative size and
distance in both real scenes and pictures. Horizon relations can also be
related, intrinsically, to our bodies. We can use our own eye-height as
the unit for understanding absolute size when it is visible in the optic
array. The evidence of this last study is that at least in picture perception,
eye-height is understood as our usual standing eye-height. We are not
tempted to use a changing eye-height value that results from interacting
with the world (kneeling, climbing, etc.) with ordinary pictures. I would
argue that this is not a “decision” to adopt a picture-specific perceptual
strategy but a behavior guided by optical information. In ordinary picture
viewing there is optical evidence that the pictorial horizon is not linked to
our eye-height. Horizon relations do not change as we approach the pic-
ture and adopt one of the four viewing positions. When they do, as they
would in a real scene or in virtual reality, they reveal our place in the
world, and our momentary eye-height may well become the unit that
scales the dimensions of the pictorial space. When there is no information
that the pictorial horizon is not linked to our bodies (in the trompe l’oeil
condition), we proceed as if it were and relate to the picture as if it were a
real scene. In this case, perceptual judgments are influenced by the posi-
tion of our body and our momentary eye-height, even in pictures.
try to convince us that we do. If we are allowed to move our heads, the
optical flow patterns reveal the true nature of the picture as a marked sur-
face. The illusory world will deform and rotate as we move (see Rogers
1995). We cannot always trust the picture to tell us the truth about dis-
tance, or about the sizes of objects depicted.
How, then, are pictures so successful? Why are André Bazin, Kendall
Walton, and others convinced that pictures put them in contact with real-
ity? I claim that it is the very success of most pictures that blinds us to their
origins. Hand your camera to a three-year-old and show her the shutter
release. The randomly frozen optical moments that result in the images
will convince you that meaning is not so easily captured. The absence of
spatial and temporal context, and the absence of artistic control, can strip
meaning from images. To present the illusion of a window on reality, pic-
tures must first capture available informative structures and then support
the meaning in those structures through the satisfaction of the constraints
that underwrite them. Specifically, the horizon relations provide meaning
only when the objects in the scene are on the ground. If the scene is in a
picture, additional constraints apply. For the original horizon relations to
be preserved, the observer must view the picture from within a horizontal
plane extending out of the picture from the horizon. If these horizon rela-
tions are to tell the truth about the dimensions of the scene, the camera or
center of projection must be at a reasonable standing eye-height. Too high
or too low and the picture will lie. Further, if there is any risk that the
picture viewers will not realize they are looking at a picture, they must
observe it from a location that matches their eye-height to the height of
the camera. Too high or too low and the picture will lie.
The potential for distortions and ambiguities in pictorial space have
given fright to many who have pursued a geometrical account of pic-
ture perception like this one (for expanded treatment of this problem, see
Kubovy 1986; Rogers 1995; and Rogers 1997). This potential is handled
here, not by the hypothesis of a shared set of signs that code for distance,
size, and 3-D shape in a picture, or by a cognitive juggling act on the part
of the beholder that fixes any problems, but by the recognition of the
important role of the artist in making pictures meaningful. The careful
deployment of conventional practices (not signs) on the part of the pic-
ture maker can ensure that distortion and ambiguity are minimized, that
the loss of temporal context is compensated for, and that the constraints
under which certain geometrical structures provide meaning are satisfied.
320 Sheena Rogers
Figure 14.1
Piazza with columns.
Figure 14.2
Cube bars from discontinuous lines.
Figure 14.3
Dotted lines can outline a cube, and a cube’s bars can be shown by solid lines,
dotted strips, and hatched lines.
324 John M. Kennedy, Igor Juricevic, and Juan Bai
and vertical arms come together, the frontal bar can seem to be flanked by
a pair of contours, running across empty space. The pair is horizontal
when the horizontal bar is in front and vertical when the bars reverse in
perception.
Figure 14.3 contains a dotted-line picture of a cube. It also reverses in
depth. The dotted lines function as outlines, suggesting continuous bars,
corners, and obliques oriented in depth. The dots belong together in lines
even though no continuous contour joins them on the page, or in our per-
cept (Arnheim 1974). The dot at the center of the “+” where four lines
come together can be seen as belonging to the foreground bar.
Figure 14.3 also includes a picture of a cube with dotted strips forming
lines, a black line picture of a cube, and a hatched-line picture. Despite
their varied constituents, the cubes appear to have the same proportions.
The obliques recede in depth and are all foreshortened equally.
The dotted strips forming “+” shapes can be grouped as horizontal
bars in front of vertical bars, or the reverse. The central dots of each “+”
belong with the bar in front, in these percepts. They belong to the frontal
bar even with no continuous flanking contour in the percept, much as lines
of dots can group without needing continuous contours.
The black-line cube can seem to have contours crossing the empty space
at the centre of the “+” intersections, much as in figure 14.2’s white cube.
In the hatched figure, some hatches are omitted to distinguish the front
and back bars. The hatches of the front bars are grouped together as a
long continuous row. The line of hatch marks has a discontinuous border
on the page, but the bars being depicted, we have the impression, have
continuous edges.
Figure 14.4
Copy of a raised-line drawing by Gaia. (From Kennedy, in press, with permission
from Prion Press)
a continuous edge, much as sighted people see small sections of line such
as hatch marks and group them as if they had continuous borders.
Gaia’s drawing uses lines for borders of tangible surfaces. The surfaces
of her house are flat, and so their borders are the occluding edges of flat
surfaces. Her hills have brows, that is, occluding boundaries of rounded
surfaces. All surfaces are either flat or curved, so Gaia has deployed lines
to show the borders of the only kinds of surfaces there are.
The skillful use of line in Gaia’s picture invites us to explore the pos-
sible use of outline in pictures. Lines have affinities for tangible borders, in
touch as well as vision, Gaia’s picture suggests (Kennedy 1993). What
kinds of borders can we perceive, besides occluding surfaces, and which
suit outline, and why?
Perceptual borders arise from features in the environment, like Gaia’s
surface edges. The inverse projection problem is to decide which feature.
In the light coming to our eye there may be a “contrast” border between
dark and light. But what was its origin? A flat surface’s occluding edge,
such as the roofline of Gaia’s house? Or a rounded surface’s occluding
boundary like the brow of a hill? Or what?
Perception takes in a limited sample of the light projected to it, and,
in inverse projection, gives us awareness of its possible cause. In this
respect, perception is like the person who notes a few blackbirds in a field
326 John M. Kennedy, Igor Juricevic, and Juan Bai
14.2 INDUCTION
Alas, poor observer, finite data fit an infinite number of curves. Any finite
perceptible pattern can be projected by an infinite number of origins. So
what are we to do?
One drastic error in the study of perception was to argue vision only has
brightness and color as its core because we only have luminance and spec-
tral values to work with. In fact there is nothing to stop us having impres-
sions of any kind whatsoever from stimulation X. We see purple, which is
not in the spectrum, when certain combinations of spectral values come to
us. We see continuity where there is only a group of separate marks. We
get spheres wrong in inverse projection. So it may not be entirely surpris-
ing that borders have affinities with some features of an environment but
not with all.
One cannot predict a priori which elements allow which percepts. The
problem of induction at borders has to be given an empirical solution, for
327 Line and Borders of Surfaces
Figure 14.5
Kinds of borders and pictures using them.
The light that comes to our eyes has luminance borders, between high
intensity and low intensity. Likewise, between what we informally call
two “colors,” it has what we may call a spectral border. (The luminance
borders are generally more effective in perception.) Figure 14.5 classifies
borders into luminance and spectral, moving and static, monocular and
binocular. It gives examples of pictures that fit these types.
Luminance and spectral borders can be arranged to form dots, lines,
and textures. The elements (dots or lines or other units) in a texture are
arranged with some regularity over a region. Vision groups the elements
in a region as a unit (a gradient like that in figure 14.1 or a simple form
or gestalt-like lines or squares in figure 14.2), despite the units’ being spa-
tially separate. Trees on a snow-covered hillside define its continuous
slope in vision even if each tree is quite separate. Dots, lines, and contin-
uous borders are evident in monocular vision, with our eye in one vantage
point and a static incoming array.
Motion in an array can also provide a monocular border, much like the
border between two regions. If we were standing looking at two distant
Celtic hills covered by yellow gorse bushes and purple heather, and the
hills partially overlapped each other from our vantage point, they might
look like a single slope. But if we move to one side, we will reveal new
bushes on the more distant hill. The occluding brow of the foreground hill
would become evident. The visible brow would be defined by new heather
329 Line and Borders of Surfaces
bars, when we look at a picture like figure 14.2, whereas motion and stereo
vision do not (see also Sedgwick, chapter 3 in this volume): they record the
2-D flatness of the page. The bars look complete, but we also see gaps
between marks. Because each kind of border in vision can be independent
of the others, vision accepts two kinds of contradictions readily—3-D ver-
sus 2-D and continuous versus full of gaps.
If borders carried distinctive hallmarks, as shadows do, then perception
would find them unambiguous. But in fact many apparent borders use the
same inputs. Does a luminance border always coincide with the figure-
ground edge of a foreground object? The Alpine cowhide tells us no
(Kennedy 1974, 1993; Peterson 1999), a lesson we should now explore.
Figure 14.6
The outline depicts occlusions, corners, cracks, and wires.
331 Line and Borders of Surfaces
Information or Affinity
What normally removes ambiguity at a border? One factor is what lies
between borders. The region between borders is textured. Indeed, the sur-
faces we find in our environs are generally textured. That is, color, lumi-
nance, and texture boundaries usually coincide. Recognizing the presence
of textured surfaces may help us understand the function of perceiving
continuity where the elements are separate. Surfaces have continuous
reflectance and continuous borders but separate texture elements.
Helpfully, the separate texture units on a continuous surface are usu-
ally uniform. On a breezy day, lake surfaces have waves (texture units)
very different from the rocky or grassy margins of the lake. Little wonder
vision has the useful ability to see a continuous region with a continuous
border when the input is separate texture units.
The texture gradients (figure 14.1) in a region help us see a single
surface stretching away from our vantage point. Perspective governs the
change in angle subtended by each unit as its distance increases (Sedgwick
332 John M. Kennedy, Igor Juricevic, and Juan Bai
Figure 14.7
The heights and depths of ghostly columns are shown by the scene’s meridians.
1986; Kubovy 1986). Vision detects how the variations fit with perspec-
tive. Also, it can readily detect the number of texture units between us and
the foot of an object until foreshortening reaches limits set by our acuity.
The result is good perception of depth up to the distance where foreshort-
ening crushes texture units to the degree where they are indistinct (twenty
or so units in a flat piazza such as that in figure 14.1).
Figure 14.7 is a version of figure 14.1 with two ghostly columns. The
grid of converging lines tells us the upper column is larger than the lower
one. Both columns stand on one of the converging lines, but the tops are
aligned with quite different lines in the grid. The upper column projects
three converging lines above the line grazed by the top of the lower col-
umn. Where the line grazed by the lower column meets the upper column
indicates the relative heights of the two. The converging meridians of
the sides of the tiles in the scene indicate height as well as the parallel
meridians of the fronts of the tiles indicate depth. Much of inverse projec-
tion involves applying these informative meridian rules to texture units in
visual scenes.
However, outline omits texture, and the uniform texture between the
lines tells us the picture is simply flat. Is perspective irrelevant to outline
as a result? It does not offer large expanses of darkness, unlike shadows.
Is shading irrelevant, as a result? Shortly we will show just how aspects of
perspective and shadows do and do not matter to drawing in outline. Here
we wish to note simply this: an outline drawing omits perspective gradi-
ents of texture, and shading, and offers static flat surfaces. Its binocular
inspection and accretion-deletion simply indicate it is flat. This suggests,
intriguingly, outline operates despite countervailing information. Outline’s
ability to show a surface edge may therefore be independent of informa-
333 Line and Borders of Surfaces
tion for depth, slant, and occlusion. Rather than information of this kind,
inverse projection of an outline may rely on an affinity between lines and
edges of surfaces. The affinity could be due to common features of lines
and surface edges. The general principle here is that perception sidelines
information and may home in on something in common between what we
use to see our normal surroundings and what we use when we depict the
same scene. If so, theory of pictures must reveal the “common features”
and what factors in perception govern their use.
For clarity’s sake, let us note here that people use two forms of repre-
sentation. One form involves fiat or conventions and the other uses com-
mon features.
Words represent by convention mostly. A hedgehog is smaller than a
pig, even though the word “hedgehog” has more letters than “pig.” Words
sometimes use onomatopeia. The word rivulet nicely stands for a little
stream. A babbling brook is onomatopeic for its referent. When words
stray like crooners into sounding like their subject, they offer a glimpse of
the world of representation by common features.
We have probably only the barest understanding of representation by
common features (Cazeaux 2002). John Searle (1993) once wrote that when
we say “Jill is icy” and mean she is unemotional, the connection between
emotion and being chilled is hard to explain. Being unemotional just is,
somehow, being cold. We detect something in common between the two.
But what? Those of us who have a gift for synesthesia will sympathize
with Searle’s puzzlement over common features we cannot make explicit,
ties that cannot speak their name. Astonishingly, George Lakoff and
Mark Johnson (1999) have argued that much of our thought resides
in implicit connections like synesthesia. The connections are said to be
metaphoric, like John Searle’s metaphor for cool Jill, or Shakespeare’s
warmer metaphor “Juliet is the sun.”
Juliet is as pleasant, comforting, attractive, and essential as a summer’s
day. We do not have to go far to find common features to use in a literal
sentence akin to Romeo’s, though our list of features may not have his
flair. Also, most if not all of Lakoff and Johnson’s metaphors can be
shown to have a literal base. But the search for relevant common features
in matters of representation is by no means easy.
The fact that there are common features for two objects does not
mean that one readily stands for another. Think of two objects. They have
in common that you have just thought of them. But the fact that you
thought of them does not mean one stands for the other. Look at two
objects. The fact that you looked at them together does not make one
334 John M. Kennedy, Igor Juricevic, and Juan Bai
Figure 14.8
Two C-shapes facing to the right, with the illusion that the right-hand C looks
bigger than the left-hand one. If each of the two is flipped in place, without
exchanging places, the left-hand C will look bigger. Subjects then report they are
the same size.
stand for another. Consider two objects that are nearly identical: two cars
off a Rover plant in England. They may be almost indistinguishable, but
that does not make one stand for another, Nelson Goodman (1968) noted.
Context shifts the relevance of features. All objects have many common
features. We have to find some common features and then select the one
pertinent at the time. “Man is a wolf” can shift “hunter” to “killer,” for
example. At a dance, we might say, “he is a wolf,” meaning predatory, and
we might be gossiping about some glamorous and assertive patroller of
the dance floor. In war, we might say he is a “wolf” and mean a vicious
killer.
The problem here, at heart, is the induction problem once again. All
objects have an infinite set of properties in common. Unaided, picking the
proper one has a probability of zero. Hence there have to be guides or
restrictions to our choice.
When we consider pictorial representation, we must ask about relevant
matching features. We can ask what the possible set could be and what
factors influence them. This should be a limited set. Out of the limited set,
only some should provide the relevant match.
Pictorial representation has limits that are not open to change. The ref-
erent of a word is arbitrary, but a picture must look like its referent to
some extent (Wollheim 1973).
It may be helpful to distinguish here the perceptual looks of things from
information gained by watching things alter. Two equal C-shapes placed
side by side, like “( (”, both facing the same way, look unequal (figure
14.8).
335 Line and Borders of Surfaces
Figure 14.9
Illusions add effects: Müller-Lyer (top), Ponzo (middle), and combined figure
(bottom).
But if each of the two shapes is turned in place, to face the opposite way,
like ) ), the apparent inequality will reverse. Observers exclaim, “Oh, they
are equal!” The appearance remains unequal, but the information gained
from the alteration is “they are equal.” Pictures are about providing
appearances, not about the information gained from perceptible transfor-
mations. In a phrase inspired by Heinz Wimmer and Josef Perner (1983),
pictures are about what allows false belief: viewing with restrictions. They
are about what arises when static conditions are provided and what then
induces appearance and illusion.
Helpfully, figure 14.9 shows a distinctive feature of inducers of appear-
ance. The top panel is the Müller-Lyer illusion. The upper horizontal line
looks longer than its twin, just below it. In the middle panel, the Ponzo
illusion also makes the upper line look longer than its twin, the lower line.
In the bottom panel, the two illusions are combined. Remarkably, the
effect is additive (Kennedy and Miyoshi 1995).
Appearances are additive. In this respect, appearances are quite differ-
ent from information. Information does not add; it confirms. Each of the
meridians in figure 14.7 confirms what the others tell us about the size and
336 John M. Kennedy, Igor Juricevic, and Juan Bai
distance of the ghostly columns. If five bits of information tell us one line
is twice another, the five tell us “twice” five times. They do not add to tell
us “ten times bigger.” (We should note that if the Ponzo causes a 5% illu-
sion and the Müller-Lyer 10%, their combined effect may be 14%, because
of common factors. If the first illusion is due to factors a, b, and c, and the
second to factors c, d, and e, their combination offers a, b, c, d, and e, with
c as a common factor only appearing once, not twice. The process is addi-
tive, but the sum is an imperfect addition.)
Pictures govern appearance, straying at times into illusion and additive
effects. In particular they offer lines giving the appearance of edges, influ-
enced by perceptual factors given by the patterns of the lines.
Pictures are unlike words, because words, despite onomatopeia, are not
essentially about looks. Also, phonemes, the components of words, do not
mean things, but the looks of pictorial borders may rely on common fea-
tures with other borders. So let us ask about the features in pictures such
as Gaia’s line drawing.
One simple theory is that for each border in the depicted world there is
one in the picture, and no more. This is hardly true for outline, a moment’s
thought reveals. An abrupt border on a surface we can call a contour, an
abrupt border in the light to the eye we can call a contrast, and an abrupt
change in depth at the border of a flat surface we can call an edge. Inter-
estingly, pictures such as Gaia’s have lines formed of two contours close
together, forming a line in an outline picture. Although the outline in the
picture has two contours projecting two contrasts, it generally is taken to
stand for one edge—a single edge of a single surface (Kennedy 1997). That
is, the referent of two contours is one edge of a surface. The affinity of line
for edges is independent of number of borders.
This theory explains too little (as Goodman’s Project Zero colleague
David Perkins pointed out at the time). Certainly common shape proper-
ties are important. But there is more. Edges of flat-surfaced objects such
as cubes are indeed depicted by straight lines. But the boundaries of cylin-
ders and water glasses are too, and these are curved. We see the straight
lines stand for boundaries of surfaces curving away from us. The affinity
of lines for edges does not depend on matching straight for straight and
curved for curved (Willats 1997). If a close match of shape were vital,
young children could not draw in outline.
Figure 14.10
Cups with partial deletion of outline. Omitting sections between vertices (left)
creates the same problems for perception of the cups as omitting sections with ver-
tices (right).
some parts of the figure with vertices removed, such as handles, are easier
to perceive. Further, five evenly distributed dots on each line in a picture
of a box suggest the overall figure more readily than dots concentrated
midway between vertices, and least effective is one with dots concentrated
at vertices (figure 14.11). If the figure has curved lines, like a bird does, one
with dashes evenly distributed suggests the overall form much more effec-
tively than one with dashes concentrated at changes of direction (figure
14.12, from my Toronto student Ramona Domander). A figure with many
indentations, such as a hand, also tells the story. The hand with evenly dis-
tributed dots is easier to see than one with dots concentrated at points of
change (figure 14.13, from my Salzburg student Christoph Obermair).
Figure 14.13 includes two hands made by joining the dots with straight
lines. Instructively, the one made from the concentrated-dots figure looks
more like a hand. Evidently, when we look at the even-dot figure we do
339 Line and Borders of Surfaces
Figure 14.11
Boxes and dotted lines. The box shown by lines with evenly spaced dots (bottom
right) is easier to see than the box with dots concentrated at midlines (top right),
and the poorest figure is the one with dots concentrated at vertices (center left).
not just join the dots with straight lines. We fit curves. Joining the
concentrated-dot figure with straight lines succeeds because then its dense
dot regions approximate curves.
Even distribution is hardly the sole factor grouping elements in lines.
Density matters. Like the bird pictures, the hands use the same number
of elements, distributing them over two perimeters with the same length.
Overall density is constant, and even distribution is shown to be signifi-
cant. However, we group figure 14.14 (from my Toronto student Rizvana
Chishty) as five straight columns of diamonds with irregular spacing. We
340 John M. Kennedy, Igor Juricevic, and Juan Bai
Figure 14.12
The bird figure with lines evenly spaced is easier to see than the bird figure with
lines concentrated where contour changes direction.
cannot readily see figure 14.14’s seven rows of perfectly evenly (horizon-
tally) spaced diamonds, orthogonal to the five irregular columns. Vision
favors the five columns with irregular but denser spacing.
Figure 14.13
The hand with dots evenly spaced is easier to see than the hand with dots concen-
trated where contour changes direction. But if the dots are connected with straight
lines the one based on concentrated dots is more like a hand.
Figure 14.14
Columns of irregularly spaced diamonds are seen, not regularly spaced rows with
less density.
In another study, Nicholls and I changed the frontal square into a tra-
pezium (figure 14.16). This has four equal-length obliques and no right
angles. Now the oblique misjudgments should apply to both the front face
and the long obliques, when subjects assess their length on the page, and
they should be compared more fairly. Indeed they are. At 1 (front trape-
zium) to 1.5 (long oblique) they were judged veridically. Further, given an
oblique of 5 times the front trapezium the judged ratio was 1.1 times the
true ratio.
If the front face of a brick is depicted by a trapezium, the brick appears
to recede at an acute angle to the picture surface. If the front face is a
square, the brick recedes as if orthogonal to the picture surface. The result
is more foreshortening for the square-fronted drawing. Obliques of length
343 Line and Borders of Surfaces
Figure 14.15
The obliques are seen as longer than the horizontals and verticals of the bricks.
Figure 14.16
Trapezium fronts invoke less foreshortening of obliques than do squares.
344 John M. Kennedy, Igor Juricevic, and Juan Bai
eyes. Now try to estimate the relative visual angles of the two hands (their
vertical extent). Most people say the right one looks about 90% the visual
angle of the left. Now swing both arms to straight ahead, being sure to
keep one bent and the other straight. Now the right hand’s visual angle
looks to be about 70% of the left’s. Now close one eye. Be sure the hands
are optically adjacent, but the right is still twice the distance of the other.
Now the right’s visual angle looks close to 50% of the left’s. Evidently,
visual angle ratios of the hands are judged to be close to their ratio in
length (in centimeters) when the two hands are set widely apart in angle,
and we view binocularly (getting good length information). Igor Juricevic
and I found these effects in studies with real objects set on a tabletop,
and studies with pictures of objects like that in figure 14.1. Judgments
of visual angle ratios vary from 100% veridical when objects are aligned
(optically adjacent) to judgments that are increasingly like length ratios as
angular separation increases. The judgments move steadily toward length
ratios, and come close to 100% of length ratios when objects are set 180°
apart (one to our extreme left and one to our extreme right).
Little wonder we have trouble drawing in perspective! The relative
visual angles of nearby objects are misperceived quite massively. Photo-
graphs arrange patches on a picture surface to match visual angle ratios
of objects in the 3-D world. To draw with photographic realism, we need
to judge relative visual angle ratios of the 3-D target objects. We cannot
do this accurately without aligning objects optically. That is why artists
hold up pencils optically alongside target objects.
Consider using a pencil like an artist when drawing two identical elec-
tricity pylons, one near and the other far, one to our left and one to our
extreme right. The ratio of the nearby pylon’s visual angle and the pencil’s
are judged correctly: say 0.6 to 1. Then a second pylon is assessed against
the pencil: Say 0.3 to 1. This tells us the ratio of the two targets’ visual
angles is 2 to 1. If we had not used the pencil we might have judged the
visual angle ratio of the pylons as say 1.2 to 1, close to the height ratio.
The effects of foreshortening are related more and more to length and
height information as target objects separate in angle. Thus vision is not
predictable just from the laws of perspective. We should use foreshort-
ening, because it is everpresent, but some impressions we get from it are
not predictable from laws of perspective alone. Pictorial impressions
are to do with these effects, which are dramatic and little understood.
The effects require empirical investigation, not just a priori geometrical
considerations.
346 John M. Kennedy, Igor Juricevic, and Juan Bai
In Gaia’s picture, lines clearly depict surface edges. But should we gener-
alize and suggest lines can stand for any discontinuity? Let us consider
two theories, one arguing that any discontinuity suits outline, the other
arguing that lines show surface edges but not purely visual margins such
as those of a shadow’s.
A line has two contours. A black line on a white page is a thin black
region with white surrounds on both sides. The luminance border that
relies on black giving way to white is the shadow border.
Lines can match the shapes of any spatial discontinuity, including a
border of a shadow (Kennedy 1974). But what can they look like? Inter-
estingly, they cannot look like shadows, or objects bearing shadows.
347 Line and Borders of Surfaces
Figure 14.17
Shape-from-shadow fails in outline because it provides a negative contour along-
side the shadow.
not see familiar color forms look colored when shown purely in outline.
Hence the discontinuity theory is too general.
Could the explanation of outline be that a line uses one of its black-white
borders to match the referent’s border? That is, some borders, such as
shadow borders, are “polarized” so that they are darker on one side. A
shadow is darker than the illuminated region neighboring it, a factor that
plays an important role in shadow perception (Cavanagh and Leclerc
1989). Does outline work if it matches a polarized border?
A theorist might contend that an outline will work like a shadow if it
had a contour with the shadow’s polarization. This hypothesis fails, alas,
because one of the black outline’s two contours in figure 14.17 has pre-
cisely this pattern of polarization—the contour alongside the white, illu-
minated region.
Another explanation of the failure of outline to trigger shape-from-
shadow is that outline has two contours, and a shadow border has one.
Alas, a shape-from-shadow figure with two borders succeeds, provided
both are polarized in the correct direction (figure 14.17, second panel from
right). That figure has a black area for shadow, a grey area for a line, and
a white area for illuminated regions. Indeed, in studies with Juan Bai, I
have found that more than two borders can be provided and if they are
all polarized like a positive, shape-from-shadow succeeds. A successful
explanation of the failure of outline with shadow borders is that one of the
line’s contours offers incorrect polarity at the border of the shadowed
region. Hence the failure of a figure with a gray region for shadow and
a black outline (figure 14.17, third from left). The figure has gray for shad-
ow, a black line, and a white area for illuminated regions in the referent.
The line’s contour alongside the illuminated region has the correct polar-
ity. The line is dark and the illuminated area bright. But the contour it
provides to the shadowed region is polarized like a negative. It is gray on
the shadow side and darker on the line’s side.
Up to this point, the list of theories of outline has treated outline as using
continuous contours. But outlines with gaps work well too. We can draw
349 Line and Borders of Surfaces
a cube with lines of dots. If outline can be served by lines with gaps, then
it is the grouping of the elements that is at work. The axis of the grouping
is crucial. This has a location and a direction from us, but no color or
brightness. Hence it does not matter to outline whether the marks on
the page are black or white dots, or dashes or continuous lines, as figures
14.2 and 3 reveal. (I should note that Willats [1997] disagrees with this
conclusion.)
What do lines with continuous and discontinuous borders share with
their referents? The referents are surface borders, that is, corners and occlu-
sions. Any given outline is therefore ambiguous. The pattern in which it
is found helps it take on a referent in perception, but inverse projection
of outline works with vague sketches with no decisive features. The key
therefore is some common feature of a dotted line, with gaps, and a con-
tinuous surface border. But what feature?
Surfaces can be textured sparsely or densely. The contrasts telling us
about borders of surfaces are sometimes continuous optically, but some-
times they are full of gaps. Likewise, random-dot stereograms have regions
with no continuous contour in the stimulus display, but binocularly they
give us an impression of a continuous border. The border has no bright-
ness polarity, just like dotted outline. Similarly, kinetic random textures
use accretion and deletion of elements, with discontinuous stretches where
grouping of elements occurs across gaps, to push us into percepts of sharp
borders. Again, these borders are not brightness-polarized.
In sum, the optical patterns that specify borders of surfaces have fea-
tures that match the patterns operating as outline. Both kinds of patterns
can provide continuous stretches of luminance or spectral contour, but
often they offer gaps that need to be bridged. The physical factors that
permit grouping and apparent continuity, without polarization, can be
triggered by both outline and information for surface edges.
Figure 14.18
Accretion and deletion occurring with moving opaque foreground objects, in uni-
form illumination, with no shadows.
Figure 14.19
Accretion and deletion occurs for snow accreting in a beam from a light. The A
elements are accreting, and F elements are falling. A cross section of the beam has
accretion for elements at the left and right borders. A frontal view also has accre-
tion in the center regions between the left and right borders.
water surface. At the leading edge, optical units accrete, and at the trail-
ing edge units delete, and we see a single continuous surface. The elements
accreting and deleting are on the surface of water and the top layer of
wheat.
Related effects occur at night with directional illumination. As an obsta-
cle moves between us and a light source, its shadow sweeps through the
falling snow. Deletion occurs at the leading edge of the shadow and accre-
tion at its trailing edge. Snow can fall into a light beam, from a dark sky,
and accrete in the optic array. In figure 14.19, the A units are accreting and
the F units are falling. Snow can fall through the horizontal light beam
from a car headlight, deleting optically at the lower edge of the beam.
The particles seen in a street lamp’s light can be seen falling until their
background is the snow-covered sidewalk. The continuing dance down-
ward, visible when passing in front of a dark wall, is no longer visible
against the white sidewalk. The snowflakes accreting and deleting are in
352 John M. Kennedy, Igor Juricevic, and Juan Bai
Figure 14.20
An object racing to the right casts a shadow on elements (E) with deletion (D) at
its leading edge and accretion (A) at its trailing edge. The elements in the shadow
are not illuminated and are invisible.
the foreground, and the wall is the background. At night, insects swarm-
ing in a beam of light accrete and delete at the beam’s margins, contrast-
ing with dark buildings in the background.
In a 1930s gangster movie set in Alcatraz, a spotlight running across a
surface such a brick wall at night could bring elements into the optic array
and then move on so they would fall back into darkness. Bricks accrete
and delete revealing a continuous surface, with no foreground or back-
ground. Cagney or Bogey could run through the spotlight and cast a deep
shadow on the ground that would delete elements at its leading edge, while
elements would accrete at the trailing edge (figure 14.20).
Other cases are provided by motion and highlights. As we move past a
series of small spots of water on a surface, a path of highlights can stretch
from us towards the sun, especially if it is low in the sky. The path moves
with us, staying aligned with the sun. The highlights accrete as we move
forward and delete as we move too far. The sparkling lights accrete at
the leading edge of the path and delete at its trailing edge. The accretion
and deletion tell us about a continuous surface below us. This is vividly
observed flying east-west past the thousands of kettle lakes of the Missis-
sippi, with the sun to the south, or walking over a fresh dry sparkling
Ontario snowfall on a sunny winter’s day.
On a breezy day, lake water often reflects sunlight as rippling light
patterns on boat hulls. The ripples run along the hull and delete at the
surface’s edge. The accretion and deletion tell us about the foreground
surface.
In sum, like outline, accretion and deletion require grouping to tell us
about a continuous border. Also like outline, they are ambiguous and can
353 Line and Borders of Surfaces
14.15 CONCLUSION
Hermann Kalkofen
Figure 15.1
Ernst Mach. From J. Brozek and J. Hoskovec (1993, p. 229).
1885 in his Analysis of the Sensations and it has been revived in Gibson’s
Perception of the Visual World (1950). Gibson explained the portrayal
as a “literal representation” of Mach’s “visual field, with his right eye
closed, as he reclined in a nineteenth century chaise longue. His nose de-
limits the field on the right and his mustache appears below. His body
and the room are drawn in detail, although he could not see them in
detail without moving his eye” (Gibson 1950, p. 27). Mach’s portrait
from within represents what Hermann von Helmholtz called a Blickfeld,
whereby the eye is allowed here to move in its socket. Figure 15.3
shows the visual field of an arrested eye—in Helmholtz’s terms a Sehfeld
(see Metzger 1975, p. 127)—on top of Mach’s Blickfeld. The boundary
lines indicate appearance change as a function of where in the visual
field an object is placed. For example, a yellowish-green perimetric target
357 Irreconcilable Views
Figure 15.2
Self-portrait “from within.” From Mach (1903, p. 15).
appears, when first detected, whitish (solid line) and thereafter yellow
(broken line), and not until it reaches the dot-surrounded region does it
appear yellowish green. The so-called functional visual field is defined as
the area wherein some pattern discrimination with an arrested eye is just
possible. It has a horizontal extension of about 15° (Ikeda and Takeuchi
1975).1
Figure 15.3
A Sehfeld on top of Mach’s Blickfeld. From Mach (1903, p. 15) and Metzger
(1975, p. 127).
Figure 15.4
Pinhole photograph of a Greek statuette (probably Aphrodite, second century
BC, according to Pirenne). From Pirenne (1970, p. 101).
360 Hermann Kalkofen
limit. The value recently declared by Michael Bischoff (1998, pp. 143–
144), how-ever, amounts to 36°. Outside this visual angle, according
to him, objects are no longer clearly perceptible with an unmoving eye.
Bischoff traces the 36° back to Hans Jantzen (1911), who asked for the
limit up to which a picture can be clearly seen with an unmoving eye but
beyond which several shifts of gaze are required to make it perceptible
(ibid., p. 120). Jantzen’s question remains, strictly speaking, unanswered;
he suspected the region to ultimately extend inside the area beyond which
increasing marginal distortions emerge. This area has been defined by
Guido Hauck, to whom Jantzen referred. On the basis of his own per-
spective constructions, Hauck estimated it to be between 30° and, maxi-
mally, 36°.
Hauck was not actually concerned with gaze direction shifts. He stood out
as a discerning critic of the geometrical-perspective teachings of his days,
which were still based, as he saw it, on the physiology of the Kepplers (sic)
and [Christoph] Scheiners. The latter considered the eye to be a resting
camera obscura, and yet they were certain of a direct mental apprehen-
sion of the retinal image as an instantaneous whole (Hauck 1879, p. 4).
The fact that this teaching, in 1879, had remained absolutely untouched
by the recent achievements of physiological optics might have been, in
Hauck’s eyes, at least partly, explained by the accident that the alleged
supreme fidelity of perspective received apparent confirmation by the
camera obscura pictures of photography (ibid., p. 9).3 Seeing, however,
consists in the eye ceaselessly moving up and down, fixating, and flying
over the integral object. According to Hauck, the skillfulness of the eye
in this traveling activity is so great that we do not in the least realize the
particulars of the process. In a trice, the total image (Gesammtbild) is cre-
ated, which we believe to be a combination of simultaneous detail impres-
sions, whereas in reality the same take place in succession (ibid., pp. 7–8).
Seeing means a transference of the successive to simultaneity. The result-
ing mental visual image (Hauck’s Gesammt-Anschauungsbild) would be
composed of the received detail impressions, in part contradicting each
other, whereas the duty to weigh and compare the importance of the
distinct detail impressions fell upon the mind. This is necessary in order
to even out the inconsistencies and to bring about an overall result that is
361 Irreconcilable Views
Figure 15.5
Fourfold row of pillars. From Hauck (1879).
consistent with itself and with the mind (ibid., p. 38). This is Helmholtzian
“cognitivist” thinking at its best.
The mathematician Hauck considered it thinkable that central perspec-
tive might represent just one out of an entire order of imaginable and
justifiable perspective systems (ibid., p. 6). However, he argued, “If we . . .
ask about the particular properties of our mental visual image two cardi-
nal features turn out: 1) The apparent size of a length is proportional
to the angle of view.—2) Any straight line appears again as straight line.”
The former feature would be called the principle of conformity, the latter
the principle of collinearity (ibid., pp. 39–40).
Being the aesthetician that he was, Hauck suggested that a graphic
representation, in order to equal the mental visual image as faithfully as
possible, should be endowed with the following features: (1) two indis-
pensable conditions to be fulfilled at all cost—the principle of the linear
horizon and the principle of verticality; (2) two secondary conditions to
be fulfilled if possible: the principle of conformity and the principle of
collinearity.4 The latter two conditions oppose each other and cannot be
satisfied at the same time; the one excludes the other completely or in part
(ibid., pp. 41–42).
Figure 15.5 shows a 95° view of a fourfold row of eight equidistant pil-
lars, whose appearance follows the rules of conform perspective and is
reshaped according to the verticality principle.5 Mind the perceptible cur-
vatures at top and bottom.
362 Hermann Kalkofen
Figure 15.6
Varieties of perspective. From Hauck (1879).
Figure 15.6 compares the twelve pillars at the left in figure 15.5 delin-
eated in revised conform perspective (above) and in collinear perspective
(below). In both cases, the visual angle is 36°. Up to this angle, the dif-
ferences between conform and collinear pictures are so minute that it
might be considered a region of absolute perspective. The deviations would
further diminish if the angle becomes 30°. An angle of 30° corresponds,
however, to an eye distance just 2 times the width of the picture, an angle
of 36° to a distance of 1.5 picture widths,6 distances generally used and
recommended (ibid., p. 76). Is Hauck’s estimate of the absolute perspec-
363 Irreconcilable Views
Figure 15.7
Hyperbolic checkerboard pattern. From Helmholtz (1910).
Figure 15.8
Chronophotograph of a high jump. From Ceram (1965, pp. 126–129).
Figure 15.9
Prisms and double prism. From Thiéry (1895, p. 319).
seen it, or whether recreating imagination had been in the game—all that
is insubstantial regarding the conception of pictorial faithfulness” (Bühler
1925, p. 113). Depictions of really existing (referent) objects are, semi-
otically spoken, pictographs; pictures of imaginary (referent) objects are
ideographs.
Figure 15.10
From Theodor De Bry’s Collectiones peregrinatorum in Indiam Orientalem et
Indiam Occidentalem (1590–1634) (1990, p. 27). See plate 5 for color version.
prisms (figures 15.9A and 15.9B; figs. 4 and 5 in Thiéry 1895);13 the one at
the left seen from below, the right one seen from above. If both get com-
bined and consequently have a lateral plane in common, an ambiguous
pattern results, which can be interpreted in a twofold way: “If one moves
one’s eyes from the right to the left, the plane in common with its right
edge comes to the fore and forms, with the plane at the right, the prism
seen from above. The same right edge recedes, however, when the eyes run
through the figure from the second to the third plane to the left. In this
case, the planes located at the left apparently portray the prism seen from
below” (ibid., p. 320). The two interpretations of Thiéry’s figure go along
with irreconcilable views.14
Prisms and double-prisms, too, are a variety of inanimate object, some sort
of existent. The depiction of an existent may be called its presentation.
369 Irreconcilable Views
Figure 15.11
Noble Maiden from Segota. From De Bry (1990, p. 23).
Figure 15.12
Two Venetian Women, Albrecht Dürer (1495). From Strieder (1996, p. 150).
that are both contained in the POV shot.16 This construction of the—then
composite—picture requires two centers of projection—one for a zoo-
graphic view and one for a “holistic” view. The moment intended here
corresponds to one point in time. The alternative construction assumes
a single center of projection and two points in time.17 Here the noble
maiden had to change her statuary position during the time lapse between
t1 and t2 exactly into its opposite.
Figure 15.12 is a drawing by Albrecht Dürer (1495). It is admittedly
true that its caption reads as Two Venetian Women. The picture shall be
nonetheless regarded as presentation of a single character. The sceno-
371 Irreconcilable Views
Figure 15.13
Angela Merkel. From Die Zeit, no. 19, 2000. Photographs by Uta Mahler with
permission by Ostkreuz Agentur der Fotografen.
Figure 15.14
From De Bry (1990, p. 156).
372 Hermann Kalkofen
Figure 15.15
From Stemberger (1990).
Figure 15.16
From Racinet (1888/1995, p. 85).
Figure 15.17
From Kratzsch (1983, p. 39).
However, future time, too, is remote and distant from the present. The
picture Serveto and His Execution (Christof van Sichem, 1608) (figure
15.15) is an instance of what may be named prospective perspective (ibid.).
Figure 15.16 brings a detail from a Japanese woodcut The Fitting out
of an Archer. It is the selfsame soldier indeed in both views, as can be
deduced from the omitted context. The scenographic cues are minimized
again. Pictorial space is reduced to the foreground. Thus, the depicted
action proceeds from the left to the right.18 Figure 15.17 shows a woodcut
from the Luther Bible (Kratzsch 1983), Samson Beats the Philistines, 1534.
This is another case of prospective perspective.
Pictorial narration may of course extend to more than two bare
moments. For example, De Bry’s engraving presented in figure 15.18
(Indians want to try out whether the Spaniards be immortal folks and drown
a Spaniard named Salsedo in the sea) comprises three moments and is
another example of prospective perspective. The instances of Wickhoff’s
continuous style with its characteristic figure repetitions could be enor-
375 Irreconcilable Views
Figure 15.18
From De Bry (1990, p. 158).
Figure 15.19
From Lübke and Pernice (1921, fig. 71/51), Sethos I. Offers a Picture to Truth
(Seth, Seti), nineteenth dynasty (1314–1200 B.C.).
Notes
1. The term has seemingly been coined by Sanders (1970); Ikeda and Takeuchi
report that “the functional visual field varied greatly among subjects, and also
changed for a particular subject when the data were collected on different occa-
sions” (1975, p. 260).
2. The author was among the founding fathers of Gestalt theory (cf. Metzger
1975, p. 133). Jantzen (1911, p. 120) characterized Cornelius’ book as “reich illus-
trierten Kommentar zu Adolf Hildebrands Problem der Form” [“richly illustrated
commentary on Adolf Hildebrand’s Problem of Form”] (Hildebrand 1908).
3. Compare Te Peerdt: “Vor Erfindung der Photographie ist nicht einmal der
Gedanke einer wahrhaft einheitlichen optischen Darstellung gefasst gewesen”
[“Before the invention of photography the thought of a truly consistent optical
representation had not even existed”] (1899, p. 6).
4. I have reversed the order of the two principles from the order in which he gave
them.
5. Hauck’s “reconceived pictorial space”—one system of “curvilinear perspec-
tive” among others (Pirenne 1970, p. 148)—has been assessed by Ten Doesschate:
“He introduced a new system of perspective which he called ‘conform’ in opposi-
tion to normal central perspective which is collinear.—Hauck is not consistent.
He represents horizontal straight lines by arcs; verticals, however, he ‘rectifies.’
This he does according to our ‘consciousness of verticality’ (Vertikalitätsbewusst-
sein)” (1964, pp. 47– 48). Hauck’s semiconformity has already been criticized by
Panofsky (1927, p. 262).
6. In the case of still smaller viewing distances, it would be feasible to begin the
construction collinearily first and to reshape possible conformity distortions later
on, according to conform perspective. Raphael always proceeded in this way, see,
for example, the spheres at the right and the base of the column at the left in The
School of Athens, 1510–11, (Hauck 1879, p. 75). Compare Pirenne: “The architec-
ture which extends over most of the painting is drawn in perspective as one whole,
from one main centre of projection. But the spheres (and the numerous human
figures) are not drawn as projections from this centre. They are drawn from a
number of subsidiary centres of projection, each in front of the position which the
respective sphere or figure would occupy in the painting” (1970, p. 121).
7. “[D]ie Richtkreise des Blickfeldes, welche mit der durch den Fixationspunkt
gehenden vertikalen und horizontalen Linie übereinstimmende Richtung haben,
auf eine ebene Tafel projiziert” [“the reference circles that have a common direc-
tion with the vertical and horizontal lines that cross the fixation point, projected
onto a plane”] (Helmholtz 1910, p. 150).
8. “[P]icture narratives have, of course, been common for centuries” (Chatman
1978, p. 34).
9. Chatman (1978) notes that this kind of distinction has been recognized since
antiquity, that is, in Aristotle’s Poetics (p. 19). Comparably, G. E. Lessing drew a
distinction between characters (Körper) and actions (Handlungen) (Lessing 1964).
378 Hermann Kalkofen
10. “The conceptual framework within which we organize and describe our per-
ceptions of the physical world, whatever language we speak, is one in which we can
identify not only states-of-affairs of shorter or longer duration, but also events,
processes and actions.—There is, unfortunately, no satisfactory term that would
cover states, on the one hand, and events, processes and actions, on the other”
(Lyons 1977, p. 483). Lyons’ opposition reminds us of L. von Bertalanffy’s (1952)
saying that “what are called structures are slow processes.” Consistent with this
opposition is Strawson’s ontological distinction between first- and second-order
entities (Lyons 1977, p. 443).
11. An agent is, in the sense of Chatman, an existent; in Lyons’s sense he is a
static situation. Lyons refrains—understandably—from naming agents situations.
12. In the words of Franz Wickhoff: “Endlich, für unsere gewohnte Vorstel-
lung von der Kunstentwicklung spät genug, hatte sich auch im Bilde zusam-
mengeschlossen, was man auf der Szene schon lange zu sehen gewohnt war:
wandelnde Figuren vor einem geschlossenen Grund” [“Late enough for our accus-
tomed idea of the development of art, something long familiar on the scene finally
came together for the picture: changing figures in front of a closed ground”] (1912,
p. 104).
13. This is a somewhat strange composition in that Thiéry’s figures 4, 5, and 6
show nine, eight, and ten crossing transversals respectively.
14. The picture-prisms are, of course, perspective projections: Thiéry’s drawing
instructions for the prisms read as follows: “Man denke sich ein verticales Prisma,
von dem bloß zwei Seitenflächen gesehen werden. Das Prisma ist um eine hori-
zontale Achse beweglich. Auf jede der sichtbaren Flächen des Prismas zeichnet
man eine verticale durch horizontale Striche eingetheilte Linie. Wenn das Prisma
so gedreht worden ist, dass sein oberer Theil vom Beobachter entfernt liegt, dann
wird das Prisma als Figur 4 und die wagerechten Striche werden als Transversalen
erscheinen. Die eingetheilten Linien werden jetzt in Folge dessen nach oben con-
vergent gesehen. Da das Auge des Beobachters in der Höhe der Achse liegt, so sind
auch die oberen Eckpunkte der Linien dem Beobachter näher als die unteren;
der Gesichtswinkel der Distanz zwischen den oberen Endpunkten ist also kleiner
wie der zwischen den unteren. Ferner: wirklich nach oben divergirende Linien
würden in diesem Fall als parallele Linien gezeichnet werden müssen” (Thiéry
1895, pp. 321–322).
15. The differential coloring is probably an indication of the advanced decay of
what has been called by Wickhoff the continuous style, with its characteristic figure
repetitions.
16. Or in the RVA shot: the picture can be “read” alternately.
17. Compare Vitz’s nice analysis of Edouard Manet’s Bar at the Folies Bergère,
1881 (Vitz 1979, p. 139).
18. The action proceeds with varying time intervals. Its depiction relates, in con-
trast to Marey’s chronophotography of the jump, to an intrinsic segmentation.
References
Adams, P. A., and Haire, M. (1959). The effect of orientation on the reversal of
one cube inscribed in another. American Journal of Psychology, 72, 296–299.
Adelson, E. H. (1993). Perceptual organization and the judgement of brightness.
Science, 262, 2042–2044.
Alberti, L. B. (1436/1966). On painting (trans. J. R. Spencer). New Haven, Conn.:
Yale University Press.
Alberti, L. B. (1436/1972). On painting (trans. C. Grayson). London: Phaidon.
Alhazen, I. (1039/1989). Book of optics. In A. I. Sabra (ed.), The optics of Ibn
Al-Haytham. London: Warburg Institute, University of London. (Alhazen died
about 1039 C. E. When he wrote this work is not known.)
Anderson, C. H., and Van Essen, D. C. (1987). Shifter circuits: A computational
strategy for dynamic aspects of visual processing. Proceedings of the National
Academy of Science, 84, 6297–6301.
Anderson, K. (1992). Brook Taylor’s work on linear perspective. Sources in the
history of mathematics and physical sciences, vol. 10. New York: Springer.
Arend, L., and Goldstein, R. (1987). Simultaneous constancy, lightness and bright-
ness. Journal of the Optical Society of America A, 4, 2281–2285.
Arnheim, R. (1974). Art and visual perception: A psychology of the creative eye.
Berkeley: University of California Press.
Atkins, J. E., Fiser, J., and Jacobs, R. A. (2001). Experience-dependent visual
cue integration based on consistencies between visual and haptic percepts. Vision
Research, 41, 449–461.
Attneave, F. (1954). Some informational aspects of visual perception. Psycholog-
ical Review, 61, 183–193.
Bahn, P., and Vertut, J. (1997). Journey through the Ice Age. Berkeley: University
of California Press.
Ballard, D. (1991). Animate vision, Artificial Intelligence, 48, 57–86.
Baxandall, M. (1985). Patterns of intention: On the historical explanation of pic-
tures. London: Yale University Press.
380 References
Buckley, D., Frisby, J. P., and Freeman, J. (1994). Lightness perception can be
affected by surface curvature from stereopsis. Perception, 23, 869–881.
Bühler, K. (1922). Die Erscheinungsweisen der Farben (The appearance of the
colors). In K. Bühler (ed.), Handbuch der Psychologie, part 1: Die Struktur der
Wahrnehmungen (Handbook of Psychology, part 1: The structure of Perception)
(pp. 1–201). Jena, Germany: Fischer.
Bühler, K. (1925). Abriß der geistigen Entwicklung des Kindes (Mental Develop-
ment of the Child). Leipzig, Germany: Quelle and Meyer.
Bunge, W. (1966). Theoretical geography. Lund, Sweden: C. W. K. Gleerup.
Burton, H. E. (1945). The optics of Euclid. Journal of the Optical Society of
America, 35, 357–372.
Busey, T. A., Brady, N. P., and Cutting, J. E. (1990). Compensation is unneces-
sary for the perception of faces in slanted pictures. Perception and Psychophysics,
48, 1–11.
Cabe, P. A., Abele, P. N., and Kennedy, J. M. (1999). Dominance of bino-
cular depth information over competing kinetic occlusion information. Paper
presented at the meeting of the Psychonomic Society, November 18–21, Los
Angeles.
Cadiou, H., and Gilou (1989). La peinture en trompe l’oeil (Trompe l’oeil Paint-
ing). Paris: Dessain et Tolra.
Carani, M. (1998). The semiotics of perspective, Semiotica, 119 (3/4), 309–357.
Carlbom, I., and Paciorek, J. (1978). Planar geometric projections and viewing
transformations. Computing Surveys, 10, 465–502.
Carl Zeiss Jena (1907). Instrument zum beidäugigen Betrachten von Gemälden
u.ggl (Instrument for the Binocular Viewing of Paintings). Kaiserliches Patentamt
(patent no), Patentschrift no. 194480, Klasse 42h, Gruppe 34.
Carroll, N. (1988). Mystifying movies: Fads and fallacies in contemporary film
theory. New York: Columbia University Press.
Cavanagh, P., and Kennedy, J. M. (2000). Close encounters: Small details veto
depth from shadows. Science, 287, 2421.
Cavanagh, P., and Leclerc, Y. G. (1989). Shape from shadows. Journal of Experi-
mental Psychology: Human Perception and Performance, 15, 3–27.
Cazeaux, C. (in press). Metaphor and the categorization of the senses. Metaphor
and Symbol.
Cavanagh, P. (1987). Reconstructing the third dimension: Interactions between
color, texture, motion, binocular disparity, and shape. Computer Vision, Graphics,
and Image Processing, 37, 171–195.
Ceram, C. W. (1965). Eine Archäologie des Kinos (An Archeology of Cinema).
Reinbek: Rowohlt.
Chatman, S. (1978). Story and discourse: Narrative structure in fiction and film.
Ithaca, N.Y.: Cornell University Press.
382 References
Chauvet, J.-M., Brunel Deschamps, E., and Hillaire, C. (1995). La Grotte Chauvet
à Vallon-Pont-d’Arc. Paris: Seuil.
Chomsky, N. (1957/1971). Syntactic structures. The Hague, Netherlands:
Mouton.
Chomsky, N. (1995). The minimalist program. Cambridge, Mass.: MIT Press.
Chomsky, N. (1996). Powers and prospects: Reflections on human nature and the
social order. London: Pluto Press
Chomsky, N. (2000). New horizons in the study of language and mind. Cambridge:
Cambridge University Press.
Churchland, P. M. (1989). Perceptual plasticity and theoretical neutrality: A reply
to Jerry Fodor, Philosophy of Science, 55, 167–187.
Churchland, P. M. (1992). A neurocomputational perspective: The nature of mind
and the structure of science. Cambridge, MA: MIT Press.
Churchland, P. S., Ramachandran, V. S., and Sejnowski, T. J. (1994). A critique
of pure vision. In C. Koch and J. C. Davis (eds.), Large-scale neuronal theories of
the brain (pp. 23–60). Cambridge, Mass.: MIT Press.
Clottes, J. (2001). La Grotte Chauvet: L’art des origines (The Art of the Origins).
Paris: Seuil.
Clowes, M. B. (1971). On seeing things. Artificial Intelligence, 2 (1), 79–116.
Cook, M. (1978). Judgment of distance on a plane surface. Perception and Psy-
chophysics, 23, 85–90.
Cornelius, H. (1908). Elementargesetze der bildenden Kunst: Grundlagen einer prak-
tischen Ästhetik (Elementary Laws of Visual Art: Foundations of a Practical
Aesthetics). Leipzig and Berlin: Teuber.
Costall, A. P. (1990). Seeing through pictures. Word and Image, 6, 273–277.
Cutting, J. E. (1986a). Perception with an eye for motion. Cambridge, Mass.: MIT
Press.
Cutting, J. E. (1986b). The shape and psychophysics of cinematic space. Behavior
Research Methods, Instruments, and Computers, 18, 551–558.
Cutting, J. E. (1987). Rigidity in cinema seen from the front row, side aisle. Journal
of Experimental Psychology: Human Perception and Performance, 13, 323–334.
Cutting, J. E. (1988). Affine distortions in pictorial space: Some predictions for
Goldstein (1987) that La Gournerie (1859) might have made. Journal of Experi-
mental Psychology: Human Perception and Performance, 14, 305–311.
Cutting, J. E. (1997). How the eye measures reality and virtual reality. Behavior
Research Methods, Instruments, and Computers, 29, 27–36.
Cutting, J. E. (1998). Information from the world around us. In J. Hochberg (ed.),
Perception and cognition at century’s end (pp. 69–93). New York: Academic Press.
Cutting, J. E., and Massironi, M. (1998). Pictures and their special status in per-
ceptual and cognitive inquiry. In J. Hochberg (ed.), Perception and cognition at
century’s end: History, philosophy, and theory (pp. 137–168). San Diego: Academic
Press.
383 References
Cutting, J. E., and Vishton, P. M. (1995). Perceiving layout and knowing dis-
tances: The integration, relative potency, and contextual use of different informa-
tion about depth. In W. Epstein and S. Rogers (eds.), Perception of space and
motion (pp. 69–117). San Diego, Calif.: Academic Press.
Damisch, H. (1994). The origin of perspective (trans. John Goodman). Cambridge,
Mass.: MIT Press.
Danto, A. (2001). Seeing and showing. Journal of Aesthetics and Art Criticism, 59,
1–10.
Da Silva, J. A. (1985). Scales of perceived egocentric distance in a large open field:
Comparison of three psychophysical methods. American Journal of Psychology,
98, 119–144.
De Bry, T. (1990). Reisen in das östliche und westliche Indien: Reisen in das west-
liche Indien (Amerika) (Travels to east and west Indies; Travels to the West Indies
(America)), ed. G. Sievenich. Berlin and New York: Casablanca Verlag.
Denis, M. (1890/1976). In R. Goldwater and M. Treves (eds.) Artists on art: From
the 14th to the 20th century (p. 380). London: John Murray.
Deregowski, J. B., Muldrow, E. S., and Muldrow, W. F. (1972). Pictorial recogni-
tion in a remote Ethiopian population. Perception, 1, 417– 425.
Descartes, R. (1637/1965). Discourse on method, optics, geometry, and meteorology
(trans. Paul J. Olscamp). New York: Bobbs-Merrill.
Diels, H. (1922). Die Fragmente der Vorsokratiker, 4th ed., vol. 1. Berlin: Weid-
mannsche Buchhandlung.
Doorn, A. J. van, Koenderink, J. J., Ridder, H. de (2001). Pictorial space corre-
spondence in photographs of an object in different poses. In B. E. Rogowitz and
T. N. Pappas (eds.), Proceedings of SPIE: Human vision and electronic imaging VI.
4299, 321–329.
Dosher, B. A., Sperling, G., and Wurst, S. A. (1986). Tradeoffs between stereop-
sis and proximity luminance covariance as determinants of perceived 3D structure,
Vision Research, 26, 973–990.
Dubery, F., and Willats, J. (1972). Drawing systems. London: Studio Vista.
Dubery, F., and Willats, J. (1983). Perspective and other drawing systems. London:
Herbert Press; New York: Van Nostrand Reinhold.
Eby, D. W., and Braunstein, M. L. (1995). The perceptual flattening of three-
dimensional scenes enclosed by a frame. Perception, 24, 981–993.
Epstein, W. (1968). Modification of the disparity-depth relationship as a result
of exposure to conflicting cues. The American Journal of Psychology, 81, 189–197.
Epstein, W. (1982). Percept-percept coupling. Perception, 11, 75–83.
Epstein, W., and Rogers, S. (in press). Percept-percept coupling revisited. In
U. Savardi (ed.), Festschrift in onore di Paolo Bozzi. Padua, Italy: Cleup.
Erens, R. G. F., Kappers, A. M. L., and Koenderink, J. J. (1991). Limits on the per-
ception of local shape from shading. In P. J. Beek, R. J. Bootsma, and P. C. W. van
Wieringen (eds.), Studies in perception and action (pp. 65–71). Amsterdam: Rodopi.
384 References
Gillam, B., and Sedgwick, H. A. (1996). The interaction of stereopsis and per-
spective in the perception of depth. Perception, 25 (supplement), 70.
Gillam, B., Sedgwick, H. A., and Cook, M. (1993). The interaction of surfaces
with each other and with discrete objects in stereoscopic depth. Perception, 22
(supplement), 35.
Glennerster, A., and McKee, S. P. (1999). Bias and sensitivity of stereo judge-
ments in the presence of a slanted reference plane. Vision Research, 39, 3057–
3069.
Gogel, W. C. (1993). The analysis of perceived space. In S. Masin (ed.), Founda-
tions of perceptual theory (pp. 113–182). Amsterdam: North-Holland.
Gogh, V. van (1959). The complete letters of Vincent van Gogh: With reproductions
of all the drawings in the correspondence, 2d ed., vol. 1. Greenwich, Conn.: New
York Graphic Society.
Goldsmith, T. H. (1990). Optimization, constraint, and history in the evolution
of eyes. The Quarterly Review of Biology, 65, 281–322.
Gombrich, E. (1956). Art and illusion: A study in the psychology of pictorial rep-
resentation. Princeton, N.J.: Princeton University Press.
Gombrich, E. H. (1959). Art and illusion, part 3: The beholder’s share. London:
Phaidon Press.
Gombrich, E. H. (1960). Art and illusion: A study in the psychology of pictorial
representation. New York: Pantheon.
Gombrich, E. H. (1960/1993). Art and illusion: A study in the psychology of picto-
rial representation. London: Phaidon.
Gombrich, E. H. (1974). “The sky is the limit”: The vault of heaven and pictorial
vision. In E. H. Gombrich, The image and the eye: Further studies in the psychol-
ogy of pictorial representation (pp. 162–171). Ithaca, N.Y.: Cornell University
Press.
Gombrich, E. H. (1982). The image and the eye: Further studies in the psychology
of pictorial representation. Oxford: Phaidon.
Gombrich, E. H. (1983). Art and illusion: A study in the psychology of pictorial
representation. Oxford: Phaidon Press.
Gombrich, E. H. (1987). The art of the Greeks. In R. Woodfield (ed.), Reflections
on the history of art (pp. 11–17). Berkeley: University of California Press.
Goodman, N. (1968). The Languages of art. New York: Bobbs-Merrill.
Goodman, N. (1968/1976). Languages of art: An approach to a theory of symbols.
Indianapolis: Hackett.
Goodman, N. (1969). Languages of Art, 2d ed. Oxford: Oxford University Press.
Gorea, A., and Julesz, B. (1990). Context superiority in a detection task with line-
element stimuli: A low-level effect. Perception, 19, 5–16.
Gregory, R. L. (1977). Eye and brain: The psychology of seeing. London: Weiden-
feld and Nicolson.
387 References
Grüsser, O. J., and Landis, T. (1991). Visual agnosias and related disorders. In J.
Cronly-Dillon (ed.), Vision and visual dysfunction. Basingstoke, United Kingdom:
MacMillan.
Haber, R. N. (1980). Perceiving space from pictures: A theoretical analysis. In
M. A. Hagen (ed.), The perception of pictures, vol. 1 (pp. 3–31). New York:
Academic Press.
Hagen, M. A. (1985). There is no development in art. In N. H. Freeman and
M. V. Cox (eds.), Visual order: The nature and development of pictorial representa-
tion (pp. 59–77). Cambridge: Cambridge University Press.
Hagen, M. A. (1986). Varieties of realism: Geometries of representational art.
Cambridge: Cambridge University Press.
Hagen, M. A., and Giorgi, R. (1993). Where’s the camera? Ecological Psychology,
5, 65–84.
Hagen, M. A., and Jones, R. K. (1978). Cultural effects on pictorial perception:
How many words is one picture really worth? In R. D. Walk and H. L. Pick Jr.
(eds.), Perception and experience (pp. 171–212). New York: Plenum Press.
Harris, P. L., and Kavanaugh, R. D. (1993). Young children’s understanding of
pretense. Monographs of the Society for Research in Child Development, 58.
Harway, N. I. (1963). Judgment of distance in children and adults. Journal of
Experimental Psychology, 65, 385–390.
Hauck, G. (1879). Die subjektive Perspektive und die horizontalen Curvaturen des
dorischen Styls: Eine perspektivisch-ästhetische Studie (The subjective perspective
and the horizontal curvatures of the doric style: A perspectival aesthetic study).
Stuttgart, Germany: Wittwer.
Hawking, S. W. (1993). A brief history of time: From the Big Bang to black holes.
London: Bantam Press.
Hecht, H., van Doorn, A., and Koenderink, J. J. (1999). Compression of visual
space in natural scenes and in their photographic counterparts. Perception and
Psychophysics, 61, 1269–1286.
Heider, F. (1926). Ding und Medium. Symposium, 1, 109–157.
Helmholtz, H. von (1867). Handbuch der Physiologischen Optik. Hamburg: Voss.
Helmholtz, H. von (1894). Über den Ursprung der richtigen Deutung unserer
Sinneseindrücke (The origin and correct interpretation of our sense impressions).
Zeitschrift für Psychologie und Physiologie der Sinnesorgane, 7, 81–91.
Helmholtz, H. von (1896). Handbuch der physiologischen Optik, 2d ed. Hamburg:
Voss.
Helmholtz, H. von (1910). Handbuch der Physiologischen Optik. 3d ed., 3d vol.
Hamburg/Leipzig: Voss. (Expanded and coedited with A. Gullstrand, J. v. Kries,
and W. Nagel.)
Helmholtz, H. von (1911/1925). Treatise on physiological optics, 3d ed. (ed. and
trans. J. P. C. Southall). Rochester, N.Y.: Optical Society of America.
388 References
Kennedy, J. M., and Domander, R. (1985). Shape and contour: The points of
maximum change are least useful for recognition Perception, 14, 367–370.
Kennedy, J. M., and Juricevic, I. (1999). Angular subtense and binocular vision:
Worse than worst. In M. A. Grealey and J. A. Thomson (eds.), Studies in percep-
tion and action, vol. 5. (pp. 24–27). Mahwah, N.J.: L. Erlbaum.
Kennedy, J. M., and Merkas, C. (2000). Depictions of motion devised by a blind
person. Psychonomic Bulletin and Review, 7, 700–706.
Kennedy, J. M., and Miyoshi, H. (1995). Combining illusions: A perfect additiv-
ity model fits imperfect additivity results. Paper presented at the meeting of the
Psychonomic Society, November 10–12, Los Angeles.
Kennedy, J. M., Nicholls, A., and Desrochers, M. (1995). From line to outline. In
C. Lange-Kuettner and G. V. Thomas (eds.), Drawing and looking (pp. 62–74).
Hemel Hempstead, United Kingdom: Simon and Schuster.
Kepes, G. (1944). The language of vision. Chicago: Paul Theobold.
Kersten, D., Bülthoff, H. H., Schwartz, B. L., and Kurtz, K. J. (1992). Interaction
between transparency and structure from motion. Neural Computation, 4, 573–589.
Kilpatrick, F. P., and Ittleson, W. H. (1953). The size-distance invariance hypoth-
esis. Psychological Review, 60, 223–231.
Klee, P. (1925/1981). Pädagogisches Skizzenbuch (Pedagogical sketch book). Mainz
and Berlin.
Klee, P. (1961). Notebooks (The thinking eye), vol. 1, ed. J. Spiller. London: Lund
Humphries.
Klein, F. (1893). Vergleichende Betrachtungen über neuere geometrische
Forschungen (the Erlanger Programm) (Comparative reflection on newer geomet-
rical research). Mathematische Annalen, 43, 63–100.
Knill, D. C., and Kersten, D. (1991). Apparent surface curvature affects lightness
perception. Nature, 351, 228–230.
Koenderink, J. J. (1990). Solid shape. Cambridge, Mass.: MIT Press.
Koenderink, J. J. (1998). Pictorial relief. Philosophical Transactions of the Royal
Society London, A, 356, 1071–1086.
Koenderink, J. J. (1999). Virtual psychophysics. Perception, 28, 669–674.
Koenderink, J. J., and van Doorn, A. J. (1995). Relief: Pictorial and otherwise.
Image and Vision Computing, 13, 321–331.
Koenderink, J. J., and Doorn, A. J. van (1997). The generic bilinear calibration-
estimation problem. International Journal of Computer Vision, 23, 217–234.
Koenderink, J. J., and van Doorn, A. J. (1998). The Structure of Relief. Advances
in Imaging and Electron Physics, 103, 65–150.
Koenderink, J. J., van Doorn, A. J., and Kappers, A. M. L. (1992). Surface per-
ception in pictures. Perception and Psychophysics, 52, 487–496.
Koenderink, J. J., van Doorn, A. J., and Kappers, A. M. L. (1994). On so-called
paradoxical monocular stereoscopy. Perception, 23, 583–594.
391 References
Koenderink, J. J., van Doorn, A. J., and Kappers, A. M. L. (1995). Depth relief.
Perception, 24, 115–126.
Koenderink, J. J., van Doorn, A. J., and Kappers, A. M. L. (1996). Pictorial
surface attitude and local depth comparisons. Perception and Psychophysics, 58,
163–173.
Koenderink, J. J., van Doorn, A. J., Kappers, A. M. L., and Todd, J. T. (2000a).
Ambiguity and the “mental eye” in pictorial relief. Investigative Ophthalmology
and Visual Science, 41, S36.
Koenderink, J. J., van Doorn, A. J., Kappers, A. M. L., and Todd, J. T. (2000b).
Directing the mental eye in pictorial perception. In B. E. Rogowitz and T. N. Pap-
pas (eds.), Proceedings of SPIE: Human vision and electronic imaging, 3959, 2–13.
Koenderink, J. J., van Doorn, A. J., Kappers, A. M. L., and Todd, J. T. (2001).
Ambiguity and the “mental eye” in pictorial relief. Percpetion, 30, 431–448.
Koenderink, J. J., Kappers, A. M. L., Pollick, F. E., and Kawato, M. (1997).
Correspondence in pictorial space. Perception and Psychophysics, 59, 813–827.
Koffka, K. (1935). Principles of Gestalt psychology. New York: Harcourt.
Koffka, K. (1953). Principles of Gestalt psychology. New York: Harcourt, Brace,
and World.
Kosslyn, S. (1994). Image and the Brain. Cambridge, Mass.: MIT Press.
Kraft, R., and Green, J. S. (1989). Distance perception as a function of photo-
graphic area of view. Perception and Psychophysics, 45, 459–466.
Kratzsch, K. (ed.)(1983). Illuminierte Holzschnitte der Luther-Bibel von 1534 (Illu-
minated wood cuts of the Luther-Bibel of 1534). Hanau, Germany: Dausien.
Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to
a nonmetric hypothesis. Psychometrika, 29, 707–719.
Kubovy, M. (1986). The Psychology of perspective and renaissance art. Cambridge:
University Press.
Kubovy, M., and Epstein, W. (in press). Internalization: A metaphor we can live
without. Behavioral and Brain Sciences, 24, 618–625.
Künnapas, T. (1968). Distance perception as a function of available visual cues.
Journal of Experimental Psychology, 77, 523–529.
Lakoff, G., and Johnson, M. (1999). Philosophy in the flesh. New York: Basic
Books.
Landy, M. S., Maloney, L. T., Johnston, E. B., and Young, M. (1995). Measure-
ment and modeling of depth cue combination: In defense of weak fusion. Vision
Research, 35, 389– 412.
Landy, M. S., Maloney, L. T., and Young, M. (1991). Psychophysical estimation
of the human depth combination rule. In P. S. Shenker (ed.), Proceedings of the
SPIE, 1383, Sensor fusion III: 3-D perception and recognition, 247–254.
Lee, M., and Bremner, G. (1987). The representation of depth in children’s draw-
ings of a table. Quarterly Journal of Experimental Psychology, 39A, 479–496.
392 References
Leeman, F., Elffers, J., and Schuyt, M. (1976). Hidden images. New York: Harry
N. Abrams.
Lenk, E. (1926). Über die optische Auffassung geometrisch-regelmässiger
Gestalten (On the optic prehension of geometric regular Gestalts). In F. Krueger
(ed.), Neue Psychologische Studien, vol. 1 (pp. 577–612). Munich, Germany:
Beck’sche Verlagsbuchhandlung.
Leonardo da Vinci (1989). Leonardo on Painting (trans. and ed. M. Kemp,
M. Walker, and M. Kemp). New Haven, Conn.: Yale University Press.
Leslie, A. M. (1987). Pretense and representation: The origins of “theory of mind.”
Psychological Review, 94, 412–426.
Lessing, G. E. (1964). Laokoon oder über die Grenzen der Malerei und Poesie
[Laokoon or on the limits of painting and poetry]. Originally published 1766.
Stuttgart, Germany: Recclam.
Levelt, W. J. M. (1984). Some perceptual limitation in talking about space. In
A. J. van Doorn, W. A. van de Grind, and J. J. Koenderink (eds.), Limits in per-
ception (pp. 323–358). Utrecht, Netherlands: VNU Science Press.
Levin, C. A., and Haber, R. N. (1993). Visual angle as a determinant of perceived
interobject distance. Perception and Psychophysics, 54, 250–259.
Lewis, D. (1969). Convention: A philosophical study. Oxford: Blackwell.
Lindberg, D. C. (1976). Theories of vision from Al-Kindi to Kepler. Chicago: Uni-
versity of Chicago Press.
Loomis, J. M., Da Silva, J. A., Philbeck, J. W., and Fukushima, S. S. (1996).
Visual perception of location and distance. Current Directions in Psychological
Science, 5, 72–77.
Loomis, J. M., and Philbeck, J. W. (1999). Is the anisotropy of perceived 3-D
shape invariant across scale? Perception and Psychophysics, 61, 397–402.
Lopes, D. M. M. (1992). Pictures, styles and purposes, British Journal of Aesthet-
ics, 32 (4), 330–341.
Lopes, D. M. M. (1996). Understanding pictures. Oxford: Clarendon Press.
Lopes, D. M. M. (1997). Art media and the sense modalities: Tactile pictures.
Philosophical Quarterly, 47, 425– 440.
Lübke, W., and Pernice, E. (1921). Die Kunst des Altertums (The art of antiquity).
Esslingen, Germany: Neff.
Luce, R. D., and Krumhansl, C. L. (1988). Measurement, scaling, and psy-
chophysics. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey, and R. D. Luce
(eds.), Stevens’ handbook of experimental psychology (pp. 3–74). New York: Wiley.
Lumsden, E. A. (1980). Problems of magnification and minification: An explana-
tion of the distortions of distance, slant, and velocity. In M. Hagen (ed.), The per-
ception of pictures, vol. 1 (pp. 91–135). New York: Academic Press.
Luneburg, R. K. (1947). Mathematical analysis of binocular vision. Princeton, N.J.:
Princeton University Press.
393 References
Luo, R. C., and Kay, M. G. (1992). Data fusion and sensor integration: State-of-
the-art 1990s. In M. A. Abidi and R. C. Gonzalez (eds.), Data fusion in robotics
and machine intelligence (pp. 7–135). San Diego, Calif.: Academic Press.
Lyons, J. (1977). Semantics, 2 vols. Cambridge: University Press.
Lythgoe, J. N. (1979). The ecology of vision. Oxford: Clarendon Press.
Mach, E. (1903). Analyse der Empfindungen (The analysis of sensations). Jena,
Germany: Fischer.
Mackavey, W. R. (1980). Exceptional cases of pictorial perspective. In M. A.
Hagen (ed.), The perception of pictures, vol. 1 (pp. 213–223). New York: Academic
Press.
Margerie, A. de (ed.) (1994). Gustave Caillebotte, 1948–1894. Paris: Réunion des
Musées Nationaux.
Marler, P. (1999). On innateness: Are sparrow songs “learned” or “innate”?
In M. D. Hauser and M. Konishi (eds.), The design of animal communication
(pp. 293–318). Cambridge, Mass.: MIT Press.
Marr, D. (1982). Vision: A computational investigation into the human representa-
tion and processing of visual information. San Francisco: W.H. Freeman.
Marr, D., and Poggio, T. (1976). Cooperative computation of human stereo
vision. Science, 194, 283–287.
Massaro, D. W., and Cohen, M. M. (1993). The paradigm and the fuzzy logical
model of perception are alive and well. Journal of Experimental Psychology: Gen-
eral, 122, 115–124.
Maunsell, J., Sclar, G., Nealey, T., and DePriest, D. (1991). Extraretinal repre-
sentations in area V4 in the macaque monkey. Vision Neuroscience, 7, 651–573.
Mauries, P. (1997). Le trompe l’oeil. Paris: Gallimard.
Mausfeld, R. (2002). The physicalistic trap in perception theory. In D. Heyer and
R. Mausfeld (eds.), Perception and the physical world (pp. 75–112). Chichester,
United Kingdom: Wiley.
Mausfeld, R. (2003). The dual coding of colour: “Surface colour” and “illumina-
tion colour” as constituents of the representational format of perceptual primi-
tives. In R. Mausfeld and D. Heyer (eds.), Colour perception: From light to object.
Oxford: Oxford University Press.
Mausfeld, R., and Andres, J. (2002). Second-order statistics of colour codes mod-
ulate transformations that effectuate varying degrees of scene invariance and illu-
mination invariance. Perception, 31, 209–224.
Maynard, P. (1991). Real imaginings, Philosophy and Phenomenological Research,
51 (2), 389–394.
Maynard, P. (1994). The contest of meaning (book review). Journal of Aesthetics
and Art Criticism, 49 (4), 390–392.
Maynard, P. (1996). Perspective’s places. Journal of Aesthetics and Art Criticism,
54 (1), 23– 40.
394 References
Salt, B. (1983). Film style and technology: History and analysis. London: Starword.
Sanders, A. F. (1970). Some aspects of the selective process in the functional
visual field. Ergonomics, 13, 101–117. (Cited in Ikeda and Takeuchi 1975).
Sartre, J.-P. (1940). L’imaginaire. Psychologie phénoménologique d’imagination
(The imaginary: Phenomenal psychology of imagination). Paris: Gallimard.
Scharf, A. (1968). Art and photography. London: Penguin.
Schier, F. (1986). Deeper into pictures. Cambridge: Cambridge University Press.
Scholz, O. R. (1991): Bild, Darstellung, Zeichen. Philosophische Theorien bildhafter
Darstellung (Image, representation, symbol: Philosophical theories of pictorial
representation). Freiburg and Munich: Alber.
Schöne, W. (1954). Über das Licht in der Malerei (On the light in painting). Berlin:
Gebr. Mann.
Schriever, W. (1925). Experimentelle Studien über stereoskopisches Sehen (Exper-
imental studies of stereoscopic vision). Zeitschrift für Psychologie, 96, 113–170.
Schuster, D. H. (1964). A new ambiguous figure. American Journal of Psychology,
77, 673.
Schwartz, B. J., and Sperling, G. (1983). Luminance controls the perceived 3-D
structure of dynamic 2-D displays. Bulletin of the Psychonomic Society, 21,
456–458.
Schwartz, R. (1994). Vision: Variations on some Berkelian themes. Cambridge, MA:
Blackwell.
Schwartz, R. (1996). Directed Perception. Philosophical Psychology, 9, 81–91.
Searle, J. R. (1993). Metaphor. In A. Ortony (ed.), Metaphor and thought, 2d ed.
(pp. 83–111). Cambridge: Cambridge University Press.
Sedgwick, H. A. (1973). The visible horizon: A potential source of visual infor-
mation for the perception of size and distance. Doctoral dissertation, Cornell
University. Dissertation Abstracts International, 34, 1301B–1302B. (University
Microfilms No. 73–22,530.)
Sedgwick, H. A. (1980). The geometry of spatial layout in pictorial representa-
tions. In M. A. Hagen (ed.), The Perception of Pictures, vol. 1 (pp. 33–90). New
York: Academic Press.
Sedgwick, H. A. (1983). Environment-centered representation of spatial layout:
Available visual information from texture and perspective. In J. Beck, B. Hope,
and A. Rosenfeld (eds.), Human and machine vision (pp. 425–458). New York:
Academic Press.
Sedgwick, H. A. (1986). Space perception. In K. R. Boff, L. Kaufman, and J. P.
Thomas (eds.), Handbook of perception and performance, vol. 1: Sensory process-
es and perception pp. 21.1–21.57. New York: Wiley.
Sedgwick, H. A. (1987a). Layout2: A production system modeling visual perspec-
tive information. In Proceedings of the IEEE First International Conference on
Computer Vision (pp. 662–666). Washington, DC: IEEE Computer Society Press.
399 References
Spencer, J. R. (1966). Leon Battista Alberti on Painting. New Haven, Conn.: Yale
University Press.
Spivak, M. (1975). Differential geometry, vol. 3. Berkeley, CA: Publish or perish,
Inc.
Steinberg, S. (1966). Le masque. Paris: Maeght Editeur.
Stemberger, G. (1990). 2000 Jahre Christentum–Illustrierte Kirchengeschichte in
Farbe (2000 years of Christianity—Illustrated ecclesiastic history in color). Erlan-
gen, Germany: Karl Müller.
Stern, W. (1930). Psychology of early childhood. London: George Allen and Unwin.
Stevens, K. A., and Brookes, A. (1988). Integrating stereopsis with monocular
interpretations of planar surfaces. Vision Research, 28, 371–386.
Stevens, K. A., Lees, M., and Brookes, A. (1991). Combining binocular and
monocular curvature features. Perception, 20, 425–440.
Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S.
Stevens (ed.), Handbook of Experimental Psychology (pp. 1–49). New York: Wiley.
Stillings, N., Feinstein, M., Garfield, J., Rissland, E., Rosenbaum, D., Weisler, S.,
and Baker-Ward, L. (1987). Cognitive Science: An Introduction. Cambridge, Mass.:
MIT Press.
Strieder, P. (ed.) (1996). Dürer. Augsburg, Germany: Bechtermünz.
Strubecker, K. (1956). Einführung in die höhere Mathematik mit besonderer
Berücksichtigung ihre Anwendungen auf Geometrie, Physik, Naturwissenschaften
und Technik (Introduction into higher mathematics with particular emphasis on
its application to geometry, physics, natural science and technology), vol. 1:
Grundlagen (Foundations). Munich: R. Oldenburg Verlag.
Swedlund, C. (1981). Photography: A handbook of history, materials, and pro-
cesses, 2d ed. New York: Holt, Rinehart, and Winston.
Talbot, W. H. F. (1844–1846/1969). The pencil of nature, unpaginated reprint.
New York: Da Capo.
Taylor, B. (1719). New principles of linear perspective: Or the art of designing on a
plane the representations of all sorts of objects, in a more general and simple method
than has been done before. London: Printed for R. Knaplock at the Bishop’s Head
in St. Paul’s Churchyard.
Teghtsoonian, R. (1973). Range effects in psychophysical scaling and a revision
of Stevens’ law. American Journal of Psychology, 86, 3–27.
Ten Doesschate, G. (1964). Perspective: Fundamentals, controversials, history.
Nieuwkoop, Netherlands: B. de Graaf.
Te Peerdt, E. (1899). Das Problem der Darstellung des Momentes der Zeit in
den Werken der malenden und zeichnenden Kunst (The problem of representing a
moment in time in paintings and drawings). Strasbourg, Germany: Heitz.
Terzopoulos, D. (1986). Integrating visual information from multiple sources. In
A. Pentland (ed.), From pixels to predicates (pp. 111–142). Norwood, N.J.: Ablex.
401 References
pictorial relief and, 251, 266–267, 280– children and, 324–325, 346
281 contour ambiguity and, 349–353
psychophysics and, 37–38 curved lines and, 336–337
reality and, 27–28 discontinuities and, 346–348
segregation, 89–92 element concentration and, 337–340
surfaces and, 40–41 foreshortening and, 340–345
triggering and, 30–36 induction and, 324–327
vagueness and, 51–56, 92 layout and, 231–236
Curvilinearity, 181–182 line junctions and, 337
Cutting, James E., xvii, 104, 117, 215–238, optical edge and, 349–353
405 outline and, 324–326
Dual perception
Danto, Arthur, 107, 109–110 attentional shift and, 81
da Vinci, Leonardo, 180, 225, 247 conjoint representations and, 25–56 (see
de Barde, Alexandre I. Leroy, 7 also Conjoint representations)
De Bry, Theodor, 369, 371, 373 content strategies and, 99–122
Della Pittura (Alberti), 127 convention and, 105–113
Denis, Maurice, 247 cross-talk and, 91
de Paolo, Giovanni, 132 cueing and, 89–92
Depth. See Dimension dissociation and, 96–97
Descartes, Rene, 65 ecological specificity and, 101
Dimension, 211n11, 319 equivalence and, 88–89
absolute, 308–311 functional aspects of, 92–94
art and, 148–149 imagination and, 78
borders and, 353–354 (see also Borders) inverse projection and, 86–88
conjoint representation and, 25–56 (see language and, 98
also Conjoint representations) location of, 79–86
continuous transition and, 26 meaning of, 80–82
convention and, 147–148, 157 mirrors and, 85
cross-talk and, 71–74 perceived occlusion and, 92
cue integration and, 90 proximal mode and, 84, 93
environmental space and, 67–71, 218–231 reinforcement and, 117
eye and, 79–80 related forms of, 83–86
foreshortening and, 322, 340–345 seeing-in and, 3–15
horizon ratio and, 303–318 simultaneity and, 82
illusion and, 24–25 spatial perception and, 61–75
monocular depth enhancement and, 90 vagueness and, 92
occlusion and, 22–23 window metaphor and, 94–97
parameter setting and, 35 Dürer, Albrecht, 203, 370–371
perspective fallacies and, 198
pictorial relief and, 262–266 (see also Eby, David, 72
Pictorial relief ) Ecological approach, 101, 303
projection and, 207–208 Egyptian method, 375–376
proximal mode and, 56 Eidos, 204, 208
reality and, 27–28 Eisenstein, Sergei, 47
relative, 307–308 Elementary Laws of the Visual Arts
representational primitives and, 36–38 (Cornelius), 358
scale and, 221–222, 236 Emilie Ambre (Manet), 3–4, 13–14
Direct perception, 61–64 Environmental space, 218
Direct realism, 99 borders, 324–326 (see also Borders)
Dissociation, 96–97 information in, 221–225
Distal mode, 85–86, 93 layout perception and, 231–236
Domander, Ramona, 338 lens effects and, 225–229
Drawing, 133–134, 149–150 scale convergence and, 221–222
borders and, 324–336, 346, 348–349, shape of, 219–225
353–354 Environment-centered perception, 67–71
410 Index
Transmission of the Divine Spirit, The borders and, 328–336 (see also Borders)
(Pozzo), 82 computer experiments and, 71–72
Transparency, 40 conjoint representations and, 25–56 (see
Triggering, 30–36, 51 also Conjoint representations)
Trompe-l’oeil pictures, 5, 58, 82, 96, 172 convention and, 105–113
Turhan, Mümtaz, 23 cross-talk and, 71–74, 91, 94–96
Twofoldness, 3 directed, 104
seeing-in and, 4–15 ecological specificity and, 101
tallying effect and, 8 environment-centered perception and,
Two Venetian Women (Dürer), 370 67–71
figure-ground ambiguity and, 41
Ultramodernism, 195 illusion and, 24
imagination and, 78
Vagueness, 51–53 induction and, 324–328
perceived, 92 mirrors and, 85
perspective and, 193–194 monocular, 24, 328
proximal mode and, 54–56 motion capture and, 100–101
van Doorn, Andrea J., xvii–xviii, 66, occlusion and, 22, 37–38, 92, 202, 330
239–299, 405 optic array and, 65
Van Essen, David, 113, 115 parameter setting and, 30–36
Vanishing point. See Perspective perspective theory and, 184–186 (see also
van Sichem, Christof, 374 Perspective)
Veridicality, 26, 57, 248–251 psychophysics and, 37–41
Views, 64, 70, 377–378 redeployment and, 115–116
alternative approaches and, 207–210 representational content and, 3–13
beholder’s share and, 277–278 retinal patterns and, xi
close distance constructions and, 357–360 spatial layout and, 61–75
Egyptian, 375–376 stereoscopic, 63–64, 66
gauge figures and, 252–256 vagueness and, 51–56, 92
Hauck and, 360–363 visibility limits and, 9–13
horizon ratio and, 303–318 Vista space, 224–225
ideography and, 366–367 Visual fields, 355–357
modes of, 264–266 Visual rays, 281
multiple moments and, 368–375 Visual semiworlds, 99–100
multiple pictures and, 287–291 von Brentano, Franz, 243
optical array and, 307–314 von Helmholtz, Hermann, xi, 39, 221, 356
perspective fallacies and, 203–207 (see also von Uexküll, Jakob, 31
Perspective)
pictorial relief and, 264–266 (see also Wallach, Hans, 34–35
Pictorial relief ) Walton, Kendall, 301, 319
POV shots and, 369–370 Weeping Woman (Picasso), 48
resemblance, 150–165, 167–177 Wickhoff, Franz, 374–375
reverse angle shots and, 369–370 Willats, John, xvi, 125–143, 218, 406
rotation and, 363–364 Wimmer, Heinz, 335
scenography and, 364–366 Window metaphor, 94. See also Parameters
seeing modes and, 247–248 cross-talk and, 71–74, 91, 94–96
theory of pictures and, 146–149 dissociation and, 96–97
Thiery and, 367–368 spatial dissociation and, 96–97
veridicality and, 248–251 With Green Stockings (Klee), 138–139, 141
visual fields and, 355–357 Wollheim, Richard, xiv, 3–15, 147, 406
zoography and, 364–366
Vir Heroicus Sublimis ( Newman), 8 Yellott, John, 23
Virtual reality, 63
Vishton, Peter, 221 Zöllner illusions, 367
Vision. See also Dual perception; Eyes Zoography, 364–366
binocular, 329