Shimon Edelman
School of Cognitive and Computing Sciences
University of Sussex at Brighton
Falmer BN1 9QH, UK
shimone@cogs.susx.ac.uk
www.cogs.susx.ac.uk/users/shimone
Elise Breen
Department of Neurology
Medical College of Wisconsin
Milwaukee, Wisconsin 53226, USA
December 1998
Abstract. Representational systems need to employ symbols as internal stand-ins for distal quantities and events. Barsalou's ideas go a long way towards making the symbol system theory of representation more appealing, by delegating one critical part of the representational burden -- dealing with the constituents of compound structures -- to image-like entities. The target paper, however, leaves the other critical component of any symbol system theory -- the compositional ability to bind the constituents together -- underspecified. We point out that the binding problem can be alleviated if a perceptual symbol system is made to rely on image-like entities not only for grounding the constituent symbols, but also for composing these into structures.
Supposing the symbol system postulated by Barsalou is perceptual through and through -- what then? The target article outlines an intriguing and exciting theory of cognition in which (1) well-specified, event- or object-linked percepts assume the role traditionally allotted to abstract and arbitrary symbols, and (2) perceptual simulation is substituted for processes traditionally believed to require symbol manipulation, such as deductive reasoning. We take a more extreme stance on the role of perception (in particular, vision) in shaping cognition, and propose, in addition to Barsalou's postulates, that (3) spatial frames, endowed with a perceptual structure not unlike that of the retinotopic space, pervade all sensory modalities and are used to support compositionality.
In the target article too, the concept of a frame is invoked as a main explanatory tool in the discussion of compositionality. The reader is even encouraged to think of a frame as a structure with slots where pointers to things and events can be inserted. This, however, turns out to be merely a convenient way to visualize an entity borrowed from Artificial Intelligence: a formal expression in several variables, each of which needs to be bound to things or events. An analogy between this use of frames and the second labor of Heracles suggests itself: opting for perceptual symbols without offering a perceptual solution to the binding problem is like chopping off the Hydra's heads without staunching the stumps.
The good news is that there is a perceptually grounded alternative to abstract frames: spatial (e.g., retinotopic) frames. The origins of this idea can be traced to a number of sources. In vision, it is reminiscent of O'Regan's call to consider the visual world (which necessarily possesses an apparently two-dimensional spatial structure) as a kind of external memory [O'Regan, 1992]. In language, a model of sentence processing based on spatial data structures (two-dimensional activation maps) has been proposed a few years ago [Miikkulainen, 1993]. In a review of the latter work, one of us pointed out that the recourse to a spatial substrate in the processing of temporal structure may lead to a welcome unification of theories of visual and linguistic representation [Edelman, 1994].
From the computational standpoint, such unification could be based on two related principles. The first of these is grounding the symbols [Harnad, 1990] in the external reality; this can be done by imparting to the symbols some structure that would both help to disambiguate their referents and help manipulate the symbols to simulate the manipulation of the referent objects. This principle is already incorporated into Barsalou's theory (cf. his figure 6). The second principle, which is seen to be a generalization of the first one, is grounding the structures built of symbols.
In the case of vision, structures (that is, scene descriptors) can be naturally grounded in their distal counterparts (scenes), simply by the representing the scene internally in a spatial data structure (as envisaged by O'Regan). This can be done by ``spreading'' the perceptual symbols throughout the visual field, so that in the representation (as in the world it reflects) each thing is placed literally where it belongs. To keep down the hardware costs, the system may use channel coding [Snippe and Koenderink, 1992] (i.e., represent the event ``object A at location L'' by a superposition of a few events of the form ``object A at location Li'').
In the case of language, structures do not seem to have anything like a natural grounding in any kind of spatial structure (not counting the parse trees that linguists of a certain persuasion like to draw on two-dimensional surfaces). We conjecture, however, that such a grounding is conceivable, and can be used both for representing and manipulating semantic spaces, and for holding syntactic structures in memory (which needs then to be merely a replica, or perhaps a shared part, of the visual scene memory). To support this conjecture, one may look for a ``grammar'' of spatial relations that would mirror all the requisite theoretical constructs invented by linguists for their purposes (the ``syntactic'' approach to vision, popular for a brief time in the 1980's, may have remained barren because it aimed to explain vision in terms of language, and not vice versa). Alternatively, it may be preferable to aim at demonstrating performance based on our idea (e.g., by implementing a version of Barsalou's system in which spatial buffers would play the role of the frames), rather than to argue futilely about theories of competence.
In neurobiology, perhaps the best piece of evidence for a perceptual symbol system of the sort proposed by Barsalou is provided by the phantom limb phenomena [Ramachandran and Hirstein, 1998]. These can set in very rapidly [Borsook et al., 1998], are known to occur in congenitally limb deficient or early childhood amputees [Melzack et al., 1997], and may even be induced in normal subjects [Ramachandran and Hirstein, 1998]. In a beautiful experiment, Ramachandran and Rogers-Ramachandran (1996) superimposed a mirror image of the intact arms of amputees onto the space occupied by a phantom arm, and found that movements of the mirrored intact hand produced corresponding kinesthetic sensations in the phantom hand, even in a subject who had not experienced feelings of movement in his phantom hand for some ten years prior to testing. Likewise, touching of the intact mirrored hand produced corresponding, well-localized touch sensations in the phantom hand. These findings support the idea of the world -- somatosensory, as well as visual -- serving as an external memory [O'Regan, 1992], and suggest a stronger relationship between visuospatial and tactile/proprioceptive representations of ``body space'' for normal subjects. Interestingly, these representations may be linked, in turn, to the mental lexicon: electromyogram (EMG) responses to words with semantic content relating to pain were found to be significantly different in the stumps of amputees with chronic phantom limb pain, compared to the EMG in the intact contralateral limb in the same subjects [Larbig et al., 1996]. Finally, the phantom phenomenon is not limited to the percept of ``body space'' but may also be demonstrated in other modalities, notably, in the auditory system [Muhlnickel et al., 1998]. All this suggests that perceptual symbols, along with a spatial frame provided by the experience of the external world, may (1) solve the symbol grounding problem and (2) circumvent the binding problem -- two apparently not immortal heads of the Hydra that besets symbolic theories of knowledge representation.
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 barsalou-commentary.tex.
The translation was initiated by Shimon Edelman on 1998-12-03