Narrative Spaces: bridging architecture and entertainment via interactive technology
MIT Media Lab
Our society’s modalities of communication are rapidly changing. Large panel displays and screens are being installed in many public spaces, ranging from open plazas, to shopping malls, to private houses, to theater stages, classrooms, and museums. In parallel, wearable computers are transforming our technological landscape by reshaping the heavy, bulky desktop computer into a lightweight, portable device that is accessible to people at any time.
Computation and sensing are moving from computers and devices into the environment itself. The space around us is instrumented with sensors and displays, and it tends to reflect a diffused need to combine together the information space with our physical space. This combination of large public and miniature personal digital displays together with distributed computing and sensing intelligence offers unprecedented opportunities to merge the virtual and the real, the information landscape of the Internet with the urban landscape of the city, to transform digital animated media in storytellers, in public installations and through personal wearable technology.
This paper describes technological platforms built at the MIT Media Lab, through 1994-2002, that contribute to defining new trends in architecture that merge virtual and real spaces, and are reshaping the way we live and experience the museum, the house, the theater, and the modern city.
1. A new architecture for the information society
Architecture is no longer simply the play of masses in light. It now embraces the play of digital information in space.
William J. Mitchell, Dean of MIT’s School of Architecture and Planning, in: e-topia, pg. 41
Our daily lives are characterized by our constant access-to and processing-of a large quantity and variety of information. In the last decade, the rapid diffusion of the information superhighways, the amazing progress in performance and processing power of today’s computers, paralleled by a drop of their cost, has determined a profound transformation of western world societies. In addition to being transmitted by the traditional media, such as television, radio, the newspaper, the book, the telephone, the mail, information is conveyed to us in electronic form by the home or office computer, public billboards, private hand-held PDAs, cellular phones, and soon even by our wrist-worn watch and clothes. The potential offered by the rapid and efficient exchange of data, globally, between individuals and organizations, delineates new social, economic, and cultural models based on the exchange of knowledge. The information society is defined by the primary role of information, such that power and growth are associated to our ability to receive, store, process, and transmit information instantaneously.
Screens are everywhere, from the large billboards commonly embedded in the contemporary urban city-scape , to the video walls which welcome us in the entry-hall of corporate headquarter buildings, the desktop computer monitor in our home, the PDA in our pocket, or the tiny private-eye screens of wearable computers  [figures 1 and 2].
We split our daily activities between the real and the digital realm. More and more frequently we go shopping virtually on the internet, go to the library on the internet, meet and chat with people over the internet, manage our finances, play, plan our entertainment, and even date through the information superhighways. These profound transformations of our life-style demand a new architecture that supports these new modalities of communication and living.
Space needs to be redesigned to favor information exchange through video walls, across portable devices, and private-eye displays. Computation and sensing will move from computers and devices into the environment itself. The space around will increasingly be instrumented with sensors and displays, to reflect the diffused need to combine together the information space with our physical space. “Augmented reality” and “mixed reality” are the terms most often used to refer to this type of media -enhanced interactive spaces.
Several scenarios arise from the encounter and blending of media design and architectural disciplines . Yet architecture’s agenda needs to encompass not just the design of new media- and information-enhanced spaces, but should also extend itself to investigate natural modalities of human-computer interaction which facilitate communication through cyberspace . Indeed still today, the interfaces available for people to communicate electronically are limited to the primitive and low -bandwidth keyboard and mouse attached to desktop or mobile computers.
This paper describes technological platforms built at the MIT Media Lab, through 1994-2002, that contribute to defining new trends in architecture that merge virtual and real spaces, and are reshaping the way we live and experience specifically the museum, the theater, the house, and the modern city. These platforms are grounded in research in real-time computer vision based human-computer interaction, as well as sensor fusion, and mathematical modeling of perceptual intelligence. Our focus is in narrative spaces, that is sensor-enabled, pe opledriven, media -augmented interactive indoors or outdoors spaces that convey information as (audio-)visual micro-stories or more simply as three dimensional information landscape visualizations.
This paper also wishes to highlight that, in the author’s view, it is not a coincidence that the contribution to the new architecture for the information society illustrated in this paper comes from within a research group with a strong technical and scientific background, combined with creativity, a sense for architecture and space design, knowledge of filmmaking, and attention towards social change and needs. Specifically, the author’s knowledge of statistical modeling, image processing, Bayesian networks, has enabled her and her collaborators to build a whole series of platforms and experiment with various scenarios of interactive spaces, to iterate the design process several times as needed, to progressively adapt the science and technology of interaction to the emerging design issues, so as to bring both the architectural and the technical aspects of the presented projects to the desired level of quality and performance.
2. Enabling Technologies
The problem, in my opinion, is that our current computers are both deaf and blind: they experience the world only by way of a keyboard and a mouse. Even multimedia machines, those that handle audiovisual signals, as well as text, simply transport strings of data. They do not understand the meaning behind the characters, sounds, and pictures they convey. I believe computers must be able to see and hear what we do before they can prove truly helpful. … To that end, my group at the Media Laboratory at the Massachusetts Institute of Technology has recently developed a family of computer systems for recognizing faces, expressions, and gestures. The technology has enabled us to build smart rooms … furnished with cameras and microphones that relay their recordings to a nearby network of computers.
The computers assess what people in the smart rooms are saying and doing. Thanks to this connection, visitors can use their actions, voices and expressions – instead of keyboards, sensors or goggles – to control computer programs, browse multimedia information or venture into realms of virtual reality.
Alex Pentland, head of the Perceptual Computing Group at the MIT Media Lab, in: Scientific American, April 1996, pg 68 and 71.
The architect who wishes to reshape our surrounding space and body, and transform them into technology-augmented devices for information exchange and communication needs: sensors that are reliable and robust, and (mathematical) modeling tools which allow the system to understand the public’s intentions and coordinate a narration. Information authoring tools need to be able to take input from people, and deliver a (personalized) story articulated not only over time but also over space. Just as humans use vision as their main sensing modality to perceive and understand their surroundings, the narrative spaces here presented use predominantly real time computer vision to locate people in space and understand what they are doing. This section offers a brief overview of the technologies and requirements the author believes are important to enable people-driven narrative space design. These are the technologies utilized to build the narrative spaces presented in the following section.
Applications such as unencumbered virtual reality interfaces, performance spaces, and information browsers, all have in common the need to track and interpret human action. The first step in this process is identifying and tracking key features in a robust, real time , and non intrusive way. Computer vision is a tool capable of solving this problem across many situations and application domains. By use of real-time computer vision techniques  we are able to interpret the people’s posture, movement, gestures , and identity. Wren and others  have shown that a system which has an image based two dimensional description of the human body as a set of adjacent color regions (head/torso/hands/feet), a MAP estimator for color pixel classification, and a Kalman filter applied to the features to track, is a very powerful tool to mathematically and computationally describe the human body in motion in real time. Similar maximum likelihood statistical approaches are also effective in stereo vision tracking to locate body features more accurately in 3-D space, when pointing direction, and accurate depth information are needed. Hidden Markov Models and more recently Bayesian networks, have been successfully used to classify human movements and gestures .
2.2. Robustness of multimodal perception
Robust sensing is the premise for the correct interpretation of the user’s intention. Monosensor applications which rely on one unique sensor modality to acquire information about people are brittle and prone to error. For how well that one sensor works individually, whether that be a camera, or a radar, or an electric field sensor, it only provides the system with a single view of what is going on. In order for a body driven interactive application to offer a reliable and robust response to a large number of people on a daily basis, as needed in a museum, or to meet the challenges of the variable and unpredictable factors of a real life situation, we need to rely on a variety of sensors which cooperate to gather correct and reliable measurements on and about the user. Cooperation of sensor modalities which have various degrees of redundancy and complementarity can guarantee robust, accurate perception . We can use the redundancy of the sensors to register the data they provide with one another. We then use the complementarity of the sensors to resolve ambiguity or reduce errors when an environmental perturbation affects the system.
2.3. Context-based data interpretation
To make good use of reliable measurements about the user, we need to be able to interpret our measurements in the context of what the user is trying to do with the digital media, or what we actually want people to do to get the most out of the experiences we wish to offer.
The same or similar gesture of the public can have different meanings according to the context and history of interaction. For example the same pointing gesture of the hand can be interpreted either as pushing a virtual character, or more simply, as a selection gesture. In a similar way, the system needs to develop expectations on the likelihood of the user’s responses based on the specific content shown. These expectations influence in turn the interpretation of sensory data. Following on the previous example, rather than teaching both the user and the system to perform or recognize two slightly different gestures, one for pushing and one for selecting, we can simply teach the system how to correctly interpret slightly similar gestures, based on the context and history of interaction, by developing expectations on the probability of the follow-on gesture. In summary, our systems need to have a user model which characterizes the behavior and the likelihood of responses of the public. This model also need to be flexible and should be adaptively revised by learning the user’s interaction profile .
2.4. Narrative engines
It is difficult to produce compelling applications simply by direct mapping of sensor measurement inputs with digital media output. While this strategy may work for very simple interactive environments, it is not effective to orchestrate digital information effectively. Many current interactive systems are defined by a series of couplings between user input and system responses. The problem with these systems is that they are often repetitive: the same action of the participant always produces the same response by the system. Alternatively, most existing CDROM titles are scripted: they sequence micro-stories in multi-path narrative threads. While the content presentation in these applications tends to be more engaging, they often impose a rigid interaction modality and become boring after a while. The participant’s role is confined to clicking and choosing the sequencing of the narrative thread without real engagement or participation in the narrative. In order to create compelling narrative spaces we need to be able to simulate encounters between the public and the digital media acting as a character. To accomplish this goal we need to be able to model the story we wish to narrate in such a way that it takes into account and encompasses the user’s intentions and the context of interaction . Consequently the story should develop on the basis of the system’s constant evaluation of how the user’s actions match the system’s expectations about those actions, and the system’s goals.
These guidelines have driven the research described below on smart rooms [figure 3], smart desks [figure 4], and wearable computing, with special emphasis on interactive information presentation, digital storytelling, and cultural communication.
4. Narrative Spaces
We will characterize cities of the twenty -first century as systems of interlinked, interacting, silicon- and software-saturated smart, attentive, and responsive places. We will encounter them at the scale of clothing, rooms, buildings, campuses, and neighborhoods, metropolitan regions, and global infrastructures.
William J. Mitchell, Dean of MIT’s School of Architecture and Planning, in: e-topia, pg. 68
People experience their lives as a narrative. Amongst cognitive psychologists, Jerome Bruner stressed the importance of story, as the means which structures our perception and communication . He reminded us that thinking cannot be reduced to mere information processing and sorting into categories and that narrative is our main instrument of making meaning, the embodiment of culture, communication, and education. The history of architecture offers innumerable examples of places which embed and narrate a story through their spatial layout and décor. By looking at the sequence of floor plans of historical buildings through the centuries, from the Greek temple, to the Roman church, the medieval dome through today, we understand how a rectangle, a circle, a cross, or other more complex figures, transmit a message through the centuries. This message is a story about how people through times relate to life, nature, and spirituality.
The new information society architecture can more explicitly embed stories and information in its structure, thanks to the tools offered to us by the digital revolution. Following are a few scenarios relative to changing places: the museum, the city, the house, the theater, that draw from the examples presented in the previous section.
4.1. The museum
Museums have recently developed a strong interest in technology, as they are more than ever before in the orbit of leisure industries. They are faced with the challenge of designing appealing exhibitions, handling large volumes of visitors, and conserving precious artwork. They look at technology as a possible partner which can help archive a balance between leisure and learning as well as help them be more effective in conveying story and meaning. Technology can help construct a coherent narrative of an exhibit for the visitor by creating experiences in which the objects on display narrate their own story in context . Using interactive techniques embedded in the physical space museums can present a larger variety and more connected material in an engaging manner within the limited space available. They can also enrich and personalize the visit with wearable computers which act as a visual and auditory storyteller that guide the public through the path of the exhibit. The presentation tables will be used a playful interface for the public to access and explore the body of facts, content, and stories of the exhibit. The MetaSpace can be installed as an introductory immersive cinema space. Responsive portraits show artwork that reacts to how people approach it and is capable of explaining its origin and making. All these systems enhance the memory of the visit and help build a constructivist-style learning experience for the public.
5. Conclusions: new challenges for today’s architects
Traditional urban patterns cannot coexist with cyberspace … This will redefine the intellectual and professional agenda of architects, urban designers, and others who care about the spaces and places in which we spend our daily lives … This new agenda separates itself naturally into several distinct levels … We must put in the necessary digital telecommunications infrastructure, create innovative smart places from electronic hardware as well as traditional architectural elements, and develop the software that activates those places and makes them useful… To pursue this agenda effectively, we must extend the definitions of architecture and urban design to encompass virtual places as well as physical ones, software as well as hardware, and interconnection by means of telecommunications links as well as by physical adjacencies and transportation systems.
William J. Mitchell, Dean of MIT’s School of Architecture and Planning, in: e-topia, pg. 8 This paper described a series of sensor-enabled, media-augmented, people-driven narrative spaces and highlighted the role of the technologies that are key in their conception and making. Scenarios are given for how the variety of such spaces and related hardware and software platforms can be used in, and influence the way we experience more traditional “narrative spaces” such as the museum, the theater, the house , and the city.
Another contribution of the paper is also to underline that technology is not simply hardware of software that the space designer and the media artist add to their projects to make them work. It is really not sufficient to wait for technologists to develop new modalities of interaction and man-machine communication in their laboratories, to later incorporate these in space design, as software that one buys at the store. The architecture of the information society is truly driven and informed by technology, which in turn shapes the architectural thinking and project development. Unless today’s architects are able to shape the tools they need to produce new space designs, then their creations and aesthetics will always be limited by their technological competence. On the other end, technologists with a robust knowledge of people tracking and statistical modeling, blended with creativity and a sense for experience design and space design, seem to be in a better position than traditional architects to contribute to new trends in architecture.
To build narrative spaces, rather than stressing the importance of collaboration among people with different backgrounds and fields of competence, the author, based on her own education and experience, wishes to show that today’s architect can also be in equal measure a scientist, an engineer, and a visual artist and communicator. The role and required competencies of the contemporary architect tend to create a new professional figure characterized by a mastery of disciplines today considered belonging to separate practices and teachings. Yet this new professional figure has old and profound historical roots. The European renaissance has given birth to two typologies of intellectuals: the scientist type, incarnated by Galileo, who first established rules for scientific experimentation and scientific method and the artist-engineer, incarnated by Leonardo, involved in a creative research equally informed by art and science.
In modern times Moholy-Nagy, and more closely to MIT, Gyorgy Kepes represent models of the contemporary artist-engineer. While the Galileo-scientist type has been predominant in western culture since after the renaissance, the emergence of digital media favors the reappearance of the artist engineer, equally versatile in artistic creation and engineering abilities.
The nature of the projects here presented stresses the importance of statistical mathematical modeling techniques and corresponding technologies, such as pattern recognition and machine learning, for the field of interactive space design. The author believes that the main concepts and tools of these disciplines should become part of the current language of the today’s and tomorrow’s architects because they are the basis for any reliable sensor interpretation and intelligence simulation by machines, both indispensable in the new architecture of the information society.
 Starner, T., Mann, S., Rhodes, B., Levin e J., Healey, J., Kirsch, D., Picard, R., and Pentland A., “Augmented Reality through Wearable Computing.” Presence, Vol. 6, No. 4, pp. 386-398, August 1997.
 Rijken D., “Information in Space: Explorations in Media and Architecture”. Interactions, MayJune 1999.
 Wren C., Sparacino F., et al, “Perceptive Spaces for Performance and Entertainment: Untethered Interaction using Computer Vision and Audition”, Applied Artificial Intelligence (AAI) Journal, June 1996.
 Wren, C., Basu, S., Sparacino, F., Pentland A. “Combining Audio and Video in Perceptive Spaces”. Proceedings of: Managing Interactions in Smart Environments (MANSE 99), Trinity College Dublin, Ireland, December 13-14 1999.
 Wren C., Azarbayeani A., Darrell T., Pentland A., “Pfinder: Real-time tracking of the human body”. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7):780-785, July 1997.
 Darrell T., Moghaddam B., Pentland A., “Active face tracking and pose estimation in an interactive room.” In :Proceedings of: CVPR 96. IEEE Computer Society, 1996.
 Azarbayejani A., Pentland, A. “Real-time self-calibrating stereo person tracking using 3-D shape estimation from blob features ”. In: Proceedings of 13th ICPR, 1996.
 Starner, T., and Pentland, A. “Visual Recognition of American Sign Language Using Hidden Markov Models.” International Workshop on Automatic Face and Gesture Recognition (IWAFGR). Zurich, Switzerland, 1995.
 Pavlovic, V.I. Dynamic Bayesian Networks for Information Fusion with Applications to HumanComputer Interfaces. PhD Thesis. University of Illinois at Urbana-Champain, 1999.
 Hall D.L. and Llinas J. “An Introduction to Multisensor Data Fusion”. In: Proceedings of the IEEE, special issue on Data Fusion, January 1997.
 Sparacino F., Davenport G., Pentland A., “Media Actors: Characters in Search of an Author.” IEEE Multimedia Systems ’99, International Conference on Multimedia Computing and Systems (IEEE ICMCS’99), Centro Affari, Firenze, Italy 7-11 June 1999.
 Sparacino, F. “(Some) computer vision based interfaces for interactive art and entertainment installations”. In: INTER_FACE Body Boundaries, (issue editor Emanuele Quinz), Anomalie, n.2, Paris, France, anomos, 2001.
 Sparacino F., Wren C., Pentland A., Davenport G., “Hyperplex: a world of 3d interactive digital movies”. In: IJCAI-95 Workshop on Entertainment and AI/Alife, 1995.
 Sparacino F., Larson K., MacNeil R., Davenport G., Pentland A. “Technologies and methods for interactive exhibit design: from wireless object and body trac king to wearable computers.” In: Proceedings of the International Conference on Hypertext and Interactive Museums, ICHIM 99, Washington, DC, Sept. 22-26, 1999.
 Sparacino, F, Wren, C., Azarbayejani, A., Pentland A. “Browsing 3-D spaces with 3-D vision: body-driven navigation through the Internet city ”. Proceedings of 3DPVT: 1st International Symposium on 3D Data Processing Visualization and Transmission, Padova, Italy, June 19-21 2002.
 Sparacino, F., Wren, C., Davenport, G., Pentland A. “Digital Circus: a computer-vision based interactive Virtual Studio ”. In: Proceedings of IMAGINA , Monte Carlo, Monaco, 18-20th January 1999.
 Sparacino, F., Oliver, N., Pentland A. “Responsive Portraits”. In: Proceedings of The Eighth International Symposium on Electronic Art (ISEA 97), Chicago, IL, USA, September 22-27, 1997.
 Sparacino, F. “The Museum Wearable: real-time sensor-driven understanding of visitors’ interests for personalized visually-augmented museum experiences ”. In: Proceedings of Museums and the Web (MW2002), April 17-20, Boston, 2002.
 Sparacino F., Davenport G., Pentland A., “Wearable Cinema/Wearable City: bridging physical and virtual spaces through wearable computing.” IMAGINA 2000, Montecarlo, January 31st-Feb 3, 2000.
 Sparacino, F., Wren, C., Davenport, G., Pentland A. “Augmented Performance in Dance and Theater”. In: Proceedings of: International Dance and Technology 1999 (IDAT99), at Arizona State University, Feb. 25-28, 1999.
 Bruner J. Acts of Meaning, Cambridge, Ma.: Harvard University Press, 1990.
 Sparacino, F., Corsino N., Corsino N. “City-Dance: project proposal for a body driven outdoors installation in Shanghai”. Unpublished.
 Lozano-Hemmer, R. “Relational Architecture”. Performance Research, Routledge, London 1999.
 Sparacino, F., Pentland, A., Davenport, G. “Wearable Performance”. In: Proceedings of International Symposium on Wearable Computers, Cambridge, Massachusetts, USA, October 13-14, 1997.