This file contains abstracts and online locations of some of my papers. It is currently soemwhat out of date and not likely to be as current as the .
ftp.cs.rochester.edu
in directory
pub/papers/ai
, unless otherwise noted.
In order to make spoken dialogue systems more sophisticated, designers need to better understand the conventions that people use in structuring their speech and in interacting with their fellow conversants. In particular, it is crucial to discriminate the basic building blocks of dialogue and how they affect the way people process language. Many researchers have proposed the {\em utterance unit} as the primary object of study, but defining exactly what this is has remained a difficult issue. To shed light on this question, we consider grounding behavior in dialogue, and examine co-occurrences between turn-initial grounding acts and utterance unit signals that have been proposed in the literal, namely prosodic boundary tones and pauses. Preliminary results indicate high correlation between grounding and boundary tones, with a secondary correlation for longer pauses. We also consider some of the dialogue processing issues which are impacted by a definition of utterance unit.
We use the idea that actions performed in a conversation become part of the common ground as the basis for a model of context that reconciles in a general and systematic fashion the differences between the theories of discourse context used for reference resolution, intention recognition, and dialogue management. We start from the treatment of anaphoric accessibility developed in DRT, and we show first how to obtain a discourse model that, while preserving DRT's basic ideas about referential accessibility, includes information about the occurrence of speech acts and their relations. Next, we show how the different kinds of `structure' that play a role in conversation---discourse segmentation, turn-taking, and grounding---can be formulated in terms of information about speech acts, and use this same information as the basis for a model of the interpretation of fragmentary input.
Disparities between modules were bridged by careful design of the
interfaces, based on regular in-depth discussion of issues encountered by
the participants. Because of the goal of generality and principled
representation, the multiple representations ended up with a good deal
in common (for instance, the use of explicit event variables and the
ability to refer to complex abstract objects such as plans); and future
unifications seem quite possible. We explain some of the goals and
particulars of the KRs used, evaluate the extent to which they served their
purposes, and point out some of the tensions between representations that
needed to be resolved. On the whole, we found that using very expressive
representations minimized the tensions, since it is easier to extract what
one needs from an elaborate representation retaining all semantic nuances,
than to make up for lost information.
We present a model of context that tries to reconcile in a general and
systematic fashion the differences between the discourse models used for
reference resolution, conversation act recognition, and dialogue management in
a system dealing with conversations. Starting from the technical solutions
adopted in DRT, we show first of all how to obtain a discourse model that
while preserving DRT's basic ideas about referential accessibility, includes
information about the occurrence of speech acts and their relations. We show
then how the information about speech acts can be used to formalize the basic
ideas of Grosz and Sidner's model of discourse structure. Finally, we extend
this model to incorporate an account of the grounding process.
We extend speech act theory to account for levels of action both above
and below the sentence level, including the level of grounding
acts described above. Traditional illocutionary acts are now seen to
be multi-agent acts which must be grounded to have their usual
effects.
A conversational agent model is provided, showing how grounding fits
in naturally with the other functions that an agent must perform in
engaging in conversation. These ideas are implemented within the
TRAINS conversation system.
Also presented is a situation-theoretic model of plan execution
relations, giving definitions of what it means for an action to begin,
continue, complete, or repair the execution of a plan. This framework
is then used to provide precise definitions of the grounding acts in
terms of agents executing a general communication plan in which
one agent must present the content and another acknowledge it.
Keywords: TRAINS; spoken language corpus; task-oriented dialogue; conversation.
Keywords: speech acts; conversation; literal meaning; discourse; grounding;
turn taking.
The resulting notion of Conversation Acts is more general than
speech act theory, encompassing not only the traditional speech acts
but turn-taking, grounding, and higher-level argumentation acts as well.
Furthermore, the traditional speech acts in this scheme become fully joint
actions, whose successful performance requires full listener participation.
This paper presents a detailed analysis of spoken language dialogue.
It shows the role of each class of conversation acts
in discourse structure, and discusses how members of each class
can be recognized in conversation. Conversation acts, it will be seen,
better account for the success of conversation than speech act theory alone.
96.tn4.Knowledge_representation_in_the_TRAINS-93_conversation_system.ps.gz
We describe the goals, architecture, and functioning of the TRAINS-93
system, with emphasis on the representational issues involved in
putting together a complex language processing and reasoning agent.
The system is intended as an experimental prototype of an intelligent,
conversationally proficient planning advisor in a dynamic domain of
cargo trains and factories. For this team effort, our strategy at the
outset was to let the designers of the various language processing,
discourse processing, plan reasoning, execution and monitoring modules
choose whatever representations seemed best suited for their tasks, but with
the constraint that all should strive for principled, general approaches.
Mutimedia examples from the paper: for example d93-13.2: Wave form, pitch contour and annotations sound.
For example d93-16.2: Wave form, pitch contour and annotations sound.
Defining an utterance unit in spoken dialogue has remained a
difficult issue. To shed light on this question, we consider
grounding behavior in dialogue, and examine co-occurrences between
turn-initial grounding acts and utterance unit signals that have
been proposed in the literal, namely prosodic boundary tones and
pauses. Preliminary results indicate high correlation between
grounding and boundary tones, with a secondary correlation for
longer pauses.
We explore grounding and the sub-phenomena of miscommunication and
repair from both theoretical and empirical perspectives. From a
theoretical perspective, we classify several types of
miscommunication, as action or perception failure, and part of a
more general case of non-alignment of the mental states of agents.
From an empirical perspective, we present a preliminary analysis of
examples of miscommunication in multi-modal collaboration. These
points of view converge towards a predictive model of grounding,
which considers costs and benefits of performing grounding acts
(including repairs of miscommunication).
Designing an agent to participate in natural
conversation requires more than just adapting a standard agent model
to perceive and produce language. In particular, the model must be
augmented with social attitudes (including mutual belief, shared
plans, and obligations) and a notion of discourse context. The
dialogue manager of the TRAINS-93 NL conversation system embodies such
an augmented theory of agency. This paper focuses on the
representation of mental state and discourse context and the
deliberation strategies used in the agent model of the dialogue
manager.
ftp://ftp.cogsci.ed.ac.uk/pub/poesio/ijcai_context.ps.gz
Natural language dialogue systems require contextual
information for a variety of processing functions, including reference
resolution, speech act recognition, and dialogue management. While much has
been written about individual contextual problems, many of the proposed
representations are mutually incompatible, unusable by the agents involved in a
conversation, or both.
94.tr545.Computational_theory_of_grounding_in_NL_conversation.ps.gz
The process of adding to the common ground between conversational
participants (called grounding) has previously been either
oversimplified or studied in an off-line manner. This dissertation
presents a computational theory, in which a protocol is
presented which can be used to determine, for any given state of the
conversation, whether material has been grounded or what it would take
to ground the material. This protocol is related to the mental states
of participating agents, showing the motivations for performing
particular grounding acts and what their effects will be.
http://tecfa.unige.ch/tecfa/tecfa-research/traum/ukp94.ps.gz
We present a situation theoretic formalization of plan execution which
allows for an abstract characterization of the role an action
performance plays in the execution of a plan, including
characterizations of performance error and plan repair. The Plan
Execution Situation presented generalizes the mental state of
having a plan to include cases where the plan is in the midst of
execution, and allows for representation of dynamic change of the
plan's recipe as well as the attitudes of the agent towards previous
execution. We also show how this formalism can be used in different
plan inference tasks such as plan execution monitoring and plan
recognition.
94.traum-allen.ACL.ps.Z
We show that in modeling social interaction, particularly dialogue,
the attitude of obligation can be a useful adjunct to the
popularly considered attitudes of belief, goal, and intention and
their mutual and shared counterparts. In particular, we show how
discourse obligations can be used to account in a natural manner for
the connection between a question and its answer in dialogue and how
obligations can be used along with other parts of the discourse
context to extend the coverage of a dialogue system.
The TRAINS project is an effort to build a conversationally
proficient planning assistant. A key part of the project is the
construction of the TRAINS system, which provides the research
platform for a wide range of issues in natural language
understanding, mixed-initiative planning systems, and representing
and reasoning about time, actions and events. Four years have now
passed since the beginning of the project. Each year we have
produced a demonstration system that focused on a dialog that
illustrates particular aspects of our research. The commitment to
building complete integrated systems is a significant overhead on
the research, but we feel it is essential to guarantee that the
results constitute real progress in the field. This paper describes
the goals of the project, and our experience with the effort so far.
94.traum-et-al.aaai-spring-94.integrating-nlu-trains.ps.Z
This paper describes the TRAINS-93 Conversation System, an
implemented system that acts as an intelligent planning assistant
and converses with the user in natural language. The architecture of
the system is described and particular attention is paid to the
interactions between the language understanding and plan reasoning
components. We examine how these two tasks constrain and inform
each other in an integrated NL-based system.
This paper contains an investigation of the relationship between
rhetorical relations and intentions. Rhetorical relations are claimed
to be actions, and thus the proper objects of intentions, although
some relations may occur be independent of intentions. Explicit
identification of particular relations is shown to be not always
necessary when this information can be captured in other ways,
nevertheless, relations are often useful both in planning and
recognition.
Task description: 92.tn1.trains_91_dialogues.ps.Z
Plain text dialogues: 92.tn1.trains_91_dialogues.txt
This report contains a small corpus of transcriptions of task oriented
spoken conversations in the TRAINS domain. Included are 16 conversations,
amounting to over 80 minutes of speech. Also included are a description
of the task and collection situation and the conventions used in
transcription and utterance segmentation.
This paper describes the dialogue manager of the TRAINS-92 system.
The general dialogue model is described with emphasis on the
mentalistic attitudes represented.
A linguistic form's compositional, timeless meaning can be surrounded
or even contradicted by various social, aesthetic,
or analogistic companion meanings.
This paper addresses a series of problems in the structure of
spoken language discourse, including turn-taking and grounding.
It views these processes as composed of fine-grained actions,
which resemble speech acts both in resulting from a computational
mechanism of planning and in having a rich relationship to the
specific linguistic features which serve to indicate their presence.
We propose that Grounding, the process of achieving mutual
understanding between participants in a conversation, be analyzed in
terms of the actions performed by the conversants which contribute to
achieving this mutual understanding. We propose a set of
Grounding Acts which facilitate this analysis. This paper describes
Grounding Acts, and a ``grammar'' stipulating which series of
performance of grounding acts result in grounded content.
A general purpose reasoning agent may come in contact with many
types of external forces which have differing types of effects on the
world. Different types of forces require different types of reasoning
about them. We present a classification scheme for planning domains,
based on differentiating the types of causative forces present along
dimensions of the degree of interaction and of how cognitive they are.
We present some speculations on how best to reason in these domains.
Example problems are illustrated using the ARMTRAK domain.