Planning for Human-robot Interaction: Representing Time and Human Intention
The approach to human-robot social interaction taken in this thesis focuses on creating more accurate models of social tasks for planning. Because the human participants are modeled as a part of the environment, the world state in these problems is dynamic and partially observable. Human intention is represented as hidden state in a partially observable Markov decision process (POMDP), and the time-dependence of action outcomes are explicitly modeled. A model structure designed by a human expert is combined with human task performance data. The resulting models are large and complex. State aggregation over the time dimension of the state space is used to trade off between the accuracy of the representation and its size in order to find sufficiently expressive models that can also be solved tractably. The utility of this approach is demonstrated by implementing a controller for a mobile robot that rides elevators with people and an agent in a driving simulator that performs the Pittsburgh left with human drivers. Performance is evaluated by comparing the policies obtained using the proposed modeling technique to policies developed using less expressive representations. In an interactions with human participants, the policies for time-dependent POMDP models with human intention as hidden state outperform the other policies, achieving both higher rewards and more positive evaluations for naturalness and social propriety of behavior.
Avis des internautes - Rédiger un commentaire
Aucun commentaire n'a été trouvé aux emplacements habituels.
achieve action effects action rules action selection agent aggregation method aggregation threshold algorithm approach belief collision confidence intervals converge created data collection experiment distribution driving simulator elevator domain environment evaluated expected reward FRTDP game theory goals horizon problem HSVI human driver human-robot interaction human's intention infinite horizon intended to yield intersection Markov decision process Markov property MDP models median response model variants non-time-dependent POMDP non-time-indexed number of timesteps optimal policy original model people's Pgh left Pittsburgh left domain policies for action policy performance possible prior model probabilistic graphical models problem reachable Reg left Regular Left representation reward obtained reward structure robot's intention role semi-Markov models social behavior social interaction social robotics social tasks socially appropriate solve survey statement Table take the Pittsburgh task model thesis time-dependent action outcomes time-dependent POMDP model time-indexed model time-indexed POMDP time-indexed state space time-state aggregated models tion traffic light turning car variable