Learning Actions That Reduce Variation in Objects
The variation in the data that a robot in the real world receives from its sensory inputs (i.e. its sensory data) will come from many sources. Much of this variation is the result of ground truths about the world, such as what class an object belongs to, its shape, its condition, and so on. Robots would like to infer this information so they can use it to reason. A considerable amount of additional variation in the data, however, arises as a result of the robot’s relative configuration compared to an object; that is, its relative position, orientation, focal depth, etc. Fortunately, a robot has direct control over this configural variation: it can perform actions such as tilting its head or shifting its gaze. The task of inferring ground truth from data is difficult, and is made much more difficult when data is affected by configural variation. This thesis explores an approach in which the robot learns to perform actions that minimize the amount of configural variation in its sensory data, making the task of inferring information about objects considerably easier. The value of this approach is demonstrated by classifying digits from the MNIST and USPS datasets that have been transformed in various ways so that they include various kinds of configural variation.