The development of visual tracking systems is becoming a major goal for the Robotics community. Most of the works dealing with this topic focus exclusively on passive tracking, where the target is confined within the camera's field of view. Only a minority propose active approaches, capable not only of identifying the object to be tracked but also of producing motion control actions to maintain visual contact with it. However, all the methods introduced so far assume that the target is initially in the immediate proximity of the tracker. This represents an undesirable constraint on the applicability of these techniques, and it is to overcome this limitation that we propose a novel End-to-End Deep Reinforcement Learning based system, capable of both exploring the surrounding environment to find the target and then of tracking it. To do this, we develop a network consisting of two sub-components: i) the Target-Detection Network, which detects the target in the camera's field-of-view, and ii) the Exploration and Tracking Network, which employs this information to switch between the exploration policy and the tracking policy with the goal of exploring the environment, finding the target and finally tracking it. Through different experiments, we demonstrate the effectiveness of our approach and its superior performance with respect to current state-of-the-art (SotA) methods.
E-VAT: An Asymmetric End-to-End Approach to Visual Active Exploration and Tracking
Dionigi A.;Devo A.;Costante G.
2022
Abstract
The development of visual tracking systems is becoming a major goal for the Robotics community. Most of the works dealing with this topic focus exclusively on passive tracking, where the target is confined within the camera's field of view. Only a minority propose active approaches, capable not only of identifying the object to be tracked but also of producing motion control actions to maintain visual contact with it. However, all the methods introduced so far assume that the target is initially in the immediate proximity of the tracker. This represents an undesirable constraint on the applicability of these techniques, and it is to overcome this limitation that we propose a novel End-to-End Deep Reinforcement Learning based system, capable of both exploring the surrounding environment to find the target and then of tracking it. To do this, we develop a network consisting of two sub-components: i) the Target-Detection Network, which detects the target in the camera's field-of-view, and ii) the Exploration and Tracking Network, which employs this information to switch between the exploration policy and the tracking policy with the goal of exploring the environment, finding the target and finally tracking it. Through different experiments, we demonstrate the effectiveness of our approach and its superior performance with respect to current state-of-the-art (SotA) methods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.