Offline training of Deep Reinforcement Learning agents is a valuable solution for addressing autonomous control and robotics challenges, especially when acquiring real-world data is particularly difficult and simulators are not available. Despite its validity, this approach is burdened by the issue of sample inefficiency, particularly with high-dimensional data, such as images, making it less effective for vision-based tasks. Counterfactual data augmentation addresses this issue by expanding the training dataset with plausible samples consistent with stochastic environments. However, its application in vision-based control remains underexplored. This study introduces a novel counterfactual data augmentation technique for vision-based tasks, leveraging Deep Generative Models to estimate the Structural Causal Model (SCM) and the associated reward model. The learned SCM is then used to augment the training dataset and optimize the DRL agents. We show the effectiveness of our method compared to existing state-of-the-art approaches on a series of stochastic control problems, both with discrete and continuous action spaces.
Enhancing Counterfactual Data Augmentation for Offline Reinforcement Learning in Vision-Based Control
Brilli R.;Dionigi A.;Crocetti F.;Costante G.
2025
Abstract
Offline training of Deep Reinforcement Learning agents is a valuable solution for addressing autonomous control and robotics challenges, especially when acquiring real-world data is particularly difficult and simulators are not available. Despite its validity, this approach is burdened by the issue of sample inefficiency, particularly with high-dimensional data, such as images, making it less effective for vision-based tasks. Counterfactual data augmentation addresses this issue by expanding the training dataset with plausible samples consistent with stochastic environments. However, its application in vision-based control remains underexplored. This study introduces a novel counterfactual data augmentation technique for vision-based tasks, leveraging Deep Generative Models to estimate the Structural Causal Model (SCM) and the associated reward model. The learned SCM is then used to augment the training dataset and optimize the DRL agents. We show the effectiveness of our method compared to existing state-of-the-art approaches on a series of stochastic control problems, both with discrete and continuous action spaces.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


