Robotic demonstration training using a diffusion model and reinforcement learning algorithms
Abstract
Robotic demonstration training using a diffusion model and reinforcement learning algorithms
Incoming article date: 19.01.2025The paper proposes a two-stage method of training a robot based on demonstrations, combining a diffusion generative model and online additional training using the method of Proximal Policy Optimization. In the offline phase, the diffusion model uses a limited set of expert demonstrations and generates synthetic "pseudo-demonstrations", allowing to expand the variability and coverage of the original dataset. This eliminates the narrow specialization of the strategy and increases its ability to generalize. In the online phase, a robot with a pre-trained strategy adjusts its actions in a real environment (or in a high-precision simulation), which significantly reduces the risks of unsafe actions and reduces the number of necessary interactions. Additionally, parametrically efficient pre-tuning has been introduced, reducing computational costs for online learning, as well as value guidance that focuses the generation of new data on areas of states and actions with high Q scores. Experiments on tasks from the D4RL set (Hopper, Walker2d, HalfCheetah) show that our approach achieves the greatest accumulated reward with lower computational costs compared to alternatives. T-SNE analysis indicates a shift of synthetic data in the area of space with high Q scores, contributing to accelerated learning. The results obtained confirm the prospects of the proposed method for robotic applications, where it is important to combine the limited volume of demonstrations, the safety and effectiveness of the online phase.
Keywords: robot learning from demonstrations, diffusion generative models, reinforcement learning, Proximal Policy Optimization