×

You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.

Contacts:

+7 961 270-60-01
ivdon3@bk.ru

Robotic demonstration training using a diffusion model and reinforcement learning algorithms

Abstract

Robotic demonstration training using a diffusion model and reinforcement learning algorithms

Gao Tianci

Incoming article date: 19.01.2025

The paper proposes a two-stage method of training a robot based on demonstrations, combining a diffusion generative model and online additional training using the method of Proximal Policy Optimization. In the offline phase, the diffusion model uses a limited set of expert demonstrations and generates synthetic "pseudo-demonstrations", allowing to expand the variability and coverage of the original dataset. This eliminates the narrow specialization of the strategy and increases its ability to generalize. In the online phase, a robot with a pre-trained strategy adjusts its actions in a real environment (or in a high-precision simulation), which significantly reduces the risks of unsafe actions and reduces the number of necessary interactions. Additionally, parametrically efficient pre-tuning has been introduced, reducing computational costs for online learning, as well as value guidance that focuses the generation of new data on areas of states and actions with high Q scores. Experiments on tasks from the D4RL set (Hopper, Walker2d, HalfCheetah) show that our approach achieves the greatest accumulated reward with lower computational costs compared to alternatives. T-SNE analysis indicates a shift of synthetic data in the area of space with high Q scores, contributing to accelerated learning. The results obtained confirm the prospects of the proposed method for robotic applications, where it is important to combine the limited volume of demonstrations, the safety and effectiveness of the online phase.

Keywords: robot learning from demonstrations, diffusion generative models, reinforcement learning, Proximal Policy Optimization