Benchmarking Action Spaces in Reinforcement Learning for Vision-based Robotic Manipulation

arXiv CS · June 18, 2026 · 2 min read · Engineering & Technology

Read research and analysis on Benchmarking Action Spaces in Reinforcement Learning for Vision-based Robotic Manipulation published by ICANEWS, a global research journal for emerging researchers.

Key Takeaways

The choice of action space significantly affects sim-to-real performance in vision-based manipulation.
Joint velocity action space performs best for vision-based picking and pushing tasks regarding smoothness.
Joint velocity action space performs best for vision-based picking and pushing tasks regarding final task performance.

Why This Matters

The research provides practical guidance for RL practitioners regarding action space selection in both simulation and real-world experiments. This specificity can streamline development and improve the performance of vision-based robotic manipulation systems.

Overview

This research investigates the impact of action space representation in reinforcement learning (RL) for vision-based robotic manipulation. The study focused on two specific tasks: object picking and pushing. Four distinct action space types were evaluated: pose increment, pose velocity, joint position increment, and joint velocity. Policies were initially trained in simulation and subsequently deployed in real-world environments through sim-to-real transfer to assess their performance.

Research Context

The selection of an appropriate action space within real-world reinforcement learning environments is described as playing a significant role in influencing motion smoothness, ensuring safety during operation, and ultimately affecting the overall performance of a given task.

Approach

The study employed a methodology that involved benchmarking four specific action space representations: pose increment, pose velocity, joint position increment, and joint velocity. These action spaces were applied to two vision-based manipulation tasks: object picking and a second task of object pushing. The experimental procedure involved training policies within a simulated environment. Following this, the trained policies were transferred and implemented in real-world settings, utilizing a sim-to-real transfer approach to evaluate their practical efficacy.

Findings

The choice of action-space representation was found to significantly affect sim-to-real performance in vision-based robotic manipulation tasks.
Specifically, for the vision-based picking and pushing tasks investigated, the joint velocity action space was identified as providing optimal results.
This optimal performance for the joint velocity action space manifested in terms of both motion smoothness and the final task performance achieved.

Why This Matters

The findings indicate that careful consideration of the action space is critical for successful deployment of RL policies in real-world robotic manipulation applications. The research provides practical guidance for RL practitioners on selecting action spaces for both simulated and real-world experiments, which can inform policy design for improved robotic control.

The study offers practical guidance for practitioners involved in reinforcement learning. This guidance pertains to the selection of action spaces for both experiments conducted in simulation and those conducted in real-world scenarios.

Research Information

Institution: arXiv CS
Original Study: View Publication
Source: arXiv CS

About ICANEWS

ICANEWS is a global research journal for emerging researchers, publishing student and emerging researcher work across all fields.