add reward manager #71

yhnsu · 2026-01-16T08:09:02Z

Description

add reward manager for RL training

Copilot

Pull request overview

This PR introduces a reward manager system for reinforcement learning training, enabling modular reward computation through configurable reward functors. The reward manager follows the same design pattern as the existing observation manager and event manager.

Changes:

Added RewardManager class to orchestrate reward computation with support for multiple weighted reward terms
Implemented 11 reusable reward functions covering distance-based rewards, penalties, and success bonuses
Integrated reward manager into EmbodiedEnv and BaseEnv for automatic reward computation
Refactored PushCubeEnv to use the reward manager instead of manual reward calculation
Added randomize_target_pose function for virtual goal poses without physical scene objects

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
embodichain/lab/gym/envs/managers/reward_manager.py	New reward manager class for orchestrating reward computation
embodichain/lab/gym/envs/managers/rewards.py	New module with 11 reward functor implementations
embodichain/lab/gym/envs/managers/cfg.py	Added RewardCfg configuration class
embodichain/lab/gym/envs/managers/init.py	Exported RewardCfg and RewardManager
embodichain/lab/gym/envs/embodied_env.py	Integrated reward manager initialization and reset
embodichain/lab/gym/envs/base_env.py	Added _extend_reward hook in get_reward method
embodichain/lab/gym/utils/gym_utils.py	Added reward parsing logic in load_gym_cfg
embodichain/lab/gym/envs/tasks/rl/push_cube.py	Refactored to use reward manager, removed manual reward code
embodichain/lab/gym/envs/managers/randomization/spatial.py	Added randomize_target_pose function
embodichain/lab/gym/envs/managers/observations.py	Added get_robot_ee_pose and target_position observation functions
configs/agents/rl/push_cube/gym_config.json	Updated to use reward manager configuration
configs/agents/rl/push_cube/train_config.json	Changed eval_freq from 2 to 200

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-16T08:22:45Z

configs/agents/rl/push_cube/gym_config.json

+          "end_effector_cfg": {"uid": "Manipulator", "body_ids": "ee_link"},
+          "object_cfg": {"uid": "cube"},
+          "goal_cfg": {"uid": "goal_sphere"},
+          "behind_offset": 0.015,
+          "height_offset": 0.015,
+          "distance_scale": 5.0


The reward function references 'goal_sphere' object which has been removed from the 'background' array (line 171). The reaching_behind_object_reward function will fail at runtime when trying to access this non-existent object. Either restore the goal_sphere object in the scene or update the reward function to use the virtual goal_pose instead.

Suggested change

"end_effector_cfg": {"uid": "Manipulator", "body_ids": "ee_link"},

"object_cfg": {"uid": "cube"},

"goal_cfg": {"uid": "goal_sphere"},

"behind_offset": 0.015,

"height_offset": 0.015,

"distance_scale": 5.0

"end_effector_cfg": {"uid": "Manipulator", "body_ids": "ee_link"},

"object_cfg": {"uid": "cube"},

"target_pose_key": "goal_pose",

"behind_offset": 0.015,

"height_offset": 0.015,

"distance_scale": 5.0

Copilot · 2026-01-16T08:22:45Z

embodichain/lab/gym/envs/managers/observations.py

+    Args:
+        env: The environment instance.
+        obs: The observation dictionary.
+        robot_uid: The uid of the robot. If None, uses env.robot.


The docstring mentions a 'robot_uid' parameter but this parameter doesn't exist in the function signature. The function always uses env.robot. Remove this incorrect documentation line.

Suggested change

robot_uid: The uid of the robot. If None, uses env.robot.

Copilot · 2026-01-16T08:22:45Z

embodichain/lab/gym/envs/managers/reward_manager.py

+            table.align["Name"] = "l"
+            for index, name in enumerate(self._mode_functor_names[mode]):
+                functor_cfg = self._mode_functor_cfgs[mode][index]
+                weight = functor_cfg.params.get("weight", 1.0)


The weight is being retrieved from params instead of from the RewardCfg.weight attribute. This means the weight configured at the top level of RewardCfg (line 168) is ignored. It should use functor_cfg.weight to correctly display and apply the configured weight.

Suggested change

weight = functor_cfg.params.get("weight", 1.0)

weight = getattr(functor_cfg, "weight", 1.0)

Copilot · 2026-01-16T08:22:46Z

embodichain/lab/gym/envs/managers/rewards.py

+) -> torch.Tensor:
+    """Reward based on distance between two entities."""
+    # get source entity position
+    source_obj = env.sim[source_entity_cfg.uid]


The code uses direct dictionary access env.sim[uid] which will raise a KeyError if the entity doesn't exist. For consistency with the observation functions (see get_rigid_object_pose in observations.py), consider using env.sim.get_rigid_object() which provides safer access with None checking.

Copilot · 2026-01-16T08:22:46Z

embodichain/lab/gym/envs/tasks/rl/push_cube.py

-        info["rewards"]["action_penalty"] = action_penalty
-        info["rewards"]["success_bonus"] = success_bonus
-        return reward
+            # Fallback: no virtual goal set


The fallback to zero goal position could mask configuration errors. If goal_poses is not set, it indicates the randomize_goal event is missing from configuration. Consider logging a warning when falling back to zeros to help with debugging misconfiguration.

Suggested change

# Fallback: no virtual goal set

# Fallback: no virtual goal set

logger.warning(

"PushCubeEnv: _goal_poses is not set; falling back to zero goal positions. "

"This may indicate that the randomize_target_pose event is missing from the configuration."

)

Copilot · 2026-01-16T08:22:46Z

embodichain/lab/gym/envs/managers/rewards.py

+        current_dist = torch.norm(source_pos - target_pos, dim=-1)
+
+    # initialize previous distance on first call
+    prev_dist_key = f"_prev_dist_{source_entity_cfg.uid}_{target_pose_key}"


The state key uses string concatenation which could create long attribute names. While functional, consider using a dictionary-based state management approach (e.g., env._reward_states[key]) for better organization when multiple incremental rewards are used.

Copilot · 2026-01-16T08:22:46Z

embodichain/lab/gym/utils/gym_utils.py

+                raise_if_not_found=True,
+            )
+
+            from embodichain.lab.gym.envs.managers import RewardCfg


The import statement is placed inside the loop iteration. Move this import to the top of the file with other imports to follow Python best practices and avoid repeated import overhead during loop iterations.

Copilot · 2026-01-16T08:22:47Z

embodichain/lab/gym/envs/embodied_env.py

+            # Add individual reward terms to info for logging
+            for term_name, term_value in reward_info.items():
+                info[f"reward/{term_name}"] = term_value


Individual reward terms are added directly to the info dict with a 'reward/' prefix. Consider organizing them under info['rewards'] = reward_info for consistency with how the removed manual implementation organized reward components (see line 170 in the diff), making it easier for trainers to identify and log reward components.

Suggested change

# Add individual reward terms to info for logging

for term_name, term_value in reward_info.items():

info[f"reward/{term_name}"] = term_value

# Organize individual reward terms under info["rewards"] for logging

existing_rewards = info.get("rewards")

if isinstance(existing_rewards, dict):

existing_rewards.update(reward_info)

else:

info["rewards"] = reward_info

yuecideng · 2026-01-17T12:12:09Z

embodichain/lab/gym/envs/managers/observations.py

    return torch.stack(masks, dim=-1)


+def get_robot_ee_pose(


Use get_robot_eef_pose

yuecideng · 2026-01-17T12:16:33Z

embodichain/lab/gym/envs/managers/rewards.py

+    target_value: float = 0.0,
+    scale: float = 1.0,
+) -> torch.Tensor:
+    """Reward based on observation values."""


Add docs for this function

yuecideng · 2026-01-17T12:17:06Z

embodichain/lab/gym/envs/managers/rewards.py

+    from embodichain.lab.gym.envs import EmbodiedEnv
+
+
+def reward_from_obs(


Can not understand the meanning of this reward founction

yuecideng · 2026-01-17T12:17:32Z

embodichain/lab/gym/envs/managers/rewards.py

+    exponential: bool = False,
+    sigma: float = 1.0,
+) -> torch.Tensor:
+    """Reward based on distance between two entities."""


Add full docs for this function

yuecideng · 2026-01-17T12:17:49Z

embodichain/lab/gym/envs/managers/rewards.py

+    robot_uid: str = "robot",
+    joint_ids: slice | list[int] = slice(None),
+) -> torch.Tensor:
+    """Penalize large joint velocities."""


Add full docs for this function

yuecideng · 2026-01-17T12:47:58Z

configs/agents/rl/push_cube/gym_config.json

+      "reaching_reward": {
+        "func": "reaching_behind_object_reward",
+        "mode": "add",
+        "name": "reaching",


I think the arg name is duplicated, since we already have this reward functor key

yuanhaonan added 2 commits January 16, 2026 14:24

add reward manager

7eeaeab

fix obs

f2a1d19

yuecideng requested a review from Copilot January 16, 2026 08:19

Copilot started reviewing on behalf of yuecideng January 16, 2026 08:19 View session

Copilot AI reviewed Jan 16, 2026

View reviewed changes

yuecideng requested changes Jan 17, 2026

View reviewed changes

	weight = functor_cfg.params.get("weight", 1.0)
	weight = getattr(functor_cfg, "weight", 1.0)

-            # Fallback: no virtual goal set
+            # Fallback: no virtual goal set
+            logger.warning(
+                "PushCubeEnv: _goal_poses is not set; falling back to zero goal positions. "
+                "This may indicate that the randomize_target_pose event is missing from the configuration."
+            )

-            # Add individual reward terms to info for logging
-            for term_name, term_value in reward_info.items():
-                info[f"reward/{term_name}"] = term_value
+            # Organize individual reward terms under info["rewards"] for logging
+            existing_rewards = info.get("rewards")
+            if isinstance(existing_rewards, dict):
+                existing_rewards.update(reward_info)
+            else:
+                info["rewards"] = reward_info

		from embodichain.lab.gym.envs import EmbodiedEnv


		def reward_from_obs(

add reward manager #71

Are you sure you want to change the base?

add reward manager #71

Uh oh!

Conversation

yhnsu commented Jan 16, 2026

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

yuecideng Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

yuecideng Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

yuecideng Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

yuecideng Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

yuecideng Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

yuecideng Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants