-
Notifications
You must be signed in to change notification settings - Fork 0
add reward manager #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a reward manager system for reinforcement learning training, enabling modular reward computation through configurable reward functors. The reward manager follows the same design pattern as the existing observation manager and event manager.
Changes:
- Added
RewardManagerclass to orchestrate reward computation with support for multiple weighted reward terms - Implemented 11 reusable reward functions covering distance-based rewards, penalties, and success bonuses
- Integrated reward manager into
EmbodiedEnvandBaseEnvfor automatic reward computation - Refactored
PushCubeEnvto use the reward manager instead of manual reward calculation - Added
randomize_target_posefunction for virtual goal poses without physical scene objects
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| embodichain/lab/gym/envs/managers/reward_manager.py | New reward manager class for orchestrating reward computation |
| embodichain/lab/gym/envs/managers/rewards.py | New module with 11 reward functor implementations |
| embodichain/lab/gym/envs/managers/cfg.py | Added RewardCfg configuration class |
| embodichain/lab/gym/envs/managers/init.py | Exported RewardCfg and RewardManager |
| embodichain/lab/gym/envs/embodied_env.py | Integrated reward manager initialization and reset |
| embodichain/lab/gym/envs/base_env.py | Added _extend_reward hook in get_reward method |
| embodichain/lab/gym/utils/gym_utils.py | Added reward parsing logic in load_gym_cfg |
| embodichain/lab/gym/envs/tasks/rl/push_cube.py | Refactored to use reward manager, removed manual reward code |
| embodichain/lab/gym/envs/managers/randomization/spatial.py | Added randomize_target_pose function |
| embodichain/lab/gym/envs/managers/observations.py | Added get_robot_ee_pose and target_position observation functions |
| configs/agents/rl/push_cube/gym_config.json | Updated to use reward manager configuration |
| configs/agents/rl/push_cube/train_config.json | Changed eval_freq from 2 to 200 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "end_effector_cfg": {"uid": "Manipulator", "body_ids": "ee_link"}, | ||
| "object_cfg": {"uid": "cube"}, | ||
| "goal_cfg": {"uid": "goal_sphere"}, | ||
| "behind_offset": 0.015, | ||
| "height_offset": 0.015, | ||
| "distance_scale": 5.0 |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reward function references 'goal_sphere' object which has been removed from the 'background' array (line 171). The reaching_behind_object_reward function will fail at runtime when trying to access this non-existent object. Either restore the goal_sphere object in the scene or update the reward function to use the virtual goal_pose instead.
| "end_effector_cfg": {"uid": "Manipulator", "body_ids": "ee_link"}, | |
| "object_cfg": {"uid": "cube"}, | |
| "goal_cfg": {"uid": "goal_sphere"}, | |
| "behind_offset": 0.015, | |
| "height_offset": 0.015, | |
| "distance_scale": 5.0 | |
| "end_effector_cfg": {"uid": "Manipulator", "body_ids": "ee_link"}, | |
| "object_cfg": {"uid": "cube"}, | |
| "target_pose_key": "goal_pose", | |
| "behind_offset": 0.015, | |
| "height_offset": 0.015, | |
| "distance_scale": 5.0 |
| Args: | ||
| env: The environment instance. | ||
| obs: The observation dictionary. | ||
| robot_uid: The uid of the robot. If None, uses env.robot. |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring mentions a 'robot_uid' parameter but this parameter doesn't exist in the function signature. The function always uses env.robot. Remove this incorrect documentation line.
| robot_uid: The uid of the robot. If None, uses env.robot. |
| table.align["Name"] = "l" | ||
| for index, name in enumerate(self._mode_functor_names[mode]): | ||
| functor_cfg = self._mode_functor_cfgs[mode][index] | ||
| weight = functor_cfg.params.get("weight", 1.0) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weight is being retrieved from params instead of from the RewardCfg.weight attribute. This means the weight configured at the top level of RewardCfg (line 168) is ignored. It should use functor_cfg.weight to correctly display and apply the configured weight.
| weight = functor_cfg.params.get("weight", 1.0) | |
| weight = getattr(functor_cfg, "weight", 1.0) |
| ) -> torch.Tensor: | ||
| """Reward based on distance between two entities.""" | ||
| # get source entity position | ||
| source_obj = env.sim[source_entity_cfg.uid] |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code uses direct dictionary access env.sim[uid] which will raise a KeyError if the entity doesn't exist. For consistency with the observation functions (see get_rigid_object_pose in observations.py), consider using env.sim.get_rigid_object() which provides safer access with None checking.
| info["rewards"]["action_penalty"] = action_penalty | ||
| info["rewards"]["success_bonus"] = success_bonus | ||
| return reward | ||
| # Fallback: no virtual goal set |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback to zero goal position could mask configuration errors. If goal_poses is not set, it indicates the randomize_goal event is missing from configuration. Consider logging a warning when falling back to zeros to help with debugging misconfiguration.
| # Fallback: no virtual goal set | |
| # Fallback: no virtual goal set | |
| logger.warning( | |
| "PushCubeEnv: _goal_poses is not set; falling back to zero goal positions. " | |
| "This may indicate that the randomize_target_pose event is missing from the configuration." | |
| ) |
| current_dist = torch.norm(source_pos - target_pos, dim=-1) | ||
|
|
||
| # initialize previous distance on first call | ||
| prev_dist_key = f"_prev_dist_{source_entity_cfg.uid}_{target_pose_key}" |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state key uses string concatenation which could create long attribute names. While functional, consider using a dictionary-based state management approach (e.g., env._reward_states[key]) for better organization when multiple incremental rewards are used.
| raise_if_not_found=True, | ||
| ) | ||
|
|
||
| from embodichain.lab.gym.envs.managers import RewardCfg |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import statement is placed inside the loop iteration. Move this import to the top of the file with other imports to follow Python best practices and avoid repeated import overhead during loop iterations.
| # Add individual reward terms to info for logging | ||
| for term_name, term_value in reward_info.items(): | ||
| info[f"reward/{term_name}"] = term_value |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Individual reward terms are added directly to the info dict with a 'reward/' prefix. Consider organizing them under info['rewards'] = reward_info for consistency with how the removed manual implementation organized reward components (see line 170 in the diff), making it easier for trainers to identify and log reward components.
| # Add individual reward terms to info for logging | |
| for term_name, term_value in reward_info.items(): | |
| info[f"reward/{term_name}"] = term_value | |
| # Organize individual reward terms under info["rewards"] for logging | |
| existing_rewards = info.get("rewards") | |
| if isinstance(existing_rewards, dict): | |
| existing_rewards.update(reward_info) | |
| else: | |
| info["rewards"] = reward_info |
| return torch.stack(masks, dim=-1) | ||
|
|
||
|
|
||
| def get_robot_ee_pose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use get_robot_eef_pose
| target_value: float = 0.0, | ||
| scale: float = 1.0, | ||
| ) -> torch.Tensor: | ||
| """Reward based on observation values.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docs for this function
| from embodichain.lab.gym.envs import EmbodiedEnv | ||
|
|
||
|
|
||
| def reward_from_obs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can not understand the meanning of this reward founction
| exponential: bool = False, | ||
| sigma: float = 1.0, | ||
| ) -> torch.Tensor: | ||
| """Reward based on distance between two entities.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add full docs for this function
| robot_uid: str = "robot", | ||
| joint_ids: slice | list[int] = slice(None), | ||
| ) -> torch.Tensor: | ||
| """Penalize large joint velocities.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add full docs for this function
| "reaching_reward": { | ||
| "func": "reaching_behind_object_reward", | ||
| "mode": "add", | ||
| "name": "reaching", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the arg name is duplicated, since we already have this reward functor key
Description
add reward manager for RL training