ソースを参照

更新训练参数,增加训练完成后自动评估训练结果脚本

corvin_zhang 1 週間 前
コミット
ae617468e0

+ 24 - 2
docs/guguji_biped_rl_guide.md

@@ -220,8 +220,8 @@ ros2 launch guguji_ros2 gazebo.launch.py gui:=false pause:=true
   奖励函数
 - `guguji_rl/scripts/train.py`
   PPO 训练
-- `guguji_rl/scripts/evaluate.py`
-  加载策略并运行
+- `guguji_rl/scripts/evaluate_forward_progress.py`
+  评估模型的 `delta_x / mean_vx`
 - `guguji_rl/scripts/run_policy.py`
   把训练好的策略作为在线控制程序持续运行
 - `guguji_rl/scripts/check_env.py`
@@ -236,6 +236,28 @@ ros2 launch guguji_ros2 gazebo.launch.py gui:=false pause:=true
 5. 观察 `/joint_states` 和 `/tf`,检查机器人是否会频繁倒地,是否开始出现轻微向前趋势。
 6. 调整奖励和关节范围后,再切到 `walk_ppo.yaml`。
 
+## 11. 怎么快速比较“这轮训练到底有没有更往前走”
+
+后面调 walking 参数时,最实用的指标就是:
+
+- `delta_x`
+- `mean_vx`
+
+当前代码已经支持两种方式:
+
+1. 每次 `scripts/train.py` 训练结束后,会自动输出一次前进评估汇总。
+2. 你也可以手动运行:
+
+```bash
+cd /home/corvin/Project/guguji_simulation/guguji_rl
+source .venv/bin/activate
+python3 scripts/evaluate_forward_progress.py \
+  --config configs/walk_ppo.yaml \
+  --model outputs/<你的实验目录>/final_model.zip
+```
+
+这样你后面对比不同 walking 配置时,不需要每次肉眼猜模型是不是更会往前走,终端里会直接给出量化结果。
+
 ## 10. 训练完成后怎么在 ROS 2 里持续控制
 
 训练完成后,建议分成两步:

+ 23 - 2
guguji_rl/README.md

@@ -17,7 +17,7 @@
 - `guguji_rl/envs/gazebo_biped_env.py`:Gymnasium 环境封装
 - `guguji_rl/rewards.py`:奖励函数
 - `scripts/train.py`:PPO 训练入口
-- `scripts/evaluate.py`:加载训练好的策略并在 Gazebo 中运行
+- `scripts/evaluate_forward_progress.py`:评估模型的 `delta_x / mean_vx`
 - `scripts/run_policy.py`:把训练好的策略作为在线控制程序持续运行
 - `scripts/check_env.py`:训练前自检脚本
 
@@ -28,7 +28,8 @@
 3. 先跑 `balance_ppo.yaml` 训练站立平衡
 4. 再跑 `forward_transition_ppo.yaml` 做轻微前进过渡训练
 5. 最后再跑 `walk_ppo.yaml` 训练正式前进
-6. 训练出模型后,先用 `evaluate.py` 做回放,再用 `run_policy.py` 在 ROS 2 系统中持续推理控制
+6. 训练出模型后,先用 `evaluate_forward_progress.py` 看前进效果,再用 `run_policy.py` 在 ROS 2 系统中持续推理控制
+7. 每轮训练结束后,`train.py` 会自动输出一次 `delta_x / mean_vx` 前进评估
 
 ## 安装依赖
 
@@ -68,6 +69,26 @@ python3 scripts/run_policy.py \
   --deterministic
 ```
 
+## 单独评估前进效果
+
+如果你想手动比较两个模型的前进能力,可以直接运行:
+
+```bash
+cd /home/corvin/Project/guguji_simulation/guguji_rl
+source .venv/bin/activate
+python3 scripts/evaluate_forward_progress.py \
+  --config configs/walk_ppo.yaml \
+  --model outputs/<你的实验目录>/final_model.zip
+```
+
+评估结果会直接输出每个 episode 的:
+
+- `delta_x`
+- `mean_vx`
+- `total_reward`
+
+以及最后的平均汇总值,后面调 walking 参数时会很方便。
+
 ## 课程学习训练示例
 
 如果你已经训练出一个平衡模型,建议轻微前进阶段直接在它的基础上继续训练:

+ 30 - 21
guguji_rl/configs/walk_ppo.yaml

@@ -25,10 +25,20 @@ robot:
     right_knee_pitch_joint: 0.0
     right_ankle_pitch_joint: 0.0
     right_ankle_joint: 0.0
-  # 在 reset 修好后,再把动作空间稍微放大一点,让策略更容易真正迈开步子。
-  action_scale: 0.26
-  # 比上一轮略微降低平滑,给左右腿交替摆动更多空间。
-  action_smoothing: 0.62
+  # 这一轮改成“参考步态 + 残差动作”:
+  # 先让双腿有明确的交替摆动节奏,再由策略学习如何稳住并把步态转成前进。
+  reference_gait:
+    enabled: true
+    period: 0.80
+    # 让摆动期更短、支撑期更长,更像真正向前走时的步态节奏。
+    stance_ratio: 0.62
+    hip_pitch_amplitude: 0.28
+    knee_pitch_amplitude: 0.36
+    knee_pitch_bias: 0.08
+    ankle_pitch_amplitude: 0.18
+  # 参考步态已经会主动摆腿,所以残差动作幅度可以适当收小一点。
+  action_scale: 0.18
+  action_smoothing: 0.72
 
 ros:
   joint_state_topic: /joint_states
@@ -58,8 +68,8 @@ sim:
   launch_hint: ros2 launch guguji_ros2 gazebo.launch.py pause:=true
 
 task:
-  # 再把目标速度往上推一点,进一步压缩“原地小幅挪动”的可行空间
-  target_forward_velocity: 0.18
+  # 参考步态接入后,把目标速度继续往上推,逼着策略把摆腿转成前进位移
+  target_forward_velocity: 0.22
   target_base_height: null
   # 允许稍大一点机体摆动,为迈步时的动态平衡留空间。
   max_roll_rad: 0.90
@@ -68,27 +78,25 @@ task:
   termination_grace_steps: 12
 
 rewards:
-  # 这一版在前进奖励之外,额外加入轻量步态先验:
-  # 1. hip_alternation 鼓励左右髋关节交替摆动
-  # 2. knee_flexion 鼓励膝关节适度弯曲,帮助抬腿
-  alive_bonus: 0.8
-  velocity_tracking_scale: 3.8
-  velocity_tracking_sigma: 0.10
-  forward_progress_scale: 4.2
-  hip_alternation_scale: 0.8
+  # 参考步态已经负责“先迈起来”,这里的奖励更聚焦于“迈起来以后要往前走”。
+  alive_bonus: 0.6
+  velocity_tracking_scale: 4.2
+  velocity_tracking_sigma: 0.09
+  forward_progress_scale: 5.0
+  hip_alternation_scale: 0.4
   hip_target_separation: 0.35
   hip_antiphase_sigma: 0.18
-  knee_flexion_scale: 0.6
+  knee_flexion_scale: 0.3
   knee_target: 0.22
   knee_flexion_sigma: 0.12
   upright_scale: 1.6
   height_scale: 0.9
-  action_rate_penalty_scale: 0.008
+  action_rate_penalty_scale: 0.006
   joint_limit_penalty_scale: 0.05
   lateral_velocity_penalty_scale: 0.10
-  backward_velocity_penalty_scale: 2.0
-  stall_penalty_scale: 3.0
-  stall_velocity_threshold: 0.06
+  backward_velocity_penalty_scale: 2.5
+  stall_penalty_scale: 4.0
+  stall_velocity_threshold: 0.08
   fall_penalty: -15.0
 
 training:
@@ -98,8 +106,9 @@ training:
   max_episode_steps: 500
   seed: 42
   device: auto
-  # 直接从刚刚这轮修复过 reset 之后的 walking 模型继续训练。
-  init_model_path: outputs/walk_ppo_20260411_180823/final_model.zip
+  # 由于参考步态接入后控制结构变化较大,这里仍然沿用最新 walking 模型作为初始化,
+  # 但把学习率压得更低,让策略以“细修”的方式适应新节奏。
+  init_model_path: outputs/walk_ppo_20260412_181640/final_model.zip
   # 显著降低学习率,并增大 rollout 长度,避免刚才那种更新过猛导致的退化。
   learning_rate: 0.00008
   n_steps: 256

+ 13 - 0
guguji_rl/guguji_rl/config.py

@@ -16,6 +16,15 @@ DEFAULT_CONFIG: dict[str, Any] = {
         'urdf_path': 'guguji_ros2_ws/src/guguji_ros2/urdf/guguji.urdf',
         'joint_names': [],
         'command_topic_prefix': '/guguji/command',
+        'reference_gait': {
+            'enabled': False,
+            'period': 0.9,
+            'stance_ratio': 0.62,
+            'hip_pitch_amplitude': 0.0,
+            'knee_pitch_amplitude': 0.0,
+            'knee_pitch_bias': 0.0,
+            'ankle_pitch_amplitude': 0.0,
+        },
     },
     'ros': {
         'joint_state_topic': '/joint_states',
@@ -88,6 +97,10 @@ DEFAULT_CONFIG: dict[str, Any] = {
     'evaluation': {
         'episodes': 3,
         'deterministic': True,
+        'auto_forward_progress': True,
+        'forward_progress_episodes': 2,
+        'forward_progress_max_steps': 500,
+        'forward_progress_deterministic': True,
     },
 }
 

+ 74 - 9
guguji_rl/guguji_rl/envs/gazebo_biped_env.py

@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import math
 import time
 from typing import Any
 
@@ -27,6 +28,7 @@ class GazeboBipedEnv(gym.Env):
         self.task_config = config['task']
         self.training_config = config['training']
         self.configured_target_base_height = self.task_config['target_base_height']
+        self.reference_gait_config = self.robot_config.get('reference_gait', {})
 
         urdf_path = resolve_project_path(config, self.robot_config['urdf_path'])
         self.joint_limits: list[JointLimit] = parse_joint_limits(
@@ -52,6 +54,11 @@ class GazeboBipedEnv(gym.Env):
         self.action_smoothing = float(np.clip(self.action_smoothing, 0.0, 0.99))
         self.reset_hold_steps = int(self.sim_config.get('reset_hold_steps', 1))
         self.termination_grace_steps = int(self.task_config.get('termination_grace_steps', 0))
+        self.reference_gait_enabled = bool(self.reference_gait_config.get('enabled', False))
+        self.reference_gait_period = max(
+            float(self.reference_gait_config.get('period', 0.9)),
+            float(self.sim_config['control_dt']),
+        )
 
         self.interface = GugujiRos2Interface(
             joint_names=self.joint_names,
@@ -92,32 +99,87 @@ class GazeboBipedEnv(gym.Env):
         self.previous_snapshot: RobotStateSnapshot | None = None
         self.current_snapshot: RobotStateSnapshot | None = None
         self.current_joint_target = self.nominal_joint_targets.copy()
+        self.current_action_residual = np.zeros(len(self.joint_names), dtype=np.float32)
         self.step_count = 0
         self.target_base_height = self.configured_target_base_height
+        self.gait_phase = 0.0
 
     def _normalized_joint_position(self, joint_position: np.ndarray) -> np.ndarray:
         return (joint_position - self.joint_mid) / self.joint_half_range
 
     def _action_to_joint_targets(self, action: np.ndarray) -> np.ndarray:
         clipped_action = np.clip(action, -1.0, 1.0)
+        reference_offsets = self._reference_gait_offsets()
+        reference_targets = self.nominal_joint_targets + reference_offsets
         # 将 [-1, 1] 的策略输出缩放成“围绕名义站姿的小残差动作”,
         # 这样 PPO 初期随机动作不会一下把关节打到很远的位置。
-        desired_targets = (
-            self.nominal_joint_targets
-            + clipped_action * self.joint_half_range * self.action_scale
-        )
-        desired_targets = np.clip(desired_targets, self.joint_lower, self.joint_upper)
+        desired_residual = clipped_action * self.joint_half_range * self.action_scale
 
         if self.action_smoothing > 0.0:
-            joint_targets = (
-                self.action_smoothing * self.current_joint_target
-                + (1.0 - self.action_smoothing) * desired_targets
+            smoothed_residual = (
+                self.action_smoothing * self.current_action_residual
+                + (1.0 - self.action_smoothing) * desired_residual
             )
         else:
-            joint_targets = desired_targets
+            smoothed_residual = desired_residual
+
+        self.current_action_residual = smoothed_residual.astype(np.float32)
+        joint_targets = reference_targets + self.current_action_residual
 
         return np.clip(joint_targets, self.joint_lower, self.joint_upper).astype(np.float32)
 
+    def _reference_gait_offsets(self) -> np.ndarray:
+        """生成一个简洁的参考步态,让 walking 阶段先学会交替迈腿。"""
+        offsets = np.zeros(len(self.joint_names), dtype=np.float32)
+        if not self.reference_gait_enabled:
+            return offsets
+
+        hip_amplitude = float(self.reference_gait_config.get('hip_pitch_amplitude', 0.0))
+        knee_amplitude = float(self.reference_gait_config.get('knee_pitch_amplitude', 0.0))
+        knee_bias = float(self.reference_gait_config.get('knee_pitch_bias', 0.0))
+        ankle_amplitude = float(self.reference_gait_config.get('ankle_pitch_amplitude', 0.0))
+        stance_ratio = float(self.reference_gait_config.get('stance_ratio', 0.62))
+        stance_ratio = float(np.clip(stance_ratio, 0.05, 0.95))
+
+        def gait_profile(phase: float) -> tuple[float, float]:
+            """返回该相位下的髋关节目标形状和膝关节摆动形状。"""
+            phase = phase % 1.0
+            if phase < stance_ratio:
+                stance_progress = phase / stance_ratio
+                # 支撑期:腿缓慢向后划,形成推地趋势。
+                hip_profile = -1.0 + 2.0 * stance_progress
+                knee_profile = 0.0
+            else:
+                swing_progress = (phase - stance_ratio) / (1.0 - stance_ratio)
+                # 摆动期:腿更快地向前回摆,并伴随膝关节抬腿。
+                hip_profile = 1.0 - 2.0 * swing_progress
+                knee_profile = math.sin(math.pi * swing_progress)
+            return hip_profile, knee_profile
+
+        left_hip_profile, left_knee_profile = gait_profile(self.gait_phase)
+        right_hip_profile, right_knee_profile = gait_profile(self.gait_phase + 0.5)
+
+        for index, joint_name in enumerate(self.joint_names):
+            if joint_name == 'left_hip_pitch_joint':
+                offsets[index] = hip_amplitude * left_hip_profile
+            elif joint_name == 'right_hip_pitch_joint':
+                offsets[index] = hip_amplitude * right_hip_profile
+            elif joint_name == 'left_knee_pitch_joint':
+                offsets[index] = knee_bias + knee_amplitude * left_knee_profile
+            elif joint_name == 'right_knee_pitch_joint':
+                offsets[index] = knee_bias + knee_amplitude * right_knee_profile
+            elif joint_name == 'left_ankle_pitch_joint':
+                offsets[index] = -ankle_amplitude * left_hip_profile
+            elif joint_name == 'right_ankle_pitch_joint':
+                offsets[index] = -ankle_amplitude * right_hip_profile
+
+        return offsets
+
+    def _advance_gait_phase(self) -> None:
+        if not self.reference_gait_enabled:
+            return
+        self.gait_phase = (self.gait_phase + float(self.sim_config['control_dt']) / self.reference_gait_period) % 1.0
+
     def _get_target_base_height(self, snapshot: RobotStateSnapshot) -> float:
         if self.target_base_height is None:
             self.target_base_height = float(snapshot.base_position[2])
@@ -208,7 +270,9 @@ class GazeboBipedEnv(gym.Env):
         self.previous_snapshot = snapshot
         self.previous_action = np.zeros(len(self.joint_names), dtype=np.float32)
         self.current_joint_target = self.nominal_joint_targets.copy()
+        self.current_action_residual = np.zeros(len(self.joint_names), dtype=np.float32)
         self.step_count = 0
+        self.gait_phase = 0.0
 
         observation = self._build_observation(snapshot, snapshot, self.previous_action)
         info = {
@@ -228,6 +292,7 @@ class GazeboBipedEnv(gym.Env):
         self.current_joint_target = joint_targets.copy()
         self._publish_joint_targets(joint_targets)
         self._advance_simulation()
+        self._advance_gait_phase()
 
         self.previous_snapshot = self.current_snapshot
         self.current_snapshot = self.interface.wait_for_snapshot()

+ 143 - 0
guguji_rl/guguji_rl/evaluation.py

@@ -0,0 +1,143 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+
+
+@dataclass
+class ForwardEpisodeMetrics:
+    episode_index: int
+    steps: int
+    terminated: bool
+    truncated: bool
+    total_reward: float
+    start_x: float
+    end_x: float
+    delta_x: float
+    mean_vx: float
+    start_z: float
+    end_z: float
+
+
+@dataclass
+class ForwardProgressSummary:
+    model_path: str
+    episodes: list[ForwardEpisodeMetrics]
+
+    @property
+    def mean_delta_x(self) -> float:
+        return float(np.mean([episode.delta_x for episode in self.episodes])) if self.episodes else 0.0
+
+    @property
+    def mean_mean_vx(self) -> float:
+        return float(np.mean([episode.mean_vx for episode in self.episodes])) if self.episodes else 0.0
+
+    @property
+    def mean_total_reward(self) -> float:
+        return float(np.mean([episode.total_reward for episode in self.episodes])) if self.episodes else 0.0
+
+
+def resolve_input_path(script_root: Path, path_str: str | Path) -> Path:
+    path = Path(path_str)
+    if path.is_absolute() or path.exists():
+        return path
+    return script_root / path
+
+
+def resolve_sb3_model_path(model_path: str | Path) -> str:
+    """兼容 stable-baselines3 对 .zip 扩展名的处理。"""
+    path = Path(model_path)
+    # SB3 的 PPO.load("foo") 会自动补成 foo.zip。
+    # 如果这里传入已经带 .zip 的 Path,某些情况下会再拼一次 .zip,导致找不到文件。
+    if path.suffix == '.zip':
+        return str(path.with_suffix(''))
+    return str(path)
+
+
+def evaluate_forward_progress(
+    *,
+    config: dict[str, Any],
+    model_path: str | Path,
+    episodes: int,
+    max_steps: int,
+    deterministic: bool,
+) -> ForwardProgressSummary:
+    try:
+        from stable_baselines3 import PPO
+    except ImportError as error:
+        raise RuntimeError('缺少 stable-baselines3,无法执行策略评估。') from error
+
+    from guguji_rl.envs import GazeboBipedEnv
+
+    env = GazeboBipedEnv(config)
+    summary_episodes: list[ForwardEpisodeMetrics] = []
+
+    try:
+        model = PPO.load(resolve_sb3_model_path(model_path))
+
+        for episode_index in range(max(int(episodes), 1)):
+            observation, _ = env.reset()
+            start_x = float(env.current_snapshot.base_position[0])
+            start_z = float(env.current_snapshot.base_position[2])
+            terminated = False
+            truncated = False
+            total_reward = 0.0
+            steps = 0
+
+            while not terminated and not truncated and steps < int(max_steps):
+                action, _ = model.predict(observation, deterministic=deterministic)
+                observation, reward, terminated, truncated, _ = env.step(action)
+                total_reward += float(reward)
+                steps += 1
+
+            end_x = float(env.current_snapshot.base_position[0])
+            end_z = float(env.current_snapshot.base_position[2])
+            delta_x = end_x - start_x
+            mean_vx = delta_x / max(steps * float(config['sim']['control_dt']), 1e-6)
+
+            summary_episodes.append(
+                ForwardEpisodeMetrics(
+                    episode_index=episode_index,
+                    steps=steps,
+                    terminated=terminated,
+                    truncated=truncated,
+                    total_reward=total_reward,
+                    start_x=start_x,
+                    end_x=end_x,
+                    delta_x=delta_x,
+                    mean_vx=mean_vx,
+                    start_z=start_z,
+                    end_z=end_z,
+                )
+            )
+    finally:
+        env.close()
+
+    return ForwardProgressSummary(
+        model_path=str(model_path),
+        episodes=summary_episodes,
+    )
+
+
+def print_forward_progress_summary(summary: ForwardProgressSummary) -> None:
+    print('前进评估结果:')
+    print(f'  模型: {summary.model_path}')
+    for episode in summary.episodes:
+        print(
+            f'  episode={episode.episode_index} '
+            f'steps={episode.steps} '
+            f'delta_x={episode.delta_x:.3f} '
+            f'mean_vx={episode.mean_vx:.3f} '
+            f'total_reward={episode.total_reward:.2f} '
+            f'terminated={episode.terminated} '
+            f'truncated={episode.truncated}'
+        )
+    print(
+        '  汇总: '
+        f'mean_delta_x={summary.mean_delta_x:.3f} '
+        f'mean_mean_vx={summary.mean_mean_vx:.3f} '
+        f'mean_total_reward={summary.mean_total_reward:.2f}'
+    )

+ 54 - 0
guguji_rl/scripts/evaluate_forward_progress.py

@@ -0,0 +1,54 @@
+from __future__ import annotations
+
+import argparse
+import sys
+from pathlib import Path
+
+SCRIPT_ROOT = Path(__file__).resolve().parents[1]
+if str(SCRIPT_ROOT) not in sys.path:
+    sys.path.insert(0, str(SCRIPT_ROOT))
+
+from guguji_rl.config import load_config
+from guguji_rl.evaluation import (
+    evaluate_forward_progress,
+    print_forward_progress_summary,
+    resolve_input_path,
+)
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description='Evaluate forward progress metrics for a trained PPO policy.')
+    parser.add_argument('--config', default='configs/walk_ppo.yaml', help='配置文件路径')
+    parser.add_argument('--model', required=True, help='训练好的模型路径,例如 outputs/.../final_model.zip')
+    parser.add_argument('--episodes', type=int, default=None, help='评估 episode 数量,默认读取配置文件')
+    parser.add_argument('--max-steps', type=int, default=None, help='每个 episode 最多评估多少步,默认读取配置文件')
+    parser.add_argument(
+        '--deterministic',
+        action='store_true',
+        help='是否强制使用确定性动作,默认读取配置文件',
+    )
+    return parser.parse_args()
+
+
+def main() -> int:
+    args = parse_args()
+
+    config = load_config(resolve_input_path(SCRIPT_ROOT, args.config))
+    evaluation_config = config['evaluation']
+    episodes = args.episodes if args.episodes is not None else int(evaluation_config['forward_progress_episodes'])
+    max_steps = args.max_steps if args.max_steps is not None else int(evaluation_config['forward_progress_max_steps'])
+    deterministic = args.deterministic or bool(evaluation_config['forward_progress_deterministic'])
+
+    summary = evaluate_forward_progress(
+        config=config,
+        model_path=resolve_input_path(SCRIPT_ROOT, args.model),
+        episodes=episodes,
+        max_steps=max_steps,
+        deterministic=deterministic,
+    )
+    print_forward_progress_summary(summary)
+    return 0
+
+
+if __name__ == '__main__':
+    raise SystemExit(main())

+ 25 - 1
guguji_rl/scripts/train.py

@@ -12,6 +12,7 @@ if str(SCRIPT_ROOT) not in sys.path:
 import torch
 
 from guguji_rl.config import load_config, resolve_project_path, save_yaml
+from guguji_rl.evaluation import evaluate_forward_progress, print_forward_progress_summary
 
 
 def resolve_device(device_name: str) -> str:
@@ -45,6 +46,11 @@ def parse_args() -> argparse.Namespace:
         default=None,
         help='可选指定一个已有 PPO 模型,用于继续训练或做课程学习初始化',
     )
+    parser.add_argument(
+        '--skip-auto-eval',
+        action='store_true',
+        help='训练完成后跳过自动前进评估',
+    )
     return parser.parse_args()
 
 
@@ -136,9 +142,27 @@ def main() -> int:
         callback=checkpoint_callback,
         progress_bar=True,
     )
-    model.save(run_dir / 'final_model')
+    final_model_path = run_dir / 'final_model'
+    model.save(final_model_path)
     env.close()
+
     print(f'训练完成,模型已保存到: {run_dir / "final_model.zip"}')
+
+    evaluation_config = config['evaluation']
+    if bool(evaluation_config.get('auto_forward_progress', True)) and not args.skip_auto_eval:
+        try:
+            # 每轮训练结束后自动做一次前进评估,方便你快速看 delta_x / mean_vx。
+            summary = evaluate_forward_progress(
+                config=config,
+                model_path=final_model_path,
+                episodes=int(evaluation_config['forward_progress_episodes']),
+                max_steps=int(evaluation_config['forward_progress_max_steps']),
+                deterministic=bool(evaluation_config['forward_progress_deterministic']),
+            )
+            print_forward_progress_summary(summary)
+        except Exception as error:
+            print(f'自动前进评估失败: {error}', file=sys.stderr)
+
     return 0