PPO · Clipped Policy Gradient
unseel.com · ratio clip [1−ε, 1+ε] · ε = 0.2
ratio r(θ) 1.00
objective 0.00
State
policy step (ratio)
clip band [1−ε,1+ε]
surrogate objective
clipped — gradient = 0
Unseel.com · PPO