PPO ·
Clipped Policy Gradient
un
seel
.com · ratio clip [1−ε, 1+ε] · ε = 0.2
ratio r(θ)
1.00
objective
0.00
State
—
policy step (ratio)
clip band [1−ε,1+ε]
surrogate objective
clipped — gradient = 0
▶ Play
←
→
🔇 Unmute
Reset
Un
seel
.com · PPO