Skip to content

Conversation

@ebetica
Copy link
Contributor

@ebetica ebetica commented Jan 18, 2017

Adds a REINFORCE example: converges much quicker.

REINFORCE:

[2017-01-18 17:53:34,498] Making new env: CartPole-v0 Episode 10 Last length: 13 Average length: 10.64 Episode 20 Last length: 24 Average length: 11.37 Episode 30 Last length: 115 Average length: 15.63 Episode 40 Last length: 17 Average length: 19.16 Episode 50 Last length: 77 Average length: 22.33 Episode 60 Last length: 52 Average length: 24.56 Episode 70 Last length: 67 Average length: 28.63 Episode 80 Last length: 277 Average length: 41.01 Episode 90 Last length: 176 Average length: 155.29 Episode 100 Last length: 101 Average length: 151.26 Solved! Running reward is now 292.3398546817099 and the last episode runs to 9999 time steps! 

Actor critic:

[2017-01-18 17:53:52,016] Making new env: CartPole-v0 Episode 10 Last length: 26 Average length: 10.70 Episode 20 Last length: 32 Average length: 12.37 Episode 30 Last length: 45 Average length: 17.65 Episode 40 Last length: 195 Average length: 29.62 Episode 50 Last length: 334 Average length: 48.34 Episode 60 Last length: 482 Average length: 67.54 Episode 70 Last length: 214 Average length: 94.47 Episode 80 Last length: 114 Average length: 97.07 Episode 90 Last length: 114 Average length: 96.25 Episode 100 Last length: 165 Average length: 100.08 Episode 110 Last length: 110 Average length: 101.56 Episode 120 Last length: 17 Average length: 101.07 Episode 130 Last length: 101 Average length: 101.89 Episode 140 Last length: 119 Average length: 100.53 Episode 150 Last length: 100 Average length: 101.34 Episode 160 Last length: 79 Average length: 99.88 Episode 170 Last length: 110 Average length: 99.92 Episode 180 Last length: 87 Average length: 99.62 Episode 190 Last length: 114 Average length: 98.74 Episode 200 Last length: 119 Average length: 98.36 Episode 210 Last length: 149 Average length: 101.76 Episode 220 Last length: 346 Average length: 114.29 Episode 230 Last length: 442 Average length: 135.77 Episode 240 Last length: 389 Average length: 152.02 Episode 250 Last length: 444 Average length: 199.64 Solved! Running reward is now 202.3074421413447 and the last episode runs to 466 time steps! 
@soumith soumith merged commit 7f612f9 into pytorch:master Jan 18, 2017
@apaszke
Copy link
Contributor

apaszke commented Jan 19, 2017

Is it still actor_critic after this change? It seems to me that value head isn't used at all now

@soumith
Copy link
Member

soumith commented Jan 19, 2017

there's two files now. one REINFORCE and another actor-critic (that's what zeming said)

@apaszke
Copy link
Contributor

apaszke commented Jan 19, 2017

Yeah, but look at the code. The value head is trained to learn the value function, but it's no longer used. The code computes a baseline based on the received rewards, not on the computed value estimate

@ebetica
Copy link
Contributor Author

ebetica commented Jan 19, 2017

Good catch, I'll make a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants