Reinforce Policy Update wrong #722

karanchahal · 2020-02-28T03:26:25Z

The update step for REINFORCE appears to be wrong. Instead of multiplying log probs and rewards at each step, we should sum up all the rewards to go, and the log probs across the entire episode. After getting these sums, we should multiply these 2 values together. This was mentioned in Sergey Levine's lecture on Policy gradients where he derived the entire policy gradient algorithm.

Source: https://www.youtube.com/watch?v=Ds1trXd6pos&list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A&index=5

The update step for REINFORCE appears to be wrong. Instead of multiplying log probs and rewards at each step, we should sum up all the rewards to go, and the log probs across the entire episode. After getting these sums, we should multiply these 2 values together. This was mentioned in Sergey Levine's lecture on Policy gradients where he derived the entire policy gradient algorithm. Source: https://www.youtube.com/watch?v=Ds1trXd6pos&list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A&index=5

facebook-github-bot added the cla signed label Oct 30, 2020

msaroufim added need review reinforcement learning and removed need review labels Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reinforce Policy Update wrong #722

Reinforce Policy Update wrong #722

Uh oh!

karanchahal commented Feb 28, 2020

Labels

3 participants

Reinforce Policy Update wrong #722

Are you sure you want to change the base?

Reinforce Policy Update wrong #722

Uh oh!

Conversation

karanchahal commented Feb 28, 2020

Labels

3 participants