Fixed epsilon decay in dqn example #117

carsondenison · 2021-12-03T21:23:51Z

Before submitting

I raised this issue on github, issue Found mistake in pytorch-lightning DQN example. How do I upload a fix? pytorch-lightning#10883. The user @awaelchli said to fork the repo, clone it to my local machine, make my changes, push to my repo, and then submit a pull request.
This is my first attempt to contribute to an open-source project, so please let me know if I've done anything incorrectly.
I could not find where the unit tests are kept for the example colab notebooks. I would be happy to upload my unit tests for the get_epsilon method, but I don't know where to put them in the codebase.

What does this PR do?

This fixes issue Lightning-AI/pytorch-lightning#10883, where there was an incorrect version of epsilon decay for the epsilon-random policy of a DQN. The original code has a single train_step with self.hparams.eps_start and then immediately switches to self.hparams.eps_end. The intended behavior is to linearly decrease epsilon from self.haparms.eps_start to self.hparams.eps_end over the first self.hparams.eps_last_frame steps. I wrote a small function get_epsilon which fixes this logic and returns the correct epsilon.

I have also made a few minor changes on other lines, because the code would not run on my local machine without these changes. Specifically, the type hint for the __iter__ method of the RLDataset class was a Tuple, and should be an Iterator[Tuple], because it returns a generator of tuples representing (state, action, reward, done, new_state). Additionally, on line 264 (formerly 276), I got an error that the index for the gather() function must be of type int64, so I cast the actions tensor to type long. Finally, I added logging of self.episode_reward and epsilon, so that I could see that a) it still learned successfully, and b) my intended changes to epsilon were working as intended.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Sure did!

for more information, see https://pre-commit.ci

lightning_examples/reinforce-learning-DQN/dqn.py

lightning_examples/reinforce-learning-DQN/.meta.yml

fixed epsilon decay in dqn example

6e5f5bc

carsondenison requested review from Borda, SeanNaren, aribornstein, carmocca, ethanwharris and kaushikb11 as code owners December 3, 2021 21:23

[pre-commit.ci] auto fixes from pre-commit.com hooks

8ac3121

for more information, see https://pre-commit.ci

Borda added the bug / fix Something isn't working label Dec 3, 2021

Merge branch 'main' into main

81de957

Borda approved these changes Dec 4, 2021

View reviewed changes

lightning_examples/reinforce-learning-DQN/dqn.py Outdated Show resolved Hide resolved

Apply suggestions from code review

a7dde3c

awaelchli approved these changes Dec 4, 2021

View reviewed changes

requirements

f0b4827

Borda reviewed Dec 4, 2021

View reviewed changes

lightning_examples/reinforce-learning-DQN/dqn.py Outdated Show resolved Hide resolved

Apply suggestions from code review

a4ec61b

Borda reviewed Dec 4, 2021

View reviewed changes

lightning_examples/reinforce-learning-DQN/.meta.yml Outdated Show resolved Hide resolved

Borda reviewed Dec 4, 2021

View reviewed changes

lightning_examples/reinforce-learning-DQN/.meta.yml Outdated Show resolved Hide resolved

Apply suggestions from code review

6523ea6

Borda enabled auto-merge (squash) December 4, 2021 23:13

Borda merged commit b3762a2 into Lightning-AI:main Dec 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed epsilon decay in dqn example #117

Fixed epsilon decay in dqn example #117

Uh oh!

carsondenison commented Dec 3, 2021 •

edited by Borda

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

4 participants

Fixed epsilon decay in dqn example #117

Fixed epsilon decay in dqn example #117

Uh oh!

Conversation

carsondenison commented Dec 3, 2021 • edited by Borda Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

What does this PR do?

PR review

Did you have fun?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

4 participants

carsondenison commented Dec 3, 2021 •

edited by Borda

Loading