Skip to content

PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF) and Extensions: N-step Bootstrapping, PER, Noisy Layer, Dueling Networks, and parallelization.

License

Notifications You must be signed in to change notification settings

BY571/FQF-and-Extensions

Repository files navigation

Fully Parameterized Quantile Function (FQF) and Extensions

PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF). Implementation includes DQN extensions with which FQF represents the most powerful Rainbow version - supports multi env for parallelization to reduce wall clock time. The FQF Baseline in this repository is already a Double FQF version with target network!

For details on the algorithm check the article on medium

Extension included:

  • Prioritized Experience Replay Buffer (PER)
  • Noisy Layer for exploration
  • N-step Bootstrapping
  • Dueling Version
  • Munchausen RL
  • Parallelization with multi environments. 4 parallel environments reduced the wall clock time for the CartPole environment to less than 1/3.

Dependencies

Trained and tested on:

Python 3.6 PyTorch 1.4.0 Numpy 1.15.2 gym 0.10.11 

Train:

With the script version it is possible to train on simple environments like CartPole-v0 and LunarLander-v2 or on Atari games with image inputs!

To run the script version execute in your command line: python run.py -info fqf_run1

To run the script version on the Atari game Pong: python run.py -env PongNoFrameskip-v4 -info fqf_pong1

Hyperparameter

To see the options: python run.py -h

-agent, choices=["iqn","fqf+per","noisy_fqf","noisy_fqf+per","dueling","dueling+per", "noisy_dueling","noisy_dueling+per"], Specify which type of FQF agent you want to train, default is FQF - baseline! -env, Name of the Environment, default = CartPole-v0 -frames, Number of frames to train, default = 60000 -eval_every, Evaluate every x frames, default = 1000 -eval_runs, Number of evaluation runs, default = 5" -seed, Random seed to replicate training runs, default = 1 -N, Number of quantiles, default = 32 -ec, --entropy_coeff, Entropy coefficient, default = 0.001 -bs, --batch_size, Batch size for updating the DQN, default = 32 -layer_size, Size of the hidden layer, default=512 -n_step, Multistep IQN, default = 1 -m, --memory_size, Replay memory size, default = 1e5 -munchausen, choices=[0,1], Use Munchausen RL loss for training if set to 1 (True), default = 0 -lr, Learning rate, default = 5e-4 -g, --gamma, Discount factor gamma, default = 0.99 -t, --tau, Soft update parameter tat, default = 1e-3 -eps_frames, Linear annealed frames for Epsilon, default = 5000 -min_eps, Final epsilon greedy value, default = 0.025 -w , --worker, Number of parallel environments. performance for more than 4 worker can be unstable since batchsize increased proportionally, default = 0 -info, Name of the training run -save_model, choices=[0,1] Specify if the trained network shall be saved or not, default is 0 - not saved! 

Observe training results

tensorboard --logdir=runs

Results

CartPole Results

alttext

LunarLander Results

200000 Frames (~54 min), eps_frames: 20000, eval_every: 5000 alttext

Pong Results

800000 Frames (IQN: ~95 min 3 worker, FQF: ~240 min 2 worker) Authors of the paper say: FQF is roughly 20% slower than IQN due to the additional fraction proposal network. Also IQN uses N=8 and FQF N=32 quantiles!

hyperparameter:

  • frames 800000
  • eps_frames 80000
  • min_eps 0.025
  • lr 2e-4
  • tau 1e-3
  • m 20000
  • gamma 0.99
  • layer_size 512

alttext

Help and issues:

Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.

Paper and References:

Big thank you also to Toshiki Watanabe who helped me with the implementation and where I have the training routine for the fraction proposal network from! His Repo

Author

  • Sebastian Dittert

Feel free to use this code for your own projects or research. For citation:

@misc{FQF and Extensions, author = {Dittert, Sebastian}, title = {Fully Parameterized Quantile Function (FQF) and Extensions}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/BY571/FQF-and-Extensions}}, } 

About

PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF) and Extensions: N-step Bootstrapping, PER, Noisy Layer, Dueling Networks, and parallelization.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published