PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF). Implementation includes DQN extensions with which FQF represents the most powerful Rainbow version - supports multi env for parallelization to reduce wall clock time. The FQF Baseline in this repository is already a Double FQF version with target network!
For details on the algorithm check the article on medium
Extension included:
- Prioritized Experience Replay Buffer (PER)
- Noisy Layer for exploration
- N-step Bootstrapping
- Dueling Version
- Munchausen RL
- Parallelization with multi environments. 4 parallel environments reduced the wall clock time for the CartPole environment to less than 1/3.
Trained and tested on:
Python 3.6 PyTorch 1.4.0 Numpy 1.15.2 gym 0.10.11
With the script version it is possible to train on simple environments like CartPole-v0 and LunarLander-v2 or on Atari games with image inputs!
To run the script version execute in your command line: python run.py -info fqf_run1
To run the script version on the Atari game Pong: python run.py -env PongNoFrameskip-v4 -info fqf_pong1
To see the options: python run.py -h
-agent, choices=["iqn","fqf+per","noisy_fqf","noisy_fqf+per","dueling","dueling+per", "noisy_dueling","noisy_dueling+per"], Specify which type of FQF agent you want to train, default is FQF - baseline! -env, Name of the Environment, default = CartPole-v0 -frames, Number of frames to train, default = 60000 -eval_every, Evaluate every x frames, default = 1000 -eval_runs, Number of evaluation runs, default = 5" -seed, Random seed to replicate training runs, default = 1 -N, Number of quantiles, default = 32 -ec, --entropy_coeff, Entropy coefficient, default = 0.001 -bs, --batch_size, Batch size for updating the DQN, default = 32 -layer_size, Size of the hidden layer, default=512 -n_step, Multistep IQN, default = 1 -m, --memory_size, Replay memory size, default = 1e5 -munchausen, choices=[0,1], Use Munchausen RL loss for training if set to 1 (True), default = 0 -lr, Learning rate, default = 5e-4 -g, --gamma, Discount factor gamma, default = 0.99 -t, --tau, Soft update parameter tat, default = 1e-3 -eps_frames, Linear annealed frames for Epsilon, default = 5000 -min_eps, Final epsilon greedy value, default = 0.025 -w , --worker, Number of parallel environments. performance for more than 4 worker can be unstable since batchsize increased proportionally, default = 0 -info, Name of the training run -save_model, choices=[0,1] Specify if the trained network shall be saved or not, default is 0 - not saved! tensorboard --logdir=runs
200000 Frames (~54 min), eps_frames: 20000, eval_every: 5000 
800000 Frames (IQN: ~95 min 3 worker, FQF: ~240 min 2 worker) Authors of the paper say: FQF is roughly 20% slower than IQN due to the additional fraction proposal network. Also IQN uses N=8 and FQF N=32 quantiles!
hyperparameter:
- frames 800000
- eps_frames 80000
- min_eps 0.025
- lr 2e-4
- tau 1e-3
- m 20000
- gamma 0.99
- layer_size 512
Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.
Big thank you also to Toshiki Watanabe who helped me with the implementation and where I have the training routine for the fraction proposal network from! His Repo
- Sebastian Dittert
Feel free to use this code for your own projects or research. For citation:
@misc{FQF and Extensions, author = {Dittert, Sebastian}, title = {Fully Parameterized Quantile Function (FQF) and Extensions}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/BY571/FQF-and-Extensions}}, } 
