A student implementation of AlphaGo Zero paper with documentation.
Ongoing project.
- File of constants that match the paper constants
- OGS / KGS API
- Use logging instead of prints ?
- Optimization
- MCTS
- Improving the multithreading of the search
- Do something about the process leaking
- Clean code, create install script, write documentation
- Currently training on 9x9 64 simulations 10 layers ResNet on my computer, didn't seem to have learn a lot after 2800~ games, 375k training epochs and 55 improvements ( = trained network replacing the current best network) due to the duration of the evaluation process : 2 games at a time (only have 8 cores on my computer, 1 for self-play, 1 for training, 3 for self-play games, 1 for evaluation, 2 for evaluations games) which takes approximately 45s / game so 6 to 7 minutes per evaluation.
- Game that are longer than the threshold of moves are now used
- MCTS
- Tree search
- Dirichlet noise to prior probabilities in the rootnode
- Adaptative temperature (either take max or proportionally)
- Sample random rotation or reflection in the dihedral group
- Multithreading of search (kinda badly implemented for now)
- Batch size evaluation to save computation
- Dihedral group of board for more training samples
- Learning without MCTS doesnt seem to work
- Resume training
- GTP on trained models (human.py, to plug with Sabaki)
- Learning rate annealing (see this)
- Better display for game (viewer.py, converting self-play games into GTP and then using Sabaki)
- Make the 3 components (self-play, training, evaluation) asynchronous
- Multiprocessing of games for self-play and evaluation
- Models and training without MCTS
- Evaluation
- Tromp Taylor scoring
- Dataset ring buffer of self-play games
- Loading saved models
- Database for self-play games
- Compile my own version of Sabaki to watch games automatically while traning
- Statistics
- Resignation ?
- Training on a big computer / server once everything is ready ?
- Official AlphaGo Zero paper
- Custom environment implementation using pachi_py following the implementation that was originally made on OpenAI Gym
- Using PyTorch for the neural networks
- Using Sabaki for the GUI
- General scheme, cool design
- Monte Carlo tree search explaination
- Nice tree search implementation
- 0.2377991s / move - 0.00371561093s / simulation 2 threads 2 batch_size_eval
- 0.1624937s / move - 0.00253896406s / simulation 4 threads 4 batch_size_eval
- 0.1465123s / move - 0.00228925468s / simulation 8 threads 8 batch_size_eval
- 0.1401098s / move - 0.00218921563s / simulation 16 threads 16 batch_size_eval
- 0.6306054s / move - 0.012612108s / simulation with 2 threads and 2 batch_size_eval with 50 simulations
- No resignation
- PyTorch instead of Tensorflow
- Python instead of (probably) C++ / C