Name	Name	Last commit message	Last commit date
Latest commit History 52 Commits
lib	lib
models	models
.gitignore	.gitignore
README.md	README.md
const.py	const.py
human.py	human.py
main.py	main.py
purge.py	purge.py
requirements.txt	requirements.txt
viewer.py	viewer.py

SuperGo

A student implementation of AlphaGo Zero paper with documentation.

Ongoing project.

TODO (in order of priority)

File of constants that match the paper constants
OGS / KGS API
Use logging instead of prints ?

CURRENTLY DOING

Optimization
MCTS
- Improving the multithreading of the search
Do something about the process leaking
Clean code, create install script, write documentation
Currently training on 9x9 64 simulations 10 layers ResNet on my computer, didn't seem to have learn a lot after 2800~ games, 375k training epochs and 55 improvements ( = trained network replacing the current best network) due to the duration of the evaluation process : 2 games at a time (only have 8 cores on my computer, 1 for self-play, 1 for training, 3 for self-play games, 1 for evaluation, 2 for evaluations games) which takes approximately 45s / game so 6 to 7 minutes per evaluation.

DONE

Game that are longer than the threshold of moves are now used
MCTS
- Tree search
- Dirichlet noise to prior probabilities in the rootnode
- Adaptative temperature (either take max or proportionally)
- Sample random rotation or reflection in the dihedral group
- Multithreading of search (kinda badly implemented for now)
- Batch size evaluation to save computation
Dihedral group of board for more training samples
Learning without MCTS doesnt seem to work
Resume training
GTP on trained models (human.py, to plug with Sabaki)
Learning rate annealing (see this)
Better display for game (viewer.py, converting self-play games into GTP and then using Sabaki)
Make the 3 components (self-play, training, evaluation) asynchronous
Multiprocessing of games for self-play and evaluation
Models and training without MCTS
Evaluation
Tromp Taylor scoring
Dataset ring buffer of self-play games
Loading saved models
Database for self-play games

LONG TERM PLAN ?

Compile my own version of Sabaki to watch games automatically while traning
Statistics
Resignation ?
Training on a big computer / server once everything is ready ?

Resources

Official AlphaGo Zero paper
Custom environment implementation using pachi_py following the implementation that was originally made on OpenAI Gym
Using PyTorch for the neural networks
Using Sabaki for the GUI
General scheme, cool design
Monte Carlo tree search explaination
Nice tree search implementation

Statistics

For a 10 layers deep Resnet evaluated on 50 games 64 simulations

9x9 board

0.2377991s / move - 0.00371561093s / simulation 2 threads 2 batch_size_eval
0.1624937s / move - 0.00253896406s / simulation 4 threads 4 batch_size_eval
0.1465123s / move - 0.00228925468s / simulation 8 threads 8 batch_size_eval
0.1401098s / move - 0.00218921563s / simulation 16 threads 16 batch_size_eval

19x19 board

0.6306054s / move - 0.012612108s / simulation with 2 threads and 2 batch_size_eval with 50 simulations

Differences with the official paper

No resignation
PyTorch instead of Tensorflow
Python instead of (probably) C++ / C

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SuperGo

TODO (in order of priority)

CURRENTLY DOING

DONE

LONG TERM PLAN ?

Resources

Statistics

For a 10 layers deep Resnet evaluated on 50 games 64 simulations

9x9 board

19x19 board

Differences with the official paper

About

Uh oh!

Releases

Packages

Languages

digits122/SuperGo

Folders and files

Latest commit

History

Repository files navigation

SuperGo

TODO (in order of priority)

CURRENTLY DOING

DONE

LONG TERM PLAN ?

Resources

Statistics

For a 10 layers deep Resnet evaluated on 50 games 64 simulations

9x9 board

19x19 board

Differences with the official paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages