Skip to content

digits122/SuperGo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SuperGo

A student implementation of AlphaGo Zero paper with documentation.

Ongoing project.

TODO (in order of priority)

  • File of constants that match the paper constants
  • OGS / KGS API
  • Use logging instead of prints ?

CURRENTLY DOING

  • Optimization
  • MCTS
    • Improving the multithreading of the search
  • Do something about the process leaking
  • Clean code, create install script, write documentation
  • Currently training on 9x9 64 simulations 10 layers ResNet on my computer, didn't seem to have learn a lot after 2800~ games, 375k training epochs and 55 improvements ( = trained network replacing the current best network) due to the duration of the evaluation process : 2 games at a time (only have 8 cores on my computer, 1 for self-play, 1 for training, 3 for self-play games, 1 for evaluation, 2 for evaluations games) which takes approximately 45s / game so 6 to 7 minutes per evaluation.

DONE

  • Game that are longer than the threshold of moves are now used
  • MCTS
    • Tree search
    • Dirichlet noise to prior probabilities in the rootnode
    • Adaptative temperature (either take max or proportionally)
    • Sample random rotation or reflection in the dihedral group
    • Multithreading of search (kinda badly implemented for now)
    • Batch size evaluation to save computation
  • Dihedral group of board for more training samples
  • Learning without MCTS doesnt seem to work
  • Resume training
  • GTP on trained models (human.py, to plug with Sabaki)
  • Learning rate annealing (see this)
  • Better display for game (viewer.py, converting self-play games into GTP and then using Sabaki)
  • Make the 3 components (self-play, training, evaluation) asynchronous
  • Multiprocessing of games for self-play and evaluation
  • Models and training without MCTS
  • Evaluation
  • Tromp Taylor scoring
  • Dataset ring buffer of self-play games
  • Loading saved models
  • Database for self-play games

LONG TERM PLAN ?

  • Compile my own version of Sabaki to watch games automatically while traning
  • Statistics
  • Resignation ?
  • Training on a big computer / server once everything is ready ?

Resources

Statistics

For a 10 layers deep Resnet evaluated on 50 games 64 simulations

9x9 board

  • 0.2377991s / move - 0.00371561093s / simulation 2 threads 2 batch_size_eval
  • 0.1624937s / move - 0.00253896406s / simulation 4 threads 4 batch_size_eval
  • 0.1465123s / move - 0.00228925468s / simulation 8 threads 8 batch_size_eval
  • 0.1401098s / move - 0.00218921563s / simulation 16 threads 16 batch_size_eval

19x19 board

  • 0.6306054s / move - 0.012612108s / simulation with 2 threads and 2 batch_size_eval with 50 simulations

Differences with the official paper

  • No resignation
  • PyTorch instead of Tensorflow
  • Python instead of (probably) C++ / C

About

A student implementation of Alpha Go Zero

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%