You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to share with you my efforts and resulting failures at training a chess agent using TorchRL and ask you for guidance.
A few weeks back I had the idea of training a chess AI using PPO with self learning. In the past I had already made extensive use of Pytorch as a deep learning library. When I discovered PytorchRL, I was excited to try it out even though it is clearly not stable yet (I hope it will become stable eventually). So, I gave it a go. After a week into much coding and debugging, I realized why it isn't a mature library yet because I had to create many of my own subclasses for utilities. But here is more or less what came out of it:
In the days following that, I started running many experiments and tried to reach an AI that can at least consistently win against a randomly acting agent. I tried:
Self-learning from scratch (starting from random actor policy)
Self-learning from pretrained weights (pretrained on the winner moves of ~20k chess games)
Learning against a random agent from scratch
Learning against a random agent from pretrained weights
But nothing seemed to work. The actors I trained were consistently performing poorly, and by poor I mean:
Around 80% of the time the game ends in a draw.
In the rest, the random agent manages to win 50% of the time or less. But the random agent can always win a match after a few non-draw games.
Here is some information on how the actions are decided at each step by my agent:
The board state and the turn info -> action_net_0 -> Which piece to move (action_0)
Turn is False for black and True for white
action_net_0 is an MLP
The board state, the turn info and action_0 -> action_net_1 -> Where to move the selected piece (action_1) The actions are always drawn from the next legal moves given by the chess engine.
Running my training loop is super simple. After installing the dependencies, all it takes is pipenv run python ./src/train.py
Currently, I feel quite stuck and don't know what to do next to get it working as I wish. So, now I would like to ask you RL experts, what changes would I need to train my agent so that it can at least consistently win against a random actor?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Greetings dear community members!
I would like to share with you my efforts and resulting failures at training a chess agent using TorchRL and ask you for guidance.
A few weeks back I had the idea of training a chess AI using PPO with self learning. In the past I had already made extensive use of Pytorch as a deep learning library. When I discovered PytorchRL, I was excited to try it out even though it is clearly not stable yet (I hope it will become stable eventually). So, I gave it a go.
After a week into much coding and debugging, I realized why it isn't a mature library yet because I had to create many of my own subclasses for utilities. But here is more or less what came out of it:
In the days following that, I started running many experiments and tried to reach an AI that can at least consistently win against a randomly acting agent. I tried:
But nothing seemed to work. The actors I trained were consistently performing poorly, and by poor I mean:
Here is some information on how the actions are decided at each step by my agent:
action_0)FalseforblackandTrueforwhiteaction_net_0is an MLPaction_0-> action_net_1 -> Where to move the selected piece (action_1)The actions are always drawn from the next legal moves given by the chess engine.
Running my training loop is super simple. After installing the dependencies, all it takes is
pipenv run python ./src/train.pyCurrently, I feel quite stuck and don't know what to do next to get it working as I wish. So, now I would like to ask you RL experts, what changes would I need to train my agent so that it can at least consistently win against a random actor?
Beta Was this translation helpful? Give feedback.
All reactions