This is a ML/reinforcement learning project. The goal was to explore how the DQN approach can be used to build an AI bot for playing the game known as Hip.
The game of Hip, introduced by Martin Gardner back in the 1950s, is a board game for two players. The board is a rectangular grid, originally proposed to be of dimension 6-by-6, but it can be generalized to any n-by-m grid with
On a n-by-n board for
No draws are possible on a n-by-n board for
On a 2n-by-2n board, the second player has a strategy that guarantees at least a draw if
We train a Deep Q-Network with a single agent playing against itself on a board of a given dimension.Effectively, it is a cooperative mode since the reward function for the single agent returns the same penalty value, whenever either one of the two players (governed by the same agent) loses. Trained over 50000 episodes on a 6-by-6 board, the bot achieves the mean game length of about 27 moves.
See game/ for the game logic module. Simple graphics is rendered by pygame (game_graphics/). The DQN agent operates on a two-layer NN model run in pytorch (ai/).
See main.py to run single game or a match between humans/bots, and train_agent.py to run a training session for a DQN agent. The reward function is fully customizable. So there is room for experiments or adapting it to other pattern-avoidance or pattern-creation board games of a similar kind. The game and the NN parameters are in config.py.
The bot training and bot playing components require PyTorch. Otherwise, the board rendering and the game logic modules can be used as stand-alones.
Train a Monte Carlo search tree model for an antagonistic play. Otherwise, DFS with pruning would probably work fine on boards that are not too large.
[1] Martin Gardner, New Mathematical Diversions: Revised Edition, MAA Press, 2000
[2] The Deep Q-Learning Algorithm, https://huggingface.co/learn/deep-rl-course/en/unit3/deep-q-algorithm
[3] Linus Strömberg, Viktor Lind Board Game AI Using Reinforcement Learning, https://www.diva-portal.org/smash/get/diva2:1680520/FULLTEXT01.pdf




