Go is a two player board game. The aim is to capture territory from your opponent 💂♂️. The player that has the most territory at the end of the game is the winner.
The rules of Go are complex and many different rule-sets exist. Wikipedia has a good description of the rules so we won't rehash them here.
The rule-set we will use here are the CGOS rules that allow two bots to play each other (as other forms of Go rely on agreements between human players). One amendment to the CGOS rules is that instead of 'Positional SuperKo', we only check for simple Ko.
Professional go is played on a 19x19 board, but we will use a 9x9 board so you don't need your own GPU farm 🐷!
Do not fret if you do not understand the rules fully, we will provide your bots with a list of legal moves for each turn!
Go starts with an empty board and black places the first stone. We will randomise which player is black on each game.
The game ends when both players pass.
As in the CGOS rules, we will use area scoring. So a player's score is the number of stones they have on the board + the number of empty territories they surround at the end of the game.
The score also includes a Komi which is a handicap given to black as it is advantageous to lay the first stone. We will use a komi of 7.5 which is subtracted from black's score at the end of the game.
-
You must build an agent to play Go using either a reinforcement learning algorithm or a planning algorithm (such as monte carlo tree search 🌳) or a mix of both!
-
You can only write code in the
mainfolder. As previously, the entry point to your code will bemain.py, but you can add, and import from, other custom python files in themainfolder.- You can only store data to be used in a competition in a
.pklfile bysave_file(). - You can pkl anything you want, even a dictionary of pytorch networks (or nothing)! Just make sure your choose_move can read it.
- In the competition, your agent will call the
choose_move()function inmain.pyto select a move (choose_move()may call other functions in themainfolder) - We provide an
MCTSclass for you to implement. This class is passed to your choose_move and will persist across calls to choose_move, allowing you to prune the tree. In the competition we will initialise this class for you, so please do not add any arguments to__init__(). (You do not have to use this class). - Any code not in the
mainfolder will not be used.
- You can only store data to be used in a competition in a
-
Your choose_move function will have a limit of 1 second to run!
-
Submission deadline: 2:30pm UTC, Sunday.
- You can update your code after submitting, but not after the deadline.
- Check your submission is valid with
check_submission()
Each matchup will consist of one game of Go between two players with the winning player progressing to the later rounds. The Komi (handicap) controls for the fact that it's an advantage to going first
The competition & discussion will be in Gather Town at 3:30pm UTC on Sunday (60 mins after submission deadline)!
| Reward | |
|---|---|
+1 |
You win |
-1 |
You lose |
0 |
Otherwise |
(A draw is not possible as the Komi is 7.5)
The tuple returned from each env.step() has:
- The
Stateobject (defined instate.py). This is described in detail further down - The
rewardfor each timestep - Whether the point is
done(boolean) - The
infodictionary- This contains a key
legal_moveswith a numpy array containing all legal moves. This is the legal moves that can be taken on the next turn.
- This contains a key
Valid actions are integers in the range 0-81 (inclusive). Each position on the board has is deinfed by an integer - e.g. (row 1, col 0) = 10. You can convert an integer action a to its corresponding board coordinate through the int_to_coord() function.
The integer 81 is the pass action. Two consecutive passes ends the game. If you do not return this action the game may never end!
train()
(Optional)
Write this to train your algorithm from experience in the environment.
(Optional) Returns a pickelable object for your choose_move to use
choose_move()
This acts greedily given the state and network.
In the competition, the choose_move() function is called to make your next move. Takes inputs of state, pkl_file and mcts (see below).
MCTS()
The skeleton of a class that you can use to implement mcts. Use this to persist your mcts tree between steps so it can be pruned.
GoEnv
The environment class controls the game and runs the opponents. It should be used for training your agent.
See example usage in
play_go().
The opponents'
choose_move functions are input at initialisation (when Env(opponent_choose_moves) is called). Every time you call Env.step(), both players make a move according to their choose_move function. Players view the board from their own perspective (i.e player1_board = -player2_board).
GoEnv has a verbose argument which prints the information about the game to the console when set to True. GoEnv also has a render argument which visualises the game in pygame when set to True. This allows you to visualise your AI's skills. You can play against your agent using the human_player() function!
choose_move_randomly()
A basic go playing bot that makes legal random moves, learn to beat this first!
Takes the state as input and outputs an action.
play_go()
Plays a game of Go, which can be rendered through pygame (if render=True).
You can play against your own bot if you set your_choose_move to human_player!
Inputs:
your_choose_move: Function that takes the state and outputs the action for your agent.
opponent_choose_move: Function that takes the state and outputs the action for the opponent.
game_speed_multiplier: controls the gameplay speed. High numbers mean fast games, low numbers mean slow games.
verbose: whether to print info to the console.
render: whether to render the match through pygame
human_player()
Use this in place of a choose_move function to play against your bot yourself!
Left click the board to place a stone, right click to pass.
Takes the state as input and outputs an action.
State
This is a big dataclass. Hold onto your hats.
However there are only 3 important attributes you need to know about:
-
board: a (board size x board size) numpy array containing the board state. The board is represented as follows:-1= white stone0= empty1= black stone- There are other possible values, but these aren't important
-
recent_moves: a tuple of allPlayerMoves made in the game so far. This is useful for keeping track of the game history & as a unique identifier for a state. 😉 -
to_play: signifies whose turn it is to play at the current state. EitherBLACKorWHITE.
The other attributes are explained in the docstring, although can be ignored (unless building a pro-level Go AI).
int_to_coord()
A function that converts from an integer to a coordinate tuple (or None, if the pass move).
PlayerMove
A dataclass that simply represents a move made by a player.
It has 2 attributes:
color: either WHITE or BLACK
move: the move made by the player. This is either an integer in the range 0-81 (inclusive) or None if the player passes.
reward_function()
Gives the reward that would be recieved in the State for the player playing as Black. This reward \* -1 is the reward recieved by the player playing as White. `1` if black wins, `-1` if white wins, `0` otherwise.
transition_funcion()
Gives the successor State object given the current State and the action int made by the player whose turn it is to play.
is_terminal()
Returns True if the game is over, False otherwise.
Takes the State as input.
