This repository provides tools and datasets for converting Tenhou mahjong game logs into the MJAI format, as well as tools for downloading and converting Mahjong Soul (雀魂) game records. It also includes preprocessed yearly Tenhou datasets for AI research, data analysis, and mahjong strategy modeling.
- Dataset Overview
- Release Structure
- Understanding the MJAI Format
- Tools
- Preparing Your Own Dataset
- Licenses
- Attribution
Complete collection of Tenhou phoenix room (鳳凰卓) game logs in MJAI format, spanning 2009–2026.
| Year | Size | Games | Year | Size | Games |
|---|---|---|---|---|---|
| 2009 | 32MB | ~8k | 2018 | 760MB | ~195k |
| 2010 | 311MB | ~80k | 2019 | 751MB | ~190k |
| 2011 | 449MB | ~115k | 2020 | 958MB | ~245k |
| 2012 | 521MB | ~135k | 2021 | 820MB | ~210k |
| 2013 | 585MB | ~150k | 2022 | 1.1GB | ~280k |
| 2014 | 625MB | ~160k | 2023 | 1.2GB | ~310k |
| 2015 | 672MB | ~170k | 2024 | 1.3GB | ~335k |
| 2016 | 702MB | ~180k | 2025 | 861MB | 197,692 |
| 2017 | 743MB | ~190k | 2026 | 60MB | 13,639 (Jan) |
Total: ~12GB, 18 yearly archives, 2.5M+ games
All datasets contain only 4-player (四人麻雀) hanchan (半荘戦) games from the houou (phoenix) room. No 3-player or tonpuu (東風戦) matches are included.
Example structure of a yearly dataset:
2024.zip
└── 2024/
├── 2024010100gm-00a9-0000-005a39ba.mjson
├── 2024010100gm-00a9-0000-00abc123.mjson
└── ...
Note
The .mjson files are gzip-compressed but retain the .mjson extension for compatibility.
Use tools like gzip, gunzip, or 7-Zip to decompress them.
Each release also includes a corresponding YYYY.db file containing Tenhou game metadata.
The .mjson files use the MJAI protocol, a standard for Mahjong AI communication. Each file contains a sequence of JSON objects, with one object per line, representing the events of a single game.
Tiles are represented using the following string format:
| Category | Notation | Examples |
|---|---|---|
| Manzu (Characters) | 1m, 2m, ..., 9m |
1m, 5m, 9m |
| Pinzu (Circles) | 1p, 2p, ..., 9p |
1p, 5p, 9p |
| Souzu (Bamboo) | 1s, 2s, ..., 9s |
1s, 5s, 9s |
| Honors | E, S, W, N, P, F, C |
East, South, West, North, White, Green, Red |
| Red Fives | 5mr, 5pr, 5sr |
Red 5-Man, Red 5-Pin, Red 5-Sou |
Each JSON object has a type field that describes the game event. Here are some of the most common events you will find in the logs:
-
start_kyoku: Signals the start of a new round.- Provides initial state: round wind (
bakaze), round number (kyoku), dealer (oya), dora indicator (dora_marker), and each player's starting hand (tehais). {"type":"start_kyoku","bakaze":"E","kyoku":1,"oya":0,"dora_marker":"7s", ...}
- Provides initial state: round wind (
-
tsumo: A player draws a tile from the wall.actoris the player's ID (0-3).paiis the tile drawn.{"type":"tsumo","actor":1,"pai":"3m"}
-
dahai: A player discards a tile.tsumogiriistrueif the discarded tile was the one just drawn.{"type":"dahai","actor":1,"pai":"7s","tsumogiri":false}
-
pon/chi/daiminkan: A player makes an open call from another player's discard.actoris the caller,targetis the player who was called on.paiis the called tile, andconsumedare the tiles from the actor's hand used to make the call.{"type":"pon","actor":0,"target":1,"pai":"5sr","consumed":["5s","5s"]}
-
ankan/kakan: A player makes a closed or added kan.{"type":"ankan","actor":1,"consumed":["N","N","N","N"]}
-
reach_accepted: A player's Riichi declaration is accepted.- Shows the 1000-point payment and updated scores.
{"type":"reach_accepted","actor":1,"deltas":[0,-1000,0,0], ...}
-
hora: A player wins the hand (Tsumo or Ron).- Contains full win details: winning tile (
pai), yaku (yakus), fu, fan, points (hora_points), score changes (deltas), and ura-dora indicators (uradora_markers). {"type":"hora","actor":2,"target":2,"pai":"2m", ...}
- Contains full win details: winning tile (
-
ryukyoku: The round ends in a draw.- Specifies the reason (
reason) and score changes. {"type":"ryukyoku","reason":"fanpai", ...}
- Specifies the reason (
For a complete and authoritative reference, please see the original MJAI Protocol Documentation (in Japanese).
git clone https://github.com/NikkeTryHard/tenhou-to-mjai.git
cd tenhou-to-mjai
cargo build --releaseMain CLI binary for both Tenhou and Majsoul pipelines.
| Command Group | Description | Docs |
|---|---|---|
fetch, download, convert, package |
Tenhou houou log pipeline | docs/tenhou-scraper.md |
majsoul fetch-days, scrape-players, scrape-all |
Majsoul UUID discovery | docs/majsoul-scraper.md |
majsoul raw-download, bulk-download |
Majsoul game download | docs/majsoul-scraper.md |
majsoul convert, convert-raw |
Majsoul → MJAI conversion | docs/majsoul-scraper.md |
majsoul resolve-uuids, resolve-paipu |
UUID resolution | docs/majsoul-scraper.md |
majsoul stats |
Pipeline statistics | docs/majsoul-scraper.md |
Quick start (Tenhou):
tenhou-scraper fetch --start 20250101 --end 20251231
tenhou-scraper download
tenhou-scraper convert --output mjai/ --players 4 --hanchan
tenhou-scraper package --input mjai/ --output houou-2025.zipMulti-account Majsoul downloader and protobuf converter. See docs/tensoul-download.md.
cd tensoul-download && uv sync
uv run python async_downloader.py --accounts accounts.txt --password "pass" --todo todo.txt
uv run python convert_pb.pyStreaming validator for MJAI dataset archives. Validates all files inside .tar.zst archives without extracting to disk. See docs/mjai-validator.md.
cd dataset/mjai-validator && cargo build --release
mjai-validator /path/to/dataset/mjai # validate
mjai-validator --clean /path/to/dataset # remove invalid filesTo build your own dataset from Tenhou logs, follow this multi-stage conversion process:
Obtain Tenhou logs from official sources or your own game archive. These will be in .xml format.
First, convert the raw Tenhou XML logs into an intermediate JSON format using a tool like mjlog2json.
# This creates a directory of .json files from your .xml files
mjlog2json "C:\path\to\tenhou_xml\2024" -o "C:\path\to\tenhou_json\2024"Next, use the convlog utility from the mjai-reviewer repository to convert the JSON files into the final MJAI format. This will produce uncompressed .mjson files.
# Assuming you are in the mjai-reviewer project directory
# This converts .json files into uncompressed .mjson files
cargo run --release --bin convlog -- "C:\path\to\tenhou_json\2024" "C:\path\to\tenhou_mjai\2024"Navigate to the output directory (e.g., tenhou_mjai/2024) and compress each .mjson file individually using gzip.
# This will compress each file, creating a .mjson.gz file and deleting the original
gzip *.mjsonImportant
The standard for this dataset is to use the .mjson extension for the compressed files. After running gzip, you must rename the resulting .mjson.gz files back to .mjson. This can be done with a simple shell script or batch renaming tool.
Once the files are compressed and correctly named, create yearly zip archives.
# Example: zipping the 2024 folder
zip -r 2024.zip 2024Keep the Tenhou .db file alongside the yearly archive for metadata reference.
This project’s code is licensed under the Apache License 2.0.
See LICENSE for details.
The datasets are licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
See DATA_LICENSE for details.
You may redistribute, remix, and build upon the data with proper attribution.
Data is derived from:
- Tenhou.net (天鳳) — Phoenix room (鳳凰卓) game logs
This dataset was prepared using the following open-source tools. We extend our gratitude to their authors and contributors.
- Mortal: https://github.com/Equim-chan/Mortal - The target AI engine for the MJAI format.
- mjai-reviewer: https://github.com/Equim-chan/mjai-reviewer - Specifically, the
convlogutility for JSON to MJAI conversion. - mjlog2json: https://github.com/tsubakisakura/mjlog2json - For converting raw Tenhou XML logs to JSON.
- Amae-Koromo: https://amae-koromo.sapk.ch - Mahjong Soul game record API.
- tensoul-py-ng: https://github.com/unStatiK/tensoul-py-ng - Majsoul protobuf to Tenhou JSON converter.
All rights to original game data remain with their respective owners. Converted datasets are redistributed for research and educational use under CC BY 4.0.