Skip to content

Converts Tenhou game logs into mjai format for Mortal AI training.

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
LICENSE-DATA
Notifications You must be signed in to change notification settings

NikkeTryHard/tenhou-to-mjai

Repository files navigation

Total Downloads

Tenhou to MJAI

This repository provides tools and datasets for converting Tenhou mahjong game logs into the MJAI format, as well as tools for downloading and converting Mahjong Soul (雀魂) game records. It also includes preprocessed yearly Tenhou datasets for AI research, data analysis, and mahjong strategy modeling.


Table of Contents


Dataset Overview

Complete collection of Tenhou phoenix room (鳳凰卓) game logs in MJAI format, spanning 2009–2026.

Year Size Games Year Size Games
2009 32MB ~8k 2018 760MB ~195k
2010 311MB ~80k 2019 751MB ~190k
2011 449MB ~115k 2020 958MB ~245k
2012 521MB ~135k 2021 820MB ~210k
2013 585MB ~150k 2022 1.1GB ~280k
2014 625MB ~160k 2023 1.2GB ~310k
2015 672MB ~170k 2024 1.3GB ~335k
2016 702MB ~180k 2025 861MB 197,692
2017 743MB ~190k 2026 60MB 13,639 (Jan)

Total: ~12GB, 18 yearly archives, 2.5M+ games

All datasets contain only 4-player (四人麻雀) hanchan (半荘戦) games from the houou (phoenix) room. No 3-player or tonpuu (東風戦) matches are included.


Release Structure

Example structure of a yearly dataset:

2024.zip
 └── 2024/
     ├── 2024010100gm-00a9-0000-005a39ba.mjson
     ├── 2024010100gm-00a9-0000-00abc123.mjson
     └── ...

Note

The .mjson files are gzip-compressed but retain the .mjson extension for compatibility. Use tools like gzip, gunzip, or 7-Zip to decompress them.

Each release also includes a corresponding YYYY.db file containing Tenhou game metadata.


Understanding the MJAI Format

The .mjson files use the MJAI protocol, a standard for Mahjong AI communication. Each file contains a sequence of JSON objects, with one object per line, representing the events of a single game.

Tile Notation

Tiles are represented using the following string format:

Category Notation Examples
Manzu (Characters) 1m, 2m, ..., 9m 1m, 5m, 9m
Pinzu (Circles) 1p, 2p, ..., 9p 1p, 5p, 9p
Souzu (Bamboo) 1s, 2s, ..., 9s 1s, 5s, 9s
Honors E, S, W, N, P, F, C East, South, West, North, White, Green, Red
Red Fives 5mr, 5pr, 5sr Red 5-Man, Red 5-Pin, Red 5-Sou

Common Event Types

Each JSON object has a type field that describes the game event. Here are some of the most common events you will find in the logs:

  • start_kyoku: Signals the start of a new round.

    • Provides initial state: round wind (bakaze), round number (kyoku), dealer (oya), dora indicator (dora_marker), and each player's starting hand (tehais).
    • {"type":"start_kyoku","bakaze":"E","kyoku":1,"oya":0,"dora_marker":"7s", ...}
  • tsumo: A player draws a tile from the wall.

    • actor is the player's ID (0-3). pai is the tile drawn.
    • {"type":"tsumo","actor":1,"pai":"3m"}
  • dahai: A player discards a tile.

    • tsumogiri is true if the discarded tile was the one just drawn.
    • {"type":"dahai","actor":1,"pai":"7s","tsumogiri":false}
  • pon / chi / daiminkan: A player makes an open call from another player's discard.

    • actor is the caller, target is the player who was called on. pai is the called tile, and consumed are the tiles from the actor's hand used to make the call.
    • {"type":"pon","actor":0,"target":1,"pai":"5sr","consumed":["5s","5s"]}
  • ankan / kakan: A player makes a closed or added kan.

    • {"type":"ankan","actor":1,"consumed":["N","N","N","N"]}
  • reach_accepted: A player's Riichi declaration is accepted.

    • Shows the 1000-point payment and updated scores.
    • {"type":"reach_accepted","actor":1,"deltas":[0,-1000,0,0], ...}
  • hora: A player wins the hand (Tsumo or Ron).

    • Contains full win details: winning tile (pai), yaku (yakus), fu, fan, points (hora_points), score changes (deltas), and ura-dora indicators (uradora_markers).
    • {"type":"hora","actor":2,"target":2,"pai":"2m", ...}
  • ryukyoku: The round ends in a draw.

    • Specifies the reason (reason) and score changes.
    • {"type":"ryukyoku","reason":"fanpai", ...}

For a complete and authoritative reference, please see the original MJAI Protocol Documentation (in Japanese).


Tools

Installation

git clone https://github.com/NikkeTryHard/tenhou-to-mjai.git
cd tenhou-to-mjai
cargo build --release

tenhou-scraper (Rust CLI)

Main CLI binary for both Tenhou and Majsoul pipelines.

Command Group Description Docs
fetch, download, convert, package Tenhou houou log pipeline docs/tenhou-scraper.md
majsoul fetch-days, scrape-players, scrape-all Majsoul UUID discovery docs/majsoul-scraper.md
majsoul raw-download, bulk-download Majsoul game download docs/majsoul-scraper.md
majsoul convert, convert-raw Majsoul → MJAI conversion docs/majsoul-scraper.md
majsoul resolve-uuids, resolve-paipu UUID resolution docs/majsoul-scraper.md
majsoul stats Pipeline statistics docs/majsoul-scraper.md

Quick start (Tenhou):

tenhou-scraper fetch --start 20250101 --end 20251231
tenhou-scraper download
tenhou-scraper convert --output mjai/ --players 4 --hanchan
tenhou-scraper package --input mjai/ --output houou-2025.zip

tensoul-download (Python)

Multi-account Majsoul downloader and protobuf converter. See docs/tensoul-download.md.

cd tensoul-download && uv sync
uv run python async_downloader.py --accounts accounts.txt --password "pass" --todo todo.txt
uv run python convert_pb.py

mjai-validator (Rust)

Streaming validator for MJAI dataset archives. Validates all files inside .tar.zst archives without extracting to disk. See docs/mjai-validator.md.

cd dataset/mjai-validator && cargo build --release
mjai-validator /path/to/dataset/mjai     # validate
mjai-validator --clean /path/to/dataset  # remove invalid files

Preparing Your Own Dataset

To build your own dataset from Tenhou logs, follow this multi-stage conversion process:

1. Download Logs Directly From Tenhou

Obtain Tenhou logs from official sources or your own game archive. These will be in .xml format.

2. Convert XML to JSON (Intermediate Step)

First, convert the raw Tenhou XML logs into an intermediate JSON format using a tool like mjlog2json.

# This creates a directory of .json files from your .xml files
mjlog2json "C:\path\to\tenhou_xml\2024" -o "C:\path\to\tenhou_json\2024"

3. Convert JSON to MJAI Format

Next, use the convlog utility from the mjai-reviewer repository to convert the JSON files into the final MJAI format. This will produce uncompressed .mjson files.

# Assuming you are in the mjai-reviewer project directory
# This converts .json files into uncompressed .mjson files
cargo run --release --bin convlog -- "C:\path\to\tenhou_json\2024" "C:\path\to\tenhou_mjai\2024"

4. Compress Each MJAI File

Navigate to the output directory (e.g., tenhou_mjai/2024) and compress each .mjson file individually using gzip.

# This will compress each file, creating a .mjson.gz file and deleting the original
gzip *.mjson

Important

The standard for this dataset is to use the .mjson extension for the compressed files. After running gzip, you must rename the resulting .mjson.gz files back to .mjson. This can be done with a simple shell script or batch renaming tool.

5. Package by Year

Once the files are compressed and correctly named, create yearly zip archives.

# Example: zipping the 2024 folder
zip -r 2024.zip 2024

6. Include Corresponding DB File

Keep the Tenhou .db file alongside the yearly archive for metadata reference.


Licenses

Code License

This project’s code is licensed under the Apache License 2.0. See LICENSE for details.

Data License

The datasets are licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). See DATA_LICENSE for details. You may redistribute, remix, and build upon the data with proper attribution.


Attribution

Data Sources

Data is derived from:

  • Tenhou.net (天鳳) — Phoenix room (鳳凰卓) game logs

Tooling

This dataset was prepared using the following open-source tools. We extend our gratitude to their authors and contributors.

All rights to original game data remain with their respective owners. Converted datasets are redistributed for research and educational use under CC BY 4.0.

About

Converts Tenhou game logs into mjai format for Mortal AI training.

Resources

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
LICENSE-DATA

Stars

Watchers

Forks

Packages

No packages published