Skip to content

turing-db/turing-bench

Repository files navigation

turing-bench

Graph database benchmarking tool that measures query performance across TuringDB, Neo4j, and Memgraph using real-world datasets (Reactome, PoleDB).

Prerequisites

  • Linux (Debian-based, e.g. Ubuntu 22.04)
  • Python >= 3.13
  • uv (Python package manager)
  • wget, git, tar, dpkg-deb
  • AWS CLI configured with the turingdb_intern profile (for downloading datasets)

Getting started

1. Clone the repository

git clone https://github.com/turing-db/turing-bench.git
cd turing-bench

2. Install database engines

Warning

The install script only supports Debian-based systems. It downloads and builds Neo4j from source, which requires Java 17 and Maven (both installed automatically).

./install.sh
# To start fresh, run: ./install.sh --clean

This installs the following under install/:

  • Java 17 and Maven (needed to build Neo4j)
  • Neo4j (community edition, built from source) + APOC plugin
  • Memgraph (extracted from .deb package)

TuringDB is installed automatically as a Python dependency when running the benchmark.

3. Set up the environment

This must be run in every new shell session before using bench or running benchmarks:

source env.sh

This adds database binaries to your PATH and defines the bench alias used to manage servers.

4. Download and import datasets

Each dataset must be downloaded and converted into the formats used by all three engines. The run_all.sh script handles the full pipeline:

./scripts/neo4j-43-imports/run_all.sh reactome
./scripts/neo4j-43-imports/run_all.sh poledb

For each dataset, this runs the following steps (see scripts/neo4j-43-imports/):

  1. Download the raw Neo4j 4.3 dump (0_download.sh)
  2. Migrate the dump to Neo4j 5 and save it to dumps/<dataset>.neo4j (1_migrate.sh)
  3. Export to Cypher -- generates a .cypher script containing all nodes, relationships, and indexes. This script will be used to generate Memgraph dump. (2_gen_cypher.sh)
  4. Export to JSONL -- generates a .jsonl file with the dataset in JSON Lines format. This graph will be used to create TuringDB graph. (3_gen_jsonl.sh)
  5. Load into Memgraph and save snapshot to dumps/<dataset>.memgraph (4_load_in_memgraph.sh)
  6. Load into TuringDB and save to dumps/<dataset>.turingdb (5_load_in_turingdb.sh)

Note

All three database engines must be installed (step 2) before importing datasets, since the pipeline starts and stops each engine during the process.

Running benchmarks

Full benchmark (all three engines)

The run.sh script stops all databases, loads the specified dataset, and benchmarks each engine sequentially:

./run.sh reactome                       # uses default query file
./run.sh poledb queries_poledb.cypher   # specify dataset + query file
./run.sh --report reactome              # also generate full benchmark report (.md)
./run.sh --no-readme reactome           # skip README summary table update

Individual engine benchmarks

Start a database, run the benchmark, then stop it:

source env.sh

# TuringDB
bench turingdb start -- -turing-dir dumps/reactome.turingdb -load reactome
uv run python -m turingbench turingdb --query-file sample_queries/reactome/queries_reactome.cypher --database=reactome
bench turingdb stop

# Neo4j
bench neo4j start
uv run python -m turingbench neo4j --query-file sample_queries/reactome/queries_reactome.cypher
bench neo4j stop

# Memgraph
bench memgraph start -- --data-directory=dumps/reactome.memgraph
uv run python -m turingbench memgraph --query-file sample_queries/reactome/queries_reactome.cypher --url=bolt://localhost:7688
bench memgraph stop

Server management

bench <engine> start    # start a database (turingdb, neo4j, memgraph)
bench <engine> stop     # stop a database
bench all stop          # stop all databases

Report generation

run.sh produces three types of output:

Output Location Generated by
Raw benchmark output reports/{dataset}_raw_benchmark.txt Automatic — per-engine timing tables
README summary table Embedded in README.md Automatic — skip with --no-readme
Full benchmark report reports/benchmark_report.md Opt-in with --report

You can also run the report tools standalone:

# Parse a raw benchmark and update the README summary table
uv run python report_summary/parse_raw_benchmark.py reports/reactome_raw_benchmark.txt --dataset reactome --update-readme

# Generate the full benchmark report from all raw benchmarks
uv run python report_summary/generate_benchmark_report.py --reports-dir reports/ -o reports/benchmark_report.md

Available datasets

Dataset Query file
reactome sample_queries/reactome/queries_reactome.cypher
poledb sample_queries/poledb/queries_poledb.cypher

Benchmark Results

Poledb

CPU: Intel(R) Xeon(R) Gold 5412U | Cores: 48 | RAM: 251.4 GB | OS: Ubuntu 24.04.3 LTS | Storage: SSD

Query TuringDB Neo4j Memgraph Speedup vs Neo4j Speedup vs Memgraph
MATCH (n) RETURN n 57ms 3130ms 1858ms 55x 33x
MATCH (p:Person) RETURN p 3ms 134ms 25ms 45x 8.3x
MATCH (p:Person) RETURN count(p) 1ms 115ms 8ms 115x 8.0x
MATCH (c:Crime) RETURN c 15ms 842ms 772ms 56x 51x
MATCH (c:Crime) RETURN count(c) 1ms 45ms 13ms 45x 13x
MATCH ()-[r]->() RETURN r 64ms 2660ms 2386ms 42x 37x
MATCH ()-[r]->() RETURN count(r) 9ms 55ms 28ms 6.1x 3.1x
MATCH (p:Person {name: 'John'})-[:PARTY_TO]->(c:Crime) RETURN p, c 4ms 24ms 0ms 6.0x -
MATCH (p:Person)-[:PARTY_TO]->(c:Crime) RETURN p.name, p.surname, c.type 1ms 28ms 11ms 28x 11x
MATCH (p:Person {surname: 'Smith'})-[r]->(n) RETURN p 3ms 64ms 2ms 21x 0.7x
MATCH (p:Person)-[r]->(n) WHERE p.surname = 'Smith' RETURN p 3ms 60ms 1ms 20x 0.3x
MATCH (p1:Person)-[:PARTY_TO]->(c:Crime)<-[:PARTY_TO]-(p2:Person) WHERE p1 <> p2 RETURN p1.name, p2.name, c.type 1ms 126ms 8ms 126x 8.0x
MATCH (p1:Person)-[:KNOWS]->(p2:Person)-[:PARTY_TO]->(c:Crime) RETURN p1.name, p2.name 1ms 24ms 11ms 24x 11x
MATCH (c:Crime)-[:OCCURRED_AT]->(l:Location) RETURN l.postcode 30ms 553ms 362ms 18x 12x

Reactome

CPU: Intel(R) Xeon(R) Gold 5412U | Cores: 48 | RAM: 251.4 GB | OS: Ubuntu 24.04.3 LTS | Storage: SSD

Query TuringDB Neo4j Memgraph Speedup vs Neo4j Speedup vs Memgraph
MATCH (n:Drug) RETURN n 2ms 977ms 371ms 488x 186x
MATCH (n:ProteinDrug) RETURN n 1ms 221ms 340ms 221x 340x
MATCH (n:Drug:ProteinDrug) RETURN n 1ms 270ms 359ms 270x 359x
MATCH (n:Taxon)-->(m:Species) RETURN n,m 1ms 259ms 301ms 259x 301x
MATCH (n)-->(m:Interaction)-->(o) RETURN n,m,o 707ms 33117ms 32609ms 47x 46x
MATCH (n{displayName:"Autophagy"}) RETURN n 283ms 918ms 629ms 3.2x 2.2x
MATCH (n{displayName:"Autophagy"})-->(m) RETURN m 216ms 628ms 540ms 2.9x 2.5x
MATCH (n{displayName:"Autophagy"})-->(m)-->(p) RETURN p 215ms 622ms 569ms 2.9x 2.6x
MATCH (n{displayName:"Autophagy"})-->(m)-->(p)-->(q) RETURN q 370ms 878ms 702ms 2.4x 1.9x
MATCH (n{displayName:"Autophagy"})-->(m)-->(p)-->(q)-->(r) RETURN r 236ms 2776ms 2595ms 12x 11x
MATCH (n{displayName:"Autophagy"})-->(m)-->(p)-->(q)-->(r)-->(s) RETURN s 296ms 5784ms 5012ms 20x 17x
MATCH (n{displayName:"Autophagy"})-->(m)-->(p)-->(q)-->(r)-->(s)-->(t) RETURN t 493ms 17983ms 17256ms 36x 35x
MATCH (n{displayName:"Autophagy"})-->(m)-->(p)-->(q)-->(r)-->(s)-->(t)-->(v) RETURN v 1149ms 66876ms 54252ms 58x 47x
MATCH (n{displayName:"APOE-4 [extracellular region]"}) RETURN n 351ms 887ms 705ms 2.5x 2.0x
MATCH (n{displayName:"APOE-4 [extracellular region]"})-->(m) RETURN m 215ms 734ms 567ms 3.4x 2.6x
MATCH (n{displayName:"APOE-4 [extracellular region]"})-->(m)-->(p) RETURN p 211ms 616ms 603ms 2.9x 2.9x
MATCH (n{displayName:"APOE-4 [extracellular region]"})-->(m)-->(p)-->(q) RETURN q 213ms 613ms 585ms 2.9x 2.7x
MATCH (n{displayName:"APOE-4 [extracellular region]"})-->(m)-->(p)-->(q)-->(r) RETURN r 213ms 649ms 560ms 3.0x 2.6x
MATCH (n{displayName:"APOE-4 [extracellular region]"})-->(m)-->(p)-->(q)-->(r)-->(s) RETURN s 211ms 669ms 537ms 3.2x 2.5x
MATCH (n{displayName:"APOE-4 [extracellular region]"})-->(m)-->(p)-->(q)-->(r)-->(s)-->(t) RETURN t 211ms 644ms 557ms 3.1x 2.6x
MATCH (n{displayName:"APOE-4 [extracellular region]"})-->(m)-->(p)-->(q)-->(r)-->(s)-->(t)-->(v) RETURN v 215ms 11087ms 465ms 52x 2.2x
MATCH (n)-[e:release]->(m) RETURN n,m 240ms 4046ms 3162ms 17x 13x
MATCH (n)-[e:interactor]->(m) RETURN n,m 318ms 26174ms 25184ms 82x 79x
MATCH (n)-[e:surroundedBy]->(m) RETURN n,m 202ms 1548ms 367ms 7.7x 1.8x
MATCH (n)-[:hasEvent]->(m) RETURN n,m 292ms 11204ms 11008ms 38x 38x
MATCH (n:Pathway)-[:hasEvent]->(m:ReactionLikeEvent) RETURN n,m 94ms 8442ms 8696ms 90x 93x
MATCH (r:ReactionLikeEvent)-[:output]->(s:PhysicalEntity) RETURN r,s 184ms 13383ms 13591ms 73x 74x
MATCH (n:DatabaseObject{isChimeric:false}) RETURN n 239ms 2507ms 1538ms 10x 6.4x
MATCH (n:DatabaseObject{isChimeric:true}) RETURN n 213ms 799ms 429ms 3.8x 2.0x
MATCH (b)-->(a:Pathway) RETURN a 324ms 4447ms 5917ms 14x 18x
MATCH (c)-->(b)-->(a:Pathway) RETURN a, c 2318ms 35138ms 34946ms 15x 15x
MATCH (c)-->(b)-->(a:Pathway) RETURN b 2109ms 22699ms 22932ms 11x 11x
MATCH (c)-->(b)-->(a:Pathway) RETURN c 1978ms 18090ms 17969ms 9.1x 9.1x
MATCH (c)-->(b)-->(a:Pathway) RETURN a 1977ms 22424ms 23598ms 11x 12x

About

TuringDB benchmarking tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors