Refactor BenchmarkRun with data contract pattern and add pytest testing framework #225

wvaske · 2026-01-13T01:20:02Z

Summary

Refactors BenchmarkRun class to use a data contract pattern with BenchmarkRunData dataclass for cleaner separation of concerns
Introduces comprehensive pytest-based unit testing framework that works without DLIO installed
Adds GitHub Actions CI workflow for automated testing on Python 3.10, 3.11, and 3.12

Changes

BenchmarkRun Refactoring

New BenchmarkRunData dataclass encapsulates all benchmark run data
Factory methods: from_benchmark(), from_result_dir(), from_data()
Property delegation for backward compatibility
Cleaner separation between data and behavior

Testing Framework (259 tests, 44% coverage)

test_config.py: Environment variables, enums, constants
test_utils.py: JSON encoder, datetime functions, nested dict operations, MPI commands
test_rules_dataclasses.py: Issue, RunID, ProcessedRun, HostInfo, ClusterInformation
test_rules_extractors.py: BenchmarkInstanceExtractor, DLIOResultParser, ResultFilesExtractor
test_rules_checkers.py: Rules validation for training and checkpointing
test_rules_calculations.py: Data size calculations, output location generation
test_benchmark_run.py: Factory methods, property delegation, serialization
test_full_submission.py: Integration tests with real submission data

Bug Fixes

Fixed check_env() to properly handle 'false' string values
Added default values to HostMemoryInfo dataclass fields

CI/CD

GitHub Actions workflow runs tests on every push/PR
Coverage reporting with Codecov integration

Test plan

All 259 unit tests pass
Tests run without DLIO installed
Integration tests work with sample submission data
Coverage at 44% overall (97% config.py, 81% rules.py, 81% utils.py)

🤖 Generated with Claude Code

This refactoring improves the separation of concerns between data extraction and data representation for benchmark runs: - Add BenchmarkRunData dataclass as the data contract for rules verification - Add BenchmarkInstanceExtractor for extracting data from live Benchmark instances - Add DLIOResultParser for parsing DLIO-specific result files (summary.json, Hydra configs) - Add ResultFilesExtractor to orchestrate extraction from result files - Refactor BenchmarkRun to use factory methods (from_benchmark, from_result_dir) - Update BenchmarkVerifier to accept multiple input types (BenchmarkRun, Benchmark, path) - Update Benchmark.metadata to explicitly include all fields needed by BenchmarkRunData - Update get_runs_files to use new factory method Benefits: - Clear data contract via BenchmarkRunData dataclass - DLIO-specific parsing isolated in DLIOResultParser (extensible for future tools) - Consistent API: run_id always returns RunID object - Backward compatible: legacy constructor still works Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add pytest test infrastructure with fixtures for benchmark results - Implement unit tests for config, utils, rules modules (259 tests total) - Create test fixtures for training and checkpointing benchmark results - Add GitHub Actions CI workflow for Python 3.10, 3.11, 3.12 - Make DLIO dependency optional in pyproject.toml for testing - Fix check_env() to properly handle False boolean values - Add default values to HostMemoryInfo dataclass - Add BenchmarkRun.from_data() factory method - Add missing datetime import to utils.py Test coverage: - config.py: 97% - rules.py: 81% - utils.py: 81% - mlps_logging.py: 81% Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-13T01:20:12Z

MLCommons CLA bot:
Thank you very much for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the MLCommons CLA (Apache 2). Please use this [Google form] (https://forms.gle/Ew1KkBVpyeJDuRw67) to initiate authorization. If you are from an MLCommons member organization, we will request that you be added to the CLA. If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact support@mlcommons.org.
0 out of 1 committers have signed the MLCommons CLA.
❌ @claude
Claude seems not to be a GitHub user. You need a GitHub account after you become MLCommons member. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request}

codecov-commenter · 2026-01-13T01:22:17Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

Add comprehensive tests for: - cli.py: Argument parsing for training, checkpointing, vectordb, reports, and history commands, plus YAML config overrides - history.py: HistoryTracker class including entry management, parsing, and command retrieval - reporting.py: ReportGenerator and Result dataclass including JSON/CSV output and results accumulation - benchmarks/base.py: Benchmark abstract base class including metadata, command execution, and verification Coverage improvements: - cli.py: 0% -> 85% - history.py: 0% -> 86% - reporting.py: 0% -> 94% - benchmarks/base.py: 0% -> 91% - Overall: 44% -> 74% Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

FileSystemGuy · 2026-01-13T16:48:56Z

Wes, in case you hadn't noticed this yet... This PR includes an additional user "@claude" (and/or "Cocodev") that the MLCommons CLA bot cannot find the the list of people who has signed the MLCommons CLA, so it is preventing the merge. It sounds like a good tool to use, but will need to be removed from the PR before anyone can approve the PR for merging.

Claude and others added 2 commits January 12, 2026 23:00

wvaske requested review from a team January 13, 2026 01:20

wvaske requested a review from a team as a code owner January 13, 2026 01:20

wvaske closed this Jan 13, 2026

github-actions bot locked and limited conversation to collaborators Jan 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor BenchmarkRun with data contract pattern and add pytest testing framework #225

Refactor BenchmarkRun with data contract pattern and add pytest testing framework #225

Uh oh!

wvaske commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

codecov-commenter commented Jan 13, 2026

Uh oh!

FileSystemGuy commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Refactor BenchmarkRun with data contract pattern and add pytest testing framework #225

Refactor BenchmarkRun with data contract pattern and add pytest testing framework #225

Uh oh!

Conversation

wvaske commented Jan 13, 2026

Summary

Changes

BenchmarkRun Refactoring

Testing Framework (259 tests, 44% coverage)

Bug Fixes

CI/CD

Test plan

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

codecov-commenter commented Jan 13, 2026

Welcome to Codecov 🎉

Uh oh!

FileSystemGuy commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants