Skip to content

Conversation

@wvaske
Copy link
Contributor

@wvaske wvaske commented Jan 13, 2026

Summary

  • Refactors BenchmarkRun class to use a data contract pattern with BenchmarkRunData dataclass for cleaner separation of concerns
  • Introduces comprehensive pytest-based unit testing framework that works without DLIO installed
  • Adds GitHub Actions CI workflow for automated testing on Python 3.10, 3.11, and 3.12

Changes

BenchmarkRun Refactoring

  • New BenchmarkRunData dataclass encapsulates all benchmark run data
  • Factory methods: from_benchmark(), from_result_dir(), from_data()
  • Property delegation for backward compatibility
  • Cleaner separation between data and behavior

Testing Framework (259 tests, 44% coverage)

  • test_config.py: Environment variables, enums, constants
  • test_utils.py: JSON encoder, datetime functions, nested dict operations, MPI commands
  • test_rules_dataclasses.py: Issue, RunID, ProcessedRun, HostInfo, ClusterInformation
  • test_rules_extractors.py: BenchmarkInstanceExtractor, DLIOResultParser, ResultFilesExtractor
  • test_rules_checkers.py: Rules validation for training and checkpointing
  • test_rules_calculations.py: Data size calculations, output location generation
  • test_benchmark_run.py: Factory methods, property delegation, serialization
  • test_full_submission.py: Integration tests with real submission data

Bug Fixes

  • Fixed check_env() to properly handle 'false' string values
  • Added default values to HostMemoryInfo dataclass fields

CI/CD

  • GitHub Actions workflow runs tests on every push/PR
  • Coverage reporting with Codecov integration

Test plan

  • All 259 unit tests pass
  • Tests run without DLIO installed
  • Integration tests work with sample submission data
  • Coverage at 44% overall (97% config.py, 81% rules.py, 81% utils.py)

🤖 Generated with Claude Code

Claude and others added 2 commits January 12, 2026 23:00
This refactoring improves the separation of concerns between data extraction
and data representation for benchmark runs:

- Add BenchmarkRunData dataclass as the data contract for rules verification
- Add BenchmarkInstanceExtractor for extracting data from live Benchmark instances
- Add DLIOResultParser for parsing DLIO-specific result files (summary.json, Hydra configs)
- Add ResultFilesExtractor to orchestrate extraction from result files
- Refactor BenchmarkRun to use factory methods (from_benchmark, from_result_dir)
- Update BenchmarkVerifier to accept multiple input types (BenchmarkRun, Benchmark, path)
- Update Benchmark.metadata to explicitly include all fields needed by BenchmarkRunData
- Update get_runs_files to use new factory method

Benefits:
- Clear data contract via BenchmarkRunData dataclass
- DLIO-specific parsing isolated in DLIOResultParser (extensible for future tools)
- Consistent API: run_id always returns RunID object
- Backward compatible: legacy constructor still works

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add pytest test infrastructure with fixtures for benchmark results
- Implement unit tests for config, utils, rules modules (259 tests total)
- Create test fixtures for training and checkpointing benchmark results
- Add GitHub Actions CI workflow for Python 3.10, 3.11, 3.12
- Make DLIO dependency optional in pyproject.toml for testing
- Fix check_env() to properly handle False boolean values
- Add default values to HostMemoryInfo dataclass
- Add BenchmarkRun.from_data() factory method
- Add missing datetime import to utils.py

Test coverage:
- config.py: 97%
- rules.py: 81%
- utils.py: 81%
- mlps_logging.py: 81%

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wvaske wvaske requested review from a team January 13, 2026 01:20
@wvaske wvaske requested a review from a team as a code owner January 13, 2026 01:20
@github-actions
Copy link

MLCommons CLA bot:
Thank you very much for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the MLCommons CLA (Apache 2). Please use this [Google form] (https://forms.gle/Ew1KkBVpyeJDuRw67) to initiate authorization. If you are from an MLCommons member organization, we will request that you be added to the CLA. If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact support@mlcommons.org.
0 out of 1 committers have signed the MLCommons CLA.
@claude
Claude seems not to be a GitHub user. You need a GitHub account after you become MLCommons member. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request

@codecov-commenter
Copy link

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

Add comprehensive tests for:
- cli.py: Argument parsing for training, checkpointing, vectordb,
  reports, and history commands, plus YAML config overrides
- history.py: HistoryTracker class including entry management,
  parsing, and command retrieval
- reporting.py: ReportGenerator and Result dataclass including
  JSON/CSV output and results accumulation
- benchmarks/base.py: Benchmark abstract base class including
  metadata, command execution, and verification

Coverage improvements:
- cli.py: 0% -> 85%
- history.py: 0% -> 86%
- reporting.py: 0% -> 94%
- benchmarks/base.py: 0% -> 91%
- Overall: 44% -> 74%

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@FileSystemGuy
Copy link
Contributor

Wes, in case you hadn't noticed this yet... This PR includes an additional user "@claude" (and/or "Cocodev") that the MLCommons CLA bot cannot find the the list of people who has signed the MLCommons CLA, so it is preventing the merge. It sounds like a good tool to use, but will need to be removed from the PR before anyone can approve the PR for merging.

@wvaske wvaske closed this Jan 13, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Jan 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants