Skip to content

fix(evaluator): inject configured LLM into custom metrics#134

Open
thesamet wants to merge 1 commit intoarklexai:mainfrom
thesamet:fix/inject-llm-into-custom-metrics
Open

fix(evaluator): inject configured LLM into custom metrics#134
thesamet wants to merge 1 commit intoarklexai:mainfrom
thesamet:fix/inject-llm-into-custom-metrics

Conversation

@thesamet
Copy link
Copy Markdown

@thesamet thesamet commented Apr 4, 2026

Summary

Custom metrics previously loaded the LLM themselves by reading a hardcoded config.yaml path at import time, causing an immediate crash when the configured provider differed from what the metric expected (e.g. running with Anthropic while metrics tried to instantiate an OpenAI client).

  • The evaluator now inspects each custom metric's __init__ signature and passes the already-configured LLM instance when an llm parameter is declared
  • Metrics without the parameter continue to work unchanged (fully backward-compatible)
  • Accessing self.llm when no LLM was injected raises a descriptive RuntimeError instead of an opaque AttributeError
  • Updated both built-in example metrics (bank-insurance, e-commerce) to use the injected LLM
  • Updated docs to show the llm=None injection pattern

Closes #131

Changes

  • arksim/evaluator/evaluator.py: detect llm parameter in custom metric constructors and inject the configured LLM instance
  • arksim/evaluator/base_metric.py: expose llm as a property on both base classes that raises a clear RuntimeError when accessed without injection; add llm parameter docs to both QuantitativeMetric and QualitativeMetric docstrings
  • examples/bank-insurance/custom_metrics.py, examples/e-commerce/custom_metrics.py: remove hardcoded config loading, accept injected LLM via self.llm
  • docs/main/evaluate-conversation.mdx: update both metric type examples to show the llm=None injection pattern
  • tests/unit/test_evaluator_class.py: new tests covering injection for quant and qual metrics, backward-compat (no-param) case, and RuntimeError when self.llm is accessed without injection

Documentation

  • Updated relevant docs in docs/ (if behavior, config, or API changed)
  • Updated README.md (if installation, quickstart, or usage changed)
  • No docs needed (explain why below)

How to Test

  • ruff check . passes
  • ruff format --check . passes
  • pytest tests/ passes
  • Manual verification: run evaluation with Anthropic config against example metrics -- no crash, LLM injected correctly

Notes

Backward-compatible: any existing custom metric that does not declare an llm parameter in __init__ is instantiated as before. Only metrics that opt in by declaring the parameter receive the injected LLM. Metrics that declare llm but try to use self.llm without injection now get a descriptive error pointing to the fix.

Reviewers

/cc @arklexai/arksim-maintainers

Custom metrics previously loaded the LLM themselves by reading a
hardcoded `config.yaml` path at import time. This caused an immediate
crash when the configured provider differed from what the metric
expected (e.g. running with Anthropic while metrics tried to
instantiate an OpenAI client).

The evaluator now passes the already-configured LLM instance to any
custom metric whose __init__ declares an `llm` parameter. Metrics
without the parameter continue to work unchanged (backward-compatible).
Update both built-in example metrics to use the injected LLM.

Fixes arklexai#131
@thesamet thesamet requested a review from a team as a code owner April 4, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom metrics always read config.yaml, ignoring the config passed to the CLI

1 participant