Skip to content

Eval Runner API

EvalRunner

from raghelm.eval.runner import EvalRunner

runner = EvalRunner("raghelm/eval/golden_dataset.json")
results = runner.run_suite(suite="quick")

Constructor

EvalRunner(dataset_path: str)

Loads and validates the golden dataset. Exits with code 1 if validation fails.

run_suite

run_suite(suite: str = "full") -> dict

  • suite="full": Run all examples
  • suite="quick": Run 10 random examples

Returns a dict with metrics, generation_scores, and per-example details.

Saves results to data/eval_results/eval_YYYYMMDD_HHMMSS.json.

Checks regression against data/baseline.json if it exists.