Eval Runner API¶
EvalRunner¶
from raghelm.eval.runner import EvalRunner
runner = EvalRunner("raghelm/eval/golden_dataset.json")
results = runner.run_suite(suite="quick")
Constructor¶
EvalRunner(dataset_path: str)
Loads and validates the golden dataset. Exits with code 1 if validation fails.
run_suite¶
run_suite(suite: str = "full") -> dict
suite="full": Run all examplessuite="quick": Run 10 random examples
Returns a dict with metrics, generation_scores, and per-example details.
Saves results to data/eval_results/eval_YYYYMMDD_HHMMSS.json.
Checks regression against data/baseline.json if it exists.