Skip to content

Regression Testing

The evaluation framework automatically detects performance regressions.

How It Works

  1. First run saves data/baseline.json with current metrics
  2. Each subsequent run compares against the baseline
  3. If any metric drops significantly, the run fails

CLI Output

When a regression is detected:

Regression detected
  recall@5: 1.0000 -> 0.8500 (-15.00%)
  mrr: 1.0000 -> 0.9200 (-8.00%)

Exit code is non-zero, failing CI pipelines.

Resetting the Baseline

rm data/baseline.json
uv run python -m raghelm eval --suite full