Skip to content

Quick Start

Run Your First Evaluation

Quick Suite (10 random examples)

uv run python -m raghelm eval --suite quick

This runs 10 randomly selected examples from the 100-item golden dataset and:

  1. Validates the golden dataset has no issues
  2. Runs mock retrieval and generation for each example
  3. Computes Recall@5, MRR, and NDCG@5 scores
  4. Scores generation faithfulness, relevance, completeness, and overall
  5. Saves results to data/eval_results/eval_20260613_065436.json

Full Suite (all 100 examples)

uv run python -m raghelm eval --suite full

Understanding the Output

{
  "timestamp": "2026-06-13T06:54:36.862454",
  "suite": "quick",
  "total_examples": 10,
  "metrics": {
    "recall@5": 1.0,
    "mrr": 1.0,
    "ndcg@5": 1.0
  },
  "generation_scores": {
    "faithfulness": 2.48,
    "relevance": 5.0,
    "completeness": 5.0,
    "overall": 4.16
  }
}

Next Steps

Next Steps (Ingestion)

After evaluations, use the ingestion CLI to populate your Pinecone namespaces:

uv run python -m raghelm ingest ./knowledge --dry-run --namespace default

See the full CLI reference for options, verification guarantees, and cost tracking.