Quick Start¶
Run Your First Evaluation¶
Quick Suite (10 random examples)¶
This runs 10 randomly selected examples from the 100-item golden dataset and:
- Validates the golden dataset has no issues
- Runs mock retrieval and generation for each example
- Computes Recall@5, MRR, and NDCG@5 scores
- Scores generation faithfulness, relevance, completeness, and overall
- Saves results to
data/eval_results/eval_20260613_065436.json
Full Suite (all 100 examples)¶
Understanding the Output¶
{
"timestamp": "2026-06-13T06:54:36.862454",
"suite": "quick",
"total_examples": 10,
"metrics": {
"recall@5": 1.0,
"mrr": 1.0,
"ndcg@5": 1.0
},
"generation_scores": {
"faithfulness": 2.48,
"relevance": 5.0,
"completeness": 5.0,
"overall": 4.16
}
}
Next Steps¶
- Learn about the golden dataset
- Read the metrics reference for score definitions
- Set up regression testing to catch regressions
Next Steps (Ingestion)¶
After evaluations, use the ingestion CLI to populate your Pinecone namespaces:
See the full CLI reference for options, verification guarantees, and cost tracking.