Skip to content

Benchmark API

benchmark_retrieval

from raghelm.eval.benchmark import benchmark_retrieval

results = benchmark_retrieval(
    namespace="cairn",
    queries=["What is STR?", "How does armor work?"],
    n_runs=20
)

Returns P50/P95/P99 latency percentiles in milliseconds.

Currently uses mock delays. Production integration with actual Pinecone retrieval is planned.

benchmark_generation

from raghelm.eval.benchmark import benchmark_generation

results = benchmark_generation(
    model="gpt-4o",
    prompts=["Answer this query..."],
    n_runs=20
)

Returns first-token and total latency percentiles.