Benchmark API¶
benchmark_retrieval¶
from raghelm.eval.benchmark import benchmark_retrieval
results = benchmark_retrieval(
namespace="cairn",
queries=["What is STR?", "How does armor work?"],
n_runs=20
)
Returns P50/P95/P99 latency percentiles in milliseconds.
Currently uses mock delays. Production integration with actual Pinecone retrieval is planned.
benchmark_generation¶
from raghelm.eval.benchmark import benchmark_generation
results = benchmark_generation(
model="gpt-4o",
prompts=["Answer this query..."],
n_runs=20
)
Returns first-token and total latency percentiles.