Skip to content

RAGhelm

Benchmark

greynewell/raghelm

Benchmark API¶

Benchmark helpers measure latency characteristics for retrieval and generation paths. Benchmark output can inform scorecards and gates when linked to manifests and runtime configuration.

`benchmark_retrieval`¶

from raghelm.eval.benchmark import benchmark_retrieval

results = benchmark_retrieval(
    namespace="cairn",
    queries=["What is STR?", "How does armor work?"],
    n_runs=20,
)

Returns P50/P95/P99 latency percentiles in milliseconds.

`benchmark_generation`¶

from raghelm.eval.benchmark import benchmark_generation

results = benchmark_generation(
    model="gpt-4o",
    prompts=["Answer this query..."],
    n_runs=20,
)

Returns first-token and total latency percentiles.

Evidence rules¶

Benchmark claims should identify mode, backend, model/provider, sample size, and artifact hash before being used in release gates or public proof.