ADR-001: Pinecone Multi-Namespace Indexing Strategy¶

Status: Accepted
Date: 2026-06-13
Deciders: Grey Newell
Tags: rag-pipeline, infrastructure, reliability, cost

Primary Goal¶

Build production-grade RAG systems that operate reliably with minimal failure or manual intervention.

This decision directly supports the requirements of owning RAG pipelines across multiple Pinecone namespaces while maintaining reliability, performance, and clean abstractions.

Context & Goals¶

We need a Pinecone indexing strategy that enables:

Multiple retrieval strategies (different chunk sizes + embedding models) without quality collisions
Safe, incremental updates with low operational overhead
Future agentic query routing (namespace selection)
Cost-efficient model tiering
Strong isolation when bad data or schema changes occur
Easy observability and regression detection per strategy

Decision¶

Use one Pinecone serverless index with multiple namespaces.

Namespace naming convention: {strategy}-{chunk-size}-{model-tier}
Examples: semantic-512-small, chunk-1024-large, legal-256-small

All vector upserts are idempotent using deterministic IDs derived from content hash + metadata. Destructive namespace-level deletes are avoided during normal operation.

Tradeoffs & Rationale¶

Tradeoff	Option Chosen	Why	Rejected Alternative	Reason Rejected
Index Architecture	Single index + multiple namespaces	Minimizes operational surface area while providing logical isolation. Aligns with "minimal manual intervention."	Multiple indexes	Higher cost, more secrets/monitoring, increases chance of configuration drift
Namespace Taxonomy	`{strategy}-{size}-{model}`	Human-readable + machine-actionable for future routing agents. Enables A/B testing and targeted improvements	Per-source or flat namespaces	Causes namespace explosion and prevents strategy-based routing
Upsert Model	Idempotent (content-hash IDs)	Prevents duplicate data and enables safe re-runs without manual cleanup	Timestamp-based or non-deterministic IDs	High risk of data corruption and manual intervention
Delete Strategy	Metadata invalidation + re-ingest	Safer and more observable than namespace deletes	Namespace-level delete + rebuild	High blast radius and violates reliability goal
Metadata Schema	Consistent core fields + strategy-specific extensions	Enables unified observability and evaluation across namespaces	Completely separate schemas	Breaks cross-namespace analysis and increases maintenance burden

Consequences¶

Positive - Failures are contained to specific namespaces/strategies - Re-ingestion and experimentation can happen safely - Directly enables model routing and cost optimization in later work - Supports structured evaluation and regression detection per namespace

Negative / Accepted Risks - Namespace count must be actively managed - Query routing logic becomes more important in M3 - Slight increase in initial setup complexity

Neutral - We accept a small amount of operational discipline in exchange for long-term reliability and iteration speed.

Alignment with Production Requirements¶

This decision supports: - Owning RAG pipelines across multiple Pinecone namespaces - Building reliable systems with minimal manual intervention - Enabling evaluation-driven iteration and regression detection - Preparing for model routing and cost optimization

Review Note: This ADR demonstrates clear ownership of production RAG architecture, explicit tradeoff reasoning, and direct alignment with reliability and business impact goals.