ADR-001: Pinecone Multi-Namespace Indexing Strategy¶
Status: Accepted
Date: 2026-06-13
Deciders: Grey Newell
Tags: rag-pipeline, infrastructure, reliability, cost
Primary Goal¶
Build production-grade RAG systems that operate reliably with minimal failure or manual intervention.
This decision directly supports the requirements of owning RAG pipelines across multiple Pinecone namespaces while maintaining reliability, performance, and clean abstractions.
Context & Goals¶
We need a Pinecone indexing strategy that enables:
- Multiple retrieval strategies (different chunk sizes + embedding models) without quality collisions
- Safe, incremental updates with low operational overhead
- Future agentic query routing (namespace selection)
- Cost-efficient model tiering
- Strong isolation when bad data or schema changes occur
- Easy observability and regression detection per strategy
Decision¶
Use one Pinecone serverless index with multiple namespaces.
Namespace naming convention: {strategy}-{chunk-size}-{model-tier}
Examples: semantic-512-small, chunk-1024-large, legal-256-small
All vector upserts are idempotent using deterministic IDs derived from content hash + metadata. Destructive namespace-level deletes are avoided during normal operation.
Tradeoffs & Rationale¶
| Tradeoff | Option Chosen | Why | Rejected Alternative | Reason Rejected |
|---|---|---|---|---|
| Index Architecture | Single index + multiple namespaces | Minimizes operational surface area while providing logical isolation. Aligns with "minimal manual intervention." | Multiple indexes | Higher cost, more secrets/monitoring, increases chance of configuration drift |
| Namespace Taxonomy | {strategy}-{size}-{model} |
Human-readable + machine-actionable for future routing agents. Enables A/B testing and targeted improvements | Per-source or flat namespaces | Causes namespace explosion and prevents strategy-based routing |
| Upsert Model | Idempotent (content-hash IDs) | Prevents duplicate data and enables safe re-runs without manual cleanup | Timestamp-based or non-deterministic IDs | High risk of data corruption and manual intervention |
| Delete Strategy | Metadata invalidation + re-ingest | Safer and more observable than namespace deletes | Namespace-level delete + rebuild | High blast radius and violates reliability goal |
| Metadata Schema | Consistent core fields + strategy-specific extensions | Enables unified observability and evaluation across namespaces | Completely separate schemas | Breaks cross-namespace analysis and increases maintenance burden |
Consequences¶
Positive - Failures are contained to specific namespaces/strategies - Re-ingestion and experimentation can happen safely - Directly enables model routing and cost optimization in later work - Supports structured evaluation and regression detection per namespace
Negative / Accepted Risks - Namespace count must be actively managed - Query routing logic becomes more important in M3 - Slight increase in initial setup complexity
Neutral - We accept a small amount of operational discipline in exchange for long-term reliability and iteration speed.
Alignment with Production Requirements¶
This decision supports: - Owning RAG pipelines across multiple Pinecone namespaces - Building reliable systems with minimal manual intervention - Enabling evaluation-driven iteration and regression detection - Preparing for model routing and cost optimization
Review Note: This ADR demonstrates clear ownership of production RAG architecture, explicit tradeoff reasoning, and direct alignment with reliability and business impact goals.