Skip to content

ADR-001: Pinecone Multi-Namespace Indexing Strategy

Status: Accepted
Date: 2026-06-13
Deciders: Grey Newell
Tags: rag-pipeline, infrastructure, reliability, cost

Primary Goal

Build production-grade RAG systems that operate reliably with minimal failure or manual intervention.

This decision directly supports the requirements of owning RAG pipelines across multiple Pinecone namespaces while maintaining reliability, performance, and clean abstractions.

Context & Goals

We need a Pinecone indexing strategy that enables:

  • Multiple retrieval strategies (different chunk sizes + embedding models) without quality collisions
  • Safe, incremental updates with low operational overhead
  • Future agentic query routing (namespace selection)
  • Cost-efficient model tiering
  • Strong isolation when bad data or schema changes occur
  • Easy observability and regression detection per strategy

Decision

Use one Pinecone serverless index with multiple namespaces.

Namespace naming convention: {strategy}-{chunk-size}-{model-tier}
Examples: semantic-512-small, chunk-1024-large, legal-256-small

All vector upserts are idempotent using deterministic IDs derived from content hash + metadata. Destructive namespace-level deletes are avoided during normal operation.

Tradeoffs & Rationale

Tradeoff Option Chosen Why Rejected Alternative Reason Rejected
Index Architecture Single index + multiple namespaces Minimizes operational surface area while providing logical isolation. Aligns with "minimal manual intervention." Multiple indexes Higher cost, more secrets/monitoring, increases chance of configuration drift
Namespace Taxonomy {strategy}-{size}-{model} Human-readable + machine-actionable for future routing agents. Enables A/B testing and targeted improvements Per-source or flat namespaces Causes namespace explosion and prevents strategy-based routing
Upsert Model Idempotent (content-hash IDs) Prevents duplicate data and enables safe re-runs without manual cleanup Timestamp-based or non-deterministic IDs High risk of data corruption and manual intervention
Delete Strategy Metadata invalidation + re-ingest Safer and more observable than namespace deletes Namespace-level delete + rebuild High blast radius and violates reliability goal
Metadata Schema Consistent core fields + strategy-specific extensions Enables unified observability and evaluation across namespaces Completely separate schemas Breaks cross-namespace analysis and increases maintenance burden

Consequences

Positive - Failures are contained to specific namespaces/strategies - Re-ingestion and experimentation can happen safely - Directly enables model routing and cost optimization in later work - Supports structured evaluation and regression detection per namespace

Negative / Accepted Risks - Namespace count must be actively managed - Query routing logic becomes more important in M3 - Slight increase in initial setup complexity

Neutral - We accept a small amount of operational discipline in exchange for long-term reliability and iteration speed.

Alignment with Production Requirements

This decision supports: - Owning RAG pipelines across multiple Pinecone namespaces - Building reliable systems with minimal manual intervention - Enabling evaluation-driven iteration and regression detection - Preparing for model routing and cost optimization


Review Note: This ADR demonstrates clear ownership of production RAG architecture, explicit tradeoff reasoning, and direct alignment with reliability and business impact goals.