As software engineers, confronting new architectural paradigms and implementing them into live production code invariably forces rigorous self-reflection. To be entirely candid, while designing the Vault module—the second core governance layer of the Bastion-RAG framework—I fell into an embarrassing architectural misconception.

Perhaps it was due to a long-standing inertia rooted in developing traditional monolithic RDBMS architectures or large-scale web applications. I initially dismissed multi-tenancy as merely a slightly more complex variant of user- or role-based access control (RBAC/ABAC). My naive assumption was that provisioning a tenant ID field in a database table, extracting that ID from the session context or JWT upon user authentication, and appending a WHERE tenant_id = ? clause to queries would suffice to enforce total data isolation.

I spent a great deal of time wandering in this confusion, wondering whether the core concepts were inherently ambiguous or if I simply lacked a clear definition of the threat landscape in modern AI architectures. My initial skepticism was stubborn: “If the API gateway or upstream web layers already manage token validation and session isolation before entering the pipeline, why do we need to enforce multi-tenancy verification yet again at the absolute perimeter of our data pipeline?” Since the Large Language Model (LLM) engine itself remains globally immutable and static, I incorrectly assumed that downstream logical partitioning within the retrieval engine (Navigator)—simply splitting vector database queries into segregated index collections—would completely solve the multi-tenancy problem.

It was only after dissolving this conceptual ambiguity and redefining tenant isolation not as mere application-level source code logic, but as an uncompromising cryptographic and infrastructural primitive, that the concrete solution took shape. The moment a user payload hits the Bastion-RAG ingress path, what must the pipeline immediately execute? While standard RAG architectures rush to evaluate the semantic or vector proximity of an input string, an enterprise-grade governance ecosystem demands something far more strict. Before computing mathematical calculations, the system must definitively validate the request’s Identity, Structural Format, and Freshness, establishing an absolute cryptographic barrier around the underlying data assets.

No matter how robust your downstream logical partitioning or collection-level boundaries are within the retrieval space, that isolation crumbles if data can be corrupted or if unauthorized metadata can be traversed at the ingress gateway or de-identification phases. The Vault module is explicitly engineered to inspect incoming request envelopes comprehensively, intercepting and dropping malformed or manipulated payloads at the absolute perimeter before downstream retrieval routines run.

This post breaks down our compile-time memory optimizations designed to minimize operational runtime overhead, alongside the exhaustive technical specifications governing our four-layer synchronous validation matrix and egress verification components.

In fact, much of this post re-explains metadata filtering. This is because it is necessary for Vault to achieve perfect cryptographic isolation based on finely verified data.

URL Site > https://github.com/zafrem/bastion-vault

Series Name: Bastion – Project Security RAG



1. Forward Data Flow: Vault-IN (Phase 1) Ingress Processing Pipeline

The moment a user’s raw natural language query and its structural metadata index envelope clear the Sentinel-IN perimeter guardrails, the Vault-IN pipeline initializes as a synchronous state machine. This forward execution path is responsible for isolating runtime context and enforcing hard cryptographic boundaries before downstream systems touch the payload.

[User Request & Metadata Envelope Ingress]
                   │
                   ▼
1.1 Runtime Context Propagation ──────▶ Binds identity via tenant.WithTenant(ctx, tenantID)
                   │
                   ▼
1.2 High-Speed PII Detection Indexing ─▶ Builds structural schema map via Index 0 (O(1))
                   │
                   ▼
1.3 Cryptographic Namespace Derivation ▶ Derives compound masterKeyID = "tenant:{id}:category:{cat}"
                   │
                   ▼
1.4 Deterministic De-identification ──▶ Persists mapping via compound storeKey = tenantID + ":" + token
                   │
                   ▼
[Vault-IN Forward Execution Complete] ──▶ Transfers payload contract to the Navigator module

1.1 Runtime Context Propagation

  • Process Input: A verified tenant_id string passed down from the structural validations executed at the Sentinel-IN perimeter layer.
  • State Transformation: The Vault-IN pipeline immediately invokes tenant.WithTenant(ctx, tenantID) to anchor the tenant identifier inside the Go context.Context structure.

Go

// internal/tenant/isolation.go

type contextKey string

const tenantContextKey contextKey = "tenant"

func WithTenant(ctx context.Context, tenantID string) context.Context {
    return context.WithValue(ctx, tenantContextKey, tenantID)
}

func FromContext(ctx context.Context) (string, bool) {
    id, ok := ctx.Value(tenantContextKey).(string)
    return id, ok
}
  • Architectural Constraint: To ensure memory-level encapsulation, the context key is mapped to an unexported package-level type (type contextKey string) rather than a primitive string literal ("tenant"). This design pattern makes it structurally impossible for downstream microservices, third-party libraries, or context-merging routines to accidentally collide with or maliciously override the tenant session token. If a handler executes and detects that a valid tenant token was omitted from the context, the engine rejects the transaction immediately as an absolute Authentication Failure rather than falling back to a default shared tenant partition.

1.2 High-Speed PII Detection Indexing

  • State Transformation: In high-throughput streaming environments processing batch arrays up to 1,000 records, executing comprehensive natural language PII classification (Classifier) across every text chunk introduces severe latency overhead. To eliminate this bottleneck, Vault-IN enforces a Static Sampling Indexing Model based on structural schema homogeneity.

Go

// internal/anonymizer/engine.go

func (e *Engine) buildDetectionIndex(records []map[string]any, policies []fieldPolicy) map[string]fieldPolicy {
    index := make(map[string]fieldPolicy)
    if len(records) == 0 { return index }

    // Isolate PII classification exclusively to the first record in the batch (Index 0)
    detections := e.classifier.DetectPII(records[0])

    // Map detected PII attributes against pre-defined data category policies
    for _, det := range detections {
        for _, pol := range policies {
            if pol.piiType == det.PIIType {
                index[det.Field] = pol // Bind field name as hash key for O(1) static lookup map
                break
            }
        }
    }
    return index
}
  • Architectural Constraint: The framework runs DetectPII solely on the initial record array entry (records[0]) to construct a static field rule lookup blueprint (Field → PII_Type → Strategy). The remaining 999 records bypass heavy classification routines entirely, inheriting this static index map directly to complete linear key lookups. This optimization constrains batch-wide classification complexity to a constant $O(1)$ footprint.

1.3 Cryptographic Namespace Derivation

  • State Transformation: The pipeline extracts the verified tenant token and pairs it with the designated data category scope (e.g., DC-01 for Customer Records, DC-03 for HR/Finance) to derive a unique, compound Key Management Service identifier (masterKeyID).$$\text{masterKeyID} = \text{fmt.Sprintf(“tenant:\%s:category:\%s”, tenantID, string(cat))}$$This derived value interfaces with the hardware KMS backend to extract and initialize the tenant’s dedicated HMAC key material and Data Encryption Key (DEK) plaintext block into isolated memory caches.

Go

// internal/kms/manager.go

func (m *Manager) GetOrGenerateDEK(ctx context.Context, masterKeyID, cacheKey string) (*DataKey, error) {
    if dk := m.fromCache(cacheKey); dk != nil {
        return dk, nil // Cache hit eliminates high-latency KMS network RPC round-trips
    }
    dk, err := m.generateWithFailover(ctx, masterKeyID)
    if err != nil {
        return nil, err
    }
    m.toCache(cacheKey, dk) // cacheKey is rigidly bounded to the tenant × category layer
    return dk, nil
}
  • Architectural Constraint: Plaintext encryption keys are cached in memory for a maximum window of 5 minutes (cacheTTL) to preserve transaction speed. However, an asynchronous eviction worker runs continuously every 30 seconds (evict()). The exact millisecond a cached entry expires, the framework applies a strict Memory Zeroization routine, overwriting the underlying heap allocation addresses with zeros to prevent key material from leaking during memory dumps or side-channel exploits.

Go

func zeroize(b []byte) {
    for i := range b {
        b[i] = 0
    }
    // Enforce an explicit runtime keep-alive trap to prevent the Go compiler's 
    // Dead Code Elimination (DCE) optimizer from stripping out the store loop.
    runtime.KeepAlive(b)
}

1.4 Deterministic De-identification and Registry Storage

  • State Transformation: Utilizing the isolated, tenant-specific key material, the pipeline engages eight deterministic anonymization strategies to securely transform raw identifiers into structured pseudonyms. For fields requiring reversible deanonymization capabilities at a later stage, the engine persists the token-to-plaintext mapping inside the token database registry (TokenDB).
  • Isolation Constraint: Every database index and cache key generated within the TokenDB registry is rigidly formatted as a compound string: storeKey = tenantID + ":" + token. This composite structure guarantees that even if identical raw input values generate matching hash strings across different clients, the underlying search boundaries remain strictly segregated at the data layer. An index scan is completely constrained by its tenant prefix boundary, preventing cross-tenant data traversal at the lowest storage tier.
Multi-Tenancy

2. Reverse Data Flow: Vault-OUT (Phase 2) Egress Verification Pipeline

Once the downstream retrieval engine (Navigator) completes vector graph execution against segregated Qdrant collections and the upstream LLM finishes text synthesis, the Vault-OUT validation pipeline activates. Acting as the final backstop before response packets traverse back to the client interface, this phase neutralizes unexpected generation mutations or logical index leaks.

[Downstream Navigator Artifacts & Raw LLM Generation Pool Ingress]
                               │
                               ▼
2.1 Egress Cross-Tenant Verification ──▶ Runs O(N) sweep across record TenantID headers
                               │
                               ├───▶ Anomaly Ratio > 10% (Pipeline Compromise)?
                               │        │
                               │        ▼ [CRITICAL Incident] Memory Zeroization & Absolute Batch Abort
                               │
                               ▼ (Anomaly Ratio ≤ 10% Safe Operation Limit)
2.2 Granular Access Control Evaluation ─▶ Maps Department/Role via 선언형 OPA Policy Engine
                               │
                               ▼
2.3 Seven-Tier AccessLevel View Generation ▶ Regulates visibility via column-level masking matrices
                               │
                               ▼
2.4 Selective Token Reversal & Eviction ─▶ Decrypts ENC: data arrays, clears heap allocations
                               │
                               ▼
[Sanitized Secure Envelope Dispatched to Client]

2.1 Egress Cross-Tenant Verification

  • Process Input: The array of generated candidate records (model.DataRecord) and the authenticated session’s original userTenantID.
  • State Transformation: The engine initiates a synchronous $O(N)$ sweep across the incoming dataset, auditing the TenantID attribute stamped onto every record against the caller’s validated session token.

Go

// internal/output/cross_tenant.go

const crossTenantSuspiciousRatio = 0.10

func (v *CrossTenantVerifier) Verify(
    ctx context.Context,
    records []model.DataRecord,
    userTenantID string,
) ([]model.DataRecord, error) {

    if userTenantID == "" {
        return nil, fmt.Errorf("cross-tenant verifier: empty user tenant_id")
    }

    out := make([]model.DataRecord, 0, len(records))
    mismatched := 0

    for _, rec := range records {
        if rec.TenantID == userTenantID {
            out = append(out, rec)
            continue
        }
        mismatched++
        // Log critical isolation violation to upstream SIEM systems immediately
        log.Printf("CRITICAL vault-out cross_tenant_violation: record_id=%s record_tenant=%s user_tenant=%s",
            rec.RecordID, rec.TenantID, userTenantID)
    }

    // If cross-tenant data contamination breaches the 10% threshold, 
    // classify the transaction as a data poisoning attempt and abort execution.
    if mismatched > 0 && float64(mismatched)/float64(len(records)) > crossTenantSuspiciousRatio {
        return nil, fmt.Errorf("cross-tenant verifier: execution halted, poisoned batch refused")
    }

    return out, nil
}
  • Decision Matrix: If an isolation anomaly is detected, the engine prunes the foreign record from the output buffer, writing a CRITICAL alert to security incident pipelines. However, if the ratio of rogue records breaches the 10% threshold (crossTenantSuspiciousRatio), the pipeline shifts to a critical alert state. It treats the entire dataset as actively poisoned, aborts the current execution path completely, and zeroizes all local response memory segments—refusing to return even valid entries to prevent structural data exfiltration.

2.2 Granular Access Control and OPA Compliance Evaluation

  • State Transformation: Clean record subsets that cleanly clear the cross-tenant sweep pass to our entitlement engine for secondary access matrix auditing. The user’s department and role attributes are mapped directly against our embedded RBAC rules.
  • Architectural Constraint: If a central Open Policy Agent (OPA) architecture is active, policy decisions are delegated upstream via high-speed gRPC API calls. However, if the OPA cluster suffers a network timeout or instance failure, the system deploys a Fail-open Fallback strategy, automatically enforcing the locally compiled static RBAC rules to maintain high system availability while preserving data minimization. Simultaneously, the transaction purpose vector (purposeMatrix) is evaluated; even if a user maintains valid role permissions, a purpose violation (e.g., pulling live records for a PurposeTrainingData scenario without an authorized break-glass lifecycle) completely drops the access tier down to a strict AccessNone state.

2.3 Seven-Tier AccessLevel View Generation

  • State Transformation: The engine computes the absolute maximum permission ceiling allowed for the transaction (AccessLevel) and applies column-level text masking transformations across the fields.

Go

// internal/access/controller.go

func EnforceMinimumLevel(requested, allowed model.AccessLevel) model.AccessLevel {
    // Restrict the transaction context to the pre-authorized RBAC/OPA ceiling,
    // explicitly preventing client-side permission amplification over wire requests.
    if model.AccessLevelRank(requested) > model.AccessLevelRank(allowed) {
        return allowed
    }
    return requested
}
  • Field-Level Data Masking: Data visibility is strictly constrained based on our 7-tier ranking structure (ranging from AccessFull down to AccessAuditLog). While an executive-level role may retain AccessFull clearance to read original values, an analyst role mapped to AccessKAnonymized triggers spatial generalization, stripping discrete street-level addresses down to broad regional headers to enforce continuous data minimization.

2.4 Selective Token Reversal and Memory Eviction

  • State Transformation: Reversible token data arrays (e.g., encrypted salary segments matching an authorized HR payroll audit request) are decrypted using the cached tenant DEK material.
  • Process Termination: The exact microsecond the plaintext construction is complete and the final secure envelope is pushed to the client socket, the engine purges all intermediate string buffers and plaintext key blocks from memory using the zeroize() routine, completing a secure, zero-trust transactional loop.

3. Core Blueprint: Synthesizing Standalone Modules and Asynchronous Processes

The definitive breakthrough achieved by the Bastion-RAG architecture lies in decoupling multi-tenancy from static, fragile application-level constraints. True zero-trust data sovereignty is realized through the flawless coordination of our ingestion and egress processes: the forward pipeline (Vault-IN) locks down runtime contexts and isolates encryption key spaces on write, while the reverse pipeline (Vault-OUT) performs $O(N)$ 전수 sweeps across record headers on read, instantly destroying active memory buffers if a 10% anomaly threshold is breached. By anchoring these synchronous data-diode guards directly into core microservices, Bastion-RAG comfortably achieves its production p95 processing target—restricting pseudonymization costs to under 5 milliseconds while maintaining uncompromised isolation properties under full infrastructure compromise.

By Mark

-_-