All posts
AI AgentsContext EngineeringRedisMongoDBMemory Architecture

The Context Engineering Loop: Write, Select, Compress, Isolate — Designing Memory for Production AI Agents

Polystreak Team2026-03-2411 min read

Every AI agent runs on context. The model itself — GPT-4, Claude, Gemini — is a reasoning engine. What it reasons about is determined entirely by the context you provide: the system prompt, the conversation history, the retrieved documents, the tool outputs, the user profile, the task state. Get the context wrong and a brilliant model gives a wrong answer. Get it right and a cheaper model outperforms an expensive one.

But context isn't a single decision. It's a continuous engineering loop — a pipeline that runs on every agent invocation. The loop has four stages, and skipping any one of them degrades agent quality in production.

Context that was never written can't be selected. Context that wasn't pruned drowns the signal. The four stages aren't optional — they're load-bearing.

The Four Stages of the Context Loop

The loop is: WRITE → SELECT → COMPRESS → ISOLATE — and then back to WRITE again. Each agent turn flows through all four stages. Each stage has a distinct purpose, a distinct failure mode, and a distinct infrastructure requirement.

StagePurposeFailure Mode
WRITECapture scratchpads, memories, observations, tool outputsAgent forgets what it learned. Repeats work. Makes contradictory decisions.
SELECTHydrate only what matters for the current taskIrrelevant context floods the prompt. Token waste. Confused reasoning.
COMPRESSPrune and summarize aggressively. Keep signal-to-noise high.Context window fills up. Old, noisy data pushes out critical recent data.
ISOLATEScoped sub-agents with their own context. No cross-contamination.One agent's context leaks into another's reasoning. Hallucination from irrelevant memory.

Stage 1: WRITE — Capture Everything Worth Remembering

Before context can be retrieved, it must be stored. The WRITE stage captures three categories of information that the agent generates or encounters during execution.

Scratchpads

Intermediate reasoning, partial results, chain-of-thought outputs, and draft plans. These are ephemeral — they matter for the current task but not for next week. A coding agent that reasons through 5 approaches before picking one should write the reasoning somewhere so it doesn't re-derive the same analysis if it loops back.

Memories

Long-term facts the agent learns about the user, the environment, or the domain. 'This user prefers concise answers.' 'The production database is on MongoDB Atlas, us-east-1.' 'The last deployment failed because of a missing environment variable.' Memories persist across sessions and conversations.

Observations

Tool call results, API responses, document retrieval outputs, database query results. Every time the agent interacts with the outside world, the response is an observation that may be relevant later in the same turn or in future turns.

The critical principle: write aggressively, filter later. If you don't write it, the SELECT stage can't retrieve it. A memory that was never persisted is permanently lost. Over-writing is cheap (storage costs pennies). Under-writing is expensive (the agent hallucinates or repeats work).

The WRITE stage is a one-way door. Once a conversation turn passes, context that wasn't captured is gone. Write first, prune later.

Where Redis and MongoDB Fit: WRITE

Scratchpads are short-lived and high-frequency. Redis is the natural store — use Hashes for structured scratchpad fields (agent:task:123 → { reasoning, draft, status }) with a TTL that matches the task lifetime. Writes are sub-millisecond. When the task completes, the TTL expires and the scratchpad is gone.

Memories are long-lived and need to be searchable. MongoDB Atlas is the durable store — write each memory as a document with the content, a vector embedding (for semantic retrieval later), metadata (user_id, agent_id, created_at, topic), and a confidence score. MongoDB gives you flexible schemas for evolving memory structures and Atlas Vector Search for semantic retrieval.

Observations from tool calls go to both: the raw result to MongoDB for audit and replay, a summarized or key-value version to Redis for fast access during the current conversation.

Stage 2: SELECT — Hydrate Only What Matters

The agent now needs to act. It has a new user message, a task to perform, or a tool to call. The SELECT stage decides which stored context gets pulled into the prompt — and critically, which context stays out.

This is retrieval. But not 'dump everything' retrieval — targeted, multi-source hydration. The agent needs the 3 most relevant memories for this user's question, the 2 most recent scratchpad entries for the current task, the tool output from the last step, and the user's preference profile. That's 6-8 pieces of context, not 600.

  • Semantic retrieval — Use vector search to find memories and documents similar to the current query. This catches conceptually relevant context even when keywords don't match.
  • Recency retrieval — The most recent N conversation turns, scratchpad entries, or observations. Time-based relevance is often as important as semantic similarity.
  • Explicit retrieval — Hardcoded context that always gets included: system prompt, user profile, active task definition, tool schemas.
  • Conditional retrieval — Context pulled only when a condition is met: 'If the user mentions billing, retrieve the pricing document.' 'If the task involves the database, retrieve the schema.'

The SELECT stage is where most agents fail in production. They either retrieve too much (the prompt is 80% irrelevant context, the model gets confused, token costs spike) or too little (the model hallucinates because it's missing a critical fact that exists in the memory store but wasn't retrieved).

SELECT is not 'give the model everything.' It's 'give the model the minimum context for a correct answer.' Every irrelevant token dilutes the signal.

Where Redis and MongoDB Fit: SELECT

Semantic retrieval runs against MongoDB Atlas Vector Search. Your memories and documents are stored with vector embeddings — a $vectorSearch aggregation stage finds the top-K most similar entries in 5-15ms. For AI agents, this is how you retrieve 'the 3 memories most relevant to the user's current question' without keyword matching.

Recency retrieval runs against Redis. The last 10 conversation turns are in a Redis List or Sorted Set (keyed by session_id, scored by timestamp). ZREVRANGEBYSCORE gives you the last N entries in sub-millisecond time. The current scratchpad is a Redis Hash — one HGETALL returns the full working state.

For hybrid retrieval (semantic + keyword), Redis supports vector search with tag-based filtering via RediSearch. If your hot context (recent conversations, active tasks) lives in Redis and your long-term knowledge lives in MongoDB, the SELECT stage queries both in parallel and merges the results. The latency-critical path (recent turns, scratchpads) hits Redis; the knowledge-retrieval path (memories, documents) hits MongoDB Atlas Vector Search.

Stage 3: COMPRESS — Keep Signal-to-Noise High

You've selected the relevant context. Now you have 15,000 tokens of conversation history, 3,000 tokens of retrieved memories, 2,000 tokens of tool outputs, and a 1,500-token system prompt. That's 21,500 tokens — approaching the practical limit for many models, and already expensive at $0.01-0.03 per 1K input tokens.

COMPRESS reduces the token count while preserving the information density. Three techniques, used in combination.

Summarization

Replace long conversation history with a running summary. Instead of sending 50 conversation turns (15,000 tokens), send a 500-token summary of the conversation so far plus the last 5 raw turns. The summary captures the arc; the recent turns capture the immediate context. Use a smaller, cheaper model (GPT-4o-mini, Claude Haiku) for summarization — it doesn't need to be perfect, just accurate enough to preserve key facts.

Pruning

Remove context that has become irrelevant. If the agent explored 3 approaches and committed to approach #2, the detailed reasoning for approaches #1 and #3 can be dropped. If a tool call returned a 5,000-token JSON response but the agent only used 2 fields, replace the full response with just those 2 fields. Aggressive pruning can cut context by 60-80% without information loss.

Deduplication

Over multiple conversation turns, the same facts get repeated — by the user, by retrieved memories, by tool outputs. 'The database is MongoDB Atlas' might appear in the system prompt, in a retrieved memory, and in a tool output. Deduplicate before prompt assembly. Semantic dedup (using embedding similarity) catches paraphrased duplicates that keyword dedup misses.

TechniqueToken ReductionInformation Loss RiskWhen to Use
Conversation summarization70-90% of historyLow — summary preserves key factsAlways, after 10+ turns
Tool output pruning60-80% per tool responseLow — keep only consumed fieldsEvery tool call with large response
Approach/reasoning pruning50-70% of scratchpadMedium — keep winning approach, drop alternativesAfter agent commits to a path
Semantic deduplication10-30% of assembled contextVery low — duplicates add no informationAlways, before final prompt assembly
Truncation (last resort)VariableHigh — information is lostOnly when above techniques are insufficient
Compression isn't about fitting into the context window. It's about signal-to-noise ratio. A 4,000-token prompt with perfect signal beats a 100,000-token prompt where 95% is noise.

Where Redis and MongoDB Fit: COMPRESS

Conversation summaries go to Redis — replace the full turn history in the Redis List with a summary document and keep only the last N raw turns. The summary is keyed by session_id:summary and updated every 10-15 turns. This is a write-back-to-Redis operation: the COMPRESS stage modifies the WRITE stage's data.

In MongoDB, memory compression happens as a background job. A scheduled process reviews memories older than N days, merges duplicates, and creates consolidated memory documents. If the agent has 50 memories about the user's database preferences from 50 different conversations, a weekly compression job merges them into 3-5 canonical memories with higher confidence scores.

Stage 4: ISOLATE — Scoped Agents, No Context Pollution

Modern agent architectures use sub-agents. A planning agent delegates to a coding agent, a research agent, and a review agent. Each sub-agent has a specialized role — and each one needs its own context scope. If the coding agent's debug logs leak into the research agent's context, the research agent starts reasoning about stack traces instead of the user's question.

ISOLATE means each sub-agent gets its own context window, its own scratchpad, its own selected memories, and its own compressed history. The parent agent controls what context flows into each child and what results flow back. This is context access control for AI.

  • Scoped scratchpads — Each sub-agent writes to its own Redis namespace. coding_agent:task:456 is invisible to research_agent:task:456. No key collisions, no cross-reads.
  • Scoped memory retrieval — The coding agent retrieves memories tagged with topic: 'code', 'architecture', 'debugging'. The research agent retrieves memories tagged with topic: 'user_requirements', 'domain_knowledge'. The same memory store, different retrieval filters.
  • Result summarization at handoff — When a sub-agent completes, its full scratchpad (which might be 50,000 tokens of reasoning) gets compressed into a 500-token result summary before being passed back to the parent. The parent sees the conclusion, not the entire reasoning chain.
  • No shared mutable state — Sub-agents don't write to each other's scratchpads. Communication happens through the parent agent, which controls the information flow.
Context isolation isn't about security. It's about accuracy. An agent reasoning about code shouldn't have research notes in its prompt. An agent writing a summary shouldn't see debug logs. Isolation keeps each agent focused.

Where Redis and MongoDB Fit: ISOLATE

Redis namespacing gives you natural isolation. Each sub-agent's scratchpad uses a prefixed key pattern: {agent_type}:{agent_id}:{task_id}. Redis key patterns ensure SCAN and retrieval only return keys within the agent's scope. TTLs on sub-agent scratchpads auto-clean after task completion — no orphaned context from dead sub-agents.

MongoDB provides isolation through query filters. All memories live in one collection, but each document has an agent_scope field. The coding agent's retrieval query includes { agent_scope: { $in: ['code', 'shared'] } } — it sees code-related memories and shared facts, but not research notes. The same Vector Search index serves all agents; the metadata filter provides the isolation.

The Full Architecture: Redis + MongoDB in the Context Loop

Putting it all together, the infrastructure stack for a production context loop uses both Redis and MongoDB — each in the layer where it's strongest.

StageRedis (Speed Layer)MongoDB (Knowledge Layer)
WRITEScratchpads (Hashes + TTL), recent conversation turns (Lists/Sorted Sets), tool output cacheLong-term memories (documents + embeddings), observations archive, audit trail
SELECTRecency retrieval (last N turns), active scratchpad (HGETALL), session state, cached embeddingsSemantic retrieval (Atlas Vector Search), long-term memory search, document retrieval
COMPRESSOverwrite conversation list with summary + recent turns, expire old scratchpad entriesBackground memory consolidation (merge duplicates, update confidence scores)
ISOLATENamespaced keys per sub-agent ({type}:{id}:{task}), TTL auto-cleanupMetadata-filtered retrieval (agent_scope field), shared vs private memories

The latency-critical path — every agent turn — hits Redis. Sub-millisecond reads for scratchpads, conversation history, and session state. The knowledge-retrieval path — semantic search over long-term memory — hits MongoDB Atlas Vector Search. Both execute in parallel during the SELECT stage. Total context assembly time: 10-20ms for a fully hydrated prompt.

The Context Engineering Checklist

  • 1. WRITE everything — scratchpads, memories, observations. Write aggressively, prune later. Use Redis for ephemeral, MongoDB for durable.
  • 2. SELECT with precision — semantic search for relevance, recency for immediacy, explicit rules for mandatory context. Never dump everything.
  • 3. COMPRESS after 10+ turns — summarize history, prune tool outputs, deduplicate semantically. Target 60-80% token reduction.
  • 4. ISOLATE every sub-agent — namespaced scratchpads, scoped memory retrieval, summarized handoffs between agents.
  • 5. Run SELECT against Redis and MongoDB in parallel — don't serialize the fast path behind the knowledge path.
  • 6. Set TTLs on all ephemeral context in Redis — scratchpads, session state, cached tool outputs. No orphaned keys.
  • 7. Consolidate long-term memories in MongoDB on a schedule — merge duplicates, update confidence, prune stale facts.
  • 8. Measure signal-to-noise ratio — track what percentage of the prompt actually influenced the model's output. High waste = COMPRESS needs tuning.
  • 9. Monitor token costs per agent turn — rising costs usually mean SELECT is retrieving too much or COMPRESS isn't aggressive enough.
  • 10. The loop is continuous — every agent turn WRITEs new context that flows into the next turn's SELECT. Design for the cycle, not a single pass.
The model is the engine. Context is the fuel. The Write-Select-Compress-Isolate loop is the fuel injection system — it controls what goes in, at what concentration, and at what moment. Engineer the loop, and a $0.01 model call outperforms a $0.10 one.