Why Teams Move from MongoDB to Redis for AI Workloads
MongoDB is the default document database for good reason — flexible schema, rich queries, and a mature ecosystem. But when teams start running AI agents in production, they hit a pattern: MongoDB handles the storage layer well, but the real-time layer — context retrieval, session memory, vector similarity, and caching — needs something faster.
MongoDB stores what your AI agent knows. Redis serves what your AI agent needs — right now, in under a millisecond.
The Latency Gap
MongoDB reads from disk (even with WiredTiger's cache, hot data isn't guaranteed to stay in memory). Redis reads from memory — always. For AI agents making 5-10 context lookups per inference call, the difference between 2-5ms (MongoDB) and 0.1-0.5ms (Redis) per read compounds fast. At 1,000 requests per second, that's the difference between a responsive agent and a sluggish one.
What Teams Actually Move to Redis
Most teams don't replace MongoDB entirely. They move specific AI workloads to Redis while keeping MongoDB as the system of record. The pattern looks like this:
- Session memory — Conversation context with TTL expiry. Redis JSON with per-key TTLs beats MongoDB's TTL indexes for speed and lifecycle control.
- Semantic cache — Cache LLM responses keyed by embedding similarity. A cache hit avoids a $0.01-0.10 API call. Redis vector search makes this sub-millisecond.
- Context retrieval — Vector similarity search for RAG pipelines. Redis HNSW indexes return top-K results faster than MongoDB Atlas Vector Search for hot, frequently-queried data.
- Feature store — Real-time user features, preferences, and agent state. Redis hashes and JSON are purpose-built for this.
- Rate limiting and token tracking — Per-user token budgets and API rate limits. Redis atomic counters and sorted sets handle this natively.
Vector Search: Atlas vs Redis
MongoDB Atlas Vector Search is solid for moderate-scale similarity queries co-located with your documents. But it runs as a separate $search stage in the aggregation pipeline, which means the vectors are indexed by a Lucene-based engine, not the core database. For high-throughput, latency-sensitive AI workloads, Redis vector search (HNSW or FLAT) runs in-memory with no pipeline overhead.
Atlas Vector Search is great when your vectors change slowly and latency is flexible. Redis is what you reach for when every millisecond counts and your agents query thousands of times per second.
The Hybrid Architecture
The winning pattern we see in production is MongoDB + Redis, not MongoDB or Redis. MongoDB remains the durable store — user profiles, documents, audit logs, transaction history. Redis becomes the real-time engine — session state, vector cache, context retrieval, feature serving. CDC (change data capture) with Debezium keeps them in sync.
- MongoDB Atlas — Source of truth. Durable storage, complex queries, aggregation pipelines, Atlas Search for full-text.
- Redis Cloud — Real-time layer. Sub-millisecond reads, vector search, session memory, semantic cache, rate limiting.
- Debezium / Kafka — Sync layer. Stream changes from MongoDB oplog to Redis in near real-time.
When to Stay on MongoDB Alone
Not every team needs Redis. If your AI agent handles fewer than 100 requests per minute, your vector index has under 100K documents, and your latency budget is 10-50ms, MongoDB Atlas alone is fine. The complexity of a second data system isn't worth it at low scale.
The inflection point is usually around 500+ requests per second, 1M+ vectors, or when you start seeing P99 latency spikes on context retrieval. That's when Redis pays for itself — in speed, in cost (fewer LLM cache misses), and in user experience.
The best AI infrastructure isn't one database — it's the right database for each layer. MongoDB for durability. Redis for speed. Both for production.