Golden Signals for Your AI Data Layer: What to Monitor in Redis and MongoDB
Google's SRE handbook defined four Golden Signals — Latency, Traffic, Errors, and Saturation. They were designed for services, but they apply just as powerfully to your data layer. When your AI agent's context retrieval slows down, it's rarely the LLM — it's almost always Redis or MongoDB underneath. Knowing which signals to watch is the difference between catching a problem in minutes and debugging it for hours.
If you can only look at four metrics for your data layer, make them the Golden Signals. Everything else is noise until these are healthy.
The Four Golden Signals — Applied to Data
1. Latency — How Fast Are Your Reads and Writes?
For Redis, this means command latency — the time from when Redis receives a command to when it sends the response. Track this at P50, P90, P99. Healthy Redis should be under 1ms at P99 for simple commands. If P99 drifts above 5ms, something is wrong — large keys, blocking commands, or memory pressure. The key metric is redis_commands_duration_seconds broken down by command type.
For MongoDB, track operation latency by type — reads (find, aggregate) and writes (insert, update). MongoDB exposes this via the serverStatus command and the opLatencies metric. Atlas surfaces it directly in the UI. Healthy read latency for indexed queries should be under 5-10ms. If aggregate operations spike above 50ms, check your indexes and working set size.
2. Traffic — How Much Load Is Hitting Your Data Layer?
For Redis, traffic means commands per second (ops/sec). This is the most fundamental throughput metric. Track it globally and per command type — if GET operations suddenly double, something upstream changed. Redis Cloud exposes this as redis_commands_processed_total. A healthy baseline is whatever your normal traffic pattern looks like; the alarm is when it deviates by more than 2-3x without a known deployment.
For MongoDB, traffic is operations per second — reads, writes, commands. Atlas shows this as opcounters. For AI workloads specifically, watch the ratio of reads to writes. Context retrieval is read-heavy; if writes suddenly spike, you may be re-indexing or have a runaway upsert loop.
3. Errors — What's Failing?
Redis errors are rare but critical when they happen. Watch for rejected connections (maxclients reached), OOM (out-of-memory) responses, and replication errors. The metric redis_rejected_connections_total should always be zero in production. If it's not, you're dropping client requests.
For MongoDB, track failed operations, assertion counts, and replication lag. The asserts metric in serverStatus covers regular, warning, message, and user-level assertions. For Atlas, watch for retryable write errors and timeout errors — these indicate the cluster is under pressure.
4. Saturation — How Full Is It?
This is where most production incidents hide. For Redis, saturation means memory usage relative to maxmemory. When Redis hits 100% memory and the eviction policy kicks in, your AI agent starts getting stale or missing context. Track redis_memory_used_bytes vs redis_memory_max_bytes. Alert at 80%. For Redis Cloud with clustering, also track each shard's memory independently — one hot shard can cause evictions while the cluster average looks healthy.
For MongoDB, saturation is the working set size relative to available RAM. When your frequently accessed data no longer fits in the WiredTiger cache, page faults spike and reads go to disk. Track cache dirty bytes and pages evicted. On Atlas, watch the disk IOPS metric — when it approaches your provisioned limit, everything slows down.
Memory saturation is a silent killer. Your data layer won't throw an error — it'll just get slower, and your AI agent's P99 latency will quietly double.
Beyond the Four: AI-Specific Signals
For AI workloads, there are two additional signals worth tracking that the classic Golden Signals don't cover.
- Cache hit ratio (Redis) — For semantic cache and session memory, the hit ratio directly impacts your LLM API costs. A 90% hit ratio means 90% of context lookups avoid an embedding or LLM call. Track redis_keyspace_hits_total vs redis_keyspace_misses_total. If the ratio drops, your TTLs may be too aggressive or your cache keys are too specific.
- Vector search latency (Redis / MongoDB) — Separate from general read latency, track the time for FT.SEARCH (Redis) or $vectorSearch (MongoDB Atlas) specifically. These operations are computationally heavier than key lookups. Healthy Redis HNSW search on 1M vectors should be under 5-10ms. If it drifts, check your HNSW parameters (EF_RUNTIME, M) or consider index partitioning.
The Monitoring Stack That Actually Works
Once you know what to measure, you need to decide how to collect it. For most production AI stacks, there are two proven approaches: Prometheus-native scraping and Datadog agent collection.
Both get you to the same place — dashboards and alerts on your Golden Signals. The choice depends on what you're already running. If you have Prometheus and Grafana, extend it. If you're on Datadog, use the agent. Don't run both for the same metrics.
The best observability stack is the one your team already knows. Add data layer metrics to it — don't build a new system.