Integrating Prometheus with MongoDB Atlas: Metrics Collection for AI Data Layers
If your AI agents run on MongoDB Atlas, you need visibility into what the database is doing — operation latency, connection pressure, replication lag, cache efficiency. Atlas has its own monitoring dashboard, but most production teams already run Prometheus and Grafana for everything else. Getting Atlas metrics into that same stack means one place to look when things go wrong.
Atlas has great built-in monitoring. But if your Prometheus already tracks Redis, Kubernetes, and your application — the last thing you want is a separate tab for your database.
The Challenge: Atlas Doesn't Speak Prometheus Natively
Unlike Redis Enterprise (which exposes a /metrics endpoint on port 8070), MongoDB Atlas is a fully managed service. You don't have access to the underlying mongod process or the host machine. That means you can't just point Prometheus at a port and start scraping. You need a bridge.
There are two proven approaches: the MongoDB Atlas Prometheus integration (available on M10+ clusters) and the community mongodb_exporter running as a sidecar or standalone process.
Option 1: Atlas Built-in Prometheus Integration
Atlas clusters on M10 tier and above support a native Prometheus integration. You enable it in the Atlas console under Integrations > Prometheus, and Atlas generates a scrape endpoint with authentication credentials. Add this endpoint to your prometheus.yml as a scrape target with basic_auth, and Prometheus will pull Atlas cluster metrics on your configured interval.
This is the cleanest approach — no exporter to deploy, no sidecar to manage. Atlas handles the metric exposition. The trade-off is that you're limited to the metrics Atlas chooses to expose, which covers the essentials but may not include every low-level WiredTiger stat.
Option 2: The mongodb_exporter
The percona/mongodb_exporter is an open-source process that connects to MongoDB (via the connection string), runs diagnostic commands (serverStatus, replSetGetStatus, dbStats), and translates the results into Prometheus exposition format on a /metrics HTTP endpoint — typically on port 9216.
Deploy the exporter as a container alongside your application (in EKS, as a sidecar or a separate deployment). Point it at your Atlas connection string (with a monitoring user that has the clusterMonitor role). Then add the exporter's endpoint to your Prometheus scrape config. This gives you deeper metric coverage than the native integration, including WiredTiger internals.
The mongodb_exporter translates MongoDB's internal diagnostics into a language Prometheus understands. One connection string, one sidecar, full visibility.
Which Metrics Matter for AI Workloads
MongoDB exposes 100+ metrics through these integrations. For AI agent data layers, focus on these:
- mongodb_ss_opLatencies — Operation latency by type (reads, writes, commands). The most important metric. If read latency spikes above 10ms for indexed queries, your AI agent's context retrieval is suffering.
- mongodb_ss_opcounters — Operations per second by type. Track the read/write ratio. AI workloads are heavily read-biased; a sudden write spike often means a runaway upsert or re-indexing job.
- mongodb_ss_connections_current — Active connections. Atlas has connection limits per tier. Approaching the limit means new requests queue or fail. Monitor against the max for your cluster tier.
- mongodb_ss_wiredTiger_cache_bytes — WiredTiger cache usage. When your working set exceeds available cache, reads go to disk and latency jumps. Track used vs maximum cache size.
- mongodb_ss_wiredTiger_cache_pages_evicted — Page evictions. Non-zero eviction rates mean your working set doesn't fit in memory. This directly impacts read performance for context retrieval.
- mongodb_rs_members_replicationLag — Replication lag in seconds. For read-from-secondary patterns, lag means stale data. For AI agents reading context from secondaries, high lag means outdated context.
- mongodb_ss_asserts_total — Assertion counts (regular, warning, user). A rising assertion rate often signals an impending issue — invalid queries, resource pressure, or driver bugs.
Building the Dashboard
In Grafana, create a MongoDB panel group with four rows mapping to the Golden Signals. Latency: opLatencies with PromQL rate() and percentile calculations. Traffic: opcounters rate by type. Errors: assertion rates and connection failures. Saturation: WiredTiger cache usage as a percentage, with alert thresholds at 80% and 90%.
For AI-specific panels, add a vector search latency panel if you're using Atlas Vector Search — track the $vectorSearch aggregation stage duration separately from regular find operations. And add a connections panel with the tier limit overlaid as a constant line so you can see how close you are to the ceiling.
Atlas Monitoring vs Prometheus: Do You Need Both?
Atlas's built-in monitoring is excellent for cluster-level health checks and historical analysis. But it lives in the Atlas console — a separate tab from your Redis dashboards, your Kubernetes metrics, and your application traces. For production AI systems, a single Grafana dashboard showing Redis latency, MongoDB latency, and application P99 side by side is worth the setup cost. That's what Prometheus integration gives you: one pane of glass.
The goal isn't to replace Atlas monitoring. It's to put MongoDB metrics next to Redis metrics next to application metrics — so when latency spikes, you know which layer caused it in one glance.