ObservabilityDatadogMongoDB AtlasMonitoring

Monitoring MongoDB Atlas with Datadog: Integration, Metrics, and Production Dashboards

Polystreak Team2026-03-219 min read

MongoDB Atlas includes excellent built-in monitoring — real-time performance panels, slow query analysis, and cluster health metrics. But it lives in the Atlas console. If your production stack already uses Datadog for application APM, Redis monitoring, Kubernetes metrics, and log management, having MongoDB in a separate tab means slower incident response. When latency spikes, you need to see application P99, Redis latency, and MongoDB latency on the same graph — not in three different browser tabs.

The goal isn't to replace Atlas monitoring. It's to put MongoDB metrics next to everything else — so when something breaks, you see the full picture in one place.

Setting Up the Integration

Datadog offers a native MongoDB Atlas integration that pulls metrics directly from the Atlas API. No agent installation on the Atlas cluster is needed — Atlas is a managed service, so there's no host to install an agent on. The integration uses Atlas's monitoring API to fetch cluster-level and process-level metrics.

Step 1: Create an Atlas API Key

In the Atlas console, navigate to Organization Access Manager > API Keys. Create a new API key with the Organization Read Only role (minimum) or Project Read Only if you want to scope it to a specific project. Save the Public Key and Private Key — you'll enter these in Datadog.

Step 2: Configure the Datadog Integration

In Datadog, go to Integrations > MongoDB Atlas.
Click Add New and enter your Atlas API Public Key and Private Key.
Optionally restrict to specific Atlas projects by entering the Project ID.
Enable the integration. Datadog begins polling the Atlas API for metrics (default interval: 60 seconds).
Metrics appear under the mongodb.atlas namespace within 2-5 minutes.

Step 3: Enable Atlas Database Metrics (Optional but Recommended)

For deeper monitoring, enable the Datadog MongoDB Database Monitoring (DBM) integration. This provides query-level visibility — slow queries, query plans, and per-operation latency — on top of the cluster-level metrics. DBM requires the Datadog Agent running in your VPC with network access to the Atlas cluster (via VPC peering).

Integration	Agent Required	What You Get	Best For
Atlas API integration	No	Cluster metrics: connections, ops, memory, disk, replication	Quick setup, cluster-level visibility
Datadog DBM	Yes (agent in your VPC)	Query-level: slow queries, explain plans, per-operation stats	Deep query performance analysis
Both together	Yes	Full stack: cluster health + query-level diagnostics	Production AI workloads

Key Metrics for MongoDB Atlas

Atlas exposes dozens of metrics through the Datadog integration. For AI agent data layers, focus on these categories mapped to the Golden Signals.

Latency

Metric	What It Tells You	Alert Threshold
mongodb.atlas.oplatencies.reads.avg	Average read operation latency (microseconds)	Alert if > 10ms for indexed queries
mongodb.atlas.oplatencies.writes.avg	Average write operation latency	Alert if > 20ms sustained
mongodb.atlas.oplatencies.commands.avg	Command latency (aggregations, $vectorSearch)	Monitor — varies by query complexity

Throughput

Metric	What It Tells You	Alert Threshold
mongodb.atlas.opcounters.query	Read operations per second	Alert on sudden drop > 50%
mongodb.atlas.opcounters.insert	Insert operations per second	Monitor for write spikes
mongodb.atlas.opcounters.update	Update operations per second	Alert if unexpected spike (runaway upserts)
mongodb.atlas.opcounters.getmore	Cursor getMore operations. High values indicate large result sets.	Monitor — high getmore means queries returning too many docs

Connections

Metric	What It Tells You	Alert Threshold
mongodb.atlas.connections.current	Current open connections	Alert at 80% of tier limit
mongodb.atlas.connections.available	Remaining available connections	Alert if < 20% remaining

Memory and Storage

Metric	What It Tells You	Alert Threshold
mongodb.atlas.mem.resident	WiredTiger cache resident memory	Alert if approaching tier limit
mongodb.atlas.extra_info.page_faults	Page faults — reads hitting disk instead of cache	Alert on sustained increase
mongodb.atlas.wiredtiger.cache.bytes_currently_in_cache	Data in WiredTiger cache vs configured max	Alert at 80% of cache size
mongodb.atlas.dbstats.storage_size	On-disk storage used	Alert at 80% of provisioned storage

Replication

Metric	What It Tells You	Alert Threshold
mongodb.atlas.replset.replication_lag	Replication lag from primary to secondary (seconds)	Alert if > 10 seconds
mongodb.atlas.replset.oplog_window	Hours of oplog retained. If lag exceeds this, resync is needed.	Alert if < 2 hours

Building the Dashboard

Structure your Datadog MongoDB Atlas dashboard around the four Golden Signals, with a fifth section for AI-specific workload patterns.

Row 1 — Latency: Timeseries of read, write, and command latency. Overlay with application P99 from APM for correlation.
Row 2 — Traffic: Timeseries of opcounters by type (query, insert, update, delete). Stacked area chart shows read/write ratio over time.
Row 3 — Errors: Timeseries of assertion rates. Connection failures. Query targeting ratio (docsExamined vs docsReturned — from DBM).
Row 4 — Saturation: WiredTiger cache usage as percentage. Connection count vs tier limit. Disk IOPS vs provisioned.
Row 5 — AI Workloads: $vectorSearch latency (from DBM query stats), document retrieval ops/sec, page faults during context retrieval, replication lag for read-from-secondary patterns.

Alerting Recommendations

Alert	Condition	Severity
Connection exhaustion	connections.current > 80% of tier max for 5 min	P1 — New connections will fail
Replication lag critical	replication_lag > 30 seconds for 5 min	P1 — Secondaries serving stale data
Read latency spike	oplatencies.reads.avg > 50ms for 10 min	P2 — Context retrieval degraded
Page faults sustained	page_faults rate > 100/min for 15 min	P2 — Working set exceeds cache
Oplog window shrinking	oplog_window < 2 hours	P2 — Risk of replica resync
Storage 80%	storage_size > 80% provisioned	P3 — Plan capacity increase

Correlating MongoDB with the Rest of Your Stack

The real power of Datadog for MongoDB monitoring isn't the MongoDB metrics alone — it's the correlation. When your AI agent's P99 latency spikes from 200ms to 2 seconds, you need to see in one view: Was it the application code? Redis cache miss spike? MongoDB read latency? Kubernetes pod restart? Network issue?

Create a Service Map that links your application → Redis → MongoDB. Datadog APM traces show which database call contributed to end-to-end latency.
Use Datadog Notebooks for incident investigation — pull MongoDB latency, Redis latency, application errors, and Kubernetes events into one timeline.
Tag MongoDB metrics with environment, cluster_name, and database to filter by service or team.
Set up Composite Monitors — alert when MongoDB latency AND application error rate both spike (reduces false positives from either alone).

A MongoDB latency spike means nothing in isolation. It means everything when you can see it alongside the Redis cache miss that caused it and the application timeout it produced.