All posts
ObservabilityDatadogMongoDB AtlasMonitoring

Monitoring MongoDB Atlas with Datadog: Integration, Metrics, and Production Dashboards

Polystreak Team2026-03-219 min read

MongoDB Atlas includes excellent built-in monitoring — real-time performance panels, slow query analysis, and cluster health metrics. But it lives in the Atlas console. If your production stack already uses Datadog for application APM, Redis monitoring, Kubernetes metrics, and log management, having MongoDB in a separate tab means slower incident response. When latency spikes, you need to see application P99, Redis latency, and MongoDB latency on the same graph — not in three different browser tabs.

The goal isn't to replace Atlas monitoring. It's to put MongoDB metrics next to everything else — so when something breaks, you see the full picture in one place.

Setting Up the Integration

Datadog offers a native MongoDB Atlas integration that pulls metrics directly from the Atlas API. No agent installation on the Atlas cluster is needed — Atlas is a managed service, so there's no host to install an agent on. The integration uses Atlas's monitoring API to fetch cluster-level and process-level metrics.

Step 1: Create an Atlas API Key

In the Atlas console, navigate to Organization Access Manager > API Keys. Create a new API key with the Organization Read Only role (minimum) or Project Read Only if you want to scope it to a specific project. Save the Public Key and Private Key — you'll enter these in Datadog.

Step 2: Configure the Datadog Integration

  • In Datadog, go to Integrations > MongoDB Atlas.
  • Click Add New and enter your Atlas API Public Key and Private Key.
  • Optionally restrict to specific Atlas projects by entering the Project ID.
  • Enable the integration. Datadog begins polling the Atlas API for metrics (default interval: 60 seconds).
  • Metrics appear under the mongodb.atlas namespace within 2-5 minutes.

Step 3: Enable Atlas Database Metrics (Optional but Recommended)

For deeper monitoring, enable the Datadog MongoDB Database Monitoring (DBM) integration. This provides query-level visibility — slow queries, query plans, and per-operation latency — on top of the cluster-level metrics. DBM requires the Datadog Agent running in your VPC with network access to the Atlas cluster (via VPC peering).

IntegrationAgent RequiredWhat You GetBest For
Atlas API integrationNoCluster metrics: connections, ops, memory, disk, replicationQuick setup, cluster-level visibility
Datadog DBMYes (agent in your VPC)Query-level: slow queries, explain plans, per-operation statsDeep query performance analysis
Both togetherYesFull stack: cluster health + query-level diagnosticsProduction AI workloads

Key Metrics for MongoDB Atlas

Atlas exposes dozens of metrics through the Datadog integration. For AI agent data layers, focus on these categories mapped to the Golden Signals.

Latency

MetricWhat It Tells YouAlert Threshold
mongodb.atlas.oplatencies.reads.avgAverage read operation latency (microseconds)Alert if > 10ms for indexed queries
mongodb.atlas.oplatencies.writes.avgAverage write operation latencyAlert if > 20ms sustained
mongodb.atlas.oplatencies.commands.avgCommand latency (aggregations, $vectorSearch)Monitor — varies by query complexity

Throughput

MetricWhat It Tells YouAlert Threshold
mongodb.atlas.opcounters.queryRead operations per secondAlert on sudden drop > 50%
mongodb.atlas.opcounters.insertInsert operations per secondMonitor for write spikes
mongodb.atlas.opcounters.updateUpdate operations per secondAlert if unexpected spike (runaway upserts)
mongodb.atlas.opcounters.getmoreCursor getMore operations. High values indicate large result sets.Monitor — high getmore means queries returning too many docs

Connections

MetricWhat It Tells YouAlert Threshold
mongodb.atlas.connections.currentCurrent open connectionsAlert at 80% of tier limit
mongodb.atlas.connections.availableRemaining available connectionsAlert if < 20% remaining

Memory and Storage

MetricWhat It Tells YouAlert Threshold
mongodb.atlas.mem.residentWiredTiger cache resident memoryAlert if approaching tier limit
mongodb.atlas.extra_info.page_faultsPage faults — reads hitting disk instead of cacheAlert on sustained increase
mongodb.atlas.wiredtiger.cache.bytes_currently_in_cacheData in WiredTiger cache vs configured maxAlert at 80% of cache size
mongodb.atlas.dbstats.storage_sizeOn-disk storage usedAlert at 80% of provisioned storage

Replication

MetricWhat It Tells YouAlert Threshold
mongodb.atlas.replset.replication_lagReplication lag from primary to secondary (seconds)Alert if > 10 seconds
mongodb.atlas.replset.oplog_windowHours of oplog retained. If lag exceeds this, resync is needed.Alert if < 2 hours

Building the Dashboard

Structure your Datadog MongoDB Atlas dashboard around the four Golden Signals, with a fifth section for AI-specific workload patterns.

  • Row 1 — Latency: Timeseries of read, write, and command latency. Overlay with application P99 from APM for correlation.
  • Row 2 — Traffic: Timeseries of opcounters by type (query, insert, update, delete). Stacked area chart shows read/write ratio over time.
  • Row 3 — Errors: Timeseries of assertion rates. Connection failures. Query targeting ratio (docsExamined vs docsReturned — from DBM).
  • Row 4 — Saturation: WiredTiger cache usage as percentage. Connection count vs tier limit. Disk IOPS vs provisioned.
  • Row 5 — AI Workloads: $vectorSearch latency (from DBM query stats), document retrieval ops/sec, page faults during context retrieval, replication lag for read-from-secondary patterns.

Alerting Recommendations

AlertConditionSeverity
Connection exhaustionconnections.current > 80% of tier max for 5 minP1 — New connections will fail
Replication lag criticalreplication_lag > 30 seconds for 5 minP1 — Secondaries serving stale data
Read latency spikeoplatencies.reads.avg > 50ms for 10 minP2 — Context retrieval degraded
Page faults sustainedpage_faults rate > 100/min for 15 minP2 — Working set exceeds cache
Oplog window shrinkingoplog_window < 2 hoursP2 — Risk of replica resync
Storage 80%storage_size > 80% provisionedP3 — Plan capacity increase

Correlating MongoDB with the Rest of Your Stack

The real power of Datadog for MongoDB monitoring isn't the MongoDB metrics alone — it's the correlation. When your AI agent's P99 latency spikes from 200ms to 2 seconds, you need to see in one view: Was it the application code? Redis cache miss spike? MongoDB read latency? Kubernetes pod restart? Network issue?

  • Create a Service Map that links your application → Redis → MongoDB. Datadog APM traces show which database call contributed to end-to-end latency.
  • Use Datadog Notebooks for incident investigation — pull MongoDB latency, Redis latency, application errors, and Kubernetes events into one timeline.
  • Tag MongoDB metrics with environment, cluster_name, and database to filter by service or team.
  • Set up Composite Monitors — alert when MongoDB latency AND application error rate both spike (reduces false positives from either alone).
A MongoDB latency spike means nothing in isolation. It means everything when you can see it alongside the Redis cache miss that caused it and the application timeout it produced.